please update the product docs ( if not already done). Also we should write an article.
--Srinath On Tue, Sep 13, 2016 at 2:08 PM, Gokul Balakrishnan <go...@wso2.com> wrote: > Hi all, > > The objective of this mail is to summarise the results of the recently > conducted performance test round for DAS 3.1.0. > > These tests were intended to measure the throughput of the batch and > interactive analytics capabilities of DAS under different conditions; > namely data persistence, Spark analytics job execution and indexing. For > this purpose, we've used DAS 3.1.0 RC3 instances backed by an Apache HBase > cluster running on HDFS as the data store, tuned for writes. > > This test round was conducted on Amazon EC2 nodes, in the following > configuration: > > 3 DAS nodes (variable roles: publisher, receiver, analyzer and indexer): > c4.2xlarge > 1 HBase master + Hadoop Namenode: c3.2xlarge > 9 HBase Regionservers + Hadoop Datanodes: c3.2xlarge > > *1. Persisting 1 billion events from the Smart Home DAS sample* > > This test was designed to test the data layer during sustained event > publication. During testing, the TPS was around the 150K mark, and the > HBase cluster's memstore flush (which suspends all writes) and minor > compaction operations brought it down somewhat in bursts. Overall, we were > able to achieve a mean of 96K TPS, but a steady rate of around 100-150K TPS > as is achievable, as opposed to the current no-flow-control situation. > > The published data took around 950GB on the Hadoop filesystem, taking > HDFS-level replication into account. > > > > > Events 1000000000 > Time (s) 10391.768 > Mean TPS 96230.01591 > > *2. Analyzing 1 billion events through Spark* > > Spark queries from the Smart Home DAS sample were executed against the > published data, and the analyzer node count was kept 2 and 3 respectively > for 2 separate tests. We'd given 6 processor cores and 12GB dedicated > memory for the Spark JVM during this test, and were able to get a > throughput of over 1M TPS on Spark for 2 analyzers and about 1.3M TPS for 3 > analyzers. > > DAS read operations from the HBase cluster also leverage HBase data > locality, which would have made the read process more efficient compared to > random reads. > > The mean throughput readings from 3 tests at each case with a query > involving aggregate functions and GROUP BY are as follows: > > INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS > avg_usage, > min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM > smartHomeData GROUP BY metro_area ; > > > 2 Analyzer Nodes 3 Analyzer Nodes > > Records 1000000000 1000000000 > Time (s) 958.802 741.152 > Mean TPS 1042968.204 1349250.896 > *3. Persisting the entire Wikipedia corpus* > > This test involved publishing the entirety of the Wikipedia dataset, where > a single event comprises of one Wiki article (16.8M articles in total). > Events vary greatly in size, with the mean being ~3.5KB; hence, the > throughput also varies greatly as expected. Here, we were able to see a > mean throughput of around 9K TPS: > > > > Events 16753779 > Time (s) 1862.901 > Mean TPS 8993.381291 > > *4. Indexing the full Wikipedia dataset* > > In this test, the data from the Wikipedia dataset was indexed, whereby the > articles would support full text search through Lucene. The index worker > counts of 2 and 4 were tested, and 2 dedicated indexer nodes were used in > the test to run the indexing jobs independently to each other. > > The TPS v time graph of the first indexer node with 4 dedicated index > worker threads is as below: > > > > The overall results from both indexer nodes can be summarised as below: > > Records 16753779 > Node 2 Worker threads 4 worker threads > Indexer 1 2198.66 TPS > 2268.62 TPS > Indexer 2 4230.75 TPS > 3048.91 TPS > > > *5. Analyzing the Wikipedia dataset* > > Similar to the Smart Home dataset, Spark queries were run against the > published Wikipedia dataset, using analyzer clusters of 2 and 3 nodes > respectively. The results of one of these tests are as follows: > > INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, > COUNT(*) as page_count FROM wiki GROUP BY contributor_username; > > Records 16753779 > Node 2 Analyzer Nodes 4 Analyzer Nodes > > Time (s) > 236.107 > 181.419 > TPS 70958.41716 > 92348.53571 > > > The full findings of this test may be found in the attached Spreadsheet. > > Best regards, > > Testing DAS 3.1.0 Performance on a 10-node HBas... > <https://docs.google.com/a/wso2.com/spreadsheets/d/1Ng7pTR0MpSg3Asn02idBIZaq8AhdNd24HySKp37GzK8/edit?usp=drive_web> > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > -- ============================ Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture