[ https://issues.apache.org/jira/browse/TRAFODION-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708641#comment-14708641 ]
Atanu Mishra commented on TRAFODION-269: ---------------------------------------- Barry Fritchman (barry-fritchman) wrote on 2014-05-30: #1 The performance degradation relative to Seaquest is unquestionable, but I'm not so sure it is limited to Update Stats. I tried using the same query we use to retrieve data from which histograms are constructed for a given column outside the ustat context, i.e., directly from sqlci. The query is SELECT FMTVAL, SUMVAL FROM (SELECT d, TRIM(TRAILING FROM CAST(d AS VARCHAR(30) CHARACTER SET UCS2)), COUNT(*) FROM cube2 GROUP BY d FOR READ UNCOMMITTED ACCESS) T(d, FMTVAL, SUMVAL) ORDER BY d; where d is the column name in this case. On a 1-million row table with 4 partitions, the execution time for this query in Trafodion was 130 seconds, whereas it takes about 2.5 seconds in Seaquest. The plans were very similar in both environments. Barry Fritchman (barry-fritchman) wrote on 2014-05-31: #2 I asked Taoufik to try to reproduce the results I referred to previously, since I ran the SQ and Traf sides of the test on different workstations. His results for SQ were the same, but the time on Trafodion was significantly faster -- 20 seconds as opposed to my 130 seconds. So the disparity is not as severe as I first thought, but still represents an order of magnitude difference, which roughly parallels the performance difference for Update Stats on SQ/Traf. For reference, here are the plans used for SQ and Traf. Plan for SQ: LC RC OP OPERATOR OPT DESCRIPTION CARD ---- ---- ---- -------------------- -------- -------------------- --------- 8 . 9 root 1.00E+001 7 . 8 esp_exchange sm 1:4(hash2) (m) 1.00E+001 6 . 7 sort_partial_groupby 1.00E+001 5 . 6 sort 1.00E+001 4 . 5 esp_exchange sm 4(hash2):4(hash2) 1.00E+001 3 . 4 hash_partial_groupby 1.00E+001 2 . 3 esp_exchange sm 4(hash2):1 1.00E+006 1 . 2 partition_access 1.00E+006 . . 1 file_scan fs fr CUBE2 1.00E+006 Plan for Traf: LC RC OP OPERATOR OPT DESCRIPTION CARD ---- ---- ---- -------------------- -------- -------------------- --------- 6 . 7 root 1.00E+001 5 . 6 sort_partial_groupby 1.00E+001 4 . 5 sort 1.00E+001 3 . 4 esp_exchange 1:3(hash2) 1.00E+001 2 . 3 hash_partial_groupby 1.00E+001 1 . 2 esp_exchange 3(hash2):2(range) 1.00E+006 . . 1 trafodion_scan CUBE2 1.00E+006 Taoufik tinkered with the degree of esp parallelism, but the original plan above was the fastest. Stacey Johnson (sjohnson-w) on 2014-06-10 information type: Proprietary → Public Barry Fritchman (barry-fritchman) wrote on 2014-07-07: #3 In addition to the improvements listed above, a change was made to perform sampling in the hbase layer instead of in Trafodion, which greatly reduces the number of rows returned from hbase. For the default 1% sampling rate, performance improvements of 2-4x were observed. Although performance improvement is an ongoing task for Trafodion, with these changes I believe the situation has improved sufficiently that this defect can be closed. Changed in trafodion: status: In Progress → Fix Committed Julie Thai (julie-y-thai) on 2014-08-06 Changed in trafodion: status: Fix Committed → Fix Released > LP Bug: 1319524 - Update Statistics performance is poor. > -------------------------------------------------------- > > Key: TRAFODION-269 > URL: https://issues.apache.org/jira/browse/TRAFODION-269 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Reporter: Barry Fritchman > Assignee: Barry Fritchman > Priority: Critical > > Update Statistics currently takes several times longer to complete than for > Seaquest. The performance was improved somewhat by the partial fix for > LP1301023, which caused the correct plan to be used for internally generated > queries, but the performance is still not adequate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)