[jira] [Commented] (TRAFODION-269) LP Bug: 1319524 - Update Statistics performance is poor.

Atanu Mishra (JIRA) Sun, 23 Aug 2015 18:46:30 -0700

    [ 
https://issues.apache.org/jira/browse/TRAFODION-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708641#comment-14708641
 ]


Atanu Mishra commented on TRAFODION-269:
----------------------------------------

Barry Fritchman (barry-fritchman) wrote on 2014-05-30:  #1
The performance degradation relative to Seaquest is unquestionable, but I'm not 
so sure it is limited to Update Stats. I tried using the same query we use to 
retrieve data from which histograms are constructed for a given column outside 
the ustat context, i.e., directly from sqlci. The query is

SELECT FMTVAL, SUMVAL FROM (SELECT d, TRIM(TRAILING FROM CAST(d AS VARCHAR(30) 
CHARACTER SET UCS2)), COUNT(*) FROM cube2 GROUP BY d FOR READ UNCOMMITTED 
ACCESS) T(d, FMTVAL, SUMVAL) ORDER BY d;

where d is the column name in this case.

On a 1-million row table with 4 partitions, the execution time for this query 
in Trafodion was 130 seconds, whereas it takes about 2.5 seconds in Seaquest. 
The plans were very similar in both environments.

Barry Fritchman (barry-fritchman) wrote on 2014-05-31:  #2
I asked Taoufik to try to reproduce the results I referred to previously, since 
I ran the SQ and Traf sides of the test on different workstations. His results 
for SQ were the same, but the time on Trafodion was significantly faster -- 20 
seconds as opposed to my 130 seconds. So the disparity is not as severe as I 
first thought, but still represents an order of magnitude difference, which 
roughly parallels the performance difference for Update Stats on SQ/Traf. For 
reference, here are the plans used for SQ and Traf.

Plan for SQ:

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

8 . 9 root 1.00E+001
7 . 8 esp_exchange sm 1:4(hash2) (m) 1.00E+001
6 . 7 sort_partial_groupby 1.00E+001
5 . 6 sort 1.00E+001
4 . 5 esp_exchange sm 4(hash2):4(hash2) 1.00E+001
3 . 4 hash_partial_groupby 1.00E+001
2 . 3 esp_exchange sm 4(hash2):1 1.00E+006
1 . 2 partition_access 1.00E+006
. . 1 file_scan fs fr CUBE2 1.00E+006

Plan for Traf:

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

6 . 7 root 1.00E+001
5 . 6 sort_partial_groupby 1.00E+001
4 . 5 sort 1.00E+001
3 . 4 esp_exchange 1:3(hash2) 1.00E+001
2 . 3 hash_partial_groupby 1.00E+001
1 . 2 esp_exchange 3(hash2):2(range) 1.00E+006
. . 1 trafodion_scan CUBE2 1.00E+006

Taoufik tinkered with the degree of esp parallelism, but the original plan 
above was the fastest.

Stacey Johnson (sjohnson-w) on 2014-06-10
information type:       Proprietary → Public
Barry Fritchman (barry-fritchman) wrote on 2014-07-07:  #3
In addition to the improvements listed above, a change was made to perform 
sampling in the hbase layer instead of in Trafodion, which greatly reduces the 
number of rows returned from hbase. For the default 1% sampling rate, 
performance improvements of 2-4x were observed. Although performance 
improvement is an ongoing task for Trafodion, with these changes I believe the 
situation has improved sufficiently that this defect can be closed.

Changed in trafodion:
status: In Progress → Fix Committed
Julie Thai (julie-y-thai) on 2014-08-06
Changed in trafodion:
status: Fix Committed → Fix Released


> LP Bug: 1319524 - Update Statistics performance is poor.
> --------------------------------------------------------
>
>                 Key: TRAFODION-269
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-269
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>            Reporter: Barry Fritchman
>            Assignee: Barry Fritchman
>            Priority: Critical
>
> Update Statistics currently takes several times longer to complete than for 
> Seaquest.  The performance was improved somewhat by the partial fix for 
> LP1301023, which caused the correct plan to be used for internally generated 
> queries, but the performance is still not adequate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-269) LP Bug: 1319524 - Update Statistics performance is poor.

Reply via email to