[Impala-ASF-CR] IMPALA-6714: [DOCS] ORC file format support
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10525 Change subject: IMPALA-6714: [DOCS] ORC file format support .. IMPALA-6714: [DOCS] ORC file format support This document is wrote refering to RCFile and Parquet's docs. The orc-support patch was merged in impala-2.12 and impala-3.0, so we start to support ORC format as an experimental feature since impala-2.12. Change-Id: Ib1ee23ed844653c274babdce5a332dbe5c79b630 --- M docs/impala.ditamap M docs/shared/impala_common.xml M docs/topics/impala_file_formats.xml A docs/topics/impala_orc.xml 4 files changed, 296 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/10525/1 -- To view, visit http://gerrit.cloudera.org:8080/10525 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib1ee23ed844653c274babdce5a332dbe5c79b630 Gerrit-Change-Number: 10525 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 ) Change subject: IMPALA-3307: Add support for IANA time-zone db .. Patch Set 10: I do not see any more low hanging fruits for performance improvement. Some overhead could be removed by modifying CCTZ, but this is out of the scope of this change, so I created a follow up Jira: IMPALA-7085: Consider patching Google/CCTZ for Impala's need -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 10 Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 28 May 2018 16:20:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7082: Show human readable size in query backend page
Quanlong Huang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10523 ) Change subject: IMPALA-7082: Show human readable size in query backend page .. IMPALA-7082: Show human readable size in query backend page This patch reuse the JS function introduced in IMPALA-6966, which can convert Long size value into human readable size, to render the "Peak mem. consumption" column in the Query details's Backends page. Change-Id: I04afb4091bb8b6bc9dedfeceaf9284a8c65b16a1 --- M www/catalog.tmpl M www/common-header.tmpl M www/query_backends.tmpl 3 files changed, 16 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/10523/2 -- To view, visit http://gerrit.cloudera.org:8080/10523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I04afb4091bb8b6bc9dedfeceaf9284a8c65b16a1 Gerrit-Change-Number: 10523 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9943 to look at the new patch set (#17). Change subject: IMPALA-5706: Spilling sort optimisations .. IMPALA-5706: Spilling sort optimisations This patch covers multiple changes with the purpose of optimizing spilling sort mechanism: - Remove the hard-coded maximum limit of buffers that can be used for merging the sorted runs. Instead this number is calculated based on the available memory through buffer pool. - The already sorted runs are distributed more optimally between the last intermediate merge and the final merge to avoid that a heavy intermediate merge is followed by a light final merge. - Right before starting the merging phase Sorter tries to allocate additional memory through the buffer pool. - An output run is not allocated anymore for the final merge. Note, double-buffering the runs during a merge was also planned with this patch. However, performance testing showed that except some exotic queries with unreasonably small amount of buffer pool memory available double-buffering doesn't add to the overall performance. It's basically because the half of the available buffers have to be sacrificed to do double-buffering and as a result the merge tree can get deeper. In addition the amount of I/O wait time is not reaching the level where double-buffering could countervail the reduced number of runs during a particular merge. Performance measurements were made during manual testing to verify that this is in fact an optimization: - In case doing a sort on top of a join when working with a restricted amount of memory then the Sort node successfully allocates additional memory right before the merging phase. This is feasible because once Join finishes sending new input data and calls InputDone() then it releases memory that can be picked up by the Sorter. This results in shallower merging trees (more runs grabbed for a merge). - On a multi-node cluster I verified that in cases when at least one merging step is done then this change reduces the execution time for sorts. - The more merging steps are done the bigger the performance gain is compared to the baseline. Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 --- M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M tests/query_test/test_sort.py 3 files changed, 149 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/9943/17 -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 17 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9943 to look at the new patch set (#16). Change subject: IMPALA-5706: Spilling sort optimisations .. IMPALA-5706: Spilling sort optimisations This patch covers multiple changes with the purpose of optimizing spilling sort mechanism: - Remove the hard-coded maximum limit of buffers that can be used for merging the sorted runs. Instead this number is calculated based on the available memory through buffer pool. - The already sorted runs are distributed more optimally between the last intermediate merge and the final merge to avoid that a heavy intermediate merge is followed by a light final merge. - Right before starting the merging phase Sorter tries to allocate additional memory through the buffer pool. - An output run is not allocated anymore for the final merge. Note, double-buffering the runs during a merge was also planned with this patch. However, performance testing showed that except some exotic queries with unreasonably small amount of buffer pool memory available double-buffering doesn't add to the overall performance. It's basically because the half of the available buffers have to be sacrificed to do double-buffering and as a result the merge tree can get deeper. In addition the amount of I/O wait time is not reaching the level where double-buffering could countervail the reduced number of runs during a particular merge. Performance measurements were made during manual testing to verify that this is in fact an optimization: - In case doing a sort on top of a join when working with a restricted amount of memory then the Sort node successfully allocates additional memory right before the merging phase. This is feasible because once Join finishes sending new input data and calls InputDone() then it releases memory that can be picked up by the Sorter. This results in shallower merging trees (more runs grabbed for a merge). - On a multi-node cluster I verified that in cases when at least one merging step is done then this change reduces the execution time for sorts. - The more merging steps are done the bigger the performance gain is compared to the baseline. Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 --- M be/src/runtime/sorter.cc M be/src/runtime/sorter.h A testdata/workloads/functional-query/queries/QueryTest/partitions-with-same-location.test M tests/query_test/test_sort.py 4 files changed, 149 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/9943/16 -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 16 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong