[Impala-ASF-CR] IMPALA-6714: [DOCS] ORC file format support

2018-05-28 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10525


Change subject: IMPALA-6714: [DOCS] ORC file format support
..

IMPALA-6714: [DOCS] ORC file format support

This document is wrote refering to RCFile and Parquet's docs.

The orc-support patch was merged in impala-2.12 and impala-3.0,
so we start to support ORC format as an experimental feature
since impala-2.12.

Change-Id: Ib1ee23ed844653c274babdce5a332dbe5c79b630
---
M docs/impala.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_file_formats.xml
A docs/topics/impala_orc.xml
4 files changed, 296 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/10525/1
--
To view, visit http://gerrit.cloudera.org:8080/10525
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib1ee23ed844653c274babdce5a332dbe5c79b630
Gerrit-Change-Number: 10525
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3307: Add support for IANA time-zone db

2018-05-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
..


Patch Set 10:

I do not see any more low hanging fruits for performance improvement. Some 
overhead could be removed by modifying CCTZ, but this is out of the scope of 
this change, so I created a follow up Jira:
IMPALA-7085:
Consider patching Google/CCTZ for Impala's need


--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 May 2018 16:20:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7082: Show human readable size in query backend page

2018-05-28 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10523 )

Change subject: IMPALA-7082: Show human readable size in query backend page
..

IMPALA-7082: Show human readable size in query backend page

This patch reuse the JS function introduced in IMPALA-6966,
which can convert Long size value into human readable size,
to render the "Peak mem. consumption" column in the Query
details's Backends page.

Change-Id: I04afb4091bb8b6bc9dedfeceaf9284a8c65b16a1
---
M www/catalog.tmpl
M www/common-header.tmpl
M www/query_backends.tmpl
3 files changed, 16 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/10523/2
--
To view, visit http://gerrit.cloudera.org:8080/10523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04afb4091bb8b6bc9dedfeceaf9284a8c65b16a1
Gerrit-Change-Number: 10523
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations

2018-05-28 Thread Gabor Kaszab (Code Review)
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9943

to look at the new patch set (#17).

Change subject: IMPALA-5706: Spilling sort optimisations
..

IMPALA-5706: Spilling sort optimisations

This patch covers multiple changes with the purpose of optimizing
spilling sort mechanism:
  - Remove the hard-coded maximum limit of buffers that can be used
for merging the sorted runs. Instead this number is calculated
based on the available memory through buffer pool.
  - The already sorted runs are distributed more optimally between
the last intermediate merge and the final merge to avoid that a
heavy intermediate merge is followed by a light final merge.
  - Right before starting the merging phase Sorter tries to allocate
additional memory through the buffer pool.
  - An output run is not allocated anymore for the final merge.

Note, double-buffering the runs during a merge was also planned with
this patch. However, performance testing showed that except some
exotic queries with unreasonably small amount of buffer pool memory
available double-buffering doesn't add to the overall performance.
It's basically because the half of the available buffers have to be
sacrificed to do double-buffering and as a result the merge tree can
get deeper. In addition the amount of I/O wait time is not reaching
the level where double-buffering could countervail the reduced number
of runs during a particular merge.

Performance measurements were made during manual testing to verify
that this is in fact an optimization:
  - In case doing a sort on top of a join when working with a
restricted amount of memory then the Sort node successfully
allocates additional memory right before the merging phase. This
is feasible because once Join finishes sending new input data and
calls InputDone() then it releases memory that can be picked up
by the Sorter. This results in shallower merging trees (more runs
grabbed for a merge).
  - On a multi-node cluster I verified that in cases when at least one
merging step is done then this change reduces the execution time
for sorts.
  - The more merging steps are done the bigger the performance gain is
compared to the baseline.

Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
---
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M tests/query_test/test_sort.py
3 files changed, 149 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/9943/17
--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations

2018-05-28 Thread Gabor Kaszab (Code Review)
Hello Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9943

to look at the new patch set (#16).

Change subject: IMPALA-5706: Spilling sort optimisations
..

IMPALA-5706: Spilling sort optimisations

This patch covers multiple changes with the purpose of optimizing
spilling sort mechanism:
  - Remove the hard-coded maximum limit of buffers that can be used
for merging the sorted runs. Instead this number is calculated
based on the available memory through buffer pool.
  - The already sorted runs are distributed more optimally between
the last intermediate merge and the final merge to avoid that a
heavy intermediate merge is followed by a light final merge.
  - Right before starting the merging phase Sorter tries to allocate
additional memory through the buffer pool.
  - An output run is not allocated anymore for the final merge.

Note, double-buffering the runs during a merge was also planned with
this patch. However, performance testing showed that except some
exotic queries with unreasonably small amount of buffer pool memory
available double-buffering doesn't add to the overall performance.
It's basically because the half of the available buffers have to be
sacrificed to do double-buffering and as a result the merge tree can
get deeper. In addition the amount of I/O wait time is not reaching
the level where double-buffering could countervail the reduced number
of runs during a particular merge.

Performance measurements were made during manual testing to verify
that this is in fact an optimization:
  - In case doing a sort on top of a join when working with a
restricted amount of memory then the Sort node successfully
allocates additional memory right before the merging phase. This
is feasible because once Join finishes sending new input data and
calls InputDone() then it releases memory that can be picked up
by the Sorter. This results in shallower merging trees (more runs
grabbed for a merge).
  - On a multi-node cluster I verified that in cases when at least one
merging step is done then this change reduces the execution time
for sorts.
  - The more merging steps are done the bigger the performance gain is
compared to the baseline.

Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
---
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A 
testdata/workloads/functional-query/queries/QueryTest/partitions-with-same-location.test
M tests/query_test/test_sort.py
4 files changed, 149 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/9943/16
--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong