date:20171021

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-21 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..


Patch Set 4:

(4 comments)

Thanks. I think I got all the long lines out.

"git show | egrep '^\+.{91}' | grep -v '^+//'" doesn't return anything any 
more. (Obviously, we're not wrapping the example codegen.)

http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h
File be/src/exec/hdfs-avro-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h@241
PS3, Line 241:   const SchemaPath& path, const AvroSchemaElement& record, 
int child_start,
> long line
Done


http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@856
PS3, Line 856:   BasicBlock* helper_block = BasicBlock::Create(
> Long line
Done


http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@867
PS3, Line 867:   Function* fnHelper = helper_functions[i];
> Long line.
Done


http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@889
PS3, Line 889:
> long line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 22 Oct 2017 04:13:44 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-21 Thread Philip Zeyliger (Code Review)

Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#5).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 158 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/5
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-21 Thread Philip Zeyliger (Code Review)

Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#4).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 156 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/4
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-21 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 2: Code-Review+1

LGTM. not +2ing so others have a chance to weigh in as to whether you have 
addressed their comments.


--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 22:23:45 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-21 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 2:

(9 comments)

Thanks for the reviews!

I observed memory when watching this, and on my 32GB machine, I always has 
~20GB available.

I agree with Alex on adding in more things: there are similar changes that can 
continue to help here, but I'm doing them one at a time.

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9
PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in
> nit: Can you wrap this at the red line provided by gerrit? I think it is 72
Done. "gqip" does it in vi. It looks like it's 72 chars.


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12
PS1, Line 12: 13 minut
> nit: minutes
Done


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33
PS1, Line 33: 16:14:33Loading workload 'tpcds' using exploration strategy 
'core' OK (Took: 16 min 29 sec)
> What testing did you do? Does the data load still run on a non-beefy local
Define non-beefy?

My desktop is 32 GB and 8 cores. This ran fine.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480
PS1, Line 480:   # Run some steps in parallel, with run-step-backgroundable / 
run-step-wait-all.
> Could add a comment about what you decided to background and what you decid
Done.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492
PS1, Line 492: LOAD_NESTED_ARGS="--cm-host $CM_HOST"
> I don't see any reason this also couldn't run in parallel.
Yes, but I've not tested this one.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505
PS1, Line 505:   load-data "functional-query" "core" "hbase/none"
 : fi
 :
 : if $KUDU_IS_SUPPORTED; then
 :   # Tests depend on the kudu data being clean, so load
> It should be possible to do the same thing for these. That will only save a
Yes. I am testing this one, but I'll do a separate patch for it.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="512" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> :). I'm still using that good-old machine, mem should be fine (fingers cros
512 works, so that's what I've changed it to. I'm not investigating using -Xms 
-Xmx to give this more flexibility (but even less predictability).


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh
File testdata/bin/run-step.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53
PS1, Line 53:
> nit: only one empty line, to match context
Done


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84
PS1, Line 84:   RUN_STEP_MSGS=()
> Do you want to reset MSGS, too?
Good catch. Done.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 21:32:51 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-21 Thread Philip Zeyliger (Code Review)

Hello Jim Apple, Joe McDonnell, Alex Behm, Zach Amsden,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8320

to look at the new patch set (#2).

Change subject: IMPALA-6070: Parallel data load.
..

IMPALA-6070: Parallel data load.

This commit loads functional-query, TPC-H data, and TPC-DS data in
parallel. In parallel, these take about 37 minutes, dominated by
functional-query. Serially, these take about 30 minutes more, namely the
13 minutes of tpcds and 16 minutes of tpcds. This works out nicely
because CPU usage during data load is very low in aggregate. (We don't
sustain more than 1 CPU of load, whereas build machines are likely to
have many CPUs.)

To do this, I added support to run-step.sh to have a notion of a
backgroundable task, and support waiting for all tasks.

I also increased the heapsize of our HiveServer2 server. When datasets
were being loaded in parallel, we ran out of memory at 256MB of heap.

The resulting log output is currently like so (but without the
timestamps):

15:58:04  Started Loading functional-query data in background; pid 8105.
15:58:04  Started Loading TPC-H data in background; pid 8106.
15:58:04  Loading functional-query data (logging to 
/home/impdev/Impala/logs/data_loading/load-functional-query.log)...
15:58:04  Started Loading TPC-DS data in background; pid 8107.
15:58:04  Loading TPC-H data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpch.log)...
15:58:04  Loading TPC-DS data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpcds.log)...
16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 
13 min 27 sec)
16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK 
(Took: 16 min 29 sec)
16:35:08Loading workload 'functional-query' using exploration strategy 
'exhaustive' OK (Took: 37 min 4 sec)

I tested dataloading with the following command on an 8-core, 32GB
machine. I saw 19GB of available memory during my run:
  ./buildall.sh -testdata -build_shared_libs -start_minicluster 
-start_impala_cluster -format

Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
---
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
3 files changed, 44 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/2
--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden

[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

2017-10-21 Thread Taras Bobrovytsky (Code Review)

Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8344 )

Change subject: IMPALA-5018: Error on decimal modulo or divide by zero
..


Patch Set 3:

I spoke to Greg and Alex yesterday, and we agreed that decimal_v2 should always 
be in strict mode. This means that when decimal_v2 is enabled, we should always 
error on overflows and division by zero. This is the behavior that more 
traditional databases have.


--
To view, visit http://gerrit.cloudera.org:8080/8344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Gerrit-Change-Number: 8344
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Sat, 21 Oct 2017 19:15:01 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-21 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Oct 2017 10:24:49 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-21 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..

IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

This change only affects uncompressed plain-encoded Parquet where
RowBatches may directly reference strings stored in the I/O
buffers. The proposed fix is to simply copy the data pages if
needed then use the same logic that we use for decompressed data
pages.

This copy inevitably adds some CPU overhead, but I believe this is
acceptable because:
* We generally recommend using compression, and optimize for that
  case.
* Copying memory is cheaper than decompressing data.
* Scans of uncompressed data are very likely to be I/O bound.

This allows several major simplifications:
* The resource management for compressed and uncompressed
  scans is much more similar.
* We don't need to attach Disk I/O buffers to RowBatches.
* We don't need to deal with attaching I/O buffers in
  ScannerContext.
* Column readers can release each I/O buffer *before* advancing to
  the next one, making it easier to reason about resource
  consumption. E.g. each Parquet column only needs one I/O buffer at
  a time to make progress.

Future changes will apply to Avro, Sequence Files and Text. Once
all scanners are converted, ScannerContext::contains_tuple_data_
will always be false and we can remove some dead code.

Testing
===
Ran core ASAN and exhaustive debug builds.

Perf

No difference in most cases when scanning uncompressed parquet.
There is a significant regression (50% increase in runtime) in
targeted perf tests scanning non-dictionary-encoded strings (see
benchmark output below).  After the regression performance is
comparable to Snappy compression.

I also did a TPC-H run but ran into some issues with the report
generator. I manually compared times and there were no regressions.

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
++---+-++++
| TARGETED-PERF(_61) | parquet / none / none | 23.02   | +0.60% | 4.23  
 | +5.97% |
++---+-++++

+++---++-++++-+---+
| Workload   | Query  | File Format   | Avg(s) | 
Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+++---++-++++-+---+
| TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00   | 
1.98| R +52.10%  |   0.97%|   1.25%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86   | 
1.92| R +49.11%  |   0.34%|   2.34%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16   | 
2.15| R +47.04%  |   1.03%|   0.72%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16   | 
2.17| R +45.60%  |   0.14%|   1.11%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51   | 
2.55| R +37.88%  |   0.83%|   0.49%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79   | 
0.61| R +30.86%  |   1.54%|   4.10%| 1   | 5 |
| TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45  | 
35.07   |   +12.51%  |   0.29%|   0.29%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78   | 
6.10|   +11.13%  |   0.99%|   0.74%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83   | 
8.14|   +8.52%   |   0.15%|   0.32%| 1   | 5 |
...

Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Reviewed-on: http://gerrit.cloudera.org:8080/8085
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.h
4 files changed, 66 insertions(+), 53 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit

[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-10-21 Thread Michael Ho (Code Review)

Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..


Patch Set 7: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8305/7/be/src/util/time.h
File be/src/util/time.h:

http://gerrit.cloudera.org:8080/#/c/8305/7/be/src/util/time.h@106
PS7, Line 106: /// Convenience function to convert current time in Unix 
microseconds to date-time string
Convenience function to return the date-time string of the current time derived 
from UnixMicros().


http://gerrit.cloudera.org:8080/#/c/8305/7/be/src/util/time.h@107
PS7, Line 107: NowMicrosToString
How about CurrentTimeString() ?



--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 7
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Sat, 21 Oct 2017 07:05:18 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-21 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1363/


--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Oct 2017 06:28:36 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

[Impala-ASF-CR] IMPALA-6070: Parallel data load.

[Impala-ASF-CR] IMPALA-6070: Parallel data load.

[Impala-ASF-CR] IMPALA-6070: Parallel data load.

[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

11 matches

Site Navigation

Mail list logo

Footer information