[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8211 ) Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. Patch Set 4: (4 comments) Thanks. I think I got all the long lines out. "git show | egrep '^\+.{91}' | grep -v '^+//'" doesn't return anything any more. (Obviously, we're not wrapping the example codegen.) http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h File be/src/exec/hdfs-avro-scanner.h: http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h@241 PS3, Line 241: const SchemaPath& path, const AvroSchemaElement& record, int child_start, > long line Done http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@856 PS3, Line 856: BasicBlock* helper_block = BasicBlock::Create( > Long line Done http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@867 PS3, Line 867: Function* fnHelper = helper_functions[i]; > Long line. Done http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@889 PS3, Line 889: > long line. Done -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 4 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 22 Oct 2017 04:13:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#5). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 158 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/5 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 5 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#4). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 156 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/4 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 4 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 2: Code-Review+1 LGTM. not +2ing so others have a chance to weigh in as to whether you have addressed their comments. -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 22:23:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 2: (9 comments) Thanks for the reviews! I observed memory when watching this, and on my 32GB machine, I always has ~20GB available. I agree with Alex on adding in more things: there are similar changes that can continue to help here, but I'm doing them one at a time. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9 PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in > nit: Can you wrap this at the red line provided by gerrit? I think it is 72 Done. "gqip" does it in vi. It looks like it's 72 chars. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12 PS1, Line 12: 13 minut > nit: minutes Done http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33 PS1, Line 33: 16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 16 min 29 sec) > What testing did you do? Does the data load still run on a non-beefy local Define non-beefy? My desktop is 32 GB and 8 cores. This ran fine. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480 PS1, Line 480: # Run some steps in parallel, with run-step-backgroundable / run-step-wait-all. > Could add a comment about what you decided to background and what you decid Done. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492 PS1, Line 492: LOAD_NESTED_ARGS="--cm-host $CM_HOST" > I don't see any reason this also couldn't run in parallel. Yes, but I've not tested this one. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505 PS1, Line 505: load-data "functional-query" "core" "hbase/none" : fi : : if $KUDU_IS_SUPPORTED; then : # Tests depend on the kudu data being clean, so load > It should be possible to do the same thing for these. That will only save a Yes. I am testing this one, but I'll do a separate patch for it. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75 PS1, Line 75: HADOOP_HEAPSIZE="512" hive --service hiveserver2 > ${LOGDIR}/hive-server2.out 2>&1 & > :). I'm still using that good-old machine, mem should be fine (fingers cros 512 works, so that's what I've changed it to. I'm not investigating using -Xms -Xmx to give this more flexibility (but even less predictability). http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh File testdata/bin/run-step.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53 PS1, Line 53: > nit: only one empty line, to match context Done http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84 PS1, Line 84: RUN_STEP_MSGS=() > Do you want to reset MSGS, too? Good catch. Done. -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 21:32:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Hello Jim Apple, Joe McDonnell, Alex Behm, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8320 to look at the new patch set (#2). Change subject: IMPALA-6070: Parallel data load. .. IMPALA-6070: Parallel data load. This commit loads functional-query, TPC-H data, and TPC-DS data in parallel. In parallel, these take about 37 minutes, dominated by functional-query. Serially, these take about 30 minutes more, namely the 13 minutes of tpcds and 16 minutes of tpcds. This works out nicely because CPU usage during data load is very low in aggregate. (We don't sustain more than 1 CPU of load, whereas build machines are likely to have many CPUs.) To do this, I added support to run-step.sh to have a notion of a backgroundable task, and support waiting for all tasks. I also increased the heapsize of our HiveServer2 server. When datasets were being loaded in parallel, we ran out of memory at 256MB of heap. The resulting log output is currently like so (but without the timestamps): 15:58:04 Started Loading functional-query data in background; pid 8105. 15:58:04 Started Loading TPC-H data in background; pid 8106. 15:58:04 Loading functional-query data (logging to /home/impdev/Impala/logs/data_loading/load-functional-query.log)... 15:58:04 Started Loading TPC-DS data in background; pid 8107. 15:58:04 Loading TPC-H data (logging to /home/impdev/Impala/logs/data_loading/load-tpch.log)... 15:58:04 Loading TPC-DS data (logging to /home/impdev/Impala/logs/data_loading/load-tpcds.log)... 16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 13 min 27 sec) 16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 16 min 29 sec) 16:35:08Loading workload 'functional-query' using exploration strategy 'exhaustive' OK (Took: 37 min 4 sec) I tested dataloading with the following command on an 8-core, 32GB machine. I saw 19GB of available memory during my run: ./buildall.sh -testdata -build_shared_libs -start_minicluster -start_impala_cluster -format Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac --- M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 3 files changed, 44 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/2 -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5018: Error on decimal modulo or divide by zero
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8344 ) Change subject: IMPALA-5018: Error on decimal modulo or divide by zero .. Patch Set 3: I spoke to Greg and Alex yesterday, and we agreed that decimal_v2 should always be in strict mode. This means that when decimal_v2 is enabled, we should always error on overflows and division by zero. This is the behavior that more traditional databases have. -- To view, visit http://gerrit.cloudera.org:8080/8344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb Gerrit-Change-Number: 8344 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 19:15:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Oct 2017 10:24:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Future changes will apply to Avro, Sequence Files and Text. Once all scanners are converted, ScannerContext::contains_tuple_data_ will always be false and we can remove some dead code. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | ... Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Reviewed-on: http://gerrit.cloudera.org:8080/8085 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 66 insertions(+), 53 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8305 ) Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src. .. Patch Set 7: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/8305/7/be/src/util/time.h File be/src/util/time.h: http://gerrit.cloudera.org:8080/#/c/8305/7/be/src/util/time.h@106 PS7, Line 106: /// Convenience function to convert current time in Unix microseconds to date-time string Convenience function to return the date-time string of the current time derived from UnixMicros(). http://gerrit.cloudera.org:8080/#/c/8305/7/be/src/util/time.h@107 PS7, Line 107: NowMicrosToString How about CurrentTimeString() ? -- To view, visit http://gerrit.cloudera.org:8080/8305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c Gerrit-Change-Number: 8305 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram ThangaGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Sat, 21 Oct 2017 07:05:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1363/ -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Oct 2017 06:28:36 + Gerrit-HasComments: No