[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Oct 2017 10:24:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Future changes will apply to Avro, Sequence Files and Text. Once all scanners are converted, ScannerContext::contains_tuple_data_ will always be false and we can remove some dead code. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | ... Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Reviewed-on: http://gerrit.cloudera.org:8080/8085 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 66 insertions(+), 53 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1363/ -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Oct 2017 06:28:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1361/ -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Oct 2017 03:08:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1361/ -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Oct 2017 23:16:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Code-Review+2 carry Dan's +2 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 18 Oct 2017 19:12:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 10: Mostafa, we spoke offline about this but just wanted to confirm it looked ok to you. The summary is that the CPU cost of value materialisation for non-dict-encoded strings in uncompressed parquet scans regresses to roughly the same CPU cost as snappy-compressed parquet. I didn't see any regressions on TPC-H, only narrowly targeted benchmarks. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 18 Oct 2017 19:12:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc@983 PS8, Line 983: data > buffers? Done -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 18 Oct 2017 19:09:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Hello Lars Volker, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8085 to look at the new patch set (#9). Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Future changes will apply to Avro, Sequence Files and Text. Once all scanners are converted, ScannerContext::contains_tuple_data_ will always be false and we can remove some dead code. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | ... Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 66 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/9 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 8: Code-Review+2 (1 comment) Yes, this is good with me. Could you get Mostafa to sign off on perf implications, perhaps recording that in the JIRA if not already (issues.apache.org not loading for me)? http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc@983 PS8, Line 983: data buffers? -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 18 Oct 2017 18:16:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 8: Dan, does this look ok to you at a high level? I think the code change itself is relatively straightforward but it would be good to confirm the direction and perf trade-off. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 18 Oct 2017 17:54:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 8: Code-Review+1 rebase, carry +1 from alex -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 13 Oct 2017 23:10:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 7: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 7 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 04 Oct 2017 22:15:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > Does seem like the simplest solution. Might be worth adding a short comment Expanded the comment just below to explain motivation. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 04 Oct 2017 22:07:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Hello Lars Volker, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8085 to look at the new patch set (#7). Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Future changes will apply to Avro, Sequence Files and Text. Once all scanners are converted, ScannerContext::contains_tuple_data_ will always be false and we can remove some dead code. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | ... Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 66 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/7 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > The basic problem is that I want to be able to reason about how many I/O bu Does seem like the simplest solution. Might be worth adding a short comment here about the intention, since it's not necessarily obvious (at least to me). Data pages are indeed smaller, but due to the mempool I think you may end up allocating MAX_CHUNK_SIZE per data page in the worst case. Probably not very common, so seems ok. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 29 Sep 2017 04:34:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > Playing devil's advocate and really trying to understand the crux of why th The basic problem is that I want to be able to reason about how many I/O buffers you need to make progress in reading through a Parquet file. If each column can make progress independently then it's simple since you can reason about each column in isolation. With the copying out you only need one buffer per column to make progress. If you try to hold onto I/O buffers you end up with dependencies between column readers. E.g. reader A may have materialised 512 values and hit the end of a buffer. Reader A can't free the previous buffer until those rows have been returned. Those rows can't be returned until all of the column values have been materialised for those rows. But maybe column B can only materialise 511 values before hitting the end of its buffer. So then you'd need to output 511 rows, then advance column B, then output 1 row before you advance column A. You can get out of that two ways: * Advance the different column readers different amounts and make progress on materialising as many final rows as you can. I think you would need to try to advance all column readers, then return however many rows you can at that point, then try to inch them forward again. * Let each column reader advance to the next buffer before freeing the previous one and hope that all column readers can materialise the requested number of values without running out of buffers. We'd need to make some assumptions to ensure that we can always materialise batch_size values given M I/O buffers. I don't think we can make any universally valid assumptions so we'd probably have to add code to detect when those assumptions are invalidated and add query options to get out of any problems. It doesn't seem worth spending complexity on either solution, especially since we'd need extra test coverage. Data pages are much smaller (around 64kb) than the 8MB I/O buffers and aren't currently part of the reservation so they're less of a problem for now - the memory overhead per column will be around max(2 * 64kb, 1024 * ) which is tolerable. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 29 Sep 2017 01:41:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > (if we pushed predicate evaluation down into the column readers, that might Playing devil's advocate and really trying to understand the crux of why this new approach is simpler. You are right that the stream can free the previous I/O buffer immediately, but you are still accumulating the same amount of memory in the data_pool_, so the peak memory consumption seems no different than before. Overall you might even need one more page than before. We copy and accumulate data pages here, but only free the data page from the stream in the next call to ReadDataPage(), so for some period of time one page exists both in the stream and in the data_pool_. I'm not sure I follow why the the "synchronized release of I/O buffers" is difficult in the copy-in-transfer approach. The scratch batch already has all interesting I/O buffers attached, so we can release those after copying the var-len data of survivors. Returned batches will have no I/O buffers attached. Completely agree about the perf tradeoffs you mentioned. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 29 Sep 2017 00:35:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > That would reintroduce one of the problems I was trying to solve - the need (if we pushed predicate evaluation down into the column readers, that might be viable, although there's a trade-off between optimising the number of bytes copied and optimising the number of individual copies, since each copy has some fixed overhead.). -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 29 Sep 2017 00:01:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > Instead of eagerly copying the entire page, have you considered only coping That would reintroduce one of the problems I was trying to solve - the need to synchronise release of I/O buffers between different column readers. The big simplification in the current approach is that each column reader can immediately free the previous I/O buffer as soon as it needs to advance to the net data page. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Sep 2017 23:59:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > Thanks for the experiment. I'm in favor of moving forward, but still a litt I think the upside is that simplifying the Parquet code incrementally makes it easier to optimise in future. We have plenty of scope to optimise CPU efficiency in future in ways that will benefit most queries. I.e. I wouldn't spend time optimising the uncompressed strings case, rather just try to make it overall more CPU efficient. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Sep 2017 23:57:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > I reran with mt_dop=8. Time spent in the scan on STRING-Q1 went from 3.35s Thanks for the experiment. I'm in favor of moving forward, but still a little worried. Maybe we can investigate later whether we can recoup some of the perf losses. The main issue is that affected users have no way to get the performance back, except downgrading. I'm curious to hear what other think of the perf regression. http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { Instead of eagerly copying the entire page, have you considered only coping the var-len data of surviving tuples in TransferScratchTuples() or FinalizeTupleTransfer()? That approach might be more efficient for selective scans, but worse for full scans. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Sep 2017 23:44:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > Makes sense. Might still be worth doing a quick experiment with MT_DOP to s I reran with mt_dop=8. Time spent in the scan on STRING-Q1 went from 3.35s -> 5.15s. I.e. about +50%, so no real difference. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 28 Sep 2017 22:08:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > I'm pretty sure it's the memory copying - making a memory allocation should Makes sense. Might still be worth doing a quick experiment with MT_DOP to see if the regression remains the same (MT_DOP is easier to reason about). I'm definitely in favor of this change, but uncompressed Parquet with a lot of string columns is unfortunately very common, despite our recommendations. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 16:25:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245 PS5, Line 245: context_->ReleaseCompletedResources(nullptr, true); > I can obviously fix it if you feel strongly but it just doesn't seem that i Ah, I understand. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477 PS5, Line 477: int64_t size, const char* err_ctx, uint8_t** buffer); > Done. Also switch to const char* so we don't need to create a std::string f Thx. ctx somewhat reminds me of the various context objects that we use elsewhere, but I don't feel strongly about it. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 16:20:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > Just to clarify my point. My hope is that most of the overhead may ultimate I'm pretty sure it's the memory copying - making a memory allocation should at least an order of magnitude cheaper than doing a pass over a data page. Unsure if the difference is due to the extra instructions executed or the increase in cache pressure from having two copies of the data. I haven't measured but I'm also not convinced that the Disk IO Mgr's buffer caching is necessarily more efficient or scalable than TCMalloc. The IO mgr just has a global lock whereas TCMalloc has locks per size class plus batching via the thread cache. The buffer pool should be more scalable for large allocations than either in any case. The queries that regressed are close to the worst possible case since they don't do any work aside from materialising the strings and evaluating a conjunct. Plus the data is already present in the buffer cache. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 16:09:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > Maybe the main difference is that copying into a mempool will ultimately fr Just to clarify my point. My hope is that most of the overhead may ultimately go away when backing the mem pool by the buffer pool (which I presume can recycle buffers?). -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 15:42:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > Do we understand where the regression comes from exactly? Somehow I find it Maybe the main difference is that copying into a mempool will ultimately free the copied memory, but I/O buffers may be recycled and would not take the tcmalloc hit from freeing from another thread? -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 15:38:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in Do we understand where the regression comes from exactly? Somehow I find it hard to believe a 50% overhead just from copying. Is it maybe another tcmalloc issue where different threads allocate/free the memory? Have you tried a similar experiment with MT_DOP? -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 15:34:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245 PS5, Line 245: context_->ReleaseCompletedResources(nullptr, true); > I fixed it while I was checking the callsites of ReleaseCompletedResources( I can obviously fix it if you feel strongly but it just doesn't seem that important to me. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Sep 2017 23:50:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8085 to look at the new patch set (#6). Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Future changes will apply to Avro, Sequence Files and Text. Once all scanners are converted, ScannerContext::contains_tuple_data_ will always be false and we can remove some dead code. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | ... Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 64 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/6 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType:
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@7 PS5, Line 7: IMPALA-5307 > Can you add a line at the bottom what the other part(s) would look like? Done http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@56 PS5, Line 56: +++---++-++++-+---+ > Nit: You could make the second column smaller to make this more readable, a Done http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245 PS5, Line 245: context_->ReleaseCompletedResources(nullptr, true); > I think it's best to change the whole file at once, or only change occurren I fixed it while I was checking the callsites of ReleaseCompletedResources(). http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@476 PS5, Line 476: Status AllocateUncompressedDataPage( > Should we call this "AllocateUncompressedDataBuffer"? Otherwise it sounds t That seems vaguer. It would also be confusing since the pool is data_page_pool_. I updated the comment to mention that the contents are uncompressed rather than the page. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477 PS5, Line 477: int64_t size, const std::string& desc, uint8_t** buffer); > Maybe err_desc, err_detail, or detail? "desc" reminds me of descriptors. Done. Also switch to const char* so we don't need to create a std::string for every call. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@485 PS5, Line 485: IsStringType > This does not say "VarLenStringType" but above in a comment you refer to va Fixed it. We don't need to copy if it's a CHAR. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1075 PS5, Line 1075: uncompressed_size, "uncompressed variable-length data", _buffer)); > DCHECK(copy_buffer != nullptr); And maybe initialize it to nullptr, so that It doesn't seem that necessary to me since it's part of the contract of the function and it will crash immediately anyway. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1105 PS5, Line 1105: *buffer = data_page_pool_->TryAllocate(size); Bad indentation. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Sep 2017 23:49:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@7 PS5, Line 7: IMPALA-5307 Can you add a line at the bottom what the other part(s) would look like? http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@56 PS5, Line 56: +++---++-++++-+---+ Nit: You could make the second column smaller to make this more readable, and add a bottom delimiter line to indicate it was truncated on purposed and not by mistake. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245 PS5, Line 245: context_->ReleaseCompletedResources(nullptr, true); I think it's best to change the whole file at once, or only change occurrences where necessary. This looks like it may be left from a previous patchset. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@476 PS5, Line 476: Status AllocateUncompressedDataPage( Should we call this "AllocateUncompressedDataBuffer"? Otherwise it sounds to me like it'll only be needed for uncompressed pages. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477 PS5, Line 477: int64_t size, const std::string& desc, uint8_t** buffer); Maybe err_desc, err_detail, or detail? "desc" reminds me of descriptors. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@485 PS5, Line 485: IsStringType This does not say "VarLenStringType" but above in a comment you refer to var-len data. Can you clarify one of them? http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1075 PS5, Line 1075: uncompressed_size, "uncompressed variable-length data", _buffer)); DCHECK(copy_buffer != nullptr); And maybe initialize it to nullptr, so that it's explicit what the allocation will do. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 25 Sep 2017 23:30:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_all| parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 63 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/5 --