Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 )
Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet ...................................................................... Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/6//COMMIT_MSG@44 PS6, Line 44: There is a significant regression (50% increase in runtime) in > Just to clarify my point. My hope is that most of the overhead may ultimate I'm pretty sure it's the memory copying - making a memory allocation should at least an order of magnitude cheaper than doing a pass over a data page. Unsure if the difference is due to the extra instructions executed or the increase in cache pressure from having two copies of the data. I haven't measured but I'm also not convinced that the Disk IO Mgr's buffer caching is necessarily more efficient or scalable than TCMalloc. The IO mgr just has a global lock whereas TCMalloc has locks per size class plus batching via the thread cache. The buffer pool should be more scalable for large allocations than either in any case. The queries that regressed are close to the worst possible case since they don't do any work aside from materialising the strings and evaluating a conjunct. Plus the data is already present in the buffer cache. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 27 Sep 2017 16:09:00 +0000 Gerrit-HasComments: Yes
