Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14804 )

Change subject: IMPALA-9174: Speedup allocations of ORC Scanner
......................................................................


Patch Set 4: Code-Review+2

(2 comments)

I am ok with the implementation, but I think hdfs-orc-scanner.cc is not the 
ideal place for it, see my comment.

http://gerrit.cloudera.org:8080/#/c/14804/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14804/4//COMMIT_MSG@19
PS4, Line 19: malloc already stores the size of the allocated bytes in front of 
the
            : allocated bytes. That's why 'free()' doesn't take a 'size' 
parameter,
            : it just searches for the size information next to the pointed 
bytes.
            : TC Malloc provides a function called 'tc_malloc_size()' that 
reveals
            : that information programmatically, so instead of the hash table we
            : can just use this function to retrieve the size.
This is not exactly how tc malloc stores sizes. There is an explanation in  
http://goog-perftools.sourceforge.net/doc/tcmalloc.html , mainly in "Small 
Object Allocation", "Spans", and Deallocation sections.

I would only mention that tc malloc has an API to retrieve the allocated size 
and provide link to the doc with details.

The main problem with always storing the size for every object is that in case 
of very small objects it leads to large overhead, so a common optimization in 
heap implementations is to store objects of similar size (class) in a 
contiguous area and only store the size once for that area. It also wouldn't be 
ideal for big allocations, where it makes sense to work only with K*4096 byte 
chunks.


http://gerrit.cloudera.org:8080/#/c/14804/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/14804/4/be/src/exec/hdfs-orc-scanner.cc@63
PS4, Line 63: malloc
I think it would be better to move most of this functionality into MemTracker, 
e.g. to functions like TrackedMalloc + TrackedFree. This function could just 
call that and throw a suitable exception based on the result.

It would be also nice to create some tests that exercise most paths.

I am ok with creating a follow up jira for this and do it later.



--
To view, visit http://gerrit.cloudera.org:8080/14804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia09e746883176d6f955c1718267bf55e2abb239b
Gerrit-Change-Number: 14804
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 10 Dec 2019 13:17:16 +0000
Gerrit-HasComments: Yes

Reply via email to