Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
......................................................................


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27
PS6, Line 27: class TupleFileReader;
> There is a special provision for std::unique_ptr and std::shared_ptr in the
Here's what I think is happening:
1. We only ever need to instantiate TupleCacheNode from TupleCachePlanNode's 
CreateExecNode() in tuple-cache-node.cc.
2. At that point, TupleFileReader and TupleFileWriter are fully defined, 
because tuple-cache-node.cc includes tuple-file-reader.h and 
tuple-file-writer.h.

The compiler implicitly declares the destructor, and that happens immediately 
for anything that includes the header. The compiler implicitly defines the 
destructor when it is first needed, and that can happen in multiple translation 
units.

My guess is that this is pretty fragile. If we needed to instantiate 
TupleCacheNode anywhere other than tuple-cache-node.cc, I think those places 
would need to include tuple-cache-reader.h and tuple-cache-writer.h. The better 
way to do this is to declare the destructor in the .h file and then define it 
in the .cc file with = default.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@33
PS6, Line 33: DECLARE_bool(allow_tuple_caching);
> Is this flag actually used here?
No, removed


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@61
PS6, Line 61:   SCOPED_TIMER(runtime_profile_->total_time_counter());
> runtime_profile() used above.
Good point, changed


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@142
PS6, Line 142:       } else if (writer_->BytesWritten() > 
tuple_cache_mgr->MaxSize()) {
> Should check the size and not write if it is over.
Reworked the code so that the TupleFileWriter enforces the max file size. It 
calculates how many bytes it would write and returns an error if the resulting 
file would exceed the limit.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@180
PS6, Line 180:     // TODO: is this needed for Reset too?
> Do we need to support Reset?
I don't think we need to support Reset(), so I changed it's definition to throw 
an error / DCHECK.

Separately, I moved the CompleteWrite() logic to happen in GetNext if it 
reaches eos.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.h
File be/src/exec/tuple-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.h@30
PS6, Line 30: class MemTracker;
> Is this forward declaration needed?
It looks like we don't need it. buffer-pool.h includes 
runtime/mem-tracker-types.h, which does the same thing.

Removed


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc
File be/src/exec/tuple-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@77
PS6, Line 77:   reader_.read(reinterpret_cast<char*>(&tuple_data_len), 
sizeof(tuple_data_len));
> Check status after reads using if(!reader_) like below.
Changed to use Kudu's RWFile, which returns status for each operation (which we 
check and return).


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@79
PS6, Line 79:   DCHECK_GE(tuple_data_len, 0);
> These may not be initialized if read fails.
Fixed


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@89
PS6, Line 89:     // TODO: more detailed error messages. May need to check for 
failure immediately
> This would be nice to address.
Good point, we should check after each read() call. I switch to use Kudu's 
RWFile which returns status after each call.

This also does additional sanity checks for the lengths to make sure the file 
is long enough to contain the necessary data, etc.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc
File be/src/exec/tuple-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc@79
PS6, Line 79:   // TODO: swap write order of data/offsets since all the 
functions use offsets first.
> I don't remember what this todo meant, we can probably get rid of it.
Removed


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc@89
PS6, Line 89:   if (!writer_) {
> I don't know of any particular issue with multiple failed writes.
Changed to Kudu's WritableFile and modified this to check status after each 
write.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc@91
PS6, Line 91:     // TODO: more detailed error messages. May need to check for 
failure immediately
> Would be nice to address.
Fixed


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/runtime/tuple-cache-mgr.cc
File be/src/runtime/tuple-cache-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/runtime/tuple-cache-mgr.cc@101
PS6, Line 101:         "The tuple_cache parameter can only be specified if 
allow_tuple_caching=true.");
> I'm trying to make sense of why we made it work this way. Wouldn't it be us
I will need to think this through again. I'm doing an upload without addressing 
this, but I will come back to it.



--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Fri, 05 Apr 2024 05:57:29 +0000
Gerrit-HasComments: Yes

Reply via email to