Grant Henke has posted comments on this change.

Change subject: Reduce arguments in file Read API
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6770/3/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 355:   virtual Status Read(Slice* result) = 0;
> I think the primary motivation for this refactor was the new ReadV() API th
Before the patch that made the read calls "read fully" there was a length IN 
and a length OUT parameter. The length OUT parameter was in the OUT slice. Now 
they are essentially the same.

Currently the Read() API essentially requires 3 parameters:
- offset (IN): position to start reading from
- length (IN/OUT): the amount of data to read
- scratch (OUT): memory to read into

The ReadV() API will similarly need:
- offset (IN): position to start reading from
- results (IN/OUT): A vector of:
   - length (IN): the amount of data to read
   - scratch (OUT): memory to read into

Since Slice is commonly used today to encapsulate a scratch buffer and define 
it's relevant length I thought it would work as an efficent IN/OUT parameter 
for both READ() and READV(). 

This results in the following API:
- Status Read(uint64_t offset, Slice* result)
- Status ReadV(uint64_t offset, vector<Slice>* results)

> in the case where we might be reading from a memory-mapped file, we wouldn't 
> necessarily want to use the buffer referenced in result.data; we would likely 
> want to reset the Slice to reference mapped memory for zero-copy.

Yes this wouldn't necessarily work well for that. In that case a scratch buffer 
isn't needed at all. It doesn't look like this API handled that use case well 
in general since the Slice didn't "own" the data and required a memcpy to make 
sure the data stuck around.


Line 376:   // Read up to "result.size" from the file starting at "offset".
> Why "up to"? Isn't guaranteed to read exactly that amount, or fail?
Technically if it failed with an IO error after a short read the scratch buffer 
would contain a partial result. Though the error status would indicated the 
result is invalid.


Line 382:   // possible to read exactly 'length' bytes, an IOError is returned.
> This note should probably exist in the other Read() implementations too. Or
It's it on all the "bottom" level reads that any other read wraps. Do you think 
that is enough?


Line 504:   // Read up to "result.size" from the file starting at "offset".
> Same question about "up to".
See earlier response.


http://gerrit.cloudera.org:8080/#/c/6770/3/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 278:                                         req, offset));
> Nit: I think result->length() is actually more correct here. Yes, the EOF o
I think I agree. But the offset is being edited in the loop too. I would need 
to keep and reference the original offset too. Should I do that?


Line 308:         *result = Slice(result->mutable_data(), r);
> Wouldn't result->truncate(r) be simpler?
Good idea. Done.


http://gerrit.cloudera.org:8080/#/c/6770/3/src/kudu/util/pb_util-internal.cc
File src/kudu/util/pb_util-internal.cc:

Line 51:   }
> I don't think this is possible anymore, is it?
Done


http://gerrit.cloudera.org:8080/#/c/6770/3/src/kudu/util/pb_util.cc
File src/kudu/util/pb_util.cc:

Line 183: Status NonShortRead(T* file, uint64_t offset, Slice* result);
> Is the indirection via NonShortRead() still needed? Or can we just call rea
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3e4df425a7387c28eb428e851a9f8c25167755d4
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to