Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc@61
PS11, Line 61: return 0;
> Is there a reason why we don't treat this as parse error and returns -1 ?
No that's a good point. I changed it.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h@347
PS11, Line 347:   /// Returns -1 if parsing should be aborted due to parse 
errors.
> May help to also add "Returns 0 if copying strings into 'pool' failed'.
Clarified that it could be a memory allocation error. Per the other comment, it 
probably makes most sense to just return -1.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc@a282
PS11, Line 282:
> It may be trivial but does it make sense to add DCHECK(row_batch->num_tuple
The assumption of 1 tuple per row for scans is pretty deeply baked into the 
scanners and planner, so it seemed silly to have this speculative generality 
here. I added a DCHECK to HdfsScanner to document that the assumption is valid 
for all HDFS scanners.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@849
PS11, Line 849: tuples_returned
> There is a subtle behavior here if Tuple::CopyStrings() failed in WriteAlig
Yeah I think I just made a mistake returning 0. Returning -1 makes more sense. 
I manually tested with this query:
    [localhost:21000] > set num_nodes=1; set disable_codegen=1; set 
mem_limit=5m; select * from widetable_1000_cols;
    NUM_NODES set to 1
    DISABLE_CODEGEN set to 1
    MEM_LIMIT set to 5m
    Query: select * from widetable_1000_cols
    Query submitted at: 2017-11-07 11:52:01 (Coordinator: 
http://tarmstrong-box:25000)
    Query progress can be monitored at: 
http://tarmstrong-box:25000/query_plan?query_id=9948285e1aa8e172:f73fe97100000000
    ERROR: Memory limit exceeded: HdfsScanner::WriteAlignedTuples() failed to 
allocate 125 bytes for strings.
    HDFS_SCAN_NODE (id=0) could not allocate 125.00 B without exceeding limit.
    Error occurred on backend tarmstrong-box:22000 by fragment 
9948285e1aa8e172:f73fe97100000000
    Memory left in process limit: 8.01 GB
    Memory left in query limit: 947.59 KB
    Query(9948285e1aa8e172:f73fe97100000000): Limit=5.00 MB Reservation=0 
ReservationLimit=0 OtherMemory=4.07 MB Total=4.07 MB Peak=4.07 MB
      Fragment 9948285e1aa8e172:f73fe97100000000: Reservation=0 
OtherMemory=4.07 MB Total=4.07 MB Peak=4.07 MB
        HDFS_SCAN_NODE (id=0): Total=4.07 MB Peak=4.07 MB
        PLAN_ROOT_SINK: Total=0 Peak=0

It would be nice to turn it into an end-to-end test but I cant see how to make 
it robustly fail in that place.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@857
PS11, Line 857: num_tuples
> Why is this not tuples_returned here ? Is there any guarantee max_added_tup
I think num_fields is tracking the number of fields consumed from the input.



--
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 19:54:01 +0000
Gerrit-HasComments: Yes

Reply via email to