Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5788: Fix agg node crash when grouping by 
nondeterministic exprs
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7714/3/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1160:   // partition index.
> Yes, in case of repartition, the rows do get shuffled around in each iterat
Repartitioning should work ok in that each repartitioning step will reduce the 
size of partitions - each row ends up in an arbitrary partition. The end result 
is pretty much arbitrary though.

I thought this was a reasonable solution. I would be ok with a solution that 
failed queries with nondeterministic grouping functions but I couldn't think of 
another simpler solution that met the following two constraints:
* Guarantees that a nondeterministic UDF or builtin can't crash Impala
* Doesn't impose overhead on the "fast path".

Runtime checks for whether the row mapped to the right partition would impose 
some runtime overhead. We could avoid that by codegen'ing a different version 
of the probe function for the case when we're processing a single spilled 
partition, but then that would add codegen overhead.


-- 
To view, visit http://gerrit.cloudera.org:8080/7714
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to