[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-22 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 4: Code-Review+2

Carry Dan's +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-21 Thread Michael Ho (Code Review)
Hello Matthew Jacobs, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4740

to look at the new patch set (#4).

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..

IMPALA-4120: Incorrect results with LEAD() analytic function

This change fixes a memory management problem with LEAD()/LAG()
analytic functions which led to incorrect result. In particular,
the update functions specified for these analytic functions only
make a shallow copy of StringVal (i.e. copying only the pointer
and the length of the string) without copying the string itself.
This may lead to problem if the string is created from some UDFs
which do local allocations whose buffer may be freed and reused
before the result tuple is copied out. This change fixes the problem
above by allocating a buffer at the Init() functions of these
analytic functions to track the intermediate value. In addition,
when the value is copied out in GetValue(), it will be copied into
the MemPool belonging to the AnalyticEvalNode and attached to the
outgoing row batches. This change also fixes a missing free of
local allocations in QueryMaintenance().

Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/udf/udf.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
7 files changed, 120 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/4740/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4740/3/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 403:   }
> so this bug was similar to IMPALA-3311 (see PartitionedAggregationNode::Han
Yes. We need some refactoring to share the code. Probably not worth it.


Line 420:  (next_partition || (fn_scope_ == RANGE && 
window_.__isset.window_end &&
> this indentation seems off. what does clang-format say?
Tried with clang-format but still looks a bit odd. Changed the indentation to 
look closer to the existing code.


http://gerrit.cloudera.org:8080/#/c/4740/3/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS3, Line 153: beyond the scope of the call site
> Maybe say "The memory is a local allocation and so needs to be copied out b
Incorporated comments from Dan and Matt.


PS3, Line 153: beyond the scope of the call site
> I'm still not sure this is precise enough... this sounds like it's scoped a
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 3: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4740/3/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 403:   }
so this bug was similar to IMPALA-3311 (see 
PartitionedAggregationNode::HandleOutputStrings()), right? but i guess we can't 
share the solution since in this case we're not directly populating the result 
row batch.


Line 420:  (next_partition || (fn_scope_ == RANGE && 
window_.__isset.window_end &&
this indentation seems off. what does clang-format say?


http://gerrit.cloudera.org:8080/#/c/4740/3/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS3, Line 153: beyond the scope of the call site
> I'm still not sure this is precise enough... this sounds like it's scoped a
Maybe say "The memory is a local allocation and so needs to be copied out 
before local allocations are freed" or something?  (As much as I dislike the 
terminology "local allocations" it is the terminology we currently have).


-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-20 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 3: Code-Review+1

(1 comment)

Thanks! It'll be good to run this through the query generator.

http://gerrit.cloudera.org:8080/#/c/4740/3/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS3, Line 153: beyond the scope of the call site
I'm still not sure this is precise enough... this sounds like it's scoped at 
the call site. It's memory that lives until QueryMaintenance frees the 
function's memory, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-19 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4740/2/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

PS2, Line 390: string
string data

to be more specific


PS2, Line 399: copy result string for AnalyticEvalNode
Since this could show up in error messages, let's say something more friendly, 
e.g. "Failed to allocate memory for analytic function result."


PS2, Line 419: if (fn_scope_ != ROWS) {
Any reason not to return early (w/ OK now) if this == ROWS? It's nice to 
simplify the logic after. If we keep it, let's combine the if conditions in a 
single if stmt.


http://gerrit.cloudera.org:8080/#/c/4740/2/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

PS2, Line 138: Return error status on failure.
This can be more specific, right? Returns an error when out of memory.

And below too


http://gerrit.cloudera.org:8080/#/c/4740/2/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS2, Line 152: Note that the buffer for StringVal result is allocated from
 :   /// memory managed by Impala so it must be copied out by 
caller.
Do we need to call this out specifically here? It might just be better to add a 
few sentences in the class header about mem usage in general when using 
AggFnEvaluator.


PS2, Line 153: r.
... if it needs to be used beyond the lifetime of ... ?


PS2, Line 153: memory managed by Impala
this isn't specific, most memory in this process is managed by Impala :)


PS2, Line 237:  memory managed by Impala
this isn't specific, most memory in this process is managed by Impala :)

same for the statement above


-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-19 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4740/1/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS1, Line 149:   /// Puts the finalized value from Tuple* src in Tuple* dst 
just as Finalize() does.
 :   /// 'pool' is the MemPool used for allocating memory for the 
finalized value (e.g.
 :   /// StringVal). However, unlike Finalize(), GetValue() does 
not clean up state in src.
 :   /// GetValue() can be called repeatedly with the same src. 
Only used internally for
 :   /// analytic fn builtins.
 :   void GetValue(FunctionContext* agg_fn_ctx, Tuple* src, Tuple* 
dst, MemPool* pool);
> That makes sense. What you are suggesting is similar to what the PAGG node 
Actually, let's stick with copying out the string from the result_tuple for 
now. We probably should update the interface of Serialize()/Finalize() down the 
road to directly copy the string into a MemPool.


-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4120: Incorrect results with LEAD() analytic function

2016-10-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4120: Incorrect results with LEAD() analytic function
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4740/1//COMMIT_MSG
Commit Message:

PS1, Line 18: AggFnEvaluatorthese
?


http://gerrit.cloudera.org:8080/#/c/4740/1/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

PS1, Line 389: AggFnEvaluator::GetValue(evaluators_, fn_ctxs_, curr_tuple_, 
result_tuple,
 :   cur_tuple_pool);
wrt my comment in AggFnEvaluator, what if we didn't change GetValue() and 
instead we called a helper fn here that moved any backing memory from 
result_tuple into cur_tuple_pool, e.g.

// Copies memory backing Strings (in analytic result slots) from result_tuple 
into memory allocated from cur_tuple_pool.
RelocateStringMemory(result_tuple, cur_tuple_pool);

(or something like that)


http://gerrit.cloudera.org:8080/#/c/4740/1/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

PS1, Line 499: v = StringVal::null();
this means the allocation failed, right? should we be propagating errors? seems 
like that'd lead to incorrect results under low-memory conditions.


http://gerrit.cloudera.org:8080/#/c/4740/1/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS1, Line 149:   /// Puts the finalized value from Tuple* src in Tuple* dst 
just as Finalize() does.
 :   /// 'pool' is the MemPool used for allocating memory for the 
finalized value (e.g.
 :   /// StringVal). However, unlike Finalize(), GetValue() does 
not clean up state in src.
 :   /// GetValue() can be called repeatedly with the same src. 
Only used internally for
 :   /// analytic fn builtins.
 :   void GetValue(FunctionContext* agg_fn_ctx, Tuple* src, Tuple* 
dst, MemPool* pool);
I see how this and the below changes to the GetValue() path (all the way 
through SerializeOrFinalize) fix the issue, but I'm worried it's going to be 
more confusing when the interface isn't symmetrical, e.g. other Serialize and 
Finalize can't allocate mem out of pool even though it wouldn't be crazy for 
that to be allowed.

What if the caller handled copying GetValue memory instead of having GetValue() 
handle the string copy? We could have a helper fn somewhere (maybe just in 
AggFnEvaluator, or maybe there's a better place) which moved backing memory 
into a specified pool. Does that make sense?


-- 
To view, visit http://gerrit.cloudera.org:8080/4740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I85bb1745232d8dd383a6047c86019c6378ab571f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes