[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 6:

FYI we should try to get query generator support for this, I filed: 
https://issues.cloudera.org/browse/IMPALA-4115

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-09 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4094/13/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2064:   TestValue("instr('', '')", TYPE_INT, 0);
would probably be good to add this test case for backwards search.


http://gerrit.cloudera.org:8080/#/c/4094/13/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 292:   if (occurrence.val <= 0 || start_position.val == 0) return 
IntVal(0);
is this what other systems do? i.e. do they silently return 0 or do they raise 
a warning or give an error in this case?  (which we would do using 
context->SetError()).


PS13, Line 302: DCHECK
DCHECK_LT.
But why is this true? during the previous iteration, couldn't match_pos = 
haystack.len - 1, and so during this iteration search_start_pos could be equal 
to haystack.len?  I think the condition should be search_start_pos <= 
haystack.len, where the case of equality is handled because the new haystack 
len will be 0 and never matched.

I think this would be the test case: INSTR("xyza", "a", 1, 2) or INSTR("a", 
"a", 1, 2);


PS13, Line 317: DCHECK
DCHECK_GT. but what is the meaning of this DCHECK?  search_start_pos is the 
position into the haystack, right? so shouldn't it always be non-negative?

Also, it seems this loop suffers from the similar problem as commented at line 
302. (test case INSTR("axyz", "a", -1, 2)  or INSTR("a", "a", -1 , 2).


PS13, Line 319: std::min
This is pretty confusing. I think this would be clearer if we did the min() 
outside the loop when choosing the first search_start_pos. i.e. the first 
search_start_pos can always be chosen to be <= haystack.len - needle.len, since 
that's the first possible match position. I think that makes the intent of 
what's going on here clearer.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 13: Code-Review+1

Thanks! Let's get Marcel or DanH to do a final review.

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-09 Thread Zoltan Ivanfi (Code Review)
Hello Lars Volker, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#13).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 306 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-09 Thread Zoltan Ivanfi (Code Review)
Hello Lars Volker, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#12).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 306 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-09 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 11:

(8 comments)

> (9 comments)
 > 
 > just a few more small things... thanks!
 > 
 > Have you tested your test cases against Oracle or another system
 > that supports this?

Yes, I ran these queries in Oracle and used those results as the reference 
values.

http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS11, Line 2088:   TestValue("instr('corporate floor', 'or', 3, 3)", TYPE_INT, 
0);
   :   TestValue("instr('corporate floor', 'or', -3, 3)", TYPE_INT, 
0);
> I think we need a few more cases with empty search strings, e.g. 
Done


Line 2102: 
> add a few more null checks:
These should not be NULL, because none of the parameters are NULL. These simply 
return 0 to indicate an error.


http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS11, Line 292:   if (occurrence.val <= 0 || start_position.val == 0) {
  : return IntVal(0);
  :   }
> 1 line
Done


PS11, Line 306: == -1
> nit: < 0
Done


PS11, Line 322: == -1
> < 0
Done


PS11, Line 324:   DCHECK(search_start_pos < haystack.len);
> this check doesn't seem useful here as it did above. The point there was to
Done, but shouldn't we simply add a DCHECK to Substring itself (in a follow-up 
change)?


http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS11, Line 107:   if (pattern_->ptr[i] == pattern_->ptr[mlast])
  : skip_ = mlast - i - 1;
> nit: 1 line
This line was not modified compared to base and I wouldn't like to mix purely 
styleguide-driven changes with actual behavioral changes in a single commit. As 
I already suggested in an earlier comment, I am happy to run clang-format on 
the whole file and fix styleguide-related issues in a follow-up change. This 
way we will end up with a clean file history that allows easy identification of 
the reason of changes later on.


PS11, Line 113:   if (pattern_->ptr[i] == pattern_->ptr[0]) {
  : rskip_ = i - 1;
  :   }
> 1 line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 11:

(9 comments)

just a few more small things... thanks!

Have you tested your test cases against Oracle or another system that supports 
this?

http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS11, Line 2088:   TestValue("instr('corporate floor', 'or', 3, 3)", TYPE_INT, 
0);
   :   TestValue("instr('corporate floor', 'or', -3, 3)", TYPE_INT, 
0);
I think we need a few more cases with empty search strings, e.g. 

  TestValue("instr('corporate floor', '', 3, 3)", TYPE_INT, 0);
  TestValue("instr('corporate floor', '', -3, 3)", TYPE_INT, 0);


Line 2102: 
add a few more null checks:

instr('corporate floor', '', -15, 1) is NULL
instr('corporate floor', '', 15, 1) is NULL


http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS11, Line 292:   if (occurrence.val <= 0 || start_position.val == 0) {
  : return IntVal(0);
  :   }
1 line


PS11, Line 306: == -1
nit: < 0


PS11, Line 322: == -1
< 0


PS11, Line 324:   DCHECK(search_start_pos < haystack.len);
this check doesn't seem useful here as it did above. The point there was to 
guard the check to the next Substring() call, so I think in this case we'd want 
to make sure

search_start_pos + needle.len > 0

to make sure the Substring call is valid.


http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 104: 
add a brief comment about how this is transformed from the python code. e.g. 
they compute the bloom & skip_ each call, we do it all here for reuse. Also why 
we call BloomAdd in this way.

Basically enough for people to look at the python code and see that this maps 
to that code.


PS11, Line 107:   if (pattern_->ptr[i] == pattern_->ptr[mlast])
  : skip_ = mlast - i - 1;
nit: 1 line


PS11, Line 113:   if (pattern_->ptr[i] == pattern_->ptr[0]) {
  : rskip_ = i - 1;
  :   }
1 line


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-08 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS6, Line 322: search_start_pos
> Ok, but since it requires a bit of thinking can you add a DCHECK for #2 to 
Done


http://gerrit.cloudera.org:8080/#/c/4094/10/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS10, Line 23: #include 
 : #include 
> can you put the std includes above boost includes, w/ a space btwn?
Done


PS10, Line 104: for (int i = 0; i <= mlast; ++i) {
  :   BloomAdd(pattern_->ptr[i]);
  :   if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != 
mlast) skip_ = mlast - i - 1;
  :   if (pattern_->ptr[i] == pattern_->ptr[0] && i != 0) {
  : rskip_ = std::min(rskip_, i - 1);
  :   }
> Given that this is subtle code (at least the algorithm is tricky), I think 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-08 Thread Zoltan Ivanfi (Code Review)
Hello Lars Volker, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#11).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 301 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS6, Line 322: search_start_pos
> I don't think it's necessary. I check the starting position before the loop
Ok, but since it requires a bit of thinking can you add a DCHECK for #2 to make 
it clear & validate?


PS6, Line 336: ntVal(1), BigIntVal(1));
> Because the position of the match is the index of its leftmost char, but we
Right, thanks


http://gerrit.cloudera.org:8080/#/c/4094/10/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS10, Line 23: #include 
 : #include 
can you put the std includes above boost includes, w/ a space btwn?


PS10, Line 104: for (int i = 0; i <= mlast; ++i) {
  :   BloomAdd(pattern_->ptr[i]);
  :   if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != 
mlast) skip_ = mlast - i - 1;
  :   if (pattern_->ptr[i] == pattern_->ptr[0] && i != 0) {
  : rskip_ = std::min(rskip_, i - 1);
  :   }
Given that this is subtle code (at least the algorithm is tricky), I think I'd 
feel more comfortable if you modeled this more clearly after the code in 
fastsearch.h, e.g. split this into different loops to create skip_ and rskip_, 
where the indexes and logic match that of fastsearch.h. You could create the  
bloom filter in a separate loop or using the loop that creates skip_  and then 
updating the bloom filter at the end for mlast as the code in fastsearch.h does.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-06 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 9:

(1 comment)

I'm good with fixing the style in a subsequent change.

 > 
 > I would suggest clang-formatting this file in a follow-up change.
 > This way we separate the formatting changes cleanly in the sense
 > that every part of this diff clearly means to a functional change,
 > while every part of the follow-up change will purely reflect a
 > formatting change.
 > 
 > > I'm not sure whether you should
 > > adapt all of string-search.h to our style guide, now that a
 > > significant part of the file changed. I'd be in favor of it, but
 > I
 > > don't feel strongly about it.

http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS9, Line 329: IntVal StringFunctions::Instr(FunctionContext* context, const 
StringVal& str,
 : const StringVal& substr, const BigIntVal& start_position) {
 :   return Instr(context, str, substr, start_position, 
BigIntVal(1));
 : }
 : 
 : IntVal StringFunctions::Instr(
 : FunctionContext* context, const StringVal& str, const 
StringVal& substr) {
 :   return Instr(context, str, substr, BigIntVal(1), BigIntVal(1));
 : }
> Hmmm, this was wrapped by clang-format, I have no idea why the wrapping is 
Yes, I agree. Let's keep it as it is.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-06 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 9:

(6 comments)

I would suggest clang-formatting this file in a follow-up change. This way we 
separate the formatting changes cleanly in the sense that every part of this 
diff clearly means to a functional change, while every part of the follow-up 
change will purely reflect a formatting change.

 > I'm not sure whether you should
 > adapt all of string-search.h to our style guide, now that a
 > significant part of the file changed. I'd be in favor of it, but I
 > don't feel strongly about it.

http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS9, Line 329: IntVal StringFunctions::Instr(FunctionContext* context, const 
StringVal& str,
 : const StringVal& substr, const BigIntVal& start_position) {
 :   return Instr(context, str, substr, start_position, 
BigIntVal(1));
 : }
 : 
 : IntVal StringFunctions::Instr(
 : FunctionContext* context, const StringVal& str, const 
StringVal& substr) {
 :   return Instr(context, str, substr, BigIntVal(1), BigIntVal(1));
 : }
> nit: Wrap arguments uniformly for both methods, either with clang-format or
Hmmm, this was wrapped by clang-format, I have no idea why the wrapping is 
different. Probably clang-format noticed that all arguments fit a single line 
in line 335, that's why it didn't split it. I think we should accept 
clang-format's output, otherwise we can spend a lot of time on fixing its 
style, which would make using clang-format pointless.


http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

Line 63:   static IntVal Instr(FunctionContext*, const StringVal& str, const 
StringVal& substr);
> nit: have the same order as in the implementation file.
Done


http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 30: // If the length is -1, use the full string length
> nit: punctuation
Done


Line 43: // If the length is -1, use the full string length
> nit: punctuation
Done


http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 101: skip_ = rskip_ = mlast - 1;
> I haven't seen multiple assignments per line elsewhere, for readability rea
Done, although in my opinion assigning two vars the same value in a single-line 
is cleaner.


Line 105:   if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != mlast) {
> nit: single line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-06 Thread Zoltan Ivanfi (Code Review)
Hello Lars Volker, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#10).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 298 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-06 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 9: Code-Review+1

(6 comments)

I only added some minor comments. I'm not sure whether you should adapt all of 
string-search.h to our style guide, now that a significant part of the file 
changed. I'd be in favor of it, but I don't feel strongly about it.

http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS9, Line 329: IntVal StringFunctions::Instr(FunctionContext* context, const 
StringVal& str,
 : const StringVal& substr, const BigIntVal& start_position) {
 :   return Instr(context, str, substr, start_position, 
BigIntVal(1));
 : }
 : 
 : IntVal StringFunctions::Instr(
 : FunctionContext* context, const StringVal& str, const 
StringVal& substr) {
 :   return Instr(context, str, substr, BigIntVal(1), BigIntVal(1));
 : }
nit: Wrap arguments uniformly for both methods, either with clang-format or 
according to the surrounding code.


http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

Line 63:   static IntVal Instr(FunctionContext*, const StringVal& str, const 
StringVal& substr);
nit: have the same order as in the implementation file.


http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 30: // If the length is -1, use the full string length
nit: punctuation


Line 43: // If the length is -1, use the full string length
nit: punctuation


http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 101: skip_ = rskip_ = mlast - 1;
I haven't seen multiple assignments per line elsewhere, for readability reasons 
I'd personally favor having one per line. However I don't feel strongly about 
this


Line 105:   if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != mlast) {
nit: single line


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-05 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#9).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 299 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-05 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 102: 
> This was accidental, will remove.
Done


PS6, Line 102: 
 : for (int i = 0; i <= mlast; ++i) {
 :   BloomAdd(pattern_->ptr[i]);
 :   if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != 
mlast) {
 : skip_ = mlast - i - 1;
 :   }
> Nice catch, thanks. I think we may be able to do better than the Python imp
Done


http://gerrit.cloudera.org:8080/#/c/4094/8/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 227:   int skip_;
I changed skip_ to be int, since we use int-s to calculate it. Alternatively, I 
can keep it as int64 and change all int-s to int64-s. The question is, do we 
want to support strings bigger than 2GB?


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-05 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#8).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 299 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-05 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 102: for (int i = 0; i < mlast; ++i) {
This was accidental, will remove.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-05 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#7).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 283 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-05 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 6:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 289:   StringValue str_sv = StringValue::FromStringVal(str);
 :   StringValue substr_sv = StringValue::FromStringVal(substr);
 :   StringSearch search(&substr_sv);
 :   // Hive returns positions starting from 1.
 :   return IntVal(search.Search(&str_sv) + 1);
 : }
> I think it makes sense to unify this by using the new method. If there's bu
Done


PS5, Line 329: n + start_pos
> I think we could just remove the comment. The next line by itself explains 
Done


http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS6, Line 311: search_start_pos < 0
> remove- this is not possible since we know it's not 0 from l302 and we know
Indeed, sorry - in an earlier version of my code this check was done in a place 
common to the regular and reverse search.


PS6, Line 311: {
 :   return IntVal(0);
> this will be short enough to return on the same line
Done


PS6, Line 318:   if (match_pos_in_substring == -1) {
 : return IntVal(0);
 :   }
> 1 line
Done


PS6, Line 322: search_start_pos
> I think you also need to make sure search_start_pos is < haystack.len befor
I don't think it's necessary. I check the starting position before the loop, so 
we start with a non-negative length for the Substring call. Then in each 
iteration of the loop, one of two things can happen:
1. There is no match in the substring, in which case the function returns 
immediately.
2. There is a match, in which case the new starting position is one char after 
the match position. Because the match position must be inside the string, 
match_pos < haystack.len, therefore search_start_pos = match_pos + 1 <= 
haystack.len (actually it will be strictly less, because the length of the 
match must be > 0).


PS6, Line 330:  search_start_pos >= str.len
> This isn't possible, str.len = haystack.len, and we know we added something
Indeed, sorry - in an earlier version of my code this check was done in a place 
common to the regular and reverse search.


PS6, Line 330: if (search_start_pos < 0 || search_start_pos >= str.len) {
 :   return IntVal(0);
 : }
> remove
We still need the < 0 check.


PS6, Line 336: search_start_pos + needle.len
> why do you add the needle len?
Because the position of the match is the index of its leftmost char, but we 
need to include the match itself in the substring where we search. For example, 
if we are searching for "234" in "0123456" and we want to find matches where 
the match index is <= 2, the substring "012" is not enough, we have to search 
in "01234".


PS6, Line 338:   if (match_pos == -1) {
 : return IntVal(0);
 :   }
> 1 line
Done


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS5, Line 64: context
> nit: remove to match the surrounding code.
Done


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 28:   return StringValue(const_cast(str), len);
> You're right. Unfortunately the style guide wasn't clear in this regard. Wh
Done


PS5, Line 69: ECT_EQ(-1, TestSearch("abca
> I see, let's fix it here then at least. Please rephrase to "Test searching 
Done


PS5, Line 70: 
> Ok, makes sense to me. Can you add this as a comment to the factory methods
Done


Line 74:   EXPECT_EQ(-1, TestSearch("abcdaba", "abaa", 4, 3));
> Same here
Done


Line 81:   EXPECT_EQ(2, TestSearch("abcdede", "cde"));
> How about "Tests with needle at the end of the haystack." then? Doesn't mak
Done


http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS6, Line 102: for (int i = 0; i < mlast; ++i) {
 :   BloomAdd(pattern_->ptr[i]);
 :   if (pattern_->ptr[i] == pattern_->ptr[mlast])
 : skip_ = mlast - i - 1;
 : }
 : BloomAdd(pattern_->ptr[mlast]);
> I don't think this will always work for RSearch, see https://hg.python.org/
Nice catch, thanks. I think we may be able to do better than the Python 
implementation here. That one iterates in reverse indeed, and for this reason 
they moved this part immediately before the search, since the direction is not 
known during construction. However, since the search may be reused for 
searching repeatedly, it would be better to build the bloom filter only once.

The contents of the bloom filter do not depend on t

[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-02 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS6, Line 311: search_start_pos < 0
remove- this is not possible since we know it's not 0 from l302 and we know 
it's not <0 from l308. it might be worthwhile to put a DCHECK above l316 to 
make sure it's always >= 0 though.


PS6, Line 311: {
 :   return IntVal(0);
this will be short enough to return on the same line


PS6, Line 318:   if (match_pos_in_substring == -1) {
 : return IntVal(0);
 :   }
1 line


PS6, Line 322: search_start_pos
I think you also need to make sure search_start_pos is < haystack.len before 
the next call to Substring (l316), otherwise that will return invalid memory 
(Substring doesn't check the length).

In fact, since you'll have to do the check before calling 
Substring(search_start_pos) on l316, I think you can remove the entire block on 
l311 that checks search_start_pos.


PS6, Line 330:  search_start_pos >= str.len
This isn't possible, str.len = haystack.len, and we know we added something 
less than 0 to it to get search_start_pos


PS6, Line 330: if (search_start_pos < 0 || search_start_pos >= str.len) {
 :   return IntVal(0);
 : }
remove


PS6, Line 336: search_start_pos + needle.len
why do you add the needle len?
I don't understand why it's not just search_start_pos.


PS6, Line 338:   if (match_pos == -1) {
 : return IntVal(0);
 :   }
1 line


http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS6, Line 102: for (int i = 0; i < mlast; ++i) {
 :   BloomAdd(pattern_->ptr[i]);
 :   if (pattern_->ptr[i] == pattern_->ptr[mlast])
 : skip_ = mlast - i - 1;
 : }
 : BloomAdd(pattern_->ptr[mlast]);
I don't think this will always work for RSearch, see 
https://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h 

l195

I think this would need to be computed in reverse.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-02 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 289:   if (str.is_null || substr.is_null) return IntVal::null();
 :   StringValue str_sv = StringValue::FromStringVal(str);
 :   StringValue substr_sv = StringValue::FromStringVal(substr);
 :   StringSearch search(&substr_sv);
 :   // Hive returns positions starting from 1.
 :   return IntVal(search.Search(&str_sv) + 1);
> I would expect a minor, probably insignificant performance impact. (On the 
I think it makes sense to unify this by using the new method. If there's bugs 
we should be able to find them.


PS5, Line 329: allowed match
> When we are searching from the right, there are two "current" positions. On
I think we could just remove the comment. The next line by itself explains 
where the search starts.


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS5, Line 64: context
nit: remove to match the surrounding code.


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 28:   if (len == -1) {
> This is matter of taste, the style guide allows both variant. I like this o
You're right. Unfortunately the style guide wasn't clear in this regard. What 
we do is put single statements in the same line as the "if", if they fit within 
the 90 character limit. I'm not a fan of this myself, but that's the 
predominant formatting.

I updated the style guide on this occasion to be more specific. The style guide 
will probably also change a bit in the near future, but I'm confident that this 
single-line-if-possible rule will not change.


PS5, Line 69: Haystack is not full string
> I didn't write the test cases, they are almost verbatim copies from https:/
I see, let's fix it here then at least. Please rephrase to "Test searching 
through part of haystack" or similar.


PS5, Line 70: 5
> As I noted in the other comment, I don't know why each individual test case
Ok, makes sense to me. Can you add this as a comment to the factory methods 
above?


Line 81:   // Test last bit, this is the interesting edge case
> I guess last bit means last part. I'm not sure why it's interesting, it was
How about "Tests with needle at the end of the haystack." then? Doesn't make 
sense to me to keep a vague comment if neither of us can fiigure out why it's 
there.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-02 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#6).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 277 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-02 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2063:   // Hive returns 0 if substr was not found in str (or on other 
error coditions).
> What happens if you pass other types as the needle or haystack, e.g. int or
The analysis of the query will complain that there is no function with a 
matching signature:

[localhost:21000] > select instr("1", 2, "3");
ERROR: AnalysisException: No matching function with signature: instr(STRING, 
TINYINT, STRING).


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 22: #include 
> are the includes still needed?
Done


PS5, Line 289:   if (str.is_null || substr.is_null) return IntVal::null();
 :   StringValue str_sv = StringValue::FromStringVal(str);
 :   StringValue substr_sv = StringValue::FromStringVal(substr);
 :   StringSearch search(&substr_sv);
 :   // Hive returns positions starting from 1.
 :   return IntVal(search.Search(&str_sv) + 1);
> Will replacing this with a call to the more general version of Instr() come
I would expect a minor, probably insignificant performance impact. (On the 
other, we should be aware that in this case we will replace a well-proven 
implementation with a freshly written one, so if there were any bugs in the new 
implementation, the impact of those would become bigger.)


PS5, Line 299: occurance
> spelling
Done


PS5, Line 316: no
> nit: use 'num' to abbreviate number
Done


PS5, Line 329: allowed match
> what does "allowed match" mean?
When we are searching from the right, there are two "current" positions. One is 
for leftmost char of the match we are looking for, the other for the rightmost 
char of it.

When the user specifies -5 as an input, it means that the leftmost char of the 
pattern must be before the 5th char from the right. So the actual match may be 
partially after the limit.

If you can suggest a better description, please let me know.


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 28:   if (len == -1) {
> single line
This is matter of taste, the style guide allows both variant. I like this one 
more, because it is consistent with multi-line conditional block and it is easy 
to add new lines.


PS5, Line 69: Haystack is not full string
> This comment could be clearer, Haystack is a full string. Maybe rephrase "T
I didn't write the test cases, they are almost verbatim copies from 
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/experiments/string-search-test.cc#L98


PS5, Line 70: 5
> What is the difference between this and just specifying "abcab" as the hays
As I noted in the other comment, I don't know why each individual test case was 
added, I just reused existing test-cases for strstr functionality. My guess is 
that by taking a prefix of a StringVal, it tests whether the strstr 
implementation respects the length stored in the StringValue or also reads 
parts of the string which should be excluded.


Line 81:   // Test last bit, this is the interesting edge case
> What does "last bit" mean? Can you add a comment why this is an interesting
I guess last bit means last part. I'm not sure why it's interesting, it wasn't 
added by me.


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 204: }
> Can we advance i by skip_ here, too? Is this what the comment in line 139 r
I don't know what line 139 refers to, I did not write that comment. I don't 
think i can be advanced by skip_ here either. As I have written, I created this 
function in the following way:

Our fast search is based on Python's fast search. Python did not support 
reverse search when it was added to our codebase, but now it does. Therefore I 
compared Python's reverse search to its regular search, then applied the same 
changes to our regular search, thereby creating this function. The python 
implementation is linked to in a comment in this file: 
http://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-09-01 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2063:   // Hive returns 0 if substr was not found in str (or on other 
error coditions).
What happens if you pass other types as the needle or haystack, e.g. int or 
double or bool? Do we cast them? Or throw an error?


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 22: #include 
are the includes still needed?


PS5, Line 289:   if (str.is_null || substr.is_null) return IntVal::null();
 :   StringValue str_sv = StringValue::FromStringVal(str);
 :   StringValue substr_sv = StringValue::FromStringVal(substr);
 :   StringSearch search(&substr_sv);
 :   // Hive returns positions starting from 1.
 :   return IntVal(search.Search(&str_sv) + 1);
Will replacing this with a call to the more general version of Instr() come 
with a performance impact? If not I think we should forward here, similar to 
the other Instr() overload below.


PS5, Line 299: occurance
spelling


PS5, Line 316: no
nit: use 'num' to abbreviate number


PS5, Line 329: allowed match
what does "allowed match" mean?


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:

Line 28:   if (len == -1) {
single line


PS5, Line 69: Haystack is not full string
This comment could be clearer, Haystack is a full string. Maybe rephrase "Test 
searching through part of haystack" or similar.


PS5, Line 70: 5
What is the difference between this and just specifying "abcab" as the 
haystack? It looks like the tests with varying len parameters mostly test that 
the c'tor of StringValue works correctly. Is there a downside to it, if we rely 
on that (or test it elsewhere) and then just specify the strings we want to 
test here without len parameters.


Line 74:   // Haystack and needle not full len
Same here


Line 81:   // Test last bit, this is the interesting edge case
What does "last bit" mean? Can you add a comment why this is an interesting 
case?


http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

Line 204: }
Can we advance i by skip_ here, too? Is this what the comment in line 139 
refers to?


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-31 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 4:

> As we discussed in person our internal strings aren't \0
 > terminated. Let's try using StringFunctions::Reverse() and the
 > InStr implementation we already have and see how fast that is.

To recap our conversation in a little more detail, using the std::string 
constructor that takes a length parameter in addition to the char* param would 
solve the problem that StringValues are not NULL-terminated, but you were also 
worried about the copying and the standard search implementation.

The copying could be easily avoided by using std::search with reverse 
iterators, as suggested here: http://stackoverflow.com/a/1634416 and creating 
the reverse iterators directly on the char*, like suggested here: 
http://stackoverflow.com/a/22488428.

In my opinion this would result in the cleanest solution, but you were also 
worried about using the standard C++ search, which is around four times slower 
than our internal search algorithm that unfortunately only supports searching 
forwards.

According to the comments in the source code of our internal search algorithm, 
it is based on Python's search. I checked this latter and found that reverse 
search have been added to it recently. Since our implementations have diverged 
significantly, it is not possible to use the new Python search code, but it can 
be seen that the reverse search functionality is a trivial modification of the 
regular search, so I added a reverse search variant as well. I also added tests 
for it as well as for our regular search, which did not have tests. The name of 
the new test collided with the name of an existing test that was misnamed, so I 
renamed that. Finally I implemented instr() on top of these bloom-filter-based 
search algorithms.

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-31 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#5).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/experiments/CMakeLists.txt
R be/src/experiments/string-search-sse-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/string-search-test.cc
M be/src/runtime/string-search.h
M common/function-registry/impala_functions.py
9 files changed, 283 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 4:

As we discussed in person our internal strings aren't \0 terminated. Let's try 
using StringFunctions::Reverse() and the InStr implementation we already have 
and see how fast that is.

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has uploaded a new patch set (#2).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 87 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 3:

> Uploaded patch set 4.

This is a result of running git-clang-format as you suggested over chat.

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#4).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 86 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#3).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 86 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 312:   std::basic_string haystack(str.ptr);
> Can you add this in a comment?
Done


http://gerrit.cloudera.org:8080/#/c/4094/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS2, Line 306: str.len + start_position.val :
 :   // Position count starts from 1, not 0.
 :   start_position.val - 1;
> line wrapping
Done


http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 65: str
> Here's a list of exceptions to the Google C++ style guide, that we stick to
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 312:   std::basic_string haystack(str.ptr);
> No, std::string is signed, therefore I would need to apply a reinterpret_ca
Can you add this in a comment?


http://gerrit.cloudera.org:8080/#/c/4094/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS2, Line 306: int search_start_pos = backwards ?
 :   str.len + start_position.val :
 :   start_position.val - 1;
line wrapping


http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 65: str
> I wrapped the parameters this way intentionally, as the two string paramete
Here's a list of exceptions to the Google C++ style guide, that we stick to: 
https://wiki.cloudera.com/pages/viewpage.action?pageId=24646528

Sadly it hasn't been ported to the apache wiki yet, but it's the authoritative 
information on the subject.

While I understand that personal style is often perceived as more readable, we 
prefer consistency of the code and thus usually wrap lines by the same rules, 
independent of the parameter semantics.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Hello Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4094

to look at the new patch set (#2).

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 87 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-29 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2070: 
> I saw below that some tests use something like "cast(10 as bigint)" for num
Done


PS1, Line 2077: 1
> The Oracle docs say this value must be >0, please add failure tests for 0 a
Done


http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 305:   // When searching backwards, position is counted from the end.
> For readability, [lease don't interleave comments and ternary expressions. 
Done


Line 312:   std::basic_string haystack(str.ptr);
> Isn't this equivalent to std::string?
No, std::string is signed, therefore I would need to apply a reinterpret_cast 
on str.ptr, which is far less elegant and more error-prone than using 
std::basic_string.


http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 65: str
> We usually wrap arguments greedily at 90 chars, so str would fit in the pre
I wrapped the parameters this way intentionally, as the two string parameters 
make one logical group of the arguments and the two bigint parameters can be 
considered another logical group. Wrapping them accordingly makes the function 
signature easier to read in my opinion.

Regarding the line length limit, the Impala wiki 
(https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala) 
links to the Google C++ style guide, which specifies 80 chars as the line 
length. If we use 90 chars instead, we should mention this as an exception on 
our wiki page.


http://gerrit.cloudera.org:8080/#/c/4094/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

PS1, Line 435:   [['instr'], 'INT', ['STRING', 'STRING', 'BIGINT', 'BIGINT'], 
'impala::StringFunctions::Instr'],
> long line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-26 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-26 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
..


Patch Set 1:

(7 comments)

Thank you for the change. Please see my comments.

http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 2070: 
I saw below that some tests use something like "cast(10 as bigint)" for number 
typed parameters. Can you add one such test, too?


PS1, Line 2077: 1
The Oracle docs say this value must be >0, please add failure tests for 0 and 
<0, too.


http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 297: IntVal StringFunctions::Instr(FunctionContext* context,
Line wrapping, see .h file.


Line 305:   // When searching backwards, position is counted from the end.
For readability, [lease don't interleave comments and ternary expressions. Use 
an if-else block instead or move the comment above the line.


Line 312:   std::basic_string haystack(str.ptr);
Isn't this equivalent to std::string?


http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 65: str
We usually wrap arguments greedily at 90 chars, so str would fit in the 
previous line. Please check the other args, too.


http://gerrit.cloudera.org:8080/#/c/4094/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

PS1, Line 435:   [['instr'], 'INT', ['STRING', 'STRING', 'BIGINT', 'BIGINT'], 
'impala::StringFunctions::Instr'],
long line


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()

2016-08-23 Thread Zoltan Ivanfi (Code Review)
Zoltan Ivanfi has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4094

Change subject: IMPALA-3973: add position and occurrence to instr()
..

IMPALA-3973: add position and occurrence to instr()

Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 82 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi