[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 6: FYI we should try to get query generator support for this, I filed: https://issues.cloudera.org/browse/IMPALA-4115 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 13:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/4094/13/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:
Line 2064: TestValue("instr('', '')", TYPE_INT, 0);
would probably be good to add this test case for backwards search.
http://gerrit.cloudera.org:8080/#/c/4094/13/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
Line 292: if (occurrence.val <= 0 || start_position.val == 0) return
IntVal(0);
is this what other systems do? i.e. do they silently return 0 or do they raise
a warning or give an error in this case? (which we would do using
context->SetError()).
PS13, Line 302: DCHECK
DCHECK_LT.
But why is this true? during the previous iteration, couldn't match_pos =
haystack.len - 1, and so during this iteration search_start_pos could be equal
to haystack.len? I think the condition should be search_start_pos <=
haystack.len, where the case of equality is handled because the new haystack
len will be 0 and never matched.
I think this would be the test case: INSTR("xyza", "a", 1, 2) or INSTR("a",
"a", 1, 2);
PS13, Line 317: DCHECK
DCHECK_GT. but what is the meaning of this DCHECK? search_start_pos is the
position into the haystack, right? so shouldn't it always be non-negative?
Also, it seems this loop suffers from the similar problem as commented at line
302. (test case INSTR("axyz", "a", -1, 2) or INSTR("a", "a", -1 , 2).
PS13, Line 319: std::min
This is pretty confusing. I think this would be clearer if we did the min()
outside the loop when choosing the first search_start_pos. i.e. the first
search_start_pos can always be chosen to be <= haystack.len - needle.len, since
that's the first possible match position. I think that makes the intent of
what's going on here clearer.
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Dan Hecht
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Marcel Kornacker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 13: Code-Review+1 Thanks! Let's get Marcel or DanH to do a final review. -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Lars Volker, Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#13). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 306 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/13 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Lars Volker, Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#12). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 306 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/12 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 11:
(8 comments)
> (9 comments)
>
> just a few more small things... thanks!
>
> Have you tested your test cases against Oracle or another system
> that supports this?
Yes, I ran these queries in Oracle and used those results as the reference
values.
http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:
PS11, Line 2088: TestValue("instr('corporate floor', 'or', 3, 3)", TYPE_INT,
0);
: TestValue("instr('corporate floor', 'or', -3, 3)", TYPE_INT,
0);
> I think we need a few more cases with empty search strings, e.g.
Done
Line 2102:
> add a few more null checks:
These should not be NULL, because none of the parameters are NULL. These simply
return 0 to indicate an error.
http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS11, Line 292: if (occurrence.val <= 0 || start_position.val == 0) {
: return IntVal(0);
: }
> 1 line
Done
PS11, Line 306: == -1
> nit: < 0
Done
PS11, Line 322: == -1
> < 0
Done
PS11, Line 324: DCHECK(search_start_pos < haystack.len);
> this check doesn't seem useful here as it did above. The point there was to
Done, but shouldn't we simply add a DCHECK to Substring itself (in a follow-up
change)?
http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
PS11, Line 107: if (pattern_->ptr[i] == pattern_->ptr[mlast])
: skip_ = mlast - i - 1;
> nit: 1 line
This line was not modified compared to base and I wouldn't like to mix purely
styleguide-driven changes with actual behavioral changes in a single commit. As
I already suggested in an earlier comment, I am happy to run clang-format on
the whole file and fix styleguide-related issues in a follow-up change. This
way we will end up with a clean file history that allows easy identification of
the reason of changes later on.
PS11, Line 113: if (pattern_->ptr[i] == pattern_->ptr[0]) {
: rskip_ = i - 1;
: }
> 1 line
Done
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Marcel Kornacker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Matthew Jacobs has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 11:
(9 comments)
just a few more small things... thanks!
Have you tested your test cases against Oracle or another system that supports
this?
http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:
PS11, Line 2088: TestValue("instr('corporate floor', 'or', 3, 3)", TYPE_INT,
0);
: TestValue("instr('corporate floor', 'or', -3, 3)", TYPE_INT,
0);
I think we need a few more cases with empty search strings, e.g.
TestValue("instr('corporate floor', '', 3, 3)", TYPE_INT, 0);
TestValue("instr('corporate floor', '', -3, 3)", TYPE_INT, 0);
Line 2102:
add a few more null checks:
instr('corporate floor', '', -15, 1) is NULL
instr('corporate floor', '', 15, 1) is NULL
http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS11, Line 292: if (occurrence.val <= 0 || start_position.val == 0) {
: return IntVal(0);
: }
1 line
PS11, Line 306: == -1
nit: < 0
PS11, Line 322: == -1
< 0
PS11, Line 324: DCHECK(search_start_pos < haystack.len);
this check doesn't seem useful here as it did above. The point there was to
guard the check to the next Substring() call, so I think in this case we'd want
to make sure
search_start_pos + needle.len > 0
to make sure the Substring call is valid.
http://gerrit.cloudera.org:8080/#/c/4094/11/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 104:
add a brief comment about how this is transformed from the python code. e.g.
they compute the bloom & skip_ each call, we do it all here for reuse. Also why
we call BloomAdd in this way.
Basically enough for people to look at the python code and see that this maps
to that code.
PS11, Line 107: if (pattern_->ptr[i] == pattern_->ptr[mlast])
: skip_ = mlast - i - 1;
nit: 1 line
PS11, Line 113: if (pattern_->ptr[i] == pattern_->ptr[0]) {
: rskip_ = i - 1;
: }
1 line
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Marcel Kornacker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 10:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS6, Line 322: search_start_pos
> Ok, but since it requires a bit of thinking can you add a DCHECK for #2 to
Done
http://gerrit.cloudera.org:8080/#/c/4094/10/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
PS10, Line 23: #include
: #include
> can you put the std includes above boost includes, w/ a space btwn?
Done
PS10, Line 104: for (int i = 0; i <= mlast; ++i) {
: BloomAdd(pattern_->ptr[i]);
: if (pattern_->ptr[i] == pattern_->ptr[mlast] && i !=
mlast) skip_ = mlast - i - 1;
: if (pattern_->ptr[i] == pattern_->ptr[0] && i != 0) {
: rskip_ = std::min(rskip_, i - 1);
: }
> Given that this is subtle code (at least the algorithm is tricky), I think
Done
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Marcel Kornacker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Lars Volker, Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#11). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 301 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/11 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Matthew Jacobs has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 10:
(4 comments)
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS6, Line 322: search_start_pos
> I don't think it's necessary. I check the starting position before the loop
Ok, but since it requires a bit of thinking can you add a DCHECK for #2 to make
it clear & validate?
PS6, Line 336: ntVal(1), BigIntVal(1));
> Because the position of the match is the index of its leftmost char, but we
Right, thanks
http://gerrit.cloudera.org:8080/#/c/4094/10/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
PS10, Line 23: #include
: #include
can you put the std includes above boost includes, w/ a space btwn?
PS10, Line 104: for (int i = 0; i <= mlast; ++i) {
: BloomAdd(pattern_->ptr[i]);
: if (pattern_->ptr[i] == pattern_->ptr[mlast] && i !=
mlast) skip_ = mlast - i - 1;
: if (pattern_->ptr[i] == pattern_->ptr[0] && i != 0) {
: rskip_ = std::min(rskip_, i - 1);
: }
Given that this is subtle code (at least the algorithm is tricky), I think I'd
feel more comfortable if you modeled this more clearly after the code in
fastsearch.h, e.g. split this into different loops to create skip_ and rskip_,
where the indexes and logic match that of fastsearch.h. You could create the
bloom filter in a separate loop or using the loop that creates skip_ and then
updating the bloom filter at the end for mlast as the code in fastsearch.h does.
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Marcel Kornacker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 9:
(1 comment)
I'm good with fixing the style in a subsequent change.
>
> I would suggest clang-formatting this file in a follow-up change.
> This way we separate the formatting changes cleanly in the sense
> that every part of this diff clearly means to a functional change,
> while every part of the follow-up change will purely reflect a
> formatting change.
>
> > I'm not sure whether you should
> > adapt all of string-search.h to our style guide, now that a
> > significant part of the file changed. I'd be in favor of it, but
> I
> > don't feel strongly about it.
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS9, Line 329: IntVal StringFunctions::Instr(FunctionContext* context, const
StringVal& str,
: const StringVal& substr, const BigIntVal& start_position) {
: return Instr(context, str, substr, start_position,
BigIntVal(1));
: }
:
: IntVal StringFunctions::Instr(
: FunctionContext* context, const StringVal& str, const
StringVal& substr) {
: return Instr(context, str, substr, BigIntVal(1), BigIntVal(1));
: }
> Hmmm, this was wrapped by clang-format, I have no idea why the wrapping is
Yes, I agree. Let's keep it as it is.
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 9:
(6 comments)
I would suggest clang-formatting this file in a follow-up change. This way we
separate the formatting changes cleanly in the sense that every part of this
diff clearly means to a functional change, while every part of the follow-up
change will purely reflect a formatting change.
> I'm not sure whether you should
> adapt all of string-search.h to our style guide, now that a
> significant part of the file changed. I'd be in favor of it, but I
> don't feel strongly about it.
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS9, Line 329: IntVal StringFunctions::Instr(FunctionContext* context, const
StringVal& str,
: const StringVal& substr, const BigIntVal& start_position) {
: return Instr(context, str, substr, start_position,
BigIntVal(1));
: }
:
: IntVal StringFunctions::Instr(
: FunctionContext* context, const StringVal& str, const
StringVal& substr) {
: return Instr(context, str, substr, BigIntVal(1), BigIntVal(1));
: }
> nit: Wrap arguments uniformly for both methods, either with clang-format or
Hmmm, this was wrapped by clang-format, I have no idea why the wrapping is
different. Probably clang-format noticed that all arguments fit a single line
in line 335, that's why it didn't split it. I think we should accept
clang-format's output, otherwise we can spend a lot of time on fixing its
style, which would make using clang-format pointless.
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:
Line 63: static IntVal Instr(FunctionContext*, const StringVal& str, const
StringVal& substr);
> nit: have the same order as in the implementation file.
Done
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:
Line 30: // If the length is -1, use the full string length
> nit: punctuation
Done
Line 43: // If the length is -1, use the full string length
> nit: punctuation
Done
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 101: skip_ = rskip_ = mlast - 1;
> I haven't seen multiple assignments per line elsewhere, for readability rea
Done, although in my opinion assigning two vars the same value in a single-line
is cleaner.
Line 105: if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != mlast) {
> nit: single line
Done
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Lars Volker, Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#10). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 298 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/10 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 9: Code-Review+1
(6 comments)
I only added some minor comments. I'm not sure whether you should adapt all of
string-search.h to our style guide, now that a significant part of the file
changed. I'd be in favor of it, but I don't feel strongly about it.
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS9, Line 329: IntVal StringFunctions::Instr(FunctionContext* context, const
StringVal& str,
: const StringVal& substr, const BigIntVal& start_position) {
: return Instr(context, str, substr, start_position,
BigIntVal(1));
: }
:
: IntVal StringFunctions::Instr(
: FunctionContext* context, const StringVal& str, const
StringVal& substr) {
: return Instr(context, str, substr, BigIntVal(1), BigIntVal(1));
: }
nit: Wrap arguments uniformly for both methods, either with clang-format or
according to the surrounding code.
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:
Line 63: static IntVal Instr(FunctionContext*, const StringVal& str, const
StringVal& substr);
nit: have the same order as in the implementation file.
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:
Line 30: // If the length is -1, use the full string length
nit: punctuation
Line 43: // If the length is -1, use the full string length
nit: punctuation
http://gerrit.cloudera.org:8080/#/c/4094/9/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 101: skip_ = rskip_ = mlast - 1;
I haven't seen multiple assignments per line elsewhere, for readability reasons
I'd personally favor having one per line. However I don't feel strongly about
this
Line 105: if (pattern_->ptr[i] == pattern_->ptr[mlast] && i != mlast) {
nit: single line
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#9). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 299 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/9 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 8:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 102:
> This was accidental, will remove.
Done
PS6, Line 102:
: for (int i = 0; i <= mlast; ++i) {
: BloomAdd(pattern_->ptr[i]);
: if (pattern_->ptr[i] == pattern_->ptr[mlast] && i !=
mlast) {
: skip_ = mlast - i - 1;
: }
> Nice catch, thanks. I think we may be able to do better than the Python imp
Done
http://gerrit.cloudera.org:8080/#/c/4094/8/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 227: int skip_;
I changed skip_ to be int, since we use int-s to calculate it. Alternatively, I
can keep it as int64 and change all int-s to int64-s. The question is, do we
want to support strings bigger than 2GB?
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#8). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 299 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/8 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 6:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 102: for (int i = 0; i < mlast; ++i) {
This was accidental, will remove.
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#7). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 283 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/7 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 6:
(17 comments)
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS5, Line 289: StringValue str_sv = StringValue::FromStringVal(str);
: StringValue substr_sv = StringValue::FromStringVal(substr);
: StringSearch search(&substr_sv);
: // Hive returns positions starting from 1.
: return IntVal(search.Search(&str_sv) + 1);
: }
> I think it makes sense to unify this by using the new method. If there's bu
Done
PS5, Line 329: n + start_pos
> I think we could just remove the comment. The next line by itself explains
Done
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS6, Line 311: search_start_pos < 0
> remove- this is not possible since we know it's not 0 from l302 and we know
Indeed, sorry - in an earlier version of my code this check was done in a place
common to the regular and reverse search.
PS6, Line 311: {
: return IntVal(0);
> this will be short enough to return on the same line
Done
PS6, Line 318: if (match_pos_in_substring == -1) {
: return IntVal(0);
: }
> 1 line
Done
PS6, Line 322: search_start_pos
> I think you also need to make sure search_start_pos is < haystack.len befor
I don't think it's necessary. I check the starting position before the loop, so
we start with a non-negative length for the Substring call. Then in each
iteration of the loop, one of two things can happen:
1. There is no match in the substring, in which case the function returns
immediately.
2. There is a match, in which case the new starting position is one char after
the match position. Because the match position must be inside the string,
match_pos < haystack.len, therefore search_start_pos = match_pos + 1 <=
haystack.len (actually it will be strictly less, because the length of the
match must be > 0).
PS6, Line 330: search_start_pos >= str.len
> This isn't possible, str.len = haystack.len, and we know we added something
Indeed, sorry - in an earlier version of my code this check was done in a place
common to the regular and reverse search.
PS6, Line 330: if (search_start_pos < 0 || search_start_pos >= str.len) {
: return IntVal(0);
: }
> remove
We still need the < 0 check.
PS6, Line 336: search_start_pos + needle.len
> why do you add the needle len?
Because the position of the match is the index of its leftmost char, but we
need to include the match itself in the substring where we search. For example,
if we are searching for "234" in "0123456" and we want to find matches where
the match index is <= 2, the substring "012" is not enough, we have to search
in "01234".
PS6, Line 338: if (match_pos == -1) {
: return IntVal(0);
: }
> 1 line
Done
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:
PS5, Line 64: context
> nit: remove to match the surrounding code.
Done
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:
Line 28: return StringValue(const_cast(str), len);
> You're right. Unfortunately the style guide wasn't clear in this regard. Wh
Done
PS5, Line 69: ECT_EQ(-1, TestSearch("abca
> I see, let's fix it here then at least. Please rephrase to "Test searching
Done
PS5, Line 70:
> Ok, makes sense to me. Can you add this as a comment to the factory methods
Done
Line 74: EXPECT_EQ(-1, TestSearch("abcdaba", "abaa", 4, 3));
> Same here
Done
Line 81: EXPECT_EQ(2, TestSearch("abcdede", "cde"));
> How about "Tests with needle at the end of the haystack." then? Doesn't mak
Done
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
PS6, Line 102: for (int i = 0; i < mlast; ++i) {
: BloomAdd(pattern_->ptr[i]);
: if (pattern_->ptr[i] == pattern_->ptr[mlast])
: skip_ = mlast - i - 1;
: }
: BloomAdd(pattern_->ptr[mlast]);
> I don't think this will always work for RSearch, see https://hg.python.org/
Nice catch, thanks. I think we may be able to do better than the Python
implementation here. That one iterates in reverse indeed, and for this reason
they moved this part immediately before the search, since the direction is not
known during construction. However, since the search may be reused for
searching repeatedly, it would be better to build the bloom filter only once.
The contents of the bloom filter do not depend on t
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Matthew Jacobs has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 6:
(9 comments)
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS6, Line 311: search_start_pos < 0
remove- this is not possible since we know it's not 0 from l302 and we know
it's not <0 from l308. it might be worthwhile to put a DCHECK above l316 to
make sure it's always >= 0 though.
PS6, Line 311: {
: return IntVal(0);
this will be short enough to return on the same line
PS6, Line 318: if (match_pos_in_substring == -1) {
: return IntVal(0);
: }
1 line
PS6, Line 322: search_start_pos
I think you also need to make sure search_start_pos is < haystack.len before
the next call to Substring (l316), otherwise that will return invalid memory
(Substring doesn't check the length).
In fact, since you'll have to do the check before calling
Substring(search_start_pos) on l316, I think you can remove the entire block on
l311 that checks search_start_pos.
PS6, Line 330: search_start_pos >= str.len
This isn't possible, str.len = haystack.len, and we know we added something
less than 0 to it to get search_start_pos
PS6, Line 330: if (search_start_pos < 0 || search_start_pos >= str.len) {
: return IntVal(0);
: }
remove
PS6, Line 336: search_start_pos + needle.len
why do you add the needle len?
I don't understand why it's not just search_start_pos.
PS6, Line 338: if (match_pos == -1) {
: return IntVal(0);
: }
1 line
http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
PS6, Line 102: for (int i = 0; i < mlast; ++i) {
: BloomAdd(pattern_->ptr[i]);
: if (pattern_->ptr[i] == pattern_->ptr[mlast])
: skip_ = mlast - i - 1;
: }
: BloomAdd(pattern_->ptr[mlast]);
I don't think this will always work for RSearch, see
https://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h
l195
I think this would need to be computed in reverse.
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 5:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
PS5, Line 289: if (str.is_null || substr.is_null) return IntVal::null();
: StringValue str_sv = StringValue::FromStringVal(str);
: StringValue substr_sv = StringValue::FromStringVal(substr);
: StringSearch search(&substr_sv);
: // Hive returns positions starting from 1.
: return IntVal(search.Search(&str_sv) + 1);
> I would expect a minor, probably insignificant performance impact. (On the
I think it makes sense to unify this by using the new method. If there's bugs
we should be able to find them.
PS5, Line 329: allowed match
> When we are searching from the right, there are two "current" positions. On
I think we could just remove the comment. The next line by itself explains
where the search starts.
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:
PS5, Line 64: context
nit: remove to match the surrounding code.
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:
Line 28: if (len == -1) {
> This is matter of taste, the style guide allows both variant. I like this o
You're right. Unfortunately the style guide wasn't clear in this regard. What
we do is put single statements in the same line as the "if", if they fit within
the 90 character limit. I'm not a fan of this myself, but that's the
predominant formatting.
I updated the style guide on this occasion to be more specific. The style guide
will probably also change a bit in the near future, but I'm confident that this
single-line-if-possible rule will not change.
PS5, Line 69: Haystack is not full string
> I didn't write the test cases, they are almost verbatim copies from https:/
I see, let's fix it here then at least. Please rephrase to "Test searching
through part of haystack" or similar.
PS5, Line 70: 5
> As I noted in the other comment, I don't know why each individual test case
Ok, makes sense to me. Can you add this as a comment to the factory methods
above?
Line 81: // Test last bit, this is the interesting edge case
> I guess last bit means last part. I'm not sure why it's interesting, it was
How about "Tests with needle at the end of the haystack." then? Doesn't make
sense to me to keep a vague comment if neither of us can fiigure out why it's
there.
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#6). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 277 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/6 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 5:
(11 comments)
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:
Line 2063: // Hive returns 0 if substr was not found in str (or on other
error coditions).
> What happens if you pass other types as the needle or haystack, e.g. int or
The analysis of the query will complain that there is no function with a
matching signature:
[localhost:21000] > select instr("1", 2, "3");
ERROR: AnalysisException: No matching function with signature: instr(STRING,
TINYINT, STRING).
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
Line 22: #include
> are the includes still needed?
Done
PS5, Line 289: if (str.is_null || substr.is_null) return IntVal::null();
: StringValue str_sv = StringValue::FromStringVal(str);
: StringValue substr_sv = StringValue::FromStringVal(substr);
: StringSearch search(&substr_sv);
: // Hive returns positions starting from 1.
: return IntVal(search.Search(&str_sv) + 1);
> Will replacing this with a call to the more general version of Instr() come
I would expect a minor, probably insignificant performance impact. (On the
other, we should be aware that in this case we will replace a well-proven
implementation with a freshly written one, so if there were any bugs in the new
implementation, the impact of those would become bigger.)
PS5, Line 299: occurance
> spelling
Done
PS5, Line 316: no
> nit: use 'num' to abbreviate number
Done
PS5, Line 329: allowed match
> what does "allowed match" mean?
When we are searching from the right, there are two "current" positions. One is
for leftmost char of the match we are looking for, the other for the rightmost
char of it.
When the user specifies -5 as an input, it means that the leftmost char of the
pattern must be before the 5th char from the right. So the actual match may be
partially after the limit.
If you can suggest a better description, please let me know.
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:
Line 28: if (len == -1) {
> single line
This is matter of taste, the style guide allows both variant. I like this one
more, because it is consistent with multi-line conditional block and it is easy
to add new lines.
PS5, Line 69: Haystack is not full string
> This comment could be clearer, Haystack is a full string. Maybe rephrase "T
I didn't write the test cases, they are almost verbatim copies from
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/experiments/string-search-test.cc#L98
PS5, Line 70: 5
> What is the difference between this and just specifying "abcab" as the hays
As I noted in the other comment, I don't know why each individual test case was
added, I just reused existing test-cases for strstr functionality. My guess is
that by taking a prefix of a StringVal, it tests whether the strstr
implementation respects the length stored in the StringValue or also reads
parts of the string which should be excluded.
Line 81: // Test last bit, this is the interesting edge case
> What does "last bit" mean? Can you add a comment why this is an interesting
I guess last bit means last part. I'm not sure why it's interesting, it wasn't
added by me.
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 204: }
> Can we advance i by skip_ here, too? Is this what the comment in line 139 r
I don't know what line 139 refers to, I did not write that comment. I don't
think i can be advanced by skip_ here either. As I have written, I created this
function in the following way:
Our fast search is based on Python's fast search. Python did not support
reverse search when it was added to our codebase, but now it does. Therefore I
compared Python's reverse search to its regular search, then applied the same
changes to our regular search, thereby creating this function. The python
implementation is linked to in a comment in this file:
http://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change.
Change subject: IMPALA-3973: add position and occurrence to instr()
..
Patch Set 5:
(12 comments)
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:
Line 2063: // Hive returns 0 if substr was not found in str (or on other
error coditions).
What happens if you pass other types as the needle or haystack, e.g. int or
double or bool? Do we cast them? Or throw an error?
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:
Line 22: #include
are the includes still needed?
PS5, Line 289: if (str.is_null || substr.is_null) return IntVal::null();
: StringValue str_sv = StringValue::FromStringVal(str);
: StringValue substr_sv = StringValue::FromStringVal(substr);
: StringSearch search(&substr_sv);
: // Hive returns positions starting from 1.
: return IntVal(search.Search(&str_sv) + 1);
Will replacing this with a call to the more general version of Instr() come
with a performance impact? If not I think we should forward here, similar to
the other Instr() overload below.
PS5, Line 299: occurance
spelling
PS5, Line 316: no
nit: use 'num' to abbreviate number
PS5, Line 329: allowed match
what does "allowed match" mean?
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search-test.cc
File be/src/runtime/string-search-test.cc:
Line 28: if (len == -1) {
single line
PS5, Line 69: Haystack is not full string
This comment could be clearer, Haystack is a full string. Maybe rephrase "Test
searching through part of haystack" or similar.
PS5, Line 70: 5
What is the difference between this and just specifying "abcab" as the
haystack? It looks like the tests with varying len parameters mostly test that
the c'tor of StringValue works correctly. Is there a downside to it, if we rely
on that (or test it elsewhere) and then just specify the strings we want to
test here without len parameters.
Line 74: // Haystack and needle not full len
Same here
Line 81: // Test last bit, this is the interesting edge case
What does "last bit" mean? Can you add a comment why this is an interesting
case?
http://gerrit.cloudera.org:8080/#/c/4094/5/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:
Line 204: }
Can we advance i by skip_ here, too? Is this what the comment in line 139
refers to?
--
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Zoltan Ivanfi
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 4: > As we discussed in person our internal strings aren't \0 > terminated. Let's try using StringFunctions::Reverse() and the > InStr implementation we already have and see how fast that is. To recap our conversation in a little more detail, using the std::string constructor that takes a length parameter in addition to the char* param would solve the problem that StringValues are not NULL-terminated, but you were also worried about the copying and the standard search implementation. The copying could be easily avoided by using std::search with reverse iterators, as suggested here: http://stackoverflow.com/a/1634416 and creating the reverse iterators directly on the char*, like suggested here: http://stackoverflow.com/a/22488428. In my opinion this would result in the cleanest solution, but you were also worried about using the standard C++ search, which is around four times slower than our internal search algorithm that unfortunately only supports searching forwards. According to the comments in the source code of our internal search algorithm, it is based on Python's search. I checked this latter and found that reverse search have been added to it recently. Since our implementations have diverged significantly, it is not possible to use the new Python search code, but it can be seen that the reverse search functionality is a trivial modification of the regular search, so I added a reverse search variant as well. I also added tests for it as well as for our regular search, which did not have tests. The name of the new test collided with the name of an existing test that was misnamed, so I renamed that. Finally I implemented instr() on top of these bloom-filter-based search algorithms. -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#5). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/experiments/CMakeLists.txt R be/src/experiments/string-search-sse-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/runtime/CMakeLists.txt A be/src/runtime/string-search-test.cc M be/src/runtime/string-search.h M common/function-registry/impala_functions.py 9 files changed, 283 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/5 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 4: As we discussed in person our internal strings aren't \0 terminated. Let's try using StringFunctions::Reverse() and the InStr implementation we already have and see how fast that is. -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has uploaded a new patch set (#2). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 87 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/2 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 3: > Uploaded patch set 4. This is a result of running git-clang-format as you suggested over chat. -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#4). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 86 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/4 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#3). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 86 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/3 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 312: std::basic_string haystack(str.ptr); > Can you add this in a comment? Done http://gerrit.cloudera.org:8080/#/c/4094/2/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS2, Line 306: str.len + start_position.val : : // Position count starts from 1, not 0. : start_position.val - 1; > line wrapping Done http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h File be/src/exprs/string-functions.h: PS1, Line 65: str > Here's a list of exceptions to the Google C++ style guide, that we stick to Done -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 312: std::basic_string haystack(str.ptr); > No, std::string is signed, therefore I would need to apply a reinterpret_ca Can you add this in a comment? http://gerrit.cloudera.org:8080/#/c/4094/2/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS2, Line 306: int search_start_pos = backwards ? : str.len + start_position.val : : start_position.val - 1; line wrapping http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h File be/src/exprs/string-functions.h: PS1, Line 65: str > I wrapped the parameters this way intentionally, as the two string paramete Here's a list of exceptions to the Google C++ style guide, that we stick to: https://wiki.cloudera.com/pages/viewpage.action?pageId=24646528 Sadly it hasn't been ported to the apache wiki yet, but it's the authoritative information on the subject. While I understand that personal style is often perceived as more readable, we prefer consistency of the code and thus usually wrap lines by the same rules, independent of the parameter semantics. -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Hello Internal Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4094 to look at the new patch set (#2). Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 87 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/2 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 2070: > I saw below that some tests use something like "cast(10 as bigint)" for num Done PS1, Line 2077: 1 > The Oracle docs say this value must be >0, please add failure tests for 0 a Done http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 305: // When searching backwards, position is counted from the end. > For readability, [lease don't interleave comments and ternary expressions. Done Line 312: std::basic_string haystack(str.ptr); > Isn't this equivalent to std::string? No, std::string is signed, therefore I would need to apply a reinterpret_cast on str.ptr, which is far less elegant and more error-prone than using std::basic_string. http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h File be/src/exprs/string-functions.h: PS1, Line 65: str > We usually wrap arguments greedily at 90 chars, so str would fit in the pre I wrapped the parameters this way intentionally, as the two string parameters make one logical group of the arguments and the two bigint parameters can be considered another logical group. Wrapping them accordingly makes the function signature easier to read in my opinion. Regarding the line length limit, the Impala wiki (https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala) links to the Google C++ style guide, which specifies 80 chars as the line length. If we use 90 chars instead, we should mention this as an exception on our wiki page. http://gerrit.cloudera.org:8080/#/c/4094/1/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: PS1, Line 435: [['instr'], 'INT', ['STRING', 'STRING', 'BIGINT', 'BIGINT'], 'impala::StringFunctions::Instr'], > long line Done -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Lars Volker has posted comments on this change. Change subject: IMPALA-3973: add position and occurrence to instr() .. Patch Set 1: (7 comments) Thank you for the change. Please see my comments. http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 2070: I saw below that some tests use something like "cast(10 as bigint)" for number typed parameters. Can you add one such test, too? PS1, Line 2077: 1 The Oracle docs say this value must be >0, please add failure tests for 0 and <0, too. http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 297: IntVal StringFunctions::Instr(FunctionContext* context, Line wrapping, see .h file. Line 305: // When searching backwards, position is counted from the end. For readability, [lease don't interleave comments and ternary expressions. Use an if-else block instead or move the comment above the line. Line 312: std::basic_string haystack(str.ptr); Isn't this equivalent to std::string? http://gerrit.cloudera.org:8080/#/c/4094/1/be/src/exprs/string-functions.h File be/src/exprs/string-functions.h: PS1, Line 65: str We usually wrap arguments greedily at 90 chars, so str would fit in the previous line. Please check the other args, too. http://gerrit.cloudera.org:8080/#/c/4094/1/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: PS1, Line 435: [['instr'], 'INT', ['STRING', 'STRING', 'BIGINT', 'BIGINT'], 'impala::StringFunctions::Instr'], long line -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has uploaded a new change for review. http://gerrit.cloudera.org:8080/4094 Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 82 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/1 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi
