gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2077577726
Of course! Thank you for contributing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2075607127
You just made my week! Thank you so much for taking the time to work
through my buggy iterations of this — I really appreciate it!
--
This is an automated message from the
gianm merged PR #16153:
URL: https://github.com/apache/druid/pull/16153
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2074084352
I just had a look at the latest patch. The list-of-patterns approach makes
more intuitive sense to me: the LIKE pattern is split on `%` into fixed-length
subpatterns, then each
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2060362644
@twilliamson thanks! I will have another look at the latest patch. Hopefully
should have some time for that tomorrow. Updating this PR works for me.
--
This is an automated message from
twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2059712591
The overnight dedicated-box perf tests finished, and they look good. I've
gone ahead an updated this PR with the list-of-`java.util.regex.Pattern`
approach. Hopefully that's a good
twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2057525485
I got the custom implementation to work, but it's pretty complicated, and
I'm bit worried about there being more bugs. Similarly, switching the regex
library to `joni` or `re2`
twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2050082812
You're right: I'm making the same mistake as with the suffix — and now
worrying I left the same bug in the Facebook stream processing system…
Sequences of literal characters and
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2049007865
This test has an issue on the latest patch:
```
assertMatch("%1 _ 5%6", "1 2 3 1 4 5 6", DruidPredicateMatch.TRUE);
```
The matcher strips off the `6` to get `1 2 3 1 4 5
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2046538087
OK, thanks, I'll keep an eye out for the update.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2046252316
Backtracking isn't needed – the bug in my current PR is in the
suffix-handling logic. ☹️
At a high level, there are three parts to the match:
- Must start with (the prefix)
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2046220895
Just looked more deeply, I think there's a bug in the logic around the
greediness of the matching. This test fails on the patch but passes on master:
```
assertMatch("%ba_", "foo
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2046146984
> @abhishekagarwal87 @gianm I was just looking at this PR, not sure what you
are suggesting for how to make forward progress?
Oh, I don't think the discussion about regex libraries
abhishekagarwal87 commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2044166247
@imply-cheddar - Would be good to know what author thinks of adding support
for other regex libraries that do not suffer from catastrophic backtracking in
this PR. If not, we
imply-cheddar commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2044150218
@abhishekagarwal87 @gianm I was just looking at this PR, not sure what you
are suggesting for how to make forward progress?
--
This is an automated message from the Apache Git
twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2040716015
This info is from 5 or 6 years back while working on stream processing
systems at Facebook, but my recollection is that `re2j` had issues with UTF-8
multi-byte sequences. Not sure if
abhishekagarwal87 commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2039313940
@gianm - https://github.com/google/re2j seems to fit the bill. A
cluster-wide setting that switches the regex-based filters to use this library
instead of the standard Java
gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2030550437
For `LIKE` it seems reasonable to have an optimized matcher specifically
geared towards the needs of `LIKE`. I am disappointed that the JDK regexp
library doesn't handle these relatively
abhishekagarwal87 commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2009025245
https://github.com/spring-projects/spring-framework/blob/main/spring-expression/src/main/java/org/springframework/expression/spel/ast/OperatorMatches.java#L128
seems like a
19 matches
Mail list logo