Re: Does Accumulo egrep support regex negative lookahead?

2020-03-19 Thread Donald Mackert
Christopher,

Thank you.  Will give these a try.

Don

On Thu, Mar 19, 2020 at 12:57 PM Christopher  wrote:

> I inserted some sample data and was able to use a regex to find values
> that matched "itemId: " followed by a valid UUID, and using negative
> look ahead, "itemId: " followed by anything other than a valid UUID.
>
> See below:
>
> root@uno t1> scan
> a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
> b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455
> c b:c []nope: 11aa22bbd33d2abcav34-11d25d334455
> d b:c []nope: 11aa22bb-d33d-2abc-av34-11d25d334455
> root@uno t1> egrep '.*itemId:
> (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
> root@uno t1> egrep '.*itemId:
> (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455
>
>
> On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert  wrote:
> >
> > Christopher,
> >
> > BLUF: We want to find all rows for a given column that do not
> have a valid UUID.
> >
> > Here is an example of what we do not want to match, which is a
> UUID in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455
> >
> > What looking for a column represented by the example column and
> the first eight characters of the UUID followed by a dash -
> >
> > Don
> >
> >
> > On Wed, Mar 18, 2020 at 11:25 PM Christopher 
> wrote:
> >>
> >> The shell command, egrep, uses the RegExFilter[1] underneath. It
> >> supports Java regular expressions, which does support negative look
> >> ahead. So, it should be possible.
> >>
> >> However, it is possible there's some quoting issues... the shell
> >> itself uses backslash to escape, but it also uses JLine to parse
> >> output, and JLine might treat the exclamation point specially, so it
> >> might need to be escaped twice. However, this is just a guess.
> >>
> >> I would recommend trying to eliminate the shell variable, and scan
> >> using the Java API directly to test.
> >>
> >> If you can supply some examples on what you want to match, and those
> >> you don't want to match, I could probably try it myself to see if I
> >> can come up with a solution.
> >>
> >> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
> >>
> >> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > Does the accumulo egrep command support regex negative look ahead?
> >> >
> >> > We are trying to find all rows that do not have a UUID pattern
> using the following sample command
> >> >
> >> >egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >> >
> >> >   The following egrep returns all rows that match the pattern
> >> >
> >> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >> >
> >> > Thank you,
> >> >
> >> > Don
>


Re: Does Accumulo egrep support regex negative lookahead?

2020-03-19 Thread Christopher
I inserted some sample data and was able to use a regex to find values
that matched "itemId: " followed by a valid UUID, and using negative
look ahead, "itemId: " followed by anything other than a valid UUID.

See below:

root@uno t1> scan
a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455
c b:c []nope: 11aa22bbd33d2abcav34-11d25d334455
d b:c []nope: 11aa22bb-d33d-2abc-av34-11d25d334455
root@uno t1> egrep '.*itemId: (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
root@uno t1> egrep '.*itemId: (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455


On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert  wrote:
>
> Christopher,
>
> BLUF: We want to find all rows for a given column that do not have a 
> valid UUID.
>
> Here is an example of what we do not want to match, which is a UUID 
> in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455
>
> What looking for a column represented by the example column and the 
> first eight characters of the UUID followed by a dash -
>
> Don
>
>
> On Wed, Mar 18, 2020 at 11:25 PM Christopher  wrote:
>>
>> The shell command, egrep, uses the RegExFilter[1] underneath. It
>> supports Java regular expressions, which does support negative look
>> ahead. So, it should be possible.
>>
>> However, it is possible there's some quoting issues... the shell
>> itself uses backslash to escape, but it also uses JLine to parse
>> output, and JLine might treat the exclamation point specially, so it
>> might need to be escaped twice. However, this is just a guess.
>>
>> I would recommend trying to eliminate the shell variable, and scan
>> using the Java API directly to test.
>>
>> If you can supply some examples on what you want to match, and those
>> you don't want to match, I could probably try it myself to see if I
>> can come up with a solution.
>>
>> [1]: 
>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>>
>> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert  wrote:
>> >
>> > Hello,
>> >
>> > Does the accumulo egrep command support regex negative look ahead?
>> >
>> > We are trying to find all rows that do not have a UUID pattern using 
>> > the following sample command
>> >
>> >egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
>> >
>> >   The following egrep returns all rows that match the pattern
>> >
>> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
>> >
>> > Thank you,
>> >
>> > Don


Re: Does Accumulo egrep support regex negative lookahead?

2020-03-19 Thread Donald Mackert
Christopher,

BLUF: We want to find all rows for a given column that do not have
a valid UUID.

Here is an example of what we do not want to match, which is a UUID
in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455

What looking for a column represented by the example column and the
first eight characters of the UUID followed by a dash -

Don


On Wed, Mar 18, 2020 at 11:25 PM Christopher  wrote:

> The shell command, egrep, uses the RegExFilter[1] underneath. It
> supports Java regular expressions, which does support negative look
> ahead. So, it should be possible.
>
> However, it is possible there's some quoting issues... the shell
> itself uses backslash to escape, but it also uses JLine to parse
> output, and JLine might treat the exclamation point specially, so it
> might need to be escaped twice. However, this is just a guess.
>
> I would recommend trying to eliminate the shell variable, and scan
> using the Java API directly to test.
>
> If you can supply some examples on what you want to match, and those
> you don't want to match, I could probably try it myself to see if I
> can come up with a solution.
>
> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>
> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert  wrote:
> >
> > Hello,
> >
> > Does the accumulo egrep command support regex negative look ahead?
> >
> > We are trying to find all rows that do not have a UUID pattern using
> the following sample command
> >
> >egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >
> >   The following egrep returns all rows that match the pattern
> >
> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >
> > Thank you,
> >
> > Don
>


Re: Does Accumulo egrep support regex negative lookahead?

2020-03-19 Thread Donald Mackert
Christopher,

 Thank you.  Will take a look.

Don

On Wed, Mar 18, 2020 at 11:25 PM Christopher  wrote:

> The shell command, egrep, uses the RegExFilter[1] underneath. It
> supports Java regular expressions, which does support negative look
> ahead. So, it should be possible.
>
> However, it is possible there's some quoting issues... the shell
> itself uses backslash to escape, but it also uses JLine to parse
> output, and JLine might treat the exclamation point specially, so it
> might need to be escaped twice. However, this is just a guess.
>
> I would recommend trying to eliminate the shell variable, and scan
> using the Java API directly to test.
>
> If you can supply some examples on what you want to match, and those
> you don't want to match, I could probably try it myself to see if I
> can come up with a solution.
>
> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>
> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert  wrote:
> >
> > Hello,
> >
> > Does the accumulo egrep command support regex negative look ahead?
> >
> > We are trying to find all rows that do not have a UUID pattern using
> the following sample command
> >
> >egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >
> >   The following egrep returns all rows that match the pattern
> >
> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >
> > Thank you,
> >
> > Don
>


Re: Does Accumulo egrep support regex negative lookahead?

2020-03-18 Thread Christopher
The shell command, egrep, uses the RegExFilter[1] underneath. It
supports Java regular expressions, which does support negative look
ahead. So, it should be possible.

However, it is possible there's some quoting issues... the shell
itself uses backslash to escape, but it also uses JLine to parse
output, and JLine might treat the exclamation point specially, so it
might need to be escaped twice. However, this is just a guess.

I would recommend trying to eliminate the shell variable, and scan
using the Java API directly to test.

If you can supply some examples on what you want to match, and those
you don't want to match, I could probably try it myself to see if I
can come up with a solution.

[1]: 
https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java

On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert  wrote:
>
> Hello,
>
> Does the accumulo egrep command support regex negative look ahead?
>
> We are trying to find all rows that do not have a UUID pattern using the 
> following sample command
>
>egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
>
>   The following egrep returns all rows that match the pattern
>
>   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
>
> Thank you,
>
> Don