Re: Does Accumulo egrep support regex negative lookahead?
Christopher, Thank you. Will give these a try. Don On Thu, Mar 19, 2020 at 12:57 PM Christopher wrote: > I inserted some sample data and was able to use a regex to find values > that matched "itemId: " followed by a valid UUID, and using negative > look ahead, "itemId: " followed by anything other than a valid UUID. > > See below: > > root@uno t1> scan > a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc > b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455 > c b:c []nope: 11aa22bbd33d2abcav34-11d25d334455 > d b:c []nope: 11aa22bb-d33d-2abc-av34-11d25d334455 > root@uno t1> egrep '.*itemId: > (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*' > a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc > root@uno t1> egrep '.*itemId: > (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*' > egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*' > b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455 > > > On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert wrote: > > > > Christopher, > > > > BLUF: We want to find all rows for a given column that do not > have a valid UUID. > > > > Here is an example of what we do not want to match, which is a > UUID in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455 > > > > What looking for a column represented by the example column and > the first eight characters of the UUID followed by a dash - > > > > Don > > > > > > On Wed, Mar 18, 2020 at 11:25 PM Christopher > wrote: > >> > >> The shell command, egrep, uses the RegExFilter[1] underneath. It > >> supports Java regular expressions, which does support negative look > >> ahead. So, it should be possible. > >> > >> However, it is possible there's some quoting issues... the shell > >> itself uses backslash to escape, but it also uses JLine to parse > >> output, and JLine might treat the exclamation point specially, so it > >> might need to be escaped twice. However, this is just a guess. > >> > >> I would recommend trying to eliminate the shell variable, and scan > >> using the Java API directly to test. > >> > >> If you can supply some examples on what you want to match, and those > >> you don't want to match, I could probably try it myself to see if I > >> can come up with a solution. > >> > >> [1]: > https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java > >> > >> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert > wrote: > >> > > >> > Hello, > >> > > >> > Does the accumulo egrep command support regex negative look ahead? > >> > > >> > We are trying to find all rows that do not have a UUID pattern > using the following sample command > >> > > >> >egrep -c column > ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$ > >> > > >> > The following egrep returns all rows that match the pattern > >> > > >> > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$ > >> > > >> > Thank you, > >> > > >> > Don >
Re: Does Accumulo egrep support regex negative lookahead?
I inserted some sample data and was able to use a regex to find values that matched "itemId: " followed by a valid UUID, and using negative look ahead, "itemId: " followed by anything other than a valid UUID. See below: root@uno t1> scan a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455 c b:c []nope: 11aa22bbd33d2abcav34-11d25d334455 d b:c []nope: 11aa22bb-d33d-2abc-av34-11d25d334455 root@uno t1> egrep '.*itemId: (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*' a b:c []itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc root@uno t1> egrep '.*itemId: (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*' egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*' b b:c []itemId: 11aa22bbd33d2abcav34-11d25d334455 On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert wrote: > > Christopher, > > BLUF: We want to find all rows for a given column that do not have a > valid UUID. > > Here is an example of what we do not want to match, which is a UUID > in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455 > > What looking for a column represented by the example column and the > first eight characters of the UUID followed by a dash - > > Don > > > On Wed, Mar 18, 2020 at 11:25 PM Christopher wrote: >> >> The shell command, egrep, uses the RegExFilter[1] underneath. It >> supports Java regular expressions, which does support negative look >> ahead. So, it should be possible. >> >> However, it is possible there's some quoting issues... the shell >> itself uses backslash to escape, but it also uses JLine to parse >> output, and JLine might treat the exclamation point specially, so it >> might need to be escaped twice. However, this is just a guess. >> >> I would recommend trying to eliminate the shell variable, and scan >> using the Java API directly to test. >> >> If you can supply some examples on what you want to match, and those >> you don't want to match, I could probably try it myself to see if I >> can come up with a solution. >> >> [1]: >> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java >> >> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert wrote: >> > >> > Hello, >> > >> > Does the accumulo egrep command support regex negative look ahead? >> > >> > We are trying to find all rows that do not have a UUID pattern using >> > the following sample command >> > >> >egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$ >> > >> > The following egrep returns all rows that match the pattern >> > >> > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$ >> > >> > Thank you, >> > >> > Don
Re: Does Accumulo egrep support regex negative lookahead?
Christopher, BLUF: We want to find all rows for a given column that do not have a valid UUID. Here is an example of what we do not want to match, which is a UUID in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455 What looking for a column represented by the example column and the first eight characters of the UUID followed by a dash - Don On Wed, Mar 18, 2020 at 11:25 PM Christopher wrote: > The shell command, egrep, uses the RegExFilter[1] underneath. It > supports Java regular expressions, which does support negative look > ahead. So, it should be possible. > > However, it is possible there's some quoting issues... the shell > itself uses backslash to escape, but it also uses JLine to parse > output, and JLine might treat the exclamation point specially, so it > might need to be escaped twice. However, this is just a guess. > > I would recommend trying to eliminate the shell variable, and scan > using the Java API directly to test. > > If you can supply some examples on what you want to match, and those > you don't want to match, I could probably try it myself to see if I > can come up with a solution. > > [1]: > https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java > > On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert wrote: > > > > Hello, > > > > Does the accumulo egrep command support regex negative look ahead? > > > > We are trying to find all rows that do not have a UUID pattern using > the following sample command > > > >egrep -c column > ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$ > > > > The following egrep returns all rows that match the pattern > > > > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$ > > > > Thank you, > > > > Don >
Re: Does Accumulo egrep support regex negative lookahead?
Christopher, Thank you. Will take a look. Don On Wed, Mar 18, 2020 at 11:25 PM Christopher wrote: > The shell command, egrep, uses the RegExFilter[1] underneath. It > supports Java regular expressions, which does support negative look > ahead. So, it should be possible. > > However, it is possible there's some quoting issues... the shell > itself uses backslash to escape, but it also uses JLine to parse > output, and JLine might treat the exclamation point specially, so it > might need to be escaped twice. However, this is just a guess. > > I would recommend trying to eliminate the shell variable, and scan > using the Java API directly to test. > > If you can supply some examples on what you want to match, and those > you don't want to match, I could probably try it myself to see if I > can come up with a solution. > > [1]: > https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java > > On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert wrote: > > > > Hello, > > > > Does the accumulo egrep command support regex negative look ahead? > > > > We are trying to find all rows that do not have a UUID pattern using > the following sample command > > > >egrep -c column > ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$ > > > > The following egrep returns all rows that match the pattern > > > > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$ > > > > Thank you, > > > > Don >
Re: Does Accumulo egrep support regex negative lookahead?
The shell command, egrep, uses the RegExFilter[1] underneath. It supports Java regular expressions, which does support negative look ahead. So, it should be possible. However, it is possible there's some quoting issues... the shell itself uses backslash to escape, but it also uses JLine to parse output, and JLine might treat the exclamation point specially, so it might need to be escaped twice. However, this is just a guess. I would recommend trying to eliminate the shell variable, and scan using the Java API directly to test. If you can supply some examples on what you want to match, and those you don't want to match, I could probably try it myself to see if I can come up with a solution. [1]: https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert wrote: > > Hello, > > Does the accumulo egrep command support regex negative look ahead? > > We are trying to find all rows that do not have a UUID pattern using the > following sample command > >egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$ > > The following egrep returns all rows that match the pattern > > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$ > > Thank you, > > Don