Christopher,
Thank you. Will give these a try.
Don
On Thu, Mar 19, 2020 at 12:57 PM Christopher <[email protected]> wrote:
> I inserted some sample data and was able to use a regex to find values
> that matched "itemId: " followed by a valid UUID, and using negative
> look ahead, "itemId: " followed by anything other than a valid UUID.
>
> See below:
>
> root@uno t1> scan
> a b:c [] itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
> b b:c [] itemId: 11aa22bbd33d2abcav34-11d25d334455
> c b:c [] nope: 11aa22bbd33d2abcav34-11d25d334455
> d b:c [] nope: 11aa22bb-d33d-2abc-av34-11d25d334455
> root@uno t1> egrep '.*itemId:
> (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> a b:c [] itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
> root@uno t1> egrep '.*itemId:
> (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> b b:c [] itemId: 11aa22bbd33d2abcav34-11d25d334455
>
>
> On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert <[email protected]> wrote:
> >
> > Christopher,
> >
> > BLUF: We want to find all rows for a given column that do not
> have a valid UUID.
> >
> > Here is an example of what we do not want to match, which is a
> UUID in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455
> >
> > What looking for a column represented by the example column and
> the first eight characters of the UUID followed by a dash -
> >
> > Don
> >
> >
> > On Wed, Mar 18, 2020 at 11:25 PM Christopher <[email protected]>
> wrote:
> >>
> >> The shell command, egrep, uses the RegExFilter[1] underneath. It
> >> supports Java regular expressions, which does support negative look
> >> ahead. So, it should be possible.
> >>
> >> However, it is possible there's some quoting issues... the shell
> >> itself uses backslash to escape, but it also uses JLine to parse
> >> output, and JLine might treat the exclamation point specially, so it
> >> might need to be escaped twice. However, this is just a guess.
> >>
> >> I would recommend trying to eliminate the shell variable, and scan
> >> using the Java API directly to test.
> >>
> >> If you can supply some examples on what you want to match, and those
> >> you don't want to match, I could probably try it myself to see if I
> >> can come up with a solution.
> >>
> >> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
> >>
> >> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <[email protected]>
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > Does the accumulo egrep command support regex negative look ahead?
> >> >
> >> > We are trying to find all rows that do not have a UUID pattern
> using the following sample command
> >> >
> >> > egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >> >
> >> > The following egrep returns all rows that match the pattern
> >> >
> >> > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >> >
> >> > Thank you,
> >> >
> >> > Don
>