Christopher,
BLUF: We want to find all rows for a given column that do not have
a valid UUID.
Here is an example of what we do not want to match, which is a UUID
in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455
What looking for a column represented by the example column and the
first eight characters of the UUID followed by a dash -
Don
On Wed, Mar 18, 2020 at 11:25 PM Christopher <[email protected]> wrote:
> The shell command, egrep, uses the RegExFilter[1] underneath. It
> supports Java regular expressions, which does support negative look
> ahead. So, it should be possible.
>
> However, it is possible there's some quoting issues... the shell
> itself uses backslash to escape, but it also uses JLine to parse
> output, and JLine might treat the exclamation point specially, so it
> might need to be escaped twice. However, this is just a guess.
>
> I would recommend trying to eliminate the shell variable, and scan
> using the Java API directly to test.
>
> If you can supply some examples on what you want to match, and those
> you don't want to match, I could probably try it myself to see if I
> can come up with a solution.
>
> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>
> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <[email protected]> wrote:
> >
> > Hello,
> >
> > Does the accumulo egrep command support regex negative look ahead?
> >
> > We are trying to find all rows that do not have a UUID pattern using
> the following sample command
> >
> > egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >
> > The following egrep returns all rows that match the pattern
> >
> > egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >
> > Thank you,
> >
> > Don
>