[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-23 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163937#comment-17163937
 ] 

Pierre Villard commented on NIFI-2072:
--

I might be able to have a look over the WE but if someone can give it a try, 
that would be helpful.

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-23 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163925#comment-17163925
 ] 

Otto Fowler commented on NIFI-2072:
---

The PR is up for review.  The next step is that somebody reviews it.  And if 
that person is a committer then they can +1 it and merge it.

[~pvillard] is pretty busy.

You are welcome to review and try etc.  If that is in the realm of things you 
are comfortable doing

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-23 Thread Malthe Borch (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163913#comment-17163913
 ] 

Malthe Borch commented on NIFI-2072:


[~otto] Nice work. I think this is ready for the next step (not sure who/how 
that works).

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-23 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163793#comment-17163793
 ] 

Otto Fowler commented on NIFI-2072:
---

[~malthe] I just pushed validation support

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-07 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153069#comment-17153069
 ] 

Otto Fowler commented on NIFI-2072:
---

I'm more inclined to do the validation, since it think handling mixed, and 
scoped ( nested ) etc goes downhill real fast since java doesn't support it.

Would would be nice is when I called group(string) i could also call 
getGroupIndex(string) so that I could mix and match, but you can't.

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-07 Thread Malthe Borch (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152904#comment-17152904
 ] 

Malthe Borch commented on NIFI-2072:


I would be happy then with "Enable named group support".

In terms of what happens if an unnamed capture group is used, I think it would 
be better to either:

- Allow it.
- Implement a validation step that scans the expression for unnamed capture 
groups (i.e. those that are not named and not non-capturing).

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-07 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152886#comment-17152886
 ] 

Pierre Villard commented on NIFI-2072:
--

Sorry, I don't really have the time to look into it as much as I'd like right 
now. The only rule is that we can't make any breaking change: meaning that any 
existing flow should keep working the exact same way after an upgrade. That's 
why we usually provide a property to explicitly allow users to enable this new 
behavior.

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-06 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152081#comment-17152081
 ] 

Otto Fowler commented on NIFI-2072:
---

That is a good question.  Usually (in my experience) when a behavior of a 
processor is changed it is put behind a configuration property, so that is the 
convention I followed.   [~pvillard] did this as well.

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-06 Thread Malthe Borch (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151827#comment-17151827
 ] 

Malthe Borch commented on NIFI-2072:


Is it really necessary to enable named capture group rather than just use them? 
If I don't want a named capture group, I suppose I am just not going to name 
them, opting instead for enumerated ones.

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>  Labels: extracttext
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-03 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151064#comment-17151064
 ] 

Otto Fowler commented on NIFI-2072:
---

OK

I have a PR just about ready for this.  But just to get some feedback first:

After the PR there implicitly two ways the processor works based on the enable 
named groups property.
The old way if it is not enabled.

The new way.
The new way is different in that numeric indices are not added until the second 
set of matches ( if you have that enabled).

The root attribute name is used for the 0 group -or- the whole match line if 
there are no groups specified.

such as : 

{code:java}
@Test
public void testFindAll() throws Exception {
final TestRunner testRunner = TestRunners.newTestRunner(new 
ExtractText());
testRunner.setProperty(ENABLE_NAMED_GROUPS, "true");
testRunner.setProperty(ExtractText.ENABLE_REPEATING_CAPTURE_GROUP, 
"true");
final String attributeKey = "regex.result";
testRunner.setProperty(attributeKey, "(?s)(?\\w+)");
testRunner.enqueue("This is my text".getBytes(StandardCharsets.UTF_8));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
// Ensure the zero capture group is in the resultant attributes
out.assertAttributeExists(attributeKey);
out.assertAttributeExists(attributeKey + ".W");
out.assertAttributeExists(attributeKey + ".W.1");
out.assertAttributeExists(attributeKey + ".W.2");
out.assertAttributeExists(attributeKey + ".W.3");
out.assertAttributeEquals(attributeKey, "This");
out.assertAttributeEquals(attributeKey + ".W", "This");
out.assertAttributeEquals(attributeKey + ".W.1", "is");
out.assertAttributeEquals(attributeKey + ".W.2", "my");
out.assertAttributeEquals(attributeKey + ".W.3", "text");
}

@Test
public void testFindAllPair() throws Exception {
final TestRunner testRunner = TestRunners.newTestRunner(new 
ExtractText());
testRunner.setProperty(ENABLE_NAMED_GROUPS, "true");
testRunner.setProperty(ExtractText.ENABLE_REPEATING_CAPTURE_GROUP, 
"true");
final String attributeKey = "regex.result";
testRunner.setProperty(attributeKey, "(?\\w+)=(?\\d+)");
testRunner.enqueue("a=1,b=10,c=100".getBytes(StandardCharsets.UTF_8));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
// Ensure the zero capture group is in the resultant attributes
out.assertAttributeExists(attributeKey);
out.assertAttributeExists(attributeKey + ".LEFT");
out.assertAttributeExists(attributeKey + ".RIGHT");
out.assertAttributeExists(attributeKey + ".LEFT.1");
out.assertAttributeExists(attributeKey + ".RIGHT.1");
out.assertAttributeExists(attributeKey + ".LEFT.2");
out.assertAttributeExists(attributeKey + ".RIGHT.2");
out.assertAttributeNotExists(attributeKey + ".LEFT.3"); // Ensure 
there's no more attributes
out.assertAttributeNotExists(attributeKey + ".RIGHT.3"); // Ensure 
there's no more attributes
out.assertAttributeEquals(attributeKey , "a=1");
out.assertAttributeEquals(attributeKey + ".LEFT", "a");
out.assertAttributeEquals(attributeKey + ".RIGHT", "1");
out.assertAttributeEquals(attributeKey + ".LEFT.1", "b");
out.assertAttributeEquals(attributeKey + ".RIGHT.1", "10");
out.assertAttributeEquals(attributeKey + ".LEFT.2", "c");
out.assertAttributeEquals(attributeKey + ".RIGHT.2", "100");
}
{code}


> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-07-03 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151065#comment-17151065
 ] 

Otto Fowler commented on NIFI-2072:
---

[~pvillard]

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-06-30 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148956#comment-17148956
 ] 

Otto Fowler commented on NIFI-2072:
---

[~pvillard]

Something like this?  The restriction on the property to enable is:  if you 
want name groups, all your capturing groups MUST be named.  You can't mix named 
and unnamed captures.


{code:java}
final String SAMPLE_STRING = 
"foo\r\nbar1\r\nbar2\r\nbar3\r\nhello\r\nworld\r\n";

 @Test
public void testProcessorWithGroupNames() throws Exception {

final TestRunner testRunner = TestRunners.newTestRunner(new 
ExtractText());

testRunner.setProperty("regex.result1", "(?s)(?.*)");
testRunner.setProperty("regex.result2", "(?s).*(?bar1).*");
testRunner.setProperty("regex.result3", "(?s).*?(?bar\\d).*"); 
testRunner.setProperty("regex.result4", 
"(?s).*?(?:bar\\d).*?(?bar\\d).*?(?bar3).*"); 
testRunner.setProperty("regex.result5", "(?s).*(?bar\\d).*"); 
testRunner.setProperty("regex.result6", "(?s)^(?.*)$");
testRunner.setProperty("regex.result7", "(?s)(?XXX)");
testRunner.setProperty(ENABLE_NAMED_GROUPS, "true");
testRunner.enqueue(SAMPLE_STRING.getBytes("UTF-8"));
testRunner.run();

testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
java.util.Map attributes = out.getAttributes();
out.assertAttributeEquals("regex.result1.all", SAMPLE_STRING);
out.assertAttributeEquals("regex.result2.bar1", "bar1");
out.assertAttributeEquals("regex.result3.bar1", "bar1");
out.assertAttributeEquals("regex.result4.bar2", "bar2");
out.assertAttributeEquals("regex.result4.bar2", "bar2");
out.assertAttributeEquals("regex.result4.bar3", "bar3");
out.assertAttributeEquals("regex.result5.bar3", "bar3");
out.assertAttributeEquals("regex.result6.all", SAMPLE_STRING);
out.assertAttributeEquals("regex.result7.miss", null);
}
{code}


> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-06-28 Thread Otto Fowler (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147556#comment-17147556
 ] 

Otto Fowler commented on NIFI-2072:
---

I'll take a shot

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Assignee: Otto Fowler
>Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-06-26 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146483#comment-17146483
 ] 

Pierre Villard commented on NIFI-2072:
--

Hi [~malthe] - I didn't go further on this one and that's definitely open to 
anyone willing to give it a try.

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2020-06-26 Thread Malthe Borch (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146104#comment-17146104
 ] 

Malthe Borch commented on NIFI-2072:


[~pvillard] did you ever make any headway with this or is it open for work, 
assuming that you are still happy with the suggested behavior?

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>Priority: Major
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2017-06-26 Thread Andre F de Miranda (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063392#comment-16063392
 ] 

Andre F de Miranda commented on NIFI-2072:
--

[~jfrazee] [~pvillard] There is a workaround to this that is using ExtractGrok. 
Grok support named captures (and their extraction into attributes) out of the 
box, making a suitable alternative to the functionality requested here.

All you need to do is to paste a pure regex on the grok pattern and voila. You 
get an attribute {{grok.captureName}} and with the captured value

Unless there are some edge cases where ExtractGrok won't be able to handle I 
suggest this to be a won't fix?



> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2016-09-23 Thread Joey Frazee (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516790#comment-15516790
 ] 

Joey Frazee commented on NIFI-2072:
---

[~pvillard] Yeah, I honestly just assumed that would have been there in the 
Pattern and/or Matcher. I do want to ask, though, from a usability perspective, 
if getting a bit nasty might be justified here?

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2072) Support named captures in ExtractText

2016-09-22 Thread Pierre Villard (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512784#comment-15512784
 ] 

Pierre Villard commented on NIFI-2072:
--

[~jfrazee] I was looking at this JIRA and I agree that it would be a great 
addition. However it seems this is not possible to get the group names from the 
Pattern expression unless if we are going a bit nasty...
http://stackoverflow.com/questions/15588903/get-group-names-in-java-regex

> Support named captures in ExtractText
> -
>
> Key: NIFI-2072
> URL: https://issues.apache.org/jira/browse/NIFI-2072
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joey Frazee
>
> ExtractText currently captures and creates attributes using numeric indices 
> (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture 
> groups are named, i.e., patterns like (?\w+).
> In addition to being more faithful to the provided regexes, named captures 
> could help simplify data flows because you wouldn't have to add superfluous 
> UpdateAttribute steps which are just renaming the indexed captures to more 
> interpretable names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)