Chris, I was stumped on this for a few minutes, but then realized I was only trying your template against the latest 0.3.0 code that has not been released. Sure enough, switching to the 0.2.1 release, I now see your issue where the content of the FlowFile is getting the matched value twice.
The good news is this was identified and fixed for the upcoming release: https://issues.apache.org/jira/browse/NIFI-911 It looks like in the meantime you could change the ReplaceText regular expression to (?s:^.*$) for the ReplaceText coming after ExtractText. Another ticket in 0.3.0 that may be relevant for you, is this one: https://issues.apache.org/jira/browse/NIFI-808 It allows you to turn off capturing group 0 since in a lot of cases this isn't used and could be large, so you would only end up with secaudit.json and secaudit.json.1 -Bryan On Thu, Sep 10, 2015 at 12:16 PM, Christopher Wilson <[email protected]> wrote: > The behavior I see is for the ExtractText -> ReplaceText path where the > attributes, secaudit.json, secaudit.json.0, and secaudit.json.1 are > concatenated into the payload (below). > > What I expected was that the attribute, secaudit.json, would have replaced > the payload. I've tried .0 and .1 as the replacement attribute and I still > see the same behavior. > > {"priority": "INFO", "event_type": "identity.authenticate", "timestamp": > "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", > "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", > "initiator": {"typeURI": "service/security/account/user", "host": {"agent": > "python-keystoneclient", "address": "10.0.0.60"}, "id": > "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": > "service/security/account/user", "id": > "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": > "service/security", "id": > "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", > "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", > "outcome": "success", "id": > "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": > "8c5c8576-9850-4920-a1d5-1053e2c704d7"}{"priority": "INFO", "event_type": > "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", > "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": " > http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": > "service/security/account/user", "host": {"agent": "python-keystoneclient", > "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, > "target": {"typeURI": "service/security/account/user", "id": > "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": > "service/security", "id": > "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", > "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", > "outcome": "success", "id": > "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": > "8c5c8576-9850-4920-a1d5-1053e2c704d7"} > > -Chris > > On Thu, Sep 10, 2015 at 11:55 AM, Bryan Bende <[email protected]> wrote: > >> Chris, >> >> I've been playing around with your template, and as far as I can tell >> both routes (ExtractText+ReplaceText vs. just ReplaceText) are producing a >> FlowFile with the same content, the difference is in the attributes... >> >> For ExtractText + ReplaceText I see this: >> >> Key: 'secaudit.json' >> Value: '{"priority": "INFO", "event_type": "identity.authenticate", >> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": >> "identity.ip-10-0-0-60", "payload": {"typeURI": " >> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": >> "service/security/account/user", "host": {"agent": "python-keystoneclient", >> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, >> "target": {"typeURI": "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}' >> Key: 'secaudit.json.0' >> Value: '{"priority": "INFO", "event_type": "identity.authenticate", >> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": >> "identity.ip-10-0-0-60", "payload": {"typeURI": " >> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": >> "service/security/account/user", "host": {"agent": "python-keystoneclient", >> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, >> "target": {"typeURI": "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}' >> Key: 'secaudit.json.1' >> Value: '{"priority": "INFO", "event_type": "identity.authenticate", >> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": >> "identity.ip-10-0-0-60", "payload": {"typeURI": " >> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": >> "service/security/account/user", "host": {"agent": "python-keystoneclient", >> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, >> "target": {"typeURI": "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}' >> -------------------------------------------------- >> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp": >> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", >> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", >> "initiator": {"typeURI": "service/security/account/user", "host": {"agent": >> "python-keystoneclient", "address": "10.0.0.60"}, "id": >> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >> "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"} >> >> >> The content/payload is the part below the --------------------, and the >> three attributes secaudit.json, secaudit.json.0, and secaudit.json.1 are >> the resulting attributes from ExtractText. >> The reason for those three attributes is that it puts the first match >> into an attribute with the name of the property you specified >> (secaudit.json), then it puts the entire match into index 0 (in case you >> had multiple capture groups this would have them all) then it puts each >> capture group after that starting with 1. >> >> For the ReplaceText by itself I see: >> .... >> -------------------------------------------------- >> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp": >> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", >> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", >> "initiator": {"typeURI": "service/security/account/user", "host": {"agent": >> "python-keystoneclient", "address": "10.0.0.60"}, "id": >> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >> "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"} >> >> >> Is this the same behavior you are seeing? >> >> >> -Bryan >> >> >> On Thu, Sep 10, 2015 at 11:22 AM, Matt Gilman <[email protected]> >> wrote: >> >>> Chris, >>> >>> Since your dealing with JSON data, you may want to consider using >>> EvaluateJsonPath. It supports specifying XPath like expressions to extract >>> values and store into FlowFile attributes or content. If your extracting >>> into attributes, you can evaluate multiple paths. However, if your >>> extracting into FlowFile content you can only specify a single path. >>> >>> I'll take a look at your template to see what's going on. >>> >>> Matt >>> >>> On Thu, Sep 10, 2015 at 11:00 AM, Christopher Wilson < >>> [email protected]> wrote: >>> >>>> I've ran into an issue with ReplaceText on another thread but thought >>>> I'd move this over to it's own. >>>> >>>> What I have is a syslog entry from OpenStack that contains CADF (Cloud >>>> Audit Data Federation) JSON as the payload. In the context of OpenStack >>>> these are login/security events that we'd like to see outside of a normal >>>> syslog stream and passed directly over to the security team. I'd started >>>> down the path of ExtractText and pulling out the associated JSON into an >>>> attribute but found when I wired in a ReplaceText and tried to replace the >>>> content with the attribute 3 copies of the JSON data were written to the >>>> file content. >>>> >>>> What I've since learned is I can just replace the text in place without >>>> yanking into an attribute. However, I can see cases where I might want to >>>> replace/append text using one or more attributes. Wanted to see if other >>>> have handled this differently and if there is an enhancement request in the >>>> offing? >>>> >>>> I put the template I was working from, with a line of the syslog data, >>>> up on GitHub in case anyone wants to see this behavior in action. You just >>>> have to play with turning processors on/off when viewing the full bulletin >>>> board. >>>> >>>> https://github.com/cj-wilson/NiFi-Templates >>>> >>>> Thanks in advance. >>>> >>>> -Chris >>>> >>> >>> >> >
