That's awesome, thank you very much! -Chris
On Thu, Sep 10, 2015 at 12:44 PM, Bryan Bende <[email protected]> wrote: > Chris, > > I was stumped on this for a few minutes, but then realized I was only > trying your template against the latest 0.3.0 code that has not been > released. > Sure enough, switching to the 0.2.1 release, I now see your issue where > the content of the FlowFile is getting the matched value twice. > > The good news is this was identified and fixed for the upcoming release: > https://issues.apache.org/jira/browse/NIFI-911 > > It looks like in the meantime you could change the ReplaceText regular > expression to (?s:^.*$) for the ReplaceText coming after ExtractText. > > Another ticket in 0.3.0 that may be relevant for you, is this one: > https://issues.apache.org/jira/browse/NIFI-808 > > It allows you to turn off capturing group 0 since in a lot of cases this > isn't used and could be large, so you would only end up with secaudit.json > and secaudit.json.1 > > -Bryan > > > On Thu, Sep 10, 2015 at 12:16 PM, Christopher Wilson <[email protected]> > wrote: > >> The behavior I see is for the ExtractText -> ReplaceText path where the >> attributes, secaudit.json, secaudit.json.0, and secaudit.json.1 are >> concatenated into the payload (below). >> >> What I expected was that the attribute, secaudit.json, would have >> replaced the payload. I've tried .0 and .1 as the replacement attribute >> and I still see the same behavior. >> >> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp": >> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", >> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", >> "initiator": {"typeURI": "service/security/account/user", "host": {"agent": >> "python-keystoneclient", "address": "10.0.0.60"}, "id": >> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >> "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}{"priority": "INFO", "event_type": >> "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", >> "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": " >> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": >> "service/security/account/user", "host": {"agent": "python-keystoneclient", >> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, >> "target": {"typeURI": "service/security/account/user", "id": >> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >> "service/security", "id": >> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >> "outcome": "success", "id": >> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >> "8c5c8576-9850-4920-a1d5-1053e2c704d7"} >> >> -Chris >> >> On Thu, Sep 10, 2015 at 11:55 AM, Bryan Bende <[email protected]> wrote: >> >>> Chris, >>> >>> I've been playing around with your template, and as far as I can tell >>> both routes (ExtractText+ReplaceText vs. just ReplaceText) are producing a >>> FlowFile with the same content, the difference is in the attributes... >>> >>> For ExtractText + ReplaceText I see this: >>> >>> Key: 'secaudit.json' >>> Value: '{"priority": "INFO", "event_type": "identity.authenticate", >>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": >>> "identity.ip-10-0-0-60", "payload": {"typeURI": " >>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": >>> {"typeURI": "service/security/account/user", "host": {"agent": >>> "python-keystoneclient", "address": "10.0.0.60"}, "id": >>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >>> "service/security/account/user", "id": >>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >>> "service/security", "id": >>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >>> "outcome": "success", "id": >>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}' >>> Key: 'secaudit.json.0' >>> Value: '{"priority": "INFO", "event_type": "identity.authenticate", >>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": >>> "identity.ip-10-0-0-60", "payload": {"typeURI": " >>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": >>> {"typeURI": "service/security/account/user", "host": {"agent": >>> "python-keystoneclient", "address": "10.0.0.60"}, "id": >>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >>> "service/security/account/user", "id": >>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >>> "service/security", "id": >>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >>> "outcome": "success", "id": >>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}' >>> Key: 'secaudit.json.1' >>> Value: '{"priority": "INFO", "event_type": "identity.authenticate", >>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": >>> "identity.ip-10-0-0-60", "payload": {"typeURI": " >>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": >>> {"typeURI": "service/security/account/user", "host": {"agent": >>> "python-keystoneclient", "address": "10.0.0.60"}, "id": >>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >>> "service/security/account/user", "id": >>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >>> "service/security", "id": >>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >>> "outcome": "success", "id": >>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}' >>> -------------------------------------------------- >>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp": >>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", >>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", >>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent": >>> "python-keystoneclient", "address": "10.0.0.60"}, "id": >>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >>> "service/security/account/user", "id": >>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >>> "service/security", "id": >>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >>> "outcome": "success", "id": >>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"} >>> >>> >>> The content/payload is the part below the --------------------, and the >>> three attributes secaudit.json, secaudit.json.0, and secaudit.json.1 are >>> the resulting attributes from ExtractText. >>> The reason for those three attributes is that it puts the first match >>> into an attribute with the name of the property you specified >>> (secaudit.json), then it puts the entire match into index 0 (in case you >>> had multiple capture groups this would have them all) then it puts each >>> capture group after that starting with 1. >>> >>> For the ReplaceText by itself I see: >>> .... >>> -------------------------------------------------- >>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp": >>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", >>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", >>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent": >>> "python-keystoneclient", "address": "10.0.0.60"}, "id": >>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": >>> "service/security/account/user", "id": >>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": >>> "service/security", "id": >>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", >>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", >>> "outcome": "success", "id": >>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": >>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"} >>> >>> >>> Is this the same behavior you are seeing? >>> >>> >>> -Bryan >>> >>> >>> On Thu, Sep 10, 2015 at 11:22 AM, Matt Gilman <[email protected]> >>> wrote: >>> >>>> Chris, >>>> >>>> Since your dealing with JSON data, you may want to consider using >>>> EvaluateJsonPath. It supports specifying XPath like expressions to extract >>>> values and store into FlowFile attributes or content. If your extracting >>>> into attributes, you can evaluate multiple paths. However, if your >>>> extracting into FlowFile content you can only specify a single path. >>>> >>>> I'll take a look at your template to see what's going on. >>>> >>>> Matt >>>> >>>> On Thu, Sep 10, 2015 at 11:00 AM, Christopher Wilson < >>>> [email protected]> wrote: >>>> >>>>> I've ran into an issue with ReplaceText on another thread but thought >>>>> I'd move this over to it's own. >>>>> >>>>> What I have is a syslog entry from OpenStack that contains CADF (Cloud >>>>> Audit Data Federation) JSON as the payload. In the context of OpenStack >>>>> these are login/security events that we'd like to see outside of a normal >>>>> syslog stream and passed directly over to the security team. I'd started >>>>> down the path of ExtractText and pulling out the associated JSON into an >>>>> attribute but found when I wired in a ReplaceText and tried to replace the >>>>> content with the attribute 3 copies of the JSON data were written to the >>>>> file content. >>>>> >>>>> What I've since learned is I can just replace the text in place >>>>> without yanking into an attribute. However, I can see cases where I might >>>>> want to replace/append text using one or more attributes. Wanted to see >>>>> if >>>>> other have handled this differently and if there is an enhancement request >>>>> in the offing? >>>>> >>>>> I put the template I was working from, with a line of the syslog data, >>>>> up on GitHub in case anyone wants to see this behavior in action. You >>>>> just >>>>> have to play with turning processors on/off when viewing the full bulletin >>>>> board. >>>>> >>>>> https://github.com/cj-wilson/NiFi-Templates >>>>> >>>>> Thanks in advance. >>>>> >>>>> -Chris >>>>> >>>> >>>> >>> >> >
