Joe and Brandon, Thanks for your input here. I agree that changing the behavior of an existing processor (that is used in people’s flows) is a breaking change and probably requires a major release, which is why I didn’t do that in the PRs. As written today, they are fully backward-compatible. My concern is that users have had issues because they attempt to deploy HashAttribute expecting it to perform hashing of individual attributes. The introduction of CryptographicHashAttribute provides this functionality but the discoverability/“do what the name says” issue feels to me like a new addition to the “death by 1000 paper cuts” list that any complex project like NiFi has to endure.
The CryptographicHashContent processor is fully backward-compatible on (expected) functionality, but because of the property descriptor naming, it’s not “in-place” replaceable. Instead, (legacy) HashContent is marked as deprecated, and moving forward, CHC should be used. Given your well-described use cases for HA, I think I may be able to provide that in CHA as well. I would expect to add a dropdown PD for “attribute enumeration style” and offer “individual” (each hash is generated on a single attribute), “list” (each hash is generated over an ordered, delimited list of literal matches), and “regex” (each hash is generated over an ordered list of all attribute names matching the provided regex). Then the dynamic properties would describe the output, as happens in the existing PR. Maybe a custom delimiter property is needed too, but for now ‘’ could be used to join the values. I’ll write up a Jira for this, and hopefully you can both let me know if this meets your requirements. Example: *Incoming Flowfile* attributes: [username: “alopresto”, role: “security”, email: “[email protected] <mailto:[email protected]>”, git_account: “alopresto”] *CHA Properties (Individual)* attribute_enumeration_style: “individual” (dynamic) username_sha256: “username” (dynamic) git_account_sha256: “git_account” *Behavior (Individual)* username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23 *Resulting Flowfile (Individual)* attributes: [username: “alopresto”, role: “security”, email: “[email protected] <mailto:[email protected]>”, git_account: “alopresto”, username_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"] *CHA Properties (List)* attribute_enumeration_style: “list” (dynamic) username_and_email_sha256: “username, email” (dynamic) git_account_sha256: “git_account” *Behavior (List)* username_and_email_sha256 = $(echo -n "[email protected]" | shasum -a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23 *Resulting Flowfile (List)* attributes: [username: “alopresto”, role: “security”, email: “[email protected] <mailto:[email protected]>”, git_account: “alopresto”, username_email_sha256: “ 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”] *CHA Properties (Regex)* attribute_enumeration_style: “regex” (dynamic) all_sha256: “.*” (dynamic) git_account_sha256: “git_account” *Behavior (Regex)* all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role, username] = $(echo -n "[email protected]" | shasum -a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23 *Resulting Flowfile (Regex)* attributes: [username: “alopresto”, role: “security”, email: “[email protected] <mailto:[email protected]>”, git_account: “alopresto”, all_sha256: “ b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”] Mike, I don’t think it makes sense to remove this functionality and relocate it to Elasticsearch. Adding a capacity to calculate a unique identifier over multiple inputs may be valuable for Elasticsearch specifically, but the functionality described here is independent from that use case, and as NiFi’s processor design philosophy is similar to *nix builtins (do one thing; do it well), it makes more sense to chain the individual necessary processors. Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Sep 5, 2018, at 9:33 AM, Brandon DeVries <[email protected]> wrote: > > Mike, > > We don't use it with Elasticsearch. > > Fundamentally, it feels like the problem is that this change would break > backwards compatibility, which would require a major version bump. So, in > lieu of that, the options are probably 1) use a different name or 2) put the > new functionality in HashContent as something that can be toggled on, but > leaving the current behavior as the default. > > Brandon > > On Wed, Sep 5, 2018 at 12:21 PM Mike Thomsen <[email protected] > <mailto:[email protected]>> wrote: > Brandon, > > What processor do you use it for in that capacity? If it's an ElasticSearch > one we can look into ways to bring this functionality into that bundle so > Andy can refactor. > > Thanks, > > Mike > > On Wed, Sep 5, 2018 at 12:07 PM Brandon DeVries <[email protected] > <mailto:[email protected]>> wrote: > Andy, > > We use it pretty much how Joe is... to create a unique composite key. It > seems as though that shouldn't be a difficult functionality to add. > Possibly, you could flip your current dynamic key/value properties. Make the > key the name of the attribute you want to create, and the value is the > attribute / attributes (newline delimited) that you want to include in the > hash. This does mean you can't use "${algorithm.name > <http://algorithm.name/>}" in the name of the created hash attribute, but I > don't know if you'd consider that a big loss. In any case, I'm sure there > are other solutions, this is just a thought. > > Brandon > > On Wed, Sep 5, 2018 at 10:27 AM Joe Percivall <[email protected] > <mailto:[email protected]>> wrote: > Hey Andy, > > We're currently using the HashAttribute processor. The use-case is that we > have various events that come in but sometimes those events are just updates > of previous ones. We store everything in ElasticSearch. So for certain > events, we'll calculate a hash based on a couple of attributes in order to > have a composite unique key to upsert as the ES _id. This allows us to easily > just insert/update events that are the same (as determined by the hashed > composite key). > > As for the configuration of the processors, we're essentially just specifying > exact attributes as dynamic properties of HashAttribute. Then passing that FF > to PutElasticSearchHttp with the resulting attribute from HashAttribute as > the "Identifier Attribute". > > Joe > > On Mon, Sep 3, 2018 at 9:52 PM Andy LoPresto <[email protected] > <mailto:[email protected]>> wrote: > I opened PRs for 2980 [1] and 2983 [2] which add more performant, consistent, > and full-featured processors to calculate cryptographic hashes of flowfile > content and flowfile attributes. I would like to deprecate and drop support > for HashAttribute, as it performs a convoluted calculation that was probably > useful in an old scenario, but doesn’t “hash attributes” like the name > implies. As it blocks the new implementation from using that name and > following our naming convention, I am hoping to find anyone still using the > old implementation and understand their use case. Thanks for your help. > > [1] https://github.com/apache/nifi/pull/2980 > <https://github.com/apache/nifi/pull/2980> > [2] https://github.com/apache/nifi/pull/2983 > <https://github.com/apache/nifi/pull/2983> > > > > Andy LoPresto > [email protected] <mailto:[email protected]> > [email protected] <mailto:[email protected]> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > > > -- > Joe Percivall > linkedin.com/in/Percivall <http://linkedin.com/in/Percivall> > e: [email protected] <mailto:[email protected]>
signature.asc
Description: Message signed with OpenPGP using GPGMail
