Re: MergeContent resulting in corrupted JSON

2020-07-03 Thread Jason Iannone
gt; Jason >>>> >>>> On Tue, Jun 30, 2020 at 10:48 PM Darren Govoni >>>> wrote: >>>> >>>>> Run the nifi jvm in a runtime profiler/analyzer like appdynamics and >>>>> see if it detects any memory leaks or dangling uncl

Re: MergeContent resulting in corrupted JSON

2020-07-03 Thread Joe Witt
i jvm in a runtime profiler/analyzer like appdynamics and >>>> see if it detects any memory leaks or dangling unclosed file buffers/io. >>>> Throwing darts but the problem could be as deep as the Linux kernel or >>>> confined inside the jvm for your specific scenario. &

Re: MergeContent resulting in corrupted JSON

2020-07-03 Thread Jason Iannone
gt; Throwing darts but the problem could be as deep as the Linux kernel or >>> confined inside the jvm for your specific scenario. >>> >>> Sent from my Verizon, Samsung Galaxy smartphone >>> Get Outlook for Android <https://aka.ms/ghei36> >>> >

Re: MergeContent resulting in corrupted JSON

2020-07-03 Thread Joe Witt
*Sent:* Tuesday, June 30, 2020 10:36:02 PM >> *To:* users@nifi.apache.org >> *Subject:* Re: MergeContent resulting in corrupted JSON >> >> Previous spotting of the issue was a red herring. We removed our custom >> code and are still facing random "org.codehaus.jackso

Re: MergeContent resulting in corrupted JSON

2020-07-03 Thread Jason Iannone
ifi.apache.org > *Subject:* Re: MergeContent resulting in corrupted JSON > > Previous spotting of the issue was a red herring. We removed our custom > code and are still facing random "org.codehaus.jackson.JsonParseException: > Illegal Character" during PutDatabaseRecord due to

Re: MergeContent resulting in corrupted JSON

2020-06-30 Thread Darren Govoni
Galaxy smartphone Get Outlook for Android<https://aka.ms/ghei36> From: Jason Iannone Sent: Tuesday, June 30, 2020 10:36:02 PM To: users@nifi.apache.org Subject: Re: MergeContent resulting in corrupted JSON Previous spotting of the issue was a red herri

Re: MergeContent resulting in corrupted JSON

2020-06-30 Thread Jason Iannone
Previous spotting of the issue was a red herring. We removed our custom code and are still facing random "org.codehaus.jackson.JsonParseException: Illegal Character" during PutDatabaseRecord due to a flowfile containing malformed JSON post MergeContent. Error never occurs immediately and is

Re: MergeContent resulting in corrupted JSON

2020-06-24 Thread Darren Govoni
Subject: Re: MergeContent resulting in corrupted JSON Exactly my thought, and we've been combing through the code but nothing significant has jumped out. Something that does are Nifi JIRA's, NIFI-6923, NIFI-6924, and NIFI-6846. Considering we're on 1.10.0 I've requested upgrading to 1.11.4. Thanks

Re: MergeContent resulting in corrupted JSON

2020-06-24 Thread Jason Iannone
Exactly my thought, and we've been combing through the code but nothing significant has jumped out. Something that does are Nifi JIRA's, NIFI-6923, NIFI-6924, and NIFI-6846. Considering we're on 1.10.0 I've requested upgrading to 1.11.4. Thanks, Jason On Tue, Jun 23, 2020 at 9:05 AM Mark Payne

Re: MergeContent resulting in corrupted JSON

2020-06-23 Thread Mark Payne
It should be okay to create a buffer like that. Assuming the FlowFile is small. Typically we try to avoid buffering the content of a FlowFile into memory. But if it’s a reasonable small FlowFile, that’s probably fine. To be honest, if the issue is intermittent and doesn’t always happen on the

Re: MergeContent resulting in corrupted JSON

2020-06-22 Thread Jason Iannone
I'm now thinking its due to how we handled reading the flowfile content into a buffer. Previous: session.read(flowFile, in -> { atomicVessel.set(ByteStreams.toByteArray(in)); }); Current: final byte[] buffer = new byte[(int) flowFile.getSize()]; session.read(flowFile, in ->

Re: MergeContent resulting in corrupted JSON

2020-06-22 Thread Mark Payne
Jason, Glad to hear it. This is where the data provenance becomes absolutely invaluable. So now you should be able to trace the lineage of that FlowFile back to the start using data provenance. You can see exactly what it looked like when it was received. If it looks wrong there, the

Re: MergeContent resulting in corrupted JSON

2020-06-22 Thread Jason Iannone
I spoke too soon, and must be the magic of sending an email! We found what appears to be corrupted content and captured the binary, hoping to play it through the code and see what's going on. Thanks, Jason On Mon, Jun 22, 2020 at 4:35 PM Jason Iannone wrote: > Hey Mark, > > We hit the issue

Re: MergeContent resulting in corrupted JSON

2020-06-22 Thread Jason Iannone
Hey Mark, We hit the issue again, and when digging into the lineage we can see the content is fine coming into MergeContent but is corrupt on output of Join. Any other suggestions? Thanks, Jason On Wed, Jun 10, 2020 at 2:26 PM Mark Payne wrote: > Jason, > > Control characters should not cause

Re: MergeContent resulting in corrupted JSON

2020-06-11 Thread Andy LoPresto
Sorry, TWR = try-with-resources. Definitely a lot of old code that “still works” but is brittle. We should do better about highlighting modern implementations and paying down tech debt, but the project just moves so quickly. Not a perfect rule, but if I see code from one of the core

Re: MergeContent resulting in corrupted JSON

2020-06-11 Thread Jason Iannone
We currently have it encapsulated in code that allows proper isolation and testing, as this is the same methodology applied for standard development. What I wasn't sure is whether Nifi is opinionated and actually preferred and/or performed better with callbacks. There's a lot of older Nifi

Re: MergeContent resulting in corrupted JSON

2020-06-11 Thread Andy LoPresto
To give another perspective on the “callback vs. non”, I’d say “heavy” or “messy” operations (like encryption, for example) should be contained in encapsulated code (other classes which provide a service) and then invoked from the callback or TWR. This allows for much more testable business

Re: MergeContent resulting in corrupted JSON

2020-06-11 Thread Mark Payne
Jason, Modify vs. clone vs. create new: You would clone a FlowFile if you want an exact copy of the FlowFile (with the exception that the clone will have a unique UUID, Entry Date, etc.). Very rare that a Processor will actually do this. Modify vs. create a “Child” FlowFiles (i.e.,

Re: MergeContent resulting in corrupted JSON

2020-06-11 Thread Jason Iannone
I confirmed what you mentioned as well. I also looked over many custom processor examples and looking for clarification on a few things which I didn't see explicitly called out in the developers guide. - Are their guidelines on when one should modify the original flowfile vs when you

Re: MergeContent resulting in corrupted JSON

2020-06-10 Thread Mark Payne
I don’t think flushing should matter, if you’re writing directly to the provided OutputStream. If you wrap it in a BufferedOutputStream or something like that, then of course you’ll want to flush that. Assuming that you are extending AbstractProcessor, it will call session.commit() for you

Re: MergeContent resulting in corrupted JSON

2020-06-10 Thread Jason Iannone
Excellent advice, thank you! When writing via ProcessSession.write(FlowFile, OutputStream) is it advised to flush and/or session.commit()? I noticed we aren't doing either, but we are invoking session.transfer. Thanks, Jason On Wed, Jun 10, 2020 at 2:26 PM Mark Payne wrote: > Jason, > >

Re: MergeContent resulting in corrupted JSON

2020-06-10 Thread Mark Payne
Jason, Control characters should not cause any problem with MergeContent. MergeContent just copies bytes from one stream to another. It’s also worth noting that attributes don’t really come into play here. MergeContent is combining the FlowFile content, so even if it has some weird attributes,

Re: MergeContent resulting in corrupted JSON

2020-06-10 Thread Jason Iannone
Hey Mark, I was thinking over this more and despite no complaints from Jackson Objectmapper is it possible that hidden and/or control characters are present in the JSON values which would then cause MergeContent to behave this way? I looked over the code and nothing jumped out, but there is

Re: MergeContent resulting in corrupted JSON

2020-06-09 Thread Jason Iannone
Andy, it was a retyping issue and it should be correct JSON. Mark, the flow is ConsumeKafka_2.0 -> Custom Processor -> MergeContent -> PutDatabaseRecord. ConsumeKafka is consuming encrypted payloads, and the custom processor is decrypting the payload, hex-encoding the decrypted bytes and putting

Re: MergeContent resulting in corrupted JSON

2020-06-09 Thread Andy LoPresto
It may just be a copy/paste or retyping issue, but in the example you provided, I see unpaired double quotes (the hexBytes values have trailing quotes but not leading ones), which could be causing issues in parsing… Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com He/Him PGP

Re: MergeContent resulting in corrupted JSON

2020-06-09 Thread Mark Payne
Hey Jason, Thanks for reaching out. That is definitely odd and not something that I’ve seen or heard about before. Are you certain that the data is not being corrupted upstream of the processor? I ask because the code for the processor that handles writing out the content is pretty straight

MergeContent resulting in corrupted JSON

2020-06-09 Thread Jason Iannone
Hi all, Within Nifi 1.10.0 we're seeing unexpected behavior with mergecontent. The processor is being fed in many flowfiles with individual JSON records. The records have various field types including a hex-encoded byte[]. We are not trying to merge JSON records themselves but rather consolidate