JIRA is created [1] . While I can't promise a time frame for when I could get around to it; it is on my radar, and I'd be happy to contribute it back to the NiFi project. =)
[1] https://issues.apache.org/jira/browse/NIFI-3503 On Fri, Feb 17, 2017 at 7:26 PM, Andy LoPresto <[email protected]> wrote: > Wes, > > Do you mind raising a Jira [1] and providing a PR with your fix once you > have it translated? I know people would appreciate it. Thanks. > > [1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa > > Andy LoPresto > [email protected] > *[email protected] <[email protected]>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Feb 17, 2017, at 9:15 AM, Joe Witt <[email protected]> wrote: > > Wes > > Absolutely yes. Would be a very welcome contribution and is such a common > case. > > Thanks > Joe > > On Feb 17, 2017 11:27 AM, "Wes Lawrence" <[email protected]> wrote: > >> This might be more of a question for the dev mailing list, but does it >> make sense to have a 'SplitCSV' processor? >> >> A situation we encounter a lot at Interset are CSV files whose records >> extend across multiple lines, similar to Prabhu's data. >> >> We currently have code written for Flume for isolating multiline CSV >> records from a file, but I've been planning on migrating that to NiFi, if >> it would be useful. >> >> --Wes >> >> On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected]> >> wrote: >> >>> This isn’t working because of known issue NIFI-3255. Oleg has submitted >>> a PR with a patch and Koji has been reviewing. There are some outstanding >>> questions about provenance chain decisions with original vs. split, but the >>> code fixes the exception which was raised and I was able to make a working >>> flow once I applied the patch. >>> >>> All of this is updated on the StackOverflow question as well. >>> >>> Andy LoPresto >>> [email protected] >>> *[email protected] <[email protected]>* >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >>> >>> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected]> >>> wrote: >>> >>> Andy, >>> >>> I have used following properties in ReplaceText processor. >>> >>> Search Value:"(.*?)(\n)(.*?)" >>> >>> Replacement Value:"$1\\n$3" >>> >>> Character Set:UTF-8 >>> >>> MaximumBuffer Size:1MB >>> >>> Replacement Strategy:Regex Replace >>> >>> Evaluation Mode:Entire Text >>> >>> >>> Result of this processor same as like input.It could n't perform any >>> change. >>> >>> Thanks, >>> prabhu >>> >>> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected]> >>> wrote: >>> >>>> Prabhu, >>>> >>>> I answered this on Stack Overflow [1] but I think you could do it with >>>> ReplaceText before the SplitText using a regex like >>>> >>>> "(.*?)(\n)(.*?)" replaced with "$1\\n$3" >>>> >>>> [1] http://stackoverflow.com/a/42242665/70465 >>>> >>>> Andy LoPresto >>>> [email protected] >>>> *[email protected] <[email protected]>* >>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >>>> >>>> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected]> wrote: >>>> >>>> Prabhu, >>>> >>>> You need to remove the new lines from within the last field. I'd >>>> recommend using awk in an execute stream command processor first, then >>>> splitting the text. Alternatively, you could write a custom processor to >>>> specifically handle the incoming data. >>>> >>>> Lee >>>> >>>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected]> >>>> wrote: >>>> >>>> I have CSV file which contains following line. >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> -- Jiuaslkm asdasdasd" >>>> >>>> used below processor structure GetFile-->SplitText >>>> >>>> In SplitText i have given header and line split count as 1. >>>> >>>> So i think it could be split row as below.., >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> -- Jiuaslkm asdasdasd:" >>>> >>>> But it actually split the csv as "2" splits like below., >>>> >>>> *First SPlit:* >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> >>>> *Second Split:* >>>> >>>> No,NAme,ID,Description >>>> -- Jiuaslkm asdasdasd" >>>> >>>> So i have faced data handling missed something. >>>> >>>> *GOal:Now i need to handle those data lines as single line.* >>>> >>>> Any one help me to resolve this? >>>> >>>> >>>> >>> >>> >> >
