Wes Absolutely yes. Would be a very welcome contribution and is such a common case.
Thanks Joe On Feb 17, 2017 11:27 AM, "Wes Lawrence" <[email protected]> wrote: > This might be more of a question for the dev mailing list, but does it > make sense to have a 'SplitCSV' processor? > > A situation we encounter a lot at Interset are CSV files whose records > extend across multiple lines, similar to Prabhu's data. > > We currently have code written for Flume for isolating multiline CSV > records from a file, but I've been planning on migrating that to NiFi, if > it would be useful. > > --Wes > > On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected]> > wrote: > >> This isn’t working because of known issue NIFI-3255. Oleg has submitted a >> PR with a patch and Koji has been reviewing. There are some outstanding >> questions about provenance chain decisions with original vs. split, but the >> code fixes the exception which was raised and I was able to make a working >> flow once I applied the patch. >> >> All of this is updated on the StackOverflow question as well. >> >> Andy LoPresto >> [email protected] >> *[email protected] <[email protected]>* >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected]> >> wrote: >> >> Andy, >> >> I have used following properties in ReplaceText processor. >> >> Search Value:"(.*?)(\n)(.*?)" >> >> Replacement Value:"$1\\n$3" >> >> Character Set:UTF-8 >> >> MaximumBuffer Size:1MB >> >> Replacement Strategy:Regex Replace >> >> Evaluation Mode:Entire Text >> >> >> Result of this processor same as like input.It could n't perform any >> change. >> >> Thanks, >> prabhu >> >> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected]> >> wrote: >> >>> Prabhu, >>> >>> I answered this on Stack Overflow [1] but I think you could do it with >>> ReplaceText before the SplitText using a regex like >>> >>> "(.*?)(\n)(.*?)" replaced with "$1\\n$3" >>> >>> [1] http://stackoverflow.com/a/42242665/70465 >>> >>> Andy LoPresto >>> [email protected] >>> *[email protected] <[email protected]>* >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >>> >>> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected]> wrote: >>> >>> Prabhu, >>> >>> You need to remove the new lines from within the last field. I'd >>> recommend using awk in an execute stream command processor first, then >>> splitting the text. Alternatively, you could write a custom processor to >>> specifically handle the incoming data. >>> >>> Lee >>> >>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected]> >>> wrote: >>> >>> I have CSV file which contains following line. >>> >>> No,NAme,ID,Description >>> 1,Stack,232,"ABCDEFGHIJKLMNO >>> -- Jiuaslkm asdasdasd" >>> >>> used below processor structure GetFile-->SplitText >>> >>> In SplitText i have given header and line split count as 1. >>> >>> So i think it could be split row as below.., >>> >>> No,NAme,ID,Description >>> 1,Stack,232,"ABCDEFGHIJKLMNO >>> -- Jiuaslkm asdasdasd:" >>> >>> But it actually split the csv as "2" splits like below., >>> >>> *First SPlit:* >>> >>> No,NAme,ID,Description >>> 1,Stack,232,"ABCDEFGHIJKLMNO >>> >>> *Second Split:* >>> >>> No,NAme,ID,Description >>> -- Jiuaslkm asdasdasd" >>> >>> So i have faced data handling missed something. >>> >>> *GOal:Now i need to handle those data lines as single line.* >>> >>> Any one help me to resolve this? >>> >>> >>> >> >> >
