This might be more of a question for the dev mailing list, but does it make sense to have a 'SplitCSV' processor?
A situation we encounter a lot at Interset are CSV files whose records extend across multiple lines, similar to Prabhu's data. We currently have code written for Flume for isolating multiline CSV records from a file, but I've been planning on migrating that to NiFi, if it would be useful. --Wes On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected]> wrote: > This isn’t working because of known issue NIFI-3255. Oleg has submitted a > PR with a patch and Koji has been reviewing. There are some outstanding > questions about provenance chain decisions with original vs. split, but the > code fixes the exception which was raised and I was able to make a working > flow once I applied the patch. > > All of this is updated on the StackOverflow question as well. > > Andy LoPresto > [email protected] > *[email protected] <[email protected]>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected]> > wrote: > > Andy, > > I have used following properties in ReplaceText processor. > > Search Value:"(.*?)(\n)(.*?)" > > Replacement Value:"$1\\n$3" > > Character Set:UTF-8 > > MaximumBuffer Size:1MB > > Replacement Strategy:Regex Replace > > Evaluation Mode:Entire Text > > > Result of this processor same as like input.It could n't perform any > change. > > Thanks, > prabhu > > On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected]> > wrote: > >> Prabhu, >> >> I answered this on Stack Overflow [1] but I think you could do it with >> ReplaceText before the SplitText using a regex like >> >> "(.*?)(\n)(.*?)" replaced with "$1\\n$3" >> >> [1] http://stackoverflow.com/a/42242665/70465 >> >> Andy LoPresto >> [email protected] >> *[email protected] <[email protected]>* >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected]> wrote: >> >> Prabhu, >> >> You need to remove the new lines from within the last field. I'd >> recommend using awk in an execute stream command processor first, then >> splitting the text. Alternatively, you could write a custom processor to >> specifically handle the incoming data. >> >> Lee >> >> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected]> >> wrote: >> >> I have CSV file which contains following line. >> >> No,NAme,ID,Description >> 1,Stack,232,"ABCDEFGHIJKLMNO >> -- Jiuaslkm asdasdasd" >> >> used below processor structure GetFile-->SplitText >> >> In SplitText i have given header and line split count as 1. >> >> So i think it could be split row as below.., >> >> No,NAme,ID,Description >> 1,Stack,232,"ABCDEFGHIJKLMNO >> -- Jiuaslkm asdasdasd:" >> >> But it actually split the csv as "2" splits like below., >> >> *First SPlit:* >> >> No,NAme,ID,Description >> 1,Stack,232,"ABCDEFGHIJKLMNO >> >> *Second Split:* >> >> No,NAme,ID,Description >> -- Jiuaslkm asdasdasd" >> >> So i have faced data handling missed something. >> >> *GOal:Now i need to handle those data lines as single line.* >> >> Any one help me to resolve this? >> >> >> > >
