Wes, Do you mind raising a Jira [1] and providing a PR with your fix once you have it translated? I know people would appreciate it. Thanks.
[1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa <https://issues.apache.org/jira/secure/CreateIssue!default.jspa> Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Feb 17, 2017, at 9:15 AM, Joe Witt <joe.w...@gmail.com> wrote: > > Wes > > Absolutely yes. Would be a very welcome contribution and is such a common > case. > > Thanks > Joe > > On Feb 17, 2017 11:27 AM, "Wes Lawrence" <wesleyll...@gmail.com > <mailto:wesleyll...@gmail.com>> wrote: > This might be more of a question for the dev mailing list, but does it make > sense to have a 'SplitCSV' processor? > > A situation we encounter a lot at Interset are CSV files whose records extend > across multiple lines, similar to Prabhu's data. > > We currently have code written for Flume for isolating multiline CSV records > from a file, but I've been planning on migrating that to NiFi, if it would be > useful. > > --Wes > > On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <alopre...@apache.org > <mailto:alopre...@apache.org>> wrote: > This isn’t working because of known issue NIFI-3255. Oleg has submitted a PR > with a patch and Koji has been reviewing. There are some outstanding > questions about provenance chain decisions with original vs. split, but the > code fixes the exception which was raised and I was able to make a working > flow once I applied the patch. > > All of this is updated on the StackOverflow question as well. > > Andy LoPresto > alopre...@apache.org <mailto:alopre...@apache.org> > alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <prabhuu161...@gmail.com >> <mailto:prabhuu161...@gmail.com>> wrote: >> >> Andy, >> >> I have used following properties in ReplaceText processor. >> Search Value:"(.*?)(\n)(.*?)" >> >> Replacement Value:"$1\\n$3" >> >> Character Set:UTF-8 >> >> MaximumBuffer Size:1MB >> >> Replacement Strategy:Regex Replace >> >> Evaluation Mode:Entire Text >> >> Result of this processor same as like input.It could n't perform any change. >> >> Thanks, >> prabhu >> >> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <alopre...@apache.org >> <mailto:alopre...@apache.org>> wrote: >> Prabhu, >> >> I answered this on Stack Overflow [1] but I think you could do it with >> ReplaceText before the SplitText using a regex like >> >> "(.*?)(\n)(.*?)" replaced with "$1\\n$3" >> >> [1] http://stackoverflow.com/a/42242665/70465 >> <http://stackoverflow.com/a/42242665/70465> >> >> Andy LoPresto >> alopre...@apache.org <mailto:alopre...@apache.org> >> alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >>> On Feb 14, 2017, at 10:52 PM, Lee Laim <lee.l...@gmail.com >>> <mailto:lee.l...@gmail.com>> wrote: >>> >>> Prabhu, >>> >>> You need to remove the new lines from within the last field. I'd recommend >>> using awk in an execute stream command processor first, then splitting the >>> text. Alternatively, you could write a custom processor to specifically >>> handle the incoming data. >>> >>> Lee >>> >>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <prabhuu161...@gmail.com >>> <mailto:prabhuu161...@gmail.com>> wrote: >>> >>>> I have CSV file which contains following line. >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> -- Jiuaslkm asdasdasd" >>>> used below processor structure GetFile-->SplitText >>>> >>>> In SplitText i have given header and line split count as 1. >>>> >>>> So i think it could be split row as below.., >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> -- Jiuaslkm asdasdasd:" >>>> But it actually split the csv as "2" splits like below., >>>> >>>> First SPlit: >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> Second Split: >>>> >>>> No,NAme,ID,Description >>>> -- Jiuaslkm asdasdasd" >>>> So i have faced data handling missed something. >>>> >>>> GOal:Now i need to handle those data lines as single line. >>>> >>>> Any one help me to resolve this? >>>> >> >> > >
signature.asc
Description: Message signed with OpenPGP using GPGMail