Wes, Do you mind raising a Jira [1] and providing a PR with your fix once you have it translated? I know people would appreciate it. Thanks.
[1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa <https://issues.apache.org/jira/secure/CreateIssue!default.jspa> Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Feb 17, 2017, at 9:15 AM, Joe Witt <[email protected]> wrote: > > Wes > > Absolutely yes. Would be a very welcome contribution and is such a common > case. > > Thanks > Joe > > On Feb 17, 2017 11:27 AM, "Wes Lawrence" <[email protected] > <mailto:[email protected]>> wrote: > This might be more of a question for the dev mailing list, but does it make > sense to have a 'SplitCSV' processor? > > A situation we encounter a lot at Interset are CSV files whose records extend > across multiple lines, similar to Prabhu's data. > > We currently have code written for Flume for isolating multiline CSV records > from a file, but I've been planning on migrating that to NiFi, if it would be > useful. > > --Wes > > On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected] > <mailto:[email protected]>> wrote: > This isn’t working because of known issue NIFI-3255. Oleg has submitted a PR > with a patch and Koji has been reviewing. There are some outstanding > questions about provenance chain decisions with original vs. split, but the > code fixes the exception which was raised and I was able to make a working > flow once I applied the patch. > > All of this is updated on the StackOverflow question as well. > > Andy LoPresto > [email protected] <mailto:[email protected]> > [email protected] <mailto:[email protected]> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected] >> <mailto:[email protected]>> wrote: >> >> Andy, >> >> I have used following properties in ReplaceText processor. >> Search Value:"(.*?)(\n)(.*?)" >> >> Replacement Value:"$1\\n$3" >> >> Character Set:UTF-8 >> >> MaximumBuffer Size:1MB >> >> Replacement Strategy:Regex Replace >> >> Evaluation Mode:Entire Text >> >> Result of this processor same as like input.It could n't perform any change. >> >> Thanks, >> prabhu >> >> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected] >> <mailto:[email protected]>> wrote: >> Prabhu, >> >> I answered this on Stack Overflow [1] but I think you could do it with >> ReplaceText before the SplitText using a regex like >> >> "(.*?)(\n)(.*?)" replaced with "$1\\n$3" >> >> [1] http://stackoverflow.com/a/42242665/70465 >> <http://stackoverflow.com/a/42242665/70465> >> >> Andy LoPresto >> [email protected] <mailto:[email protected]> >> [email protected] <mailto:[email protected]> >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >>> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Prabhu, >>> >>> You need to remove the new lines from within the last field. I'd recommend >>> using awk in an execute stream command processor first, then splitting the >>> text. Alternatively, you could write a custom processor to specifically >>> handle the incoming data. >>> >>> Lee >>> >>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> I have CSV file which contains following line. >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> -- Jiuaslkm asdasdasd" >>>> used below processor structure GetFile-->SplitText >>>> >>>> In SplitText i have given header and line split count as 1. >>>> >>>> So i think it could be split row as below.., >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> -- Jiuaslkm asdasdasd:" >>>> But it actually split the csv as "2" splits like below., >>>> >>>> First SPlit: >>>> >>>> No,NAme,ID,Description >>>> 1,Stack,232,"ABCDEFGHIJKLMNO >>>> Second Split: >>>> >>>> No,NAme,ID,Description >>>> -- Jiuaslkm asdasdasd" >>>> So i have faced data handling missed something. >>>> >>>> GOal:Now i need to handle those data lines as single line. >>>> >>>> Any one help me to resolve this? >>>> >> >> > >
signature.asc
Description: Message signed with OpenPGP using GPGMail
