Re: How to avoid this splitting of single line as multi lines in SplitText?

Andy LoPresto Fri, 17 Feb 2017 16:26:22 -0800

Wes,

Do you mind raising a Jira [1] and providing a PR with your fix once you have 
it translated? I know people would appreciate it. Thanks.


[1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa 
<https://issues.apache.org/jira/secure/CreateIssue!default.jspa>

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 17, 2017, at 9:15 AM, Joe Witt <joe.w...@gmail.com> wrote:
> 
> Wes
> 
> Absolutely yes.  Would be a very welcome contribution and is such a common 
> case.
> 
> Thanks
> Joe
> 
> On Feb 17, 2017 11:27 AM, "Wes Lawrence" <wesleyll...@gmail.com 
> <mailto:wesleyll...@gmail.com>> wrote:
> This might be more of a question for the dev mailing list, but does it make 
> sense to have a 'SplitCSV' processor?
> 
> A situation we encounter a lot at Interset are CSV files whose records extend 
> across multiple lines, similar to Prabhu's data.
> 
> We currently have code written for Flume for isolating multiline CSV records 
> from a file, but I've been planning on migrating that to NiFi, if it would be 
> useful.
> 
> --Wes
> 
> On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <alopre...@apache.org 
> <mailto:alopre...@apache.org>> wrote:
> This isn’t working because of known issue NIFI-3255. Oleg has submitted a PR 
> with a patch and Koji has been reviewing. There are some outstanding 
> questions about provenance chain decisions with original vs. split, but the 
> code fixes the exception which was raised and I was able to make a working 
> flow once I applied the patch.
> 
> All of this is updated on the StackOverflow question as well.
> 
> Andy LoPresto
> alopre...@apache.org <mailto:alopre...@apache.org>
> alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <prabhuu161...@gmail.com 
>> <mailto:prabhuu161...@gmail.com>> wrote:
>> 
>> Andy,
>> 
>> I have used following properties in ReplaceText processor.
>> Search Value:"(.*?)(\n)(.*?)"
>> 
>> Replacement Value:"$1\\n$3"
>> 
>> Character Set:UTF-8
>> 
>> MaximumBuffer Size:1MB
>> 
>> Replacement Strategy:Regex Replace
>> 
>> Evaluation Mode:Entire Text
>> 
>> Result of this processor same as like input.It could n't perform any change.
>> 
>> Thanks,
>> prabhu
>> 
>> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <alopre...@apache.org 
>> <mailto:alopre...@apache.org>> wrote:
>> Prabhu,
>> 
>> I answered this on Stack Overflow [1] but I think you could do it with 
>> ReplaceText before the SplitText using a regex like
>> 
>> "(.*?)(\n)(.*?)" replaced with "$1\\n$3"
>> 
>> [1] http://stackoverflow.com/a/42242665/70465 
>> <http://stackoverflow.com/a/42242665/70465>
>> 
>> Andy LoPresto
>> alopre...@apache.org <mailto:alopre...@apache.org>
>> alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>>> On Feb 14, 2017, at 10:52 PM, Lee Laim <lee.l...@gmail.com 
>>> <mailto:lee.l...@gmail.com>> wrote:
>>> 
>>> Prabhu,
>>> 
>>> You need to remove the new lines from within the last field.  I'd recommend 
>>> using awk in an execute stream command processor first, then splitting the 
>>> text.  Alternatively, you could write a custom processor to specifically 
>>> handle the incoming data.
>>> 
>>> Lee
>>> 
>>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <prabhuu161...@gmail.com 
>>> <mailto:prabhuu161...@gmail.com>> wrote:
>>> 
>>>> I have CSV file which contains following line.
>>>> 
>>>> No,NAme,ID,Description
>>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>>  -- Jiuaslkm asdasdasd"
>>>> used below processor structure GetFile-->SplitText
>>>> 
>>>> In SplitText i have given header and line split count as 1.
>>>> 
>>>> So i think it could be split row as below..,
>>>> 
>>>>  No,NAme,ID,Description
>>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>>  -- Jiuaslkm asdasdasd:"
>>>> But it actually split the csv as "2" splits like below.,
>>>> 
>>>> First SPlit:
>>>> 
>>>> No,NAme,ID,Description
>>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>> Second Split:
>>>> 
>>>> No,NAme,ID,Description
>>>>     -- Jiuaslkm asdasdasd"
>>>> So i have faced data handling missed something.
>>>> 
>>>> GOal:Now i need to handle those data lines as single line.
>>>> 
>>>> Any one help me to resolve this?
>>>> 
>> 
>> 
> 
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: How to avoid this splitting of single line as multi lines in SplitText?

Reply via email to