Re: How to avoid this splitting of single line as multi lines in SplitText?

Wes Lawrence Fri, 17 Feb 2017 08:27:17 -0800

This might be more of a question for the dev mailing list, but does it make
sense to have a 'SplitCSV' processor?


A situation we encounter a lot at Interset are CSV files whose records
extend across multiple lines, similar to Prabhu's data.

We currently have code written for Flume for isolating multiline CSV
records from a file, but I've been planning on migrating that to NiFi, if
it would be useful.

--Wes

On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected]> wrote:

> This isn’t working because of known issue NIFI-3255. Oleg has submitted a
> PR with a patch and Koji has been reviewing. There are some outstanding
> questions about provenance chain decisions with original vs. split, but the
> code fixes the exception which was raised and I was able to make a working
> flow once I applied the patch.
>
> All of this is updated on the StackOverflow question as well.
>
> Andy LoPresto
> [email protected]
> *[email protected] <[email protected]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected]>
> wrote:
>
> Andy,
>
> I have used following properties in ReplaceText processor.
>
> Search Value:"(.*?)(\n)(.*?)"
>
> Replacement Value:"$1\\n$3"
>
> Character Set:UTF-8
>
> MaximumBuffer Size:1MB
>
> Replacement Strategy:Regex Replace
>
> Evaluation Mode:Entire Text
>
>
> Result of this processor same as like input.It could n't perform any
> change.
>
> Thanks,
> prabhu
>
> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected]>
> wrote:
>
>> Prabhu,
>>
>> I answered this on Stack Overflow [1] but I think you could do it with
>> ReplaceText before the SplitText using a regex like
>>
>> "(.*?)(\n)(.*?)" replaced with "$1\\n$3"
>>
>> [1] http://stackoverflow.com/a/42242665/70465
>>
>> Andy LoPresto
>> [email protected]
>> *[email protected] <[email protected]>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected]> wrote:
>>
>> Prabhu,
>>
>> You need to remove the new lines from within the last field.  I'd
>> recommend using awk in an execute stream command processor first, then
>> splitting the text.  Alternatively, you could write a custom processor to
>> specifically handle the incoming data.
>>
>> Lee
>>
>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected]>
>> wrote:
>>
>> I have CSV file which contains following line.
>>
>> No,NAme,ID,Description
>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>  -- Jiuaslkm asdasdasd"
>>
>> used below processor structure GetFile-->SplitText
>>
>> In SplitText i have given header and line split count as 1.
>>
>> So i think it could be split row as below..,
>>
>>  No,NAme,ID,Description
>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>  -- Jiuaslkm asdasdasd:"
>>
>> But it actually split the csv as "2" splits like below.,
>>
>> *First SPlit:*
>>
>> No,NAme,ID,Description
>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>
>> *Second Split:*
>>
>> No,NAme,ID,Description
>>     -- Jiuaslkm asdasdasd"
>>
>> So i have faced data handling missed something.
>>
>> *GOal:Now i need to handle those data lines as single line.*
>>
>> Any one help me to resolve this?
>>
>>
>>
>
>

Re: How to avoid this splitting of single line as multi lines in SplitText?

Reply via email to