Re: How to avoid this splitting of single line as multi lines in SplitText?

Joe Witt Fri, 17 Feb 2017 09:16:25 -0800

Wes

Absolutely yes.  Would be a very welcome contribution and is such a common
case.


Thanks
Joe

On Feb 17, 2017 11:27 AM, "Wes Lawrence" <[email protected]> wrote:

> This might be more of a question for the dev mailing list, but does it
> make sense to have a 'SplitCSV' processor?
>
> A situation we encounter a lot at Interset are CSV files whose records
> extend across multiple lines, similar to Prabhu's data.
>
> We currently have code written for Flume for isolating multiline CSV
> records from a file, but I've been planning on migrating that to NiFi, if
> it would be useful.
>
> --Wes
>
> On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected]>
> wrote:
>
>> This isn’t working because of known issue NIFI-3255. Oleg has submitted a
>> PR with a patch and Koji has been reviewing. There are some outstanding
>> questions about provenance chain decisions with original vs. split, but the
>> code fixes the exception which was raised and I was able to make a working
>> flow once I applied the patch.
>>
>> All of this is updated on the StackOverflow question as well.
>>
>> Andy LoPresto
>> [email protected]
>> *[email protected] <[email protected]>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected]>
>> wrote:
>>
>> Andy,
>>
>> I have used following properties in ReplaceText processor.
>>
>> Search Value:"(.*?)(\n)(.*?)"
>>
>> Replacement Value:"$1\\n$3"
>>
>> Character Set:UTF-8
>>
>> MaximumBuffer Size:1MB
>>
>> Replacement Strategy:Regex Replace
>>
>> Evaluation Mode:Entire Text
>>
>>
>> Result of this processor same as like input.It could n't perform any
>> change.
>>
>> Thanks,
>> prabhu
>>
>> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected]>
>> wrote:
>>
>>> Prabhu,
>>>
>>> I answered this on Stack Overflow [1] but I think you could do it with
>>> ReplaceText before the SplitText using a regex like
>>>
>>> "(.*?)(\n)(.*?)" replaced with "$1\\n$3"
>>>
>>> [1] http://stackoverflow.com/a/42242665/70465
>>>
>>> Andy LoPresto
>>> [email protected]
>>> *[email protected] <[email protected]>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>
>>> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected]> wrote:
>>>
>>> Prabhu,
>>>
>>> You need to remove the new lines from within the last field.  I'd
>>> recommend using awk in an execute stream command processor first, then
>>> splitting the text.  Alternatively, you could write a custom processor to
>>> specifically handle the incoming data.
>>>
>>> Lee
>>>
>>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected]>
>>> wrote:
>>>
>>> I have CSV file which contains following line.
>>>
>>> No,NAme,ID,Description
>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>  -- Jiuaslkm asdasdasd"
>>>
>>> used below processor structure GetFile-->SplitText
>>>
>>> In SplitText i have given header and line split count as 1.
>>>
>>> So i think it could be split row as below..,
>>>
>>>  No,NAme,ID,Description
>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>  -- Jiuaslkm asdasdasd:"
>>>
>>> But it actually split the csv as "2" splits like below.,
>>>
>>> *First SPlit:*
>>>
>>> No,NAme,ID,Description
>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>
>>> *Second Split:*
>>>
>>> No,NAme,ID,Description
>>>     -- Jiuaslkm asdasdasd"
>>>
>>> So i have faced data handling missed something.
>>>
>>> *GOal:Now i need to handle those data lines as single line.*
>>>
>>> Any one help me to resolve this?
>>>
>>>
>>>
>>
>>
>

Re: How to avoid this splitting of single line as multi lines in SplitText?

Reply via email to