Re: How to avoid this splitting of single line as multi lines in SplitText?

Wes Lawrence Fri, 17 Feb 2017 19:45:52 -0800

JIRA is created [1] . While I can't promise a time frame for when I could
get around to it; it is on my radar, and I'd be happy to contribute it back
to the NiFi project. =)



[1] https://issues.apache.org/jira/browse/NIFI-3503

On Fri, Feb 17, 2017 at 7:26 PM, Andy LoPresto <[email protected]> wrote:

> Wes,
>
> Do you mind raising a Jira [1] and providing a PR with your fix once you
> have it translated? I know people would appreciate it. Thanks.
>
> [1] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> Andy LoPresto
> [email protected]
> *[email protected] <[email protected]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 17, 2017, at 9:15 AM, Joe Witt <[email protected]> wrote:
>
> Wes
>
> Absolutely yes.  Would be a very welcome contribution and is such a common
> case.
>
> Thanks
> Joe
>
> On Feb 17, 2017 11:27 AM, "Wes Lawrence" <[email protected]> wrote:
>
>> This might be more of a question for the dev mailing list, but does it
>> make sense to have a 'SplitCSV' processor?
>>
>> A situation we encounter a lot at Interset are CSV files whose records
>> extend across multiple lines, similar to Prabhu's data.
>>
>> We currently have code written for Flume for isolating multiline CSV
>> records from a file, but I've been planning on migrating that to NiFi, if
>> it would be useful.
>>
>> --Wes
>>
>> On Fri, Feb 17, 2017 at 1:19 AM, Andy LoPresto <[email protected]>
>> wrote:
>>
>>> This isn’t working because of known issue NIFI-3255. Oleg has submitted
>>> a PR with a patch and Koji has been reviewing. There are some outstanding
>>> questions about provenance chain decisions with original vs. split, but the
>>> code fixes the exception which was raised and I was able to make a working
>>> flow once I applied the patch.
>>>
>>> All of this is updated on the StackOverflow question as well.
>>>
>>> Andy LoPresto
>>> [email protected]
>>> *[email protected] <[email protected]>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>
>>> On Feb 15, 2017, at 2:19 AM, prabhu Mahendran <[email protected]>
>>> wrote:
>>>
>>> Andy,
>>>
>>> I have used following properties in ReplaceText processor.
>>>
>>> Search Value:"(.*?)(\n)(.*?)"
>>>
>>> Replacement Value:"$1\\n$3"
>>>
>>> Character Set:UTF-8
>>>
>>> MaximumBuffer Size:1MB
>>>
>>> Replacement Strategy:Regex Replace
>>>
>>> Evaluation Mode:Entire Text
>>>
>>>
>>> Result of this processor same as like input.It could n't perform any
>>> change.
>>>
>>> Thanks,
>>> prabhu
>>>
>>> On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto <[email protected]>
>>> wrote:
>>>
>>>> Prabhu,
>>>>
>>>> I answered this on Stack Overflow [1] but I think you could do it with
>>>> ReplaceText before the SplitText using a regex like
>>>>
>>>> "(.*?)(\n)(.*?)" replaced with "$1\\n$3"
>>>>
>>>> [1] http://stackoverflow.com/a/42242665/70465
>>>>
>>>> Andy LoPresto
>>>> [email protected]
>>>> *[email protected] <[email protected]>*
>>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>>
>>>> On Feb 14, 2017, at 10:52 PM, Lee Laim <[email protected]> wrote:
>>>>
>>>> Prabhu,
>>>>
>>>> You need to remove the new lines from within the last field.  I'd
>>>> recommend using awk in an execute stream command processor first, then
>>>> splitting the text.  Alternatively, you could write a custom processor to
>>>> specifically handle the incoming data.
>>>>
>>>> Lee
>>>>
>>>> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran <[email protected]>
>>>> wrote:
>>>>
>>>> I have CSV file which contains following line.
>>>>
>>>> No,NAme,ID,Description
>>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>>  -- Jiuaslkm asdasdasd"
>>>>
>>>> used below processor structure GetFile-->SplitText
>>>>
>>>> In SplitText i have given header and line split count as 1.
>>>>
>>>> So i think it could be split row as below..,
>>>>
>>>>  No,NAme,ID,Description
>>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>>  -- Jiuaslkm asdasdasd:"
>>>>
>>>> But it actually split the csv as "2" splits like below.,
>>>>
>>>> *First SPlit:*
>>>>
>>>> No,NAme,ID,Description
>>>> 1,Stack,232,"ABCDEFGHIJKLMNO
>>>>
>>>> *Second Split:*
>>>>
>>>> No,NAme,ID,Description
>>>>     -- Jiuaslkm asdasdasd"
>>>>
>>>> So i have faced data handling missed something.
>>>>
>>>> *GOal:Now i need to handle those data lines as single line.*
>>>>
>>>> Any one help me to resolve this?
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: How to avoid this splitting of single line as multi lines in SplitText?

Reply via email to