Re: Routing File based on CSV header schema

Austin Heyne Thu, 16 Mar 2017 13:47:19 -0700

Sure, I've stripped things down to eliminate some variables and I thinkI have the problem cornered. I created CSV with a header of "csvA" andanother with "csvB". I'm matching with regex "^csvA$" and "^csvB$"respectively. You may see what the problem is right way. Nifi doesn'tseem to like '$' as an end of line marker. I've tried both "csvB" and"^csvB" which work fine and "csvB$" fails, however, this isn't strictenough for our purposes. We may have cases were one file is "col1, col2"and another is "col1,col2,col3". This could cause duplicates ingestslater eating resources. Is there a way to mark the end of line in theregex or am I going to have to do a nested regex filter or something else?


Thanks for the help,


Austin

On 03/16/2017 02:27 PM, James Wing wrote:

Austin,

I think you are on the right track with RouteOnContent. Any chanceyou can share a sample CSV header, the settings of your RouteOnContentprocessor, including the regex?


Thanks,

James

On Thu, Mar 16, 2017 at 11:14 AM, Austin Heyne <[email protected]<mailto:[email protected]>> wrote:


    Hi,

    I have a set of CSV files with headers that utilize various
    schemas. I'd like to route the CSV files to processors based on
    the schema set in the header. I've tried using the RouteOnContent
    processor to sort the files based on "content must contain match"
    and a regex statement that matches the first line (header).
    However, this is throwing an 'unmatched' on every file I send through.

    I've also looked at the ValidateCsv processor but it doesn't
    appear that works with the header but rather just validates data
    types. Unfortunately this won't work as columns with the same data
    type could be in a different order.

    Is there a ready made solution for this problem that I missed or
    perhaps a more clever way to approach it?

    Thanks,

    Austin Heyne

Re: Routing File based on CSV header schema

Reply via email to