Thank you Joe.

Sqoop to HDFS data load is outside the NiFi flow. Once the data is pushed
to HDFS then I have to process each record and perform validations.

By Validation i meant that we will be picking a particular column for each
record store in HDFS and the performing a SQL query against another
database.

On Sun, Jan 10, 2016 at 9:17 AM, Joe Witt <[email protected]> wrote:

> Hello Sudeep,
>
> "Which NiFi processor can I use to split each record (separated by a
> new line character)"
>
>   For this the SplitText processor is rather helpful if you want to
> split each line.  I recommend you do two SplitText processors in a
> chain where one splits on every 1000 lines for example and then the
> next one splits each line.  As long as you have back-pressure setup
> this means you could split arbitrarily larger (in terms of number of
> lines) source files and have good behavior.
>
> ..."and perform validations?"
>
>   Consider if you want to validate each line in a text file and route
> valid lines one way and invalid lines another way.  If this is the
> case then you may be able to avoid using SplitText and simply use
> RouteText instead as it can operate on the original file in a line by
> line manner and perform expression based validation.  This would
> operate in bulk and be quite efficient.
>
> "For validations I want to verify a particular column value for each
> record using a SQL query"
>
>   Our ExecuteSQL processor is designed for executing SQL against a
> JDBC accessible database.  It is not helpful at this point for
> executing queries on line oriented data even if that data were valid
> DML or something.  Interesting idea but not something we support at
> this time.
>
> I'm interested to understand your case more if you don't mind though.
> You mention you're getting data from Sqoop into HDFS.  How is NiFi
> involved in that flow - is it after data lands in HDFS you're pulling
> it into NiFi?
>
> Thanks
> Joe
>
> On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra <[email protected]>
> wrote:
> > Hi,
> >
> > I am pushing some database records into HDFS using Sqoop.
> >
> > I want to perform some validations on each record in the HDFS data. Which
> > NiFi processor can I use to split each record (separated by a new line
> > character) and perform validations?
> >
> > For validations I want to verify a particular column value for each
> record
> > using a SQL query. I can see an ExecuteQuery processor. How can I
> > dynamically pass query parameters to it. Also is there a way to execute
> the
> > queries in bulk rather for each record.
> >
> > Kindly suggest.
> >
> > Apprecuate your help.
> >
> >
> > Thanks & Regards,
> >
> > Sudeep Shekhar Mishra
> >
> >
> >
> >
> >
> > --
> > Thanks & Regards,
> >
> > Sudeep Shekhar Mishra
> >
> > +91-9167519029
> > [email protected]
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
[email protected]

Reply via email to