Re: Attribute level interlinked CSV file import

Mike Thomsen Tue, 27 Feb 2018 06:20:08 -0800

For what it's worth, this scenario sounds very similar to why I wrote the
MongoDBLookupService. I had a client that was using a CSV file w/ data
dictionary CSV file.


On Mon, Feb 26, 2018 at 8:30 AM, Matt Burgess <[email protected]> wrote:

> Mausam,
>
> You could use PutFile to store off the Category CSV, then you can use
> LookupRecord with either a CSVRecordLookupService or a
> SimpleCsvLookupService, the former is for fetching multiple fields
> from the lookup, the latter is for a single value lookup. You'll also
> use a CSVReader to read in the data, and a CSVRecordSetWriter (or some
> other writer if you are converting the format).
>
> For the input format, if they are all strings you can configure the
> reader to "Use String Fields From Header", but that assumes a header
> line and that all fields are of String type. If the fields are of
> primitive types (String, int, float) you can use InferAvroSchema first
> to get the schema into the "avro.schema" attribute, then configure the
> reader to "Use Schema Text" as the access strategy and ${avro.schema}
> as the Schema Text property.
>
> For the writer, you need to provide the adjusted schema (with added
> output fields from the Category CSV), so you can't use "Inherit Record
> Schema" for the access strategy in the writer. Alternatively, I
> suggest you explicitly create the correct input and output schemas,
> you can either paste them directly into the "Schema Text" property for
> the reader and writer, or set up an AvroSchemaRegistry, name the
> schemas as user-defined properties (see the documentation for more
> details), then you can "Use Schema Name" as the access strategy, then
> use the name from the registry in the Schema Name property.
>
> If storing the CSV as a file is not prudent, you can (currently) use
> MongoDB to persist it and use MongoDBLookupService, the same goes for
> HBase. In the future I hope we have a RDBMSLookupService to look up
> records from an RDBMS table, and possibly a Redis-backed one or
> anything the community would like to contribute :)
>
> Regards,
> Matt
>
>
> On Mon, Feb 26, 2018 at 5:04 AM, mausam <[email protected]> wrote:
> > Hi,
> >
> > I am trying to use Nifi+Kafka to import multiple CSV files into my
> > application.
> >
> > Problem statement:
> > Some of these CSV files are interlinked on attribute level.
> >
> > For example, the product CSV has a reference to Category CSV.
> > Or, the Price CSV has a reference to Product CSV.
> >
> > Also, it is possible, that the Category CSV comes only once in the
> beginning
> > and subsequently, product CSV comes for months. In such case, I need to
> > store the Category CSV data in Nifi for future references.
> >
> > I am able to create a flow with all independent files but am not able to
> > solve the file interlinking.
> >
> > Queries:
> >
> > 1. Is there any out of the box processor that can help implement this?
> > 2. Do I need to use a DB (like mongodb) to persist data for future
> > reference?
> >
> > Thanks in advance.
> >
> > -Mausam
> >
> >
> >
> >
>

Re: Attribute level interlinked CSV file import

Reply via email to