For what it's worth, this scenario sounds very similar to why I wrote the MongoDBLookupService. I had a client that was using a CSV file w/ data dictionary CSV file.
On Mon, Feb 26, 2018 at 8:30 AM, Matt Burgess <[email protected]> wrote: > Mausam, > > You could use PutFile to store off the Category CSV, then you can use > LookupRecord with either a CSVRecordLookupService or a > SimpleCsvLookupService, the former is for fetching multiple fields > from the lookup, the latter is for a single value lookup. You'll also > use a CSVReader to read in the data, and a CSVRecordSetWriter (or some > other writer if you are converting the format). > > For the input format, if they are all strings you can configure the > reader to "Use String Fields From Header", but that assumes a header > line and that all fields are of String type. If the fields are of > primitive types (String, int, float) you can use InferAvroSchema first > to get the schema into the "avro.schema" attribute, then configure the > reader to "Use Schema Text" as the access strategy and ${avro.schema} > as the Schema Text property. > > For the writer, you need to provide the adjusted schema (with added > output fields from the Category CSV), so you can't use "Inherit Record > Schema" for the access strategy in the writer. Alternatively, I > suggest you explicitly create the correct input and output schemas, > you can either paste them directly into the "Schema Text" property for > the reader and writer, or set up an AvroSchemaRegistry, name the > schemas as user-defined properties (see the documentation for more > details), then you can "Use Schema Name" as the access strategy, then > use the name from the registry in the Schema Name property. > > If storing the CSV as a file is not prudent, you can (currently) use > MongoDB to persist it and use MongoDBLookupService, the same goes for > HBase. In the future I hope we have a RDBMSLookupService to look up > records from an RDBMS table, and possibly a Redis-backed one or > anything the community would like to contribute :) > > Regards, > Matt > > > On Mon, Feb 26, 2018 at 5:04 AM, mausam <[email protected]> wrote: > > Hi, > > > > I am trying to use Nifi+Kafka to import multiple CSV files into my > > application. > > > > Problem statement: > > Some of these CSV files are interlinked on attribute level. > > > > For example, the product CSV has a reference to Category CSV. > > Or, the Price CSV has a reference to Product CSV. > > > > Also, it is possible, that the Category CSV comes only once in the > beginning > > and subsequently, product CSV comes for months. In such case, I need to > > store the Category CSV data in Nifi for future references. > > > > I am able to create a flow with all independent files but am not able to > > solve the file interlinking. > > > > Queries: > > > > 1. Is there any out of the box processor that can help implement this? > > 2. Do I need to use a DB (like mongodb) to persist data for future > > reference? > > > > Thanks in advance. > > > > -Mausam > > > > > > > > >
