Re: Re: QueryDatabaseTable - Schema
Hi Uwe, Answer to Q1 and Q2: I agree with you, I confused at the first time when I tried to understand NiFi record and schema ... etc. To understand, you need to get familiar with following concepts. Grouping keywords and concepts as follows might help you to grasp how each related: - NiFi Data models: - FlowFile: used to pass data around NiFi flow, has String key/value pairs "Attributes" and opaque binary ''Content". Some serialization mechanism, such as Avro can embed schema within the content. - Record: represents a data unit consists of multiple fields, its structure is defined by "Schema". Used at each NiFi components such as Processor, Controller Service ... etc. Resides on heap, when it's passed to next component, it's Serialized/Deserialized by RecordReader and Writer from/to various data format, CSV, JSON, XML, Avro ... etc. - RecordReader -- How to retrieve schema of an incoming "FlowFile" (Schema Access Strategy) Embedded schema, from content binary serialized as Avro with Schema embedded Schema text attribute, FlowFile contains String representation of a Schema Schema Registry: Hortonworks, Confluent, AvroSchema - RecordWriter -- How to retrieve schema of a processed "Record" (Schema Access Strategy) Inherit schema (from the processed Record). Useful for components like ConvertRecord, because it already knows a schema of the record being processed. No need to retrieve schema. Schema text attribute Schema Registry -- How to write schema of a processed "Record" (Schema Write Strategy) Embedded schema, write the schema within output FlowFile's content Schema text attribute, put Schema to output FlowFile's Attribute Schema Registry, put reference keys to output FlowFile's Attribute Answer for Q3 In order to debug schema, use 'Schema text attribute' Schema Write Strategy might be the easiest option, then you can see schema of a FlowFile from NiFi UI as FlowFile Attribute. I'm not aware of any module that we can write to external schema registry at the moment. Thanks, Koji On Wed, Sep 13, 2017 at 3:00 AM, Uwe Geercken <uwe.geerc...@web.de> wrote: > Thank you Koji! > > that is good news. But I have 3 questions: > > 1. You quote Bryan Bende: "When a reader produces a record it attaches the > schema it used to the record...": What happens here exactly? Is the schema > attached to the flowfile? Is it an attribute? > > 2. I can not see an exact definition of what "inherit" means. It may be > linked to my question above though. I am a bit puzzled of the use of > "embedded" versus "inherit". Does it not mean "embedded" in both cases? If > it really means inherit, from where does it inherit? or can I choose it? > > 3. What if I do really want to save the schema of e.g. a database table or > file to the registry. I don't know maybe as a reference or for debugging. > How would I do that (I mean: not manually)? > > > From the first look I found Nifi a kick-ass tool. It continues to evolve > very fast and I use it at work for smaller things. Now I want to start to > use it for more challenging things such as feeding kafka and maybe also > hadoop. So I am experimenting a lot and want to find the best possible > setup. > > Greetings ans thanks again. > > Uwe > > > Gesendet: Dienstag, 12. September 2017 um 03:05 Uhr > Von: "Koji Kawamura" <ijokaruma...@gmail.com> > An: users@nifi.apache.org > Betreff: Re: QueryDatabaseTable - Schema > Hi Uwe, > > I had a similar expectation when I was using QueryDatabaseTable or any > other processor creating Avro FlowFile which has its schema embedded, > combining new record reader/writer controllers. > > Now, NiFi has "Inherit Record Schema" option as "Schema Access > Strategy" of RecordWriter, already merged in master branch. > https://issues.apache.org/jira/browse/NIFI-3921 > > I was able to reuse the Avro schema at subsequent flow using "Inherit > Record Schema", it's really useful. You can construct a flow like > below: > > - QueryDatabaseTable > - outputs FlowFile with Avro schema embedded > - ConvertRecord > - AvroReader: > - "Schema Access Strategy" = "Use Embedded Avro Schema" > - CSVRecordSetWriter: > - "Schema Access Strategy" = "Inherit Record Schema" > - "Schema Write Strategy" = "Set 'avro.schema' Attribute" > > This way, you don't have to have the schema in registry, and result > CSV FlowFile has 'avro.schema' attribute inheriting the one created by > QueryDatabaseTable. > > Hope this helps. > > Thanks, > Koji > > On Tue, Sep 12, 2017 at 5:02 AM, Uwe Geercken <uwe.geerc...@web.de>
Aw: Re: QueryDatabaseTable - Schema
Thank you Koji! that is good news. But I have 3 questions: 1. You quote Bryan Bende: "When a reader produces a record it attaches the schema it used to the record...": What happens here exactly? Is the schema attached to the flowfile? Is it an attribute? 2. I can not see an exact definition of what "inherit" means. It may be linked to my question above though. I am a bit puzzled of the use of "embedded" versus "inherit". Does it not mean "embedded" in both cases? If it really means inherit, from where does it inherit? or can I choose it? 3. What if I do really want to save the schema of e.g. a database table or file to the registry. I don't know maybe as a reference or for debugging. How would I do that (I mean: not manually)? From the first look I found Nifi a kick-ass tool. It continues to evolve very fast and I use it at work for smaller things. Now I want to start to use it for more challenging things such as feeding kafka and maybe also hadoop. So I am experimenting a lot and want to find the best possible setup. Greetings ans thanks again. Uwe Gesendet: Dienstag, 12. September 2017 um 03:05 Uhr Von: "Koji Kawamura" <ijokaruma...@gmail.com> An: users@nifi.apache.org Betreff: Re: QueryDatabaseTable - Schema Hi Uwe, I had a similar expectation when I was using QueryDatabaseTable or any other processor creating Avro FlowFile which has its schema embedded, combining new record reader/writer controllers. Now, NiFi has "Inherit Record Schema" option as "Schema Access Strategy" of RecordWriter, already merged in master branch. https://issues.apache.org/jira/browse/NIFI-3921 I was able to reuse the Avro schema at subsequent flow using "Inherit Record Schema", it's really useful. You can construct a flow like below: - QueryDatabaseTable - outputs FlowFile with Avro schema embedded - ConvertRecord - AvroReader: - "Schema Access Strategy" = "Use Embedded Avro Schema" - CSVRecordSetWriter: - "Schema Access Strategy" = "Inherit Record Schema" - "Schema Write Strategy" = "Set 'avro.schema' Attribute" This way, you don't have to have the schema in registry, and result CSV FlowFile has 'avro.schema' attribute inheriting the one created by QueryDatabaseTable. Hope this helps. Thanks, Koji On Tue, Sep 12, 2017 at 5:02 AM, Uwe Geercken <uwe.geerc...@web.de> wrote: > Hello, > > I was wondering why if the QueryDatabaseTable processor creates internally > an Avro schema, why is this schema not available as an attribute or saved to > the registry? > > If it would, then one could reuse the schema. E.g. if I use the > ConvertRecord processor and I specify an AvroReader as RecordReader, then > this Reader will take the schema from the flowfile the QueryDatabaseTable > processor creates. But the RecordWriter in the ConvertRecord - in my example > a CSVRecordSetWriter requires the schema as an attribute or as a reference > to the schema registry. > > I can see there is an ExtractAvroSchema processor but I don'see there is a > way of combining the metadata into e.g. the ConvertRecord processor. > > Any help or ideas? > > Rgds, > > Uwe
Re: QueryDatabaseTable - Schema
Hi Uwe, I had a similar expectation when I was using QueryDatabaseTable or any other processor creating Avro FlowFile which has its schema embedded, combining new record reader/writer controllers. Now, NiFi has "Inherit Record Schema" option as "Schema Access Strategy" of RecordWriter, already merged in master branch. https://issues.apache.org/jira/browse/NIFI-3921 I was able to reuse the Avro schema at subsequent flow using "Inherit Record Schema", it's really useful. You can construct a flow like below: - QueryDatabaseTable - outputs FlowFile with Avro schema embedded - ConvertRecord - AvroReader: - "Schema Access Strategy" = "Use Embedded Avro Schema" - CSVRecordSetWriter: - "Schema Access Strategy" = "Inherit Record Schema" - "Schema Write Strategy" = "Set 'avro.schema' Attribute" This way, you don't have to have the schema in registry, and result CSV FlowFile has 'avro.schema' attribute inheriting the one created by QueryDatabaseTable. Hope this helps. Thanks, Koji On Tue, Sep 12, 2017 at 5:02 AM, Uwe Geerckenwrote: > Hello, > > I was wondering why if the QueryDatabaseTable processor creates internally > an Avro schema, why is this schema not available as an attribute or saved to > the registry? > > If it would, then one could reuse the schema. E.g. if I use the > ConvertRecord processor and I specify an AvroReader as RecordReader, then > this Reader will take the schema from the flowfile the QueryDatabaseTable > processor creates. But the RecordWriter in the ConvertRecord - in my example > a CSVRecordSetWriter requires the schema as an attribute or as a reference > to the schema registry. > > I can see there is an ExtractAvroSchema processor but I don'see there is a > way of combining the metadata into e.g. the ConvertRecord processor. > > Any help or ideas? > > Rgds, > > Uwe