Hi,
I'm looking for the simplest way possible of creating CSV files from SQL 
queries using Apache NiFi 1.3. 
The flow I currently have (the files are to be SFTP'ed to a remote server):

ExecuteSQL -> UpdateAttribute -> ConversionRecord [3 CSs] -> PutSFTP

The concept of SchemaRegistry is new to me, but if I understood it correctly in 
order for the ConversionRecord to work properly is necessary to have 3 
Controller Services ([3 CSs]) associated with it:   
   - AvroSchemaRegistry, with the schema defined in Avro Schema (JSON);   

   - AvroReader, referring to the above schema;   

   - CSVRecordSetWriter, also referring to the same schema.   


It seems there are many benefits in using the schema registry, including 
versioning, validation, etc, but in my example a simpler configuration would be 
welcome. 
Isn't the schema already defined by ExecuteSQL? Can I have the ConversionRecord 
alone with no dedicated SchemaRegistry (property), AvroReader,(instance), or 
CSVRecordSetWriter (instance)? Of course, we'd still need to specify the output 
is a CSV, so perhaps a shared CSVRecordSetWriter that also gets its schema from 
the flow file would still be useful.
By the way, would the Schema Access Strategy named "Use Embedded Avro Schema" 
be part of a simpler solution? How?
In the same vein, what about having the schema-name property optionally defined 
by the ExecuteSQL itself, so we don't have to depend on the UpdateAttribute 
component? 
In summary, I'm wondering if it's possible to have 3 (+ 1 generic) components 
instead of 6 per query:


ExecuteSQL -> ConversionRecord [CSVRecordSetWriter] -> PutSFTP

That would make a difference when defining multiple conversions from SQL to 
CSV, or other equivalent flows.
In addition, consider that someone might want to have maximum flexibility, 
meaning that it would be totally acceptable to change the query and get a 
different layout for the resulting CSV file, without having to change any 
SchemaRegistry, Reader, or Writer. I've found a few tickets out there covering 
a similar topic. In particular, [1] mentions the difficulty with more complex 
Avro data types. But I don't see that being a blocker when the data source is 
an old-fashioned SQL query. 
Recommendations?
P.S.1 Maybe templates would save the effort, but since Controller Services are 
"global", I'm still wondering if having too many parts would make it more 
difficult to manage lots of flows than it could be.
P.S.2 Will my 1st flow have a good performance? I'm wondering if another 
advantage of using SchemaRegistry etc is that it prevents the creation of too 
many records at once.
Thank you,
Marcio
[1] NIFI-1372 Create ConvertAvroToCSV
  
|  
|   |  
[NIFI-1372] Create ConvertAvroToCSV - ASF JIRA
   |  |

  |

 

Reply via email to