Matt, I have used all default compression settings as of now, i.e., it is set to none for AvroRecordSetWriter, PutHDFS, ConverAvroToORC.
Regards, Mohit -----Original Message----- From: Matt Burgess <[email protected]> Sent: 24 April 2018 20:58 To: [email protected] Subject: Re: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter issue Mohit, What are you using for compression settings for ConvertCSVToAvro, ConvertAvroToORC, AvroRecordSetWriter, PutHDFS, etc.? The default for ConvertCSVToAvro is Snappy where the rest default to None (although IIRC ConvertAvroToORC will retain an incoming Snappy codec in the outgoing ORC). If you used all defaults, I'm surprised the exact opposite isn't the case, as we've seen issues with processors like CompressContent using Snappy not working with Hive tables. Also, are you using the same schema in ConvertCSVToAvro as you are in AvroRecordSetWriter? If you view the Avro flow files (in the UI, listing the queue then viewing a flow file in the queue) before ConvertAvroToORC, do they appear to look the same in terms of the data and the data types? Regards, Matt On Tue, Apr 24, 2018 at 9:29 AM, Mohit <[email protected]> wrote: > Hi, > > I'm using the hive.ddl attribute formed by ConvertAvroToORC for the DDL > statement, and passing it to custom processor(ReplaceText route the file to > failure if maximum buffer size is more than 1 mb size. For large data it is > not a good option.) which modify the content of the flowfile to the hive.ddl > statement + location of external table. I don’t think this is causing any > issue. > > I tried to create the table on the top of avro file as well, it also displays > null. > > In the AvroRecordSetWriter doc, it says " Writes the contents of a RecordSet > in Binary Avro format. ". Is this format different than what ConvertCsvToAvro > writes? Because same flow is working with ValidateRecord(write it to csv) > +ConvertCsvToAvro processor. > > Thanks, > Mohit > > -----Original Message----- > From: Matt Burgess <[email protected]> > Sent: 24 April 2018 18:43 > To: [email protected] > Subject: Re: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter > issue > > Mohit, > > Can you share the config for your ConvertAvroToORC processor? Also, by > "CreateHiveTable", do you mean ReplaceText (to set the content to the > hive.ddl attribute formed by ConvertAvroToORC) -> PutHiveQL (to execute the > DDL)? If not, are you using a custom processor or ExecuteStreamCommand or > something else? > > If you are not using the generated DDL to create the table, can you share > your CREATE TABLE statement for the target table? I'm guessing there's a > mismatch somewhere between the data and the table definition. > > Regards, > Matt > > > On Tue, Apr 24, 2018 at 9:09 AM, Mohit <[email protected]> wrote: >> Hi all, >> >> >> >> I’m using ValidateRecord processor to validate the csv and convert it >> into Avro. Later, I convert this avro to orc using ConvertAvroToORC >> processor and write it to hdfs and create a hive table on the top of it. >> >> When I query the table, it displays null, though the record count is >> matching. >> >> >> >> Flow - ValidateRecord -> ConvertAvroToORC -> PutHDFS -> >> CreateHiveTable >> >> >> >> >> >> To debug, I also tried to write the avro data to hdfs and created the >> hive table on the top of it. It is also displaying null results. >> >> >> >> Flow - ValidateRecord -> ConvertCSVToAvro -> PutHDFS I manually created >> hive table with avro format. >> >> >> >> >> >> When I use ValidateRecord + ConvertCSVToAvro, it is working fine. >> >> Flow - ValidateRecord -> ConvertCSVToAvro -> ConvertAvroToORC -> >> PutHDFS >> -> CreateHiveTable >> >> >> >> >> >> Is there anything I’m doing wrong? >> >> >> >> Thanks, >> >> Mohit >> >> >> >> >
