Re: Any sugesstions java.io.IOException: Not a data file error

Jakob Homan Fri, 08 Nov 2013 11:35:30 -0800

This is not supported. The assumption is that all the files in thedirectory will be Avro. This is a general assumption across Hive, notspecific to the Avro serde.


On 10/30/2013 01:50 AM, Valluri, Sathish wrote:

Resending after disabling security signing..

*From:*Valluri, Sathish [mailto:sathish.vall...@emc.com]
*Sent:* Wednesday, October 30, 2013 2:17 PM
*To:* user@hive.apache.org
*Subject:* Any sugesstions java.io.IOException: Not a data file error

Hi All,
Hive Mapreduce jobs failing with the following *java.io.IOException:Not a data file error* if there are files other than avro in the HDFS.
I have created a Hive external table as shown below,
CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES('avro.schema.literal'='{ <schema json literal>') STORED ASINPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION'*/testdata/*';
Running select count(*) from testable;
When /testdata contains avro files the query works fine and gives theresults properly.
If the /testdata have some other format files let's say*/testdata/test.txt* the query is failing with the following error.
java.io.IOException: java.lang.reflect.InvocationTargetException atorg.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)atorg.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)atorg.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341)atorg.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)atorg.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)atorg.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) atorg.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:336) atorg.apache.hadoop.mapred.Child$4.run(Child.java:270) atjava.security.AccessController.doPrivileged(Native Method) atjavax.security.auth.Subject.doAs(Subject.java:415) atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by:java.lang.reflect.InvocationTargetException atsun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:525) atorg.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327)... 11 more *Caused by: java.io.IOException: Not a data file. at*org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)atorg.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:72)atorg.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)atorg.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)... 16 more
Can anyone suggest any parameter or any changes needs to be made forthe query to be successful. Basically Hive should skip the otherformat files and load only the avro files when processing data on theHDFS.
Waiting for any suggestions to resolve this issue.

Regards

Sathish Valluri

Re: Any sugesstions java.io.IOException: Not a data file error

Reply via email to