Exporting parquet, issues with schema

Brian Henriksen Wed, 09 Dec 2015 12:09:12 -0800

I am trying to use sqoop to export some parquet data to oracle from HDFS.  The 
first problem I ran into is that parquet export requires a .metadata directory 
that is created by a sqoop parquet IMPORT (Can anyone explain this to me, it 
seems odd to me that one can only send data to a database, that you just 
grabbed from a database).  I got around this by converting a small subset of my 
parquet data to text, sqoop export the text to oracle, and then sqoop import 
the data back to HDFS as parquet, and with it the .metadata directory.  Here is 
the error Im getting:




java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:54)
at 
parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
at 
parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
at 
org.kitesdk.data.spi.AbstractKeyRecordReaderWrapper.initialize(AbstractKeyRecordReaderWrapper.java:50)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroup

This looks like sqoop is getting to the point of starting up the mappers, but 
they are not aware of my parquet / avro schema.  Where does sqoop look for 
these schemas?  As far as I know, parquet files include the schema within the 
data files themselves, in addition to this there is the .metadata directory 
that contains a .avsc JSON file with the same schema.  Any ideas?

Exporting parquet, issues with schema

Reply via email to