Re: When on HDFS, how do I detect schema of input paths?

David B. Martin Mon, 27 Feb 2012 15:57:26 -0800

It looks like this will do the trick...

_____
        FileSystem fs = etc.
        FSDataInputStream dataInputStream = fs.open(firstInputPath);
        DatumReader<GenericRecord> reader = new
GenericDatumReader<GenericRecord>();
        DataFileStream<GenericRecord> dataFileStream = new
DataFileStream<GenericRecord>(dataInputStream, reader);


        Schema s = dataFileStream.getSchema();
_____



On Mon, Feb 27, 2012 at 3:41 PM, David B. Martin <[email protected]> wrote:
> I've been getting my feet wet writing pseduo-distributed code.  In
> that environment:
>
> _____
>
>        File file = new File(input);
>        DatumReader<GenericRecord> reader = new
> GenericDatumReader<GenericRecord>();
>        DataFileReader<GenericRecord> dataFileReader = new
> DataFileReader<GenericRecord>(file, reader);
>
>        Schema s = dataFileReader.getSchema();
> _____
>
> Saying something like this works just fine.  Now I have the schema of
> my input and am ready for real action.
>
> But on the HDFS, I have to work in terms of Path instances instead of
> File instances.  Right?  I can't figure out how to perform the above
> operation when my inputs are of type  org.apache.hadoop.fs.Path and
> not java.io.File.
>
> Dave

Re: When on HDFS, how do I detect schema of input paths?

Reply via email to