2011/1/17 Guy Doulberg <guy.doulb...@conduit.com>:
> Hey again,
>
> I thought it will be easy to combine the key and the value, however I ran 
> into difficulties, I wonder if someone has make a generic FileInputFormat 
> that prepend the key to the value?
>
> Anyhow here is the code I am trying to write:
>
> I have a class that extends the SequenceFileInputFormat
>
> public class CombinedSequenceFileInputFormat<K extends Writable,V extends 
> Writable > extends SequenceFileInputFormat<K, V> {
>
>
>    @Override
>    public org.apache.hadoop.mapred.RecordReader<K, V> getRecordReader(
>            org.apache.hadoop.mapred.InputSplit split, JobConf job,
>            Reporter reporter) throws IOException {
>        // TODO Auto-generated method stub
>
>        CombinedSequenceRecordReader<K, V> wrap =  new 
> CombinedSequenceRecordReader<K, V>(super.getRecordReader(split, job, 
> reporter));
>
>        return wrap;
>    }
>
> }
>
> And then I return the wrapped recrodReader and the code of that wrapper is:
>
> public class CombinedSequenceRecordReader<K extends Writable,V > implements 
> RecordReader<K, V> {
>
>    private RecordReader<K, V> proxy;
>    private K currentKey;
>
>    public CombinedSequenceRecordReader(RecordReader<K, V> proxy){
>        this.proxy = proxy;
>    }
>
>    public void setProxy(RecordReader<K, V> proxy) {
>        this.proxy = proxy;
>    }
>
>    public RecordReader<K, V> getProxy() {
>        return proxy;
>    }
>
>    @Override
>    public boolean next(K key, V value) throws IOException {
>
>        return proxy.next(key, value);
>    }
>
>    @Override
>    public K createKey() {
>        currentKey = proxy.createKey() ;
>        return currentKey;
>    }
>
>    @Override
>    public V createValue() {
>        V val = proxy.createValue();
>        return val;
>    }
>
>    @Override
>    public long getPos() throws IOException {
>        // TODO Auto-generated method stub
>        return proxy.getPos();
>    }
>
>    @Override
>    public void close() throws IOException {
>        proxy.close();
>
>    }
>
>    @Override
>    public float getProgress() throws IOException {
>        // TODO Auto-generated method stub
>        return proxy.getProgress();
>    }
>
>
>
> }
>
>
> Now I am trying to extend the createValue in such a way that I will have also 
> the key, any suggestions?
>
>
>
>
>
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Sunday, January 16, 2011 10:33 PM
> To: user@hive.apache.org
> Subject: Re: ‏Sequence file- custom serdes - question
>
> 2011/1/16 Guy Doulberg <guy.doulb...@conduit.com>:
>> Hey all,
>>
>> I am new to this hive thing, but I have a very complex task to perform, I am 
>> a little stuck. I hope someone here can help.
>>
>> My team has been storing data to a custom sequence file that has a custom 
>> key and a custom value. We want to expose a hive interface to query this 
>> data.
>> I have been trying to write a custom SerDe that deserialize  the sequence 
>> file to the a hive table.
>>
>> As long as I needed values from the value part of the object everything was 
>> all-right, but when I needed to extract a value from the key-part, I got 
>> stuck, suddenly I realized that in the method of the deserialize(Writeable 
>> o), o is instance of the value class, and I don't know how I can access the 
>> key object.
>>
>> It could be I am missing something in the configuration in the java code or 
>> declaration  in the HIVE.
>>
>>
>>
>> Thanks,
>> Guy
>>
>>
>>
>>
>>
>
> Hive ignores then Key! (I know how crazy right) What I have done is
> used my InputFormat to combine the key and the value and make the
> combined field the value.
>

This approach should work. A simple approach is to convert the your
custom Writable to Text at this point.

source:    Writable A( name:car type:ford) Writable B ( windows:4)
InputFormat(Result):    Byte[0],"car\tford\t4"

>From this point you can just use hive delimited Serde as normal.

If your source input is setup in such a way that you can not decode it
in the InputFormat stage you probably need to write your own Serde as
the serde will have access to the hive table information and the
Source data.

Reply via email to