Thanks,

I eventually did it in the following way:

If next (the method of RecordReader) returns true, than it has now the current 
key and the current value.

I made my value implement the interface:
ValueHoldsKey<K>

K getKey();
Void setKey(K k);


Than I changed the wrapper to the following:

public class CombinedSequenceRecordReader<K extends Writable,V extends 
ValueHoldKey<K>  > implements RecordReader<K, V>

And changed the code of the next to

        @Override
        public boolean next(K key, V value) throws IOException {
                boolean retVal = proxy.next(key, value);
                if (retVal){
                        value.setKey(key);
                }
                return retVal;
        }


Now in the custom serde I can use my getKey method

Hope that helps someones


-----Original Message-----
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Monday, January 17, 2011 4:36 PM
To: user@hive.apache.org
Subject: Re: ‏Sequence file- custom serdes - question

On Mon, Jan 17, 2011 at 9:20 AM, Guy Doulberg <guy.doulb...@conduit.com> wrote:
> Thanks Eduard,
>
> But I don't understand your suggestion,
>
> How do I convert the custom object that I have to text?
>
> An where?
> In the createValue method?
>
> Thanks again
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Monday, January 17, 2011 4:13 PM
> To: user@hive.apache.org
> Subject: Re: ‏Sequence file- custom serdes - question
>
> 2011/1/17 Guy Doulberg <guy.doulb...@conduit.com>:
>> Hey again,
>>
>> I thought it will be easy to combine the key and the value, however I ran 
>> into difficulties, I wonder if someone has make a generic FileInputFormat 
>> that prepend the key to the value?
>>
>> Anyhow here is the code I am trying to write:
>>
>> I have a class that extends the SequenceFileInputFormat
>>
>> public class CombinedSequenceFileInputFormat<K extends Writable,V extends 
>> Writable > extends SequenceFileInputFormat<K, V> {
>>
>>
>>    @Override
>>    public org.apache.hadoop.mapred.RecordReader<K, V> getRecordReader(
>>            org.apache.hadoop.mapred.InputSplit split, JobConf job,
>>            Reporter reporter) throws IOException {
>>        // TODO Auto-generated method stub
>>
>>        CombinedSequenceRecordReader<K, V> wrap =  new 
>> CombinedSequenceRecordReader<K, V>(super.getRecordReader(split, job, 
>> reporter));
>>
>>        return wrap;
>>    }
>>
>> }
>>
>> And then I return the wrapped recrodReader and the code of that wrapper is:
>>
>> public class CombinedSequenceRecordReader<K extends Writable,V > implements 
>> RecordReader<K, V> {
>>
>>    private RecordReader<K, V> proxy;
>>    private K currentKey;
>>
>>    public CombinedSequenceRecordReader(RecordReader<K, V> proxy){
>>        this.proxy = proxy;
>>    }
>>
>>    public void setProxy(RecordReader<K, V> proxy) {
>>        this.proxy = proxy;
>>    }
>>
>>    public RecordReader<K, V> getProxy() {
>>        return proxy;
>>    }
>>
>>    @Override
>>    public boolean next(K key, V value) throws IOException {
>>
>>        return proxy.next(key, value);
>>    }
>>
>>    @Override
>>    public K createKey() {
>>        currentKey = proxy.createKey() ;
>>        return currentKey;
>>    }
>>
>>    @Override
>>    public V createValue() {
>>        V val = proxy.createValue();
>>        return val;
>>    }
>>
>>    @Override
>>    public long getPos() throws IOException {
>>        // TODO Auto-generated method stub
>>        return proxy.getPos();
>>    }
>>
>>    @Override
>>    public void close() throws IOException {
>>        proxy.close();
>>
>>    }
>>
>>    @Override
>>    public float getProgress() throws IOException {
>>        // TODO Auto-generated method stub
>>        return proxy.getProgress();
>>    }
>>
>>
>>
>> }
>>
>>
>> Now I am trying to extend the createValue in such a way that I will have 
>> also the key, any suggestions?
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>> Sent: Sunday, January 16, 2011 10:33 PM
>> To: user@hive.apache.org
>> Subject: Re: ‏Sequence file- custom serdes - question
>>
>> 2011/1/16 Guy Doulberg <guy.doulb...@conduit.com>:
>>> Hey all,
>>>
>>> I am new to this hive thing, but I have a very complex task to perform, I 
>>> am a little stuck. I hope someone here can help.
>>>
>>> My team has been storing data to a custom sequence file that has a custom 
>>> key and a custom value. We want to expose a hive interface to query this 
>>> data.
>>> I have been trying to write a custom SerDe that deserialize  the sequence 
>>> file to the a hive table.
>>>
>>> As long as I needed values from the value part of the object everything was 
>>> all-right, but when I needed to extract a value from the key-part, I got 
>>> stuck, suddenly I realized that in the method of the deserialize(Writeable 
>>> o), o is instance of the value class, and I don't know how I can access the 
>>> key object.
>>>
>>> It could be I am missing something in the configuration in the java code or 
>>> declaration  in the HIVE.
>>>
>>>
>>>
>>> Thanks,
>>> Guy
>>>
>>>
>>>
>>>
>>>
>>
>> Hive ignores then Key! (I know how crazy right) What I have done is
>> used my InputFormat to combine the key and the value and make the
>> combined field the value.
>>
>
> This approach should work. A simple approach is to convert the your
> custom Writable to Text at this point.
>
> source:    Writable A( name:car type:ford) Writable B ( windows:4)
> InputFormat(Result):    Byte[0],"car\tford\t4"
>
> From this point you can just use hive delimited Serde as normal.
>
> If your source input is setup in such a way that you can not decode it
> in the InputFormat stage you probably need to write your own Serde as
> the serde will have access to the hive table information and the
> Source data.
>

If you know the type of your Key and value, you can cast them into a
known type then write some type of toString() on them.

I do this when I know K and V are ALWAYS Text,Text

However this is short cutting the process a bit. Your input format
should return Key Value objects and the SerDe is supposed to
interrogate the data from them, but in some cases you do not need a
InputFormat and a Serde just one or the other.

Reply via email to