Thanks, I eventually did it in the following way:
If next (the method of RecordReader) returns true, than it has now the current key and the current value. I made my value implement the interface: ValueHoldsKey<K> K getKey(); Void setKey(K k); Than I changed the wrapper to the following: public class CombinedSequenceRecordReader<K extends Writable,V extends ValueHoldKey<K> > implements RecordReader<K, V> And changed the code of the next to @Override public boolean next(K key, V value) throws IOException { boolean retVal = proxy.next(key, value); if (retVal){ value.setKey(key); } return retVal; } Now in the custom serde I can use my getKey method Hope that helps someones -----Original Message----- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Monday, January 17, 2011 4:36 PM To: user@hive.apache.org Subject: Re: Sequence file- custom serdes - question On Mon, Jan 17, 2011 at 9:20 AM, Guy Doulberg <guy.doulb...@conduit.com> wrote: > Thanks Eduard, > > But I don't understand your suggestion, > > How do I convert the custom object that I have to text? > > An where? > In the createValue method? > > Thanks again > > -----Original Message----- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Monday, January 17, 2011 4:13 PM > To: user@hive.apache.org > Subject: Re: Sequence file- custom serdes - question > > 2011/1/17 Guy Doulberg <guy.doulb...@conduit.com>: >> Hey again, >> >> I thought it will be easy to combine the key and the value, however I ran >> into difficulties, I wonder if someone has make a generic FileInputFormat >> that prepend the key to the value? >> >> Anyhow here is the code I am trying to write: >> >> I have a class that extends the SequenceFileInputFormat >> >> public class CombinedSequenceFileInputFormat<K extends Writable,V extends >> Writable > extends SequenceFileInputFormat<K, V> { >> >> >> @Override >> public org.apache.hadoop.mapred.RecordReader<K, V> getRecordReader( >> org.apache.hadoop.mapred.InputSplit split, JobConf job, >> Reporter reporter) throws IOException { >> // TODO Auto-generated method stub >> >> CombinedSequenceRecordReader<K, V> wrap = new >> CombinedSequenceRecordReader<K, V>(super.getRecordReader(split, job, >> reporter)); >> >> return wrap; >> } >> >> } >> >> And then I return the wrapped recrodReader and the code of that wrapper is: >> >> public class CombinedSequenceRecordReader<K extends Writable,V > implements >> RecordReader<K, V> { >> >> private RecordReader<K, V> proxy; >> private K currentKey; >> >> public CombinedSequenceRecordReader(RecordReader<K, V> proxy){ >> this.proxy = proxy; >> } >> >> public void setProxy(RecordReader<K, V> proxy) { >> this.proxy = proxy; >> } >> >> public RecordReader<K, V> getProxy() { >> return proxy; >> } >> >> @Override >> public boolean next(K key, V value) throws IOException { >> >> return proxy.next(key, value); >> } >> >> @Override >> public K createKey() { >> currentKey = proxy.createKey() ; >> return currentKey; >> } >> >> @Override >> public V createValue() { >> V val = proxy.createValue(); >> return val; >> } >> >> @Override >> public long getPos() throws IOException { >> // TODO Auto-generated method stub >> return proxy.getPos(); >> } >> >> @Override >> public void close() throws IOException { >> proxy.close(); >> >> } >> >> @Override >> public float getProgress() throws IOException { >> // TODO Auto-generated method stub >> return proxy.getProgress(); >> } >> >> >> >> } >> >> >> Now I am trying to extend the createValue in such a way that I will have >> also the key, any suggestions? >> >> >> >> >> >> >> -----Original Message----- >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] >> Sent: Sunday, January 16, 2011 10:33 PM >> To: user@hive.apache.org >> Subject: Re: Sequence file- custom serdes - question >> >> 2011/1/16 Guy Doulberg <guy.doulb...@conduit.com>: >>> Hey all, >>> >>> I am new to this hive thing, but I have a very complex task to perform, I >>> am a little stuck. I hope someone here can help. >>> >>> My team has been storing data to a custom sequence file that has a custom >>> key and a custom value. We want to expose a hive interface to query this >>> data. >>> I have been trying to write a custom SerDe that deserialize the sequence >>> file to the a hive table. >>> >>> As long as I needed values from the value part of the object everything was >>> all-right, but when I needed to extract a value from the key-part, I got >>> stuck, suddenly I realized that in the method of the deserialize(Writeable >>> o), o is instance of the value class, and I don't know how I can access the >>> key object. >>> >>> It could be I am missing something in the configuration in the java code or >>> declaration in the HIVE. >>> >>> >>> >>> Thanks, >>> Guy >>> >>> >>> >>> >>> >> >> Hive ignores then Key! (I know how crazy right) What I have done is >> used my InputFormat to combine the key and the value and make the >> combined field the value. >> > > This approach should work. A simple approach is to convert the your > custom Writable to Text at this point. > > source: Writable A( name:car type:ford) Writable B ( windows:4) > InputFormat(Result): Byte[0],"car\tford\t4" > > From this point you can just use hive delimited Serde as normal. > > If your source input is setup in such a way that you can not decode it > in the InputFormat stage you probably need to write your own Serde as > the serde will have access to the hive table information and the > Source data. > If you know the type of your Key and value, you can cast them into a known type then write some type of toString() on them. I do this when I know K and V are ALWAYS Text,Text However this is short cutting the process a bit. Your input format should return Key Value objects and the SerDe is supposed to interrogate the data from them, but in some cases you do not need a InputFormat and a Serde just one or the other.