On bucketing : fewer files than buckets.
Hello, In the documentation I read that as many files are created in each partition as there are buckets. In the following sample script, I created 32 buckets, but only find 2 files in each partition directory. Am I missing something? In this sample script, I'm trying to load a tab separated file from disk into the table trades ... and then transferring data into alltrades based on the example in : http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL/BucketedTables BTW, ANOTHER question : How does one put in comments in a hive.q file? sample script SET hive.enforce.bucketing=TRUE; CREATE TABLE trades (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT) PARTITIONED BY (dt STRING) CLUSTERED BY (symbol) SORTED BY (time ASC) INTO 1 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; LOAD DATA LOCAL INPATH 'data/2001-05-22' INTO TABLE trades PARTITION (dt='2001-05-22'); CREATE TABLE alltrades (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT) PARTITIONED BY (dt STRING) CLUSTERED BY (symbol) SORTED BY (time ASC) INTO 32 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; FROM trades INSERT OVERWRITE TABLE alltrades PARTITION (dt='2001-05-22') SELECT symbol, time, exchange, price, volume WHERE dt='2001-05-22';
RE: Sequence file- custom serdes - question
Thanks, I eventually did it in the following way: If next (the method of RecordReader) returns true, than it has now the current key and the current value. I made my value implement the interface: ValueHoldsKey K getKey(); Void setKey(K k); Than I changed the wrapper to the following: public class CombinedSequenceRecordReader > implements RecordReader And changed the code of the next to @Override public boolean next(K key, V value) throws IOException { boolean retVal = proxy.next(key, value); if (retVal){ value.setKey(key); } return retVal; } Now in the custom serde I can use my getKey method Hope that helps someones -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Monday, January 17, 2011 4:36 PM To: user@hive.apache.org Subject: Re: Sequence file- custom serdes - question On Mon, Jan 17, 2011 at 9:20 AM, Guy Doulberg wrote: > Thanks Eduard, > > But I don't understand your suggestion, > > How do I convert the custom object that I have to text? > > An where? > In the createValue method? > > Thanks again > > -Original Message- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Monday, January 17, 2011 4:13 PM > To: user@hive.apache.org > Subject: Re: Sequence file- custom serdes - question > > 2011/1/17 Guy Doulberg : >> Hey again, >> >> I thought it will be easy to combine the key and the value, however I ran >> into difficulties, I wonder if someone has make a generic FileInputFormat >> that prepend the key to the value? >> >> Anyhow here is the code I am trying to write: >> >> I have a class that extends the SequenceFileInputFormat >> >> public class CombinedSequenceFileInputFormat> Writable > extends SequenceFileInputFormat { >> >> >> @Override >> public org.apache.hadoop.mapred.RecordReader getRecordReader( >> org.apache.hadoop.mapred.InputSplit split, JobConf job, >> Reporter reporter) throws IOException { >> // TODO Auto-generated method stub >> >> CombinedSequenceRecordReader wrap = new >> CombinedSequenceRecordReader(super.getRecordReader(split, job, >> reporter)); >> >> return wrap; >> } >> >> } >> >> And then I return the wrapped recrodReader and the code of that wrapper is: >> >> public class CombinedSequenceRecordReader implements >> RecordReader { >> >> private RecordReader proxy; >> private K currentKey; >> >> public CombinedSequenceRecordReader(RecordReader proxy){ >> this.proxy = proxy; >> } >> >> public void setProxy(RecordReader proxy) { >> this.proxy = proxy; >> } >> >> public RecordReader getProxy() { >> return proxy; >> } >> >> @Override >> public boolean next(K key, V value) throws IOException { >> >> return proxy.next(key, value); >> } >> >> @Override >> public K createKey() { >> currentKey = proxy.createKey() ; >> return currentKey; >> } >> >> @Override >> public V createValue() { >> V val = proxy.createValue(); >> return val; >> } >> >> @Override >> public long getPos() throws IOException { >> // TODO Auto-generated method stub >> return proxy.getPos(); >> } >> >> @Override >> public void close() throws IOException { >> proxy.close(); >> >> } >> >> @Override >> public float getProgress() throws IOException { >> // TODO Auto-generated method stub >> return proxy.getProgress(); >> } >> >> >> >> } >> >> >> Now I am trying to extend the createValue in such a way that I will have >> also the key, any suggestions? >> >> >> >> >> >> >> -Original Message- >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] >> Sent: Sunday, January 16, 2011 10:33 PM >> To: user@hive.apache.org >> Subject: Re: Sequence file- custom serdes - question >> >> 2011/1/16 Guy Doulberg : >>> Hey all, >>> >>> I am new to this hive thing, but I have a very complex task to perform, I >>> am a little stuck. I hope someone here can help. >>> >>> My team has been storing data to a custom sequence file that has a custom >>> key and a custom value. We want to expose a hive interface to query this >>> data. >>> I have been trying to write a custom SerDe that deserialize the sequence >>> file to the a hive table. >>> >>> As long as I needed values from the value part of the object everything was >>> all-right, but when I needed to extract a value from the key-part, I got >>> stuck, suddenly I realized that in the method of the deserialize(Writeable >>> o), o is instance of the value class, and I don't know how I can access the >>> key object. >>> >>> It could be I am missing something in the configuration in the java code or >>> declaration in the HIVE. >>> >>> >>> >>> Thanks, >>> Guy >>> >>> >>> >>> >>> >> >> Hive ignores then Key! (I know how crazy right) What I have done is >> used my Inpu
Re: Sequence file- custom serdes - question
On Mon, Jan 17, 2011 at 9:20 AM, Guy Doulberg wrote: > Thanks Eduard, > > But I don't understand your suggestion, > > How do I convert the custom object that I have to text? > > An where? > In the createValue method? > > Thanks again > > -Original Message- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Monday, January 17, 2011 4:13 PM > To: user@hive.apache.org > Subject: Re: Sequence file- custom serdes - question > > 2011/1/17 Guy Doulberg : >> Hey again, >> >> I thought it will be easy to combine the key and the value, however I ran >> into difficulties, I wonder if someone has make a generic FileInputFormat >> that prepend the key to the value? >> >> Anyhow here is the code I am trying to write: >> >> I have a class that extends the SequenceFileInputFormat >> >> public class CombinedSequenceFileInputFormat> Writable > extends SequenceFileInputFormat { >> >> >> @Override >> public org.apache.hadoop.mapred.RecordReader getRecordReader( >> org.apache.hadoop.mapred.InputSplit split, JobConf job, >> Reporter reporter) throws IOException { >> // TODO Auto-generated method stub >> >> CombinedSequenceRecordReader wrap = new >> CombinedSequenceRecordReader(super.getRecordReader(split, job, >> reporter)); >> >> return wrap; >> } >> >> } >> >> And then I return the wrapped recrodReader and the code of that wrapper is: >> >> public class CombinedSequenceRecordReader implements >> RecordReader { >> >> private RecordReader proxy; >> private K currentKey; >> >> public CombinedSequenceRecordReader(RecordReader proxy){ >> this.proxy = proxy; >> } >> >> public void setProxy(RecordReader proxy) { >> this.proxy = proxy; >> } >> >> public RecordReader getProxy() { >> return proxy; >> } >> >> @Override >> public boolean next(K key, V value) throws IOException { >> >> return proxy.next(key, value); >> } >> >> @Override >> public K createKey() { >> currentKey = proxy.createKey() ; >> return currentKey; >> } >> >> @Override >> public V createValue() { >> V val = proxy.createValue(); >> return val; >> } >> >> @Override >> public long getPos() throws IOException { >> // TODO Auto-generated method stub >> return proxy.getPos(); >> } >> >> @Override >> public void close() throws IOException { >> proxy.close(); >> >> } >> >> @Override >> public float getProgress() throws IOException { >> // TODO Auto-generated method stub >> return proxy.getProgress(); >> } >> >> >> >> } >> >> >> Now I am trying to extend the createValue in such a way that I will have >> also the key, any suggestions? >> >> >> >> >> >> >> -Original Message- >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] >> Sent: Sunday, January 16, 2011 10:33 PM >> To: user@hive.apache.org >> Subject: Re: Sequence file- custom serdes - question >> >> 2011/1/16 Guy Doulberg : >>> Hey all, >>> >>> I am new to this hive thing, but I have a very complex task to perform, I >>> am a little stuck. I hope someone here can help. >>> >>> My team has been storing data to a custom sequence file that has a custom >>> key and a custom value. We want to expose a hive interface to query this >>> data. >>> I have been trying to write a custom SerDe that deserialize the sequence >>> file to the a hive table. >>> >>> As long as I needed values from the value part of the object everything was >>> all-right, but when I needed to extract a value from the key-part, I got >>> stuck, suddenly I realized that in the method of the deserialize(Writeable >>> o), o is instance of the value class, and I don't know how I can access the >>> key object. >>> >>> It could be I am missing something in the configuration in the java code or >>> declaration in the HIVE. >>> >>> >>> >>> Thanks, >>> Guy >>> >>> >>> >>> >>> >> >> Hive ignores then Key! (I know how crazy right) What I have done is >> used my InputFormat to combine the key and the value and make the >> combined field the value. >> > > This approach should work. A simple approach is to convert the your > custom Writable to Text at this point. > > source: Writable A( name:car type:ford) Writable B ( windows:4) > InputFormat(Result): Byte[0],"car\tford\t4" > > From this point you can just use hive delimited Serde as normal. > > If your source input is setup in such a way that you can not decode it > in the InputFormat stage you probably need to write your own Serde as > the serde will have access to the hive table information and the > Source data. > If you know the type of your Key and value, you can cast them into a known type then write some type of toString() on them. I do this when I know K and V are ALWAYS Text,Text However this is short cutting the process a bit. Your input format should return Key Value objects and the SerDe is supposed to interrogate the data from th
RE: Sequence file- custom serdes - question
Thanks Eduard, But I don't understand your suggestion, How do I convert the custom object that I have to text? An where? In the createValue method? Thanks again -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Monday, January 17, 2011 4:13 PM To: user@hive.apache.org Subject: Re: Sequence file- custom serdes - question 2011/1/17 Guy Doulberg : > Hey again, > > I thought it will be easy to combine the key and the value, however I ran > into difficulties, I wonder if someone has make a generic FileInputFormat > that prepend the key to the value? > > Anyhow here is the code I am trying to write: > > I have a class that extends the SequenceFileInputFormat > > public class CombinedSequenceFileInputFormat Writable > extends SequenceFileInputFormat { > > > @Override > public org.apache.hadoop.mapred.RecordReader getRecordReader( > org.apache.hadoop.mapred.InputSplit split, JobConf job, > Reporter reporter) throws IOException { > // TODO Auto-generated method stub > > CombinedSequenceRecordReader wrap = new > CombinedSequenceRecordReader(super.getRecordReader(split, job, > reporter)); > > return wrap; > } > > } > > And then I return the wrapped recrodReader and the code of that wrapper is: > > public class CombinedSequenceRecordReader implements > RecordReader { > > private RecordReader proxy; > private K currentKey; > > public CombinedSequenceRecordReader(RecordReader proxy){ > this.proxy = proxy; > } > > public void setProxy(RecordReader proxy) { > this.proxy = proxy; > } > > public RecordReader getProxy() { > return proxy; > } > > @Override > public boolean next(K key, V value) throws IOException { > > return proxy.next(key, value); > } > > @Override > public K createKey() { > currentKey = proxy.createKey() ; > return currentKey; > } > > @Override > public V createValue() { > V val = proxy.createValue(); > return val; > } > > @Override > public long getPos() throws IOException { > // TODO Auto-generated method stub > return proxy.getPos(); > } > > @Override > public void close() throws IOException { > proxy.close(); > > } > > @Override > public float getProgress() throws IOException { > // TODO Auto-generated method stub > return proxy.getProgress(); > } > > > > } > > > Now I am trying to extend the createValue in such a way that I will have also > the key, any suggestions? > > > > > > > -Original Message- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Sunday, January 16, 2011 10:33 PM > To: user@hive.apache.org > Subject: Re: Sequence file- custom serdes - question > > 2011/1/16 Guy Doulberg : >> Hey all, >> >> I am new to this hive thing, but I have a very complex task to perform, I am >> a little stuck. I hope someone here can help. >> >> My team has been storing data to a custom sequence file that has a custom >> key and a custom value. We want to expose a hive interface to query this >> data. >> I have been trying to write a custom SerDe that deserialize the sequence >> file to the a hive table. >> >> As long as I needed values from the value part of the object everything was >> all-right, but when I needed to extract a value from the key-part, I got >> stuck, suddenly I realized that in the method of the deserialize(Writeable >> o), o is instance of the value class, and I don't know how I can access the >> key object. >> >> It could be I am missing something in the configuration in the java code or >> declaration in the HIVE. >> >> >> >> Thanks, >> Guy >> >> >> >> >> > > Hive ignores then Key! (I know how crazy right) What I have done is > used my InputFormat to combine the key and the value and make the > combined field the value. > This approach should work. A simple approach is to convert the your custom Writable to Text at this point. source:Writable A( name:car type:ford) Writable B ( windows:4) InputFormat(Result):Byte[0],"car\tford\t4" From this point you can just use hive delimited Serde as normal. If your source input is setup in such a way that you can not decode it in the InputFormat stage you probably need to write your own Serde as the serde will have access to the hive table information and the Source data.
Re: Sequence file- custom serdes - question
2011/1/17 Guy Doulberg : > Hey again, > > I thought it will be easy to combine the key and the value, however I ran > into difficulties, I wonder if someone has make a generic FileInputFormat > that prepend the key to the value? > > Anyhow here is the code I am trying to write: > > I have a class that extends the SequenceFileInputFormat > > public class CombinedSequenceFileInputFormat Writable > extends SequenceFileInputFormat { > > > @Override > public org.apache.hadoop.mapred.RecordReader getRecordReader( > org.apache.hadoop.mapred.InputSplit split, JobConf job, > Reporter reporter) throws IOException { > // TODO Auto-generated method stub > > CombinedSequenceRecordReader wrap = new > CombinedSequenceRecordReader(super.getRecordReader(split, job, > reporter)); > > return wrap; > } > > } > > And then I return the wrapped recrodReader and the code of that wrapper is: > > public class CombinedSequenceRecordReader implements > RecordReader { > > private RecordReader proxy; > private K currentKey; > > public CombinedSequenceRecordReader(RecordReader proxy){ > this.proxy = proxy; > } > > public void setProxy(RecordReader proxy) { > this.proxy = proxy; > } > > public RecordReader getProxy() { > return proxy; > } > > @Override > public boolean next(K key, V value) throws IOException { > > return proxy.next(key, value); > } > > @Override > public K createKey() { > currentKey = proxy.createKey() ; > return currentKey; > } > > @Override > public V createValue() { > V val = proxy.createValue(); > return val; > } > > @Override > public long getPos() throws IOException { > // TODO Auto-generated method stub > return proxy.getPos(); > } > > @Override > public void close() throws IOException { > proxy.close(); > > } > > @Override > public float getProgress() throws IOException { > // TODO Auto-generated method stub > return proxy.getProgress(); > } > > > > } > > > Now I am trying to extend the createValue in such a way that I will have also > the key, any suggestions? > > > > > > > -Original Message- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Sunday, January 16, 2011 10:33 PM > To: user@hive.apache.org > Subject: Re: Sequence file- custom serdes - question > > 2011/1/16 Guy Doulberg : >> Hey all, >> >> I am new to this hive thing, but I have a very complex task to perform, I am >> a little stuck. I hope someone here can help. >> >> My team has been storing data to a custom sequence file that has a custom >> key and a custom value. We want to expose a hive interface to query this >> data. >> I have been trying to write a custom SerDe that deserialize the sequence >> file to the a hive table. >> >> As long as I needed values from the value part of the object everything was >> all-right, but when I needed to extract a value from the key-part, I got >> stuck, suddenly I realized that in the method of the deserialize(Writeable >> o), o is instance of the value class, and I don't know how I can access the >> key object. >> >> It could be I am missing something in the configuration in the java code or >> declaration in the HIVE. >> >> >> >> Thanks, >> Guy >> >> >> >> >> > > Hive ignores then Key! (I know how crazy right) What I have done is > used my InputFormat to combine the key and the value and make the > combined field the value. > This approach should work. A simple approach is to convert the your custom Writable to Text at this point. source:Writable A( name:car type:ford) Writable B ( windows:4) InputFormat(Result):Byte[0],"car\tford\t4" >From this point you can just use hive delimited Serde as normal. If your source input is setup in such a way that you can not decode it in the InputFormat stage you probably need to write your own Serde as the serde will have access to the hive table information and the Source data.
RE: Sequence file- custom serdes - question
Hey again, I thought it will be easy to combine the key and the value, however I ran into difficulties, I wonder if someone has make a generic FileInputFormat that prepend the key to the value? Anyhow here is the code I am trying to write: I have a class that extends the SequenceFileInputFormat public class CombinedSequenceFileInputFormat extends SequenceFileInputFormat { @Override public org.apache.hadoop.mapred.RecordReader getRecordReader( org.apache.hadoop.mapred.InputSplit split, JobConf job, Reporter reporter) throws IOException { // TODO Auto-generated method stub CombinedSequenceRecordReader wrap = new CombinedSequenceRecordReader(super.getRecordReader(split, job, reporter)); return wrap; } } And then I return the wrapped recrodReader and the code of that wrapper is: public class CombinedSequenceRecordReader implements RecordReader { private RecordReader proxy; private K currentKey; public CombinedSequenceRecordReader(RecordReader proxy){ this.proxy = proxy; } public void setProxy(RecordReader proxy) { this.proxy = proxy; } public RecordReader getProxy() { return proxy; } @Override public boolean next(K key, V value) throws IOException { return proxy.next(key, value); } @Override public K createKey() { currentKey = proxy.createKey() ; return currentKey; } @Override public V createValue() { V val = proxy.createValue(); return val; } @Override public long getPos() throws IOException { // TODO Auto-generated method stub return proxy.getPos(); } @Override public void close() throws IOException { proxy.close(); } @Override public float getProgress() throws IOException { // TODO Auto-generated method stub return proxy.getProgress(); } } Now I am trying to extend the createValue in such a way that I will have also the key, any suggestions? -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Sunday, January 16, 2011 10:33 PM To: user@hive.apache.org Subject: Re: Sequence file- custom serdes - question 2011/1/16 Guy Doulberg : > Hey all, > > I am new to this hive thing, but I have a very complex task to perform, I am > a little stuck. I hope someone here can help. > > My team has been storing data to a custom sequence file that has a custom key > and a custom value. We want to expose a hive interface to query this data. > I have been trying to write a custom SerDe that deserialize the sequence > file to the a hive table. > > As long as I needed values from the value part of the object everything was > all-right, but when I needed to extract a value from the key-part, I got > stuck, suddenly I realized that in the method of the deserialize(Writeable > o), o is instance of the value class, and I don't know how I can access the > key object. > > It could be I am missing something in the configuration in the java code or > declaration in the HIVE. > > > > Thanks, > Guy > > > > > Hive ignores then Key! (I know how crazy right) What I have done is used my InputFormat to combine the key and the value and make the combined field the value.
Re: Can't drop table
Hi All, I'm using PostgreSQL for Hive metastore in production. As far as I know MySQL has limitations on constraints in unicode environments. I remember Hive could not create metastore schema automatically with MySQL, Innodb and UTF-8 encoding environment. so I switched PostgreSQL for metastore. Actually, it works with PostgreSQL. but first time when one run hive cli with PostgreSQL, there are problems on creating metastore schema because JDO does not work well with PostgreSQL. I created the metastore schema manually. after that, everything works fine. Regards, - Youngwoo 2011/1/17 Edward Capriolo > You are the first person I have heard of using postgres. I commend you > for not succumbing to the social pressure and just installing mysql. > However I would advice succumbing to the social pressure and using > either derby or mysql. > > The reason I say this is because jpox "has support" for a number of > data stores (M$ SQL server) however, people have run into issues with > them. Databases other then derby and mysql 'should work' but are > generally untested. > > Edward > > On Mon, Jan 17, 2011 at 2:52 AM, wd wrote: > > Finally I found, when use hive-0.5.0-bin, 'drop table' will hung at first > > time, after Ctrl-c kill the client, and run hive again, it can > successfully > > drop the table. When use hive-0.6.0-bin, it will always hung there. > > > > 2011/1/6 wd > >> > >> hi, > >> > >> I've setup a single node hadoop and hive. And can create table in hive, > >> but can't drop table, hive cli will hang there, nothing more infos. > >> > >> hive-0.6.0-bin > >> hadoop-0.20.2 > >> jre1.6.0_23 > >> postgresql-9.0-801.jdbc4.jar (have tried postgresql-8.4-701.jdbc4.jar) > >> pgsql 9.0.2 > >> > >> How to find what's wrong happed? thx. > > > > >
Re: Can't drop table
Hi all, We use postgres as a metastore for Hive and haven't come across any problems. The postgres driver jar is: postgresql-8.4.701-jdbc.jar. We use the version of hive that comes out of Cloudera's CDH3b2, which I believe is some variant of Hive 0.5.0. Java: HotSpot 1.6.0_20 OS: Ubuntu Lucid (10.04) Postgres: 8.4.4 Postgres JDBC driver: postgresql-8.4.701-jdbc.jar Hadoop: 0.20.2+320 (Cloudera CDH3b2) Hive: 0.5.0+20 (Cloudera CDH3b2) Thanks, Jamie On 17 January 2011 13:13, Edward Capriolo wrote: > You are the first person I have heard of using postgres. I commend you > for not succumbing to the social pressure and just installing mysql. > However I would advice succumbing to the social pressure and using > either derby or mysql. > > The reason I say this is because jpox "has support" for a number of > data stores (M$ SQL server) however, people have run into issues with > them. Databases other then derby and mysql 'should work' but are > generally untested. > > Edward > > On Mon, Jan 17, 2011 at 2:52 AM, wd wrote: > > Finally I found, when use hive-0.5.0-bin, 'drop table' will hung at first > > time, after Ctrl-c kill the client, and run hive again, it can > successfully > > drop the table. When use hive-0.6.0-bin, it will always hung there. > > > > 2011/1/6 wd > >> > >> hi, > >> > >> I've setup a single node hadoop and hive. And can create table in hive, > >> but can't drop table, hive cli will hang there, nothing more infos. > >> > >> hive-0.6.0-bin > >> hadoop-0.20.2 > >> jre1.6.0_23 > >> postgresql-9.0-801.jdbc4.jar (have tried postgresql-8.4-701.jdbc4.jar) > >> pgsql 9.0.2 > >> > >> How to find what's wrong happed? thx. > > > > >
Re: Can't drop table
You are the first person I have heard of using postgres. I commend you for not succumbing to the social pressure and just installing mysql. However I would advice succumbing to the social pressure and using either derby or mysql. The reason I say this is because jpox "has support" for a number of data stores (M$ SQL server) however, people have run into issues with them. Databases other then derby and mysql 'should work' but are generally untested. Edward On Mon, Jan 17, 2011 at 2:52 AM, wd wrote: > Finally I found, when use hive-0.5.0-bin, 'drop table' will hung at first > time, after Ctrl-c kill the client, and run hive again, it can successfully > drop the table. When use hive-0.6.0-bin, it will always hung there. > > 2011/1/6 wd >> >> hi, >> >> I've setup a single node hadoop and hive. And can create table in hive, >> but can't drop table, hive cli will hang there, nothing more infos. >> >> hive-0.6.0-bin >> hadoop-0.20.2 >> jre1.6.0_23 >> postgresql-9.0-801.jdbc4.jar (have tried postgresql-8.4-701.jdbc4.jar) >> pgsql 9.0.2 >> >> How to find what's wrong happed? thx. > >