Add keys to column family in HBase through Hive/Python

2015-04-22 Thread Manoj Venkatesh
Dear All, I have a Hadoop cluster which has Hive, HBase installed along with other Hadoop components. I am currently exploring ways to automate a data migration process from Hive to HBase which involves new columns of data added ever so often. I was successful in creating a HBase table using

Re: Transactional table read lifecycle

2015-04-22 Thread Alan Gates
Whether you obtain a read lock depends on the guarantees you want to make to your readers. Obtaining the lock will do a couple of things your uses might want: 1) It will prevent DDL statements such as DROP TABLE from removing the data while they are reading it. 2) It will prevent the compactor

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Gopal Vijayaraghavan
> In production we run HDP 2.2.4. Any thought when crazy stuff like bloom >filters might move to GA? I¹d say that it will be in the next release, considering it is already checked into hive-trunk. Bloom filters aren¹t too crazy today. They are written within the ORC file right next to the row-in

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Gopal Vijayaraghavan
> In production we run HDP 2.2.4. Any thought when crazy stuff like bloom >filters might move to GA? I¹d say that it will be in the next release, considering it is already checked into hive-trunk. Bloom filters aren¹t too crazy today. They are written within the ORC file right next to the row-in

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Kjell Tore Fossbakk
Hey Gopal. Thanks for your answers. I did some followups; On Wed, Apr 22, 2015 at 3:46 PM, Gopal Vijayaraghavan wrote: > > > I have about 100 TB of data, approximately 180 billion events, in my > >HDFS cluster. It is my raw data stored as GZIP files. At the time of > >setup this was due to "sav

RE: MapredContext not available when tez enabled

2015-04-22 Thread Frank Luo
Gopal, Here is basically my code and I can clearly see configure() was not called and JavaCode on GenericUDF#configure reads: "This is only called in runtime of MapRedTask.". Also based on my observation, the query is not executed as a M/R because Yarn monitoring knows nothing about the job.

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Gopal Vijayaraghavan
> I have about 100 TB of data, approximately 180 billion events, in my >HDFS cluster. It is my raw data stored as GZIP files. At the time of >setup this was due to "saving the data" until we figured out what to do >with it. > > After attending @t3rmin4t0r's ORC 2015 session @hadoopsummit in Brusse

Re: Parsing and moving data to ORC from HDFS

2015-04-22 Thread Kjell Tore Fossbakk
It is worth to mention it is 100TB raw size, approximately 19TB with gzip -9 (best/slowed compression) On Wed, Apr 22, 2015 at 2:50 PM, Kjell Tore Fossbakk wrote: > Hello user@hive.apache.org > > I have about 100 TB of data, approximately 180 billion events, in my HDFS > cluster. It is my raw da

Parsing and moving data to ORC from HDFS

2015-04-22 Thread Kjell Tore Fossbakk
Hello user@hive.apache.org I have about 100 TB of data, approximately 180 billion events, in my HDFS cluster. It is my raw data stored as GZIP files. At the time of setup this was due to "saving the data" until we figured out what to do with it. After attending @t3rmin4t0r's ORC 2015 session @had

Re: Question on MAPJOIN Vs JOIN performance

2015-04-22 Thread Harsha HN
Hi, Thanks for your reply. I will go through the link. By the way my hive version is 0.12 Thanks, Harsha On Fri, Apr 17, 2015 at 4:16 AM, Lefty Leverenz wrote: > Harsha, that document is from 2010. What version of Hive are you using? > > Here's some up-to-date information in the Hive wiki: J

Question on Hive Join performance

2015-04-22 Thread Harsha HN
Hi All, I went through below mentioned Facebook engineering page, https://www.facebook.com/notes/facebook-engineering/join -optimization-in-apache-hive/470667928919 I set following for auto conversion of joins, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=1