Dear All,
I have a Hadoop cluster which has Hive, HBase installed along with other Hadoop
components. I am currently exploring ways to automate a data migration process
from Hive to HBase which involves new columns of data added ever so often. I
was successful in creating a HBase table using
Whether you obtain a read lock depends on the guarantees you want to
make to your readers. Obtaining the lock will do a couple of things
your uses might want:
1) It will prevent DDL statements such as DROP TABLE from removing the
data while they are reading it.
2) It will prevent the compactor
> In production we run HDP 2.2.4. Any thought when crazy stuff like bloom
>filters might move to GA?
I¹d say that it will be in the next release, considering it is already
checked into hive-trunk.
Bloom filters aren¹t too crazy today. They are written within the ORC file
right next to the row-in
> In production we run HDP 2.2.4. Any thought when crazy stuff like bloom
>filters might move to GA?
I¹d say that it will be in the next release, considering it is already
checked into hive-trunk.
Bloom filters aren¹t too crazy today. They are written within the ORC file
right next to the row-in
Hey Gopal.
Thanks for your answers. I did some followups;
On Wed, Apr 22, 2015 at 3:46 PM, Gopal Vijayaraghavan
wrote:
>
> > I have about 100 TB of data, approximately 180 billion events, in my
> >HDFS cluster. It is my raw data stored as GZIP files. At the time of
> >setup this was due to "sav
Gopal,
Here is basically my code and I can clearly see configure() was not called and
JavaCode on GenericUDF#configure reads: "This is only called in runtime of
MapRedTask.". Also based on my observation, the query is not executed as a M/R
because Yarn monitoring knows nothing about the job.
> I have about 100 TB of data, approximately 180 billion events, in my
>HDFS cluster. It is my raw data stored as GZIP files. At the time of
>setup this was due to "saving the data" until we figured out what to do
>with it.
>
> After attending @t3rmin4t0r's ORC 2015 session @hadoopsummit in Brusse
It is worth to mention it is 100TB raw size, approximately 19TB with gzip
-9 (best/slowed compression)
On Wed, Apr 22, 2015 at 2:50 PM, Kjell Tore Fossbakk
wrote:
> Hello user@hive.apache.org
>
> I have about 100 TB of data, approximately 180 billion events, in my HDFS
> cluster. It is my raw da
Hello user@hive.apache.org
I have about 100 TB of data, approximately 180 billion events, in my HDFS
cluster. It is my raw data stored as GZIP files. At the time of setup this
was due to "saving the data" until we figured out what to do with it.
After attending @t3rmin4t0r's ORC 2015 session @had
Hi,
Thanks for your reply. I will go through the link.
By the way my hive version is 0.12
Thanks,
Harsha
On Fri, Apr 17, 2015 at 4:16 AM, Lefty Leverenz
wrote:
> Harsha, that document is from 2010. What version of Hive are you using?
>
> Here's some up-to-date information in the Hive wiki: J
Hi All,
I went through below mentioned Facebook engineering page,
https://www.facebook.com/notes/facebook-engineering/join
-optimization-in-apache-hive/470667928919
I set following for auto conversion of joins,
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=1
11 matches
Mail list logo