Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-14 Thread Slava Markeyev
This is something I've encountered when doing ETL with hive and having it create 10's of thousands partitions. The issue is each partition needs to be added to the metastore and this is an expensive operation to perform. My work around was adding a flag to hive that optionally disables the

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-14 Thread Edward Capriolo
That is too many partitions. Way to much overhead in anything that has that many partitions. On Tue, Apr 14, 2015 at 12:53 PM, Tianqi Tong tt...@brightedge.com wrote: Hi Slava and Ferdinand, Thanks for the reply! Later when I was looking at the hive.log, I found Hive was indeed calculating

Re: partition and bucket

2015-04-14 Thread Ashok Kumar
Thank you sir. Much appreciated On Sunday, 12 April 2015, 21:05, Mich Talebzadeh m...@peridale.co.uk wrote: #yiv0994893552 #yiv0994893552 -- _filtered #yiv0994893552 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv0994893552 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}

RE: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Xu, Cheng A
Congrats Mithun! From: Gunther Hagleitner [mailto:ghagleit...@hortonworks.com] Sent: Wednesday, April 15, 2015 8:10 AM To: d...@hive.apache.org; Chris Drome; user@hive.apache.org Cc: mit...@apache.org Subject: Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan Congrats Mithun! Thanks,

Re: partition and bucket

2015-04-14 Thread Devopam Mittra
+1 quite well explained. liked it much regards Dev On Mon, Apr 13, 2015 at 1:34 AM, Mich Talebzadeh m...@peridale.co.uk wrote: Hi, I will try to have a go at your points but I am sure there are many experts around. As you may know already in RDBMS partitioning (dividing a very large

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Lefty Leverenz
Congrats Mithun -- when they gave me the cape, they called it a cloak of invisibility. But the only thing it makes invisible is itself. Maybe I should open a jira -- Lefty On Tue, Apr 14, 2015 at 9:03 PM, Xu, Cheng A cheng.a...@intel.com wrote: Congrats Mithun! *From:* Gunther

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Mithun RK
Thank you, chaps. :] One is happy to contribute. This is an honour, and more than a little daunting. Many thanks, Mithun P.S. Where do I pick up my cape? I was told there were capes... On Tue, Apr 14, 2015 at 5:10 PM Gunther Hagleitner ghagleit...@hortonworks.com wrote: Congrats Mithun!

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Jimmy Xiang
Congrats! On Tue, Apr 14, 2015 at 8:46 PM, Lefty Leverenz leftylever...@gmail.com wrote: Congrats Mithun -- when they gave me the cape, they called it a cloak of invisibility. But the only thing it makes invisible is itself. Maybe I should open a jira -- Lefty On Tue, Apr 14, 2015

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Gunther Hagleitner
Congrats Mithun! Thanks, Gunther. From: Chao Sun c...@cloudera.com Sent: Tuesday, April 14, 2015 3:48 PM To: d...@hive.apache.org; Chris Drome Cc: user@hive.apache.org; mit...@apache.org Subject: Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Prasanth Jayachandran
Congrats Mithun! Thanks Prasanth On Tue, Apr 14, 2015 at 8:51 PM -0700, Jimmy Xiang jxi...@cloudera.commailto:jxi...@cloudera.com wrote: Congrats! On Tue, Apr 14, 2015 at 8:46 PM, Lefty Leverenz leftylever...@gmail.commailto:leftylever...@gmail.com wrote: Congrats Mithun -- when they gave

Re: External Table with unclosed orc files.

2015-04-14 Thread Chad Dotzenrod
unsubscribe On Tue, Apr 14, 2015 at 4:28 PM, Gopal Vijayaraghavan gop...@apache.org wrote: 0.14 . Acid tables have been a real pain for us. We don¹t believe they are production ready. At least in our use cases, Tez crashes for assorted reasons or only assigns 1 mapper to the partition.

Re: External Table with unclosed orc files.

2015-04-14 Thread Grant Overby (groverby)
The remainder of my ranting paragraph is intended as an expansion on that comment. Sorry, I wasn’t clear. Grant Overby Software Engineer Cisco.com http://www.cisco.com/ grove...@cisco.com Mobile: 865 724 4910 Think before you print.This email may contain confidential and privileged material

[ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Carl Steinbach
The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on the Apache Hive Project. Please join me in congratulating Mithun. Thanks. - Carl

RE: External Table with unclosed orc files.

2015-04-14 Thread Mich Talebzadeh
Hi, I believe in the same way as UNIX file/partitions behave. If the file is opened by the first process writing to it, a swap file will be created. If the second process is querying it only, then it will see the data at the time of last save by the first process but not the changes after

Default schema

2015-04-14 Thread Maciek
Is it possible to customize the schema user logs on to? I was thinking of setting some bash environment variable or setting param file (like hive-env.sh, hiverc or hive-site.xml…)?

Re: External Table with unclosed orc files.

2015-04-14 Thread Alan Gates
It will fail. Orc writes info in the footers that are required to properly read the file. If close hasn't been called, then that footer hasn't been written yet. Alan. Grant Overby (groverby) mailto:grove...@cisco.com April 14, 2015 at 20:46 What will Hive do if querying an external table

Re: External Table with unclosed orc files.

2015-04-14 Thread Gopal Vijayaraghavan
What will Hive do if querying an external table containing orc files that are still being written to? Doing that directly won¹t work at all. Because ORC files are only readable after the Footer is written out, which won¹t be for any open files. I won¹t be able to test these scenarios till

Re: External Table with unclosed orc files.

2015-04-14 Thread Grant Overby (groverby)
IIRC the HW Trucking Demo creates a temporary table from csv files of the new data then issues a select … insert into an orc table. For the love of google, I can’t find this demo atm, and I’m out of time. If I recall correctly, this strikes me as suboptimal compared to writing orc files

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Chris Drome
Congratulations Mithun! On Tuesday, April 14, 2015 2:57 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on the Apache Hive Project.  Please join me in congratulating Mithun. Thanks. - Carl

Re: Default schema

2015-04-14 Thread Bala Krishna Gangisetty
Yes, certainly. There are couple of ways to do this. One such way is to define an alias for hive --database *custom_database* --Bala G. On Tue, Apr 14, 2015 at 1:30 PM, Maciek mac...@sonra.io wrote: Is it possible to customize the schema user logs on to? I was thinking of setting some bash

Re: External Table with unclosed orc files.

2015-04-14 Thread Grant Overby (groverby)
Thanks for the link to the hive streaming bolt. We rolled our own bolt many moons ago to utilize hive streaming. We’ve tried it against 0.13 and 0.14 . Acid tables have been a real pain for us. We don’t believe they are production ready. At least in our use cases, Tez crashes for assorted reasons

RE: External Table with unclosed orc files.

2015-04-14 Thread Mich Talebzadeh
Hi Grant, Thanks for insight. You mentioned and I quote Acid tables have been a real pain for us. We don’t believe they are production ready.. Can you please elaborate on this/ Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Author of the books A Practitioner’s Guide to

Re: Default schema

2015-04-14 Thread Maciek
Thought about that but not sure if it's the most suitable one Would you mind sharing those other ways? Thanks! On Tue, Apr 14, 2015 at 9:49 PM, Bala Krishna Gangisetty b...@altiscale.com wrote: Yes, certainly. There are couple of ways to do this. One such way is to define an alias for hive

Re: External Table with unclosed orc files.

2015-04-14 Thread Gopal Vijayaraghavan
0.14 . Acid tables have been a real pain for us. We don¹t believe they are production ready. At least in our use cases, Tez crashes for assorted reasons or only assigns 1 mapper to the partition. Having delta files and no base files borks mapper assignments. Some of the chicken-egg problems for

Re: Default schema

2015-04-14 Thread matshyeq
For the following I suggested: …or setting param file (like hive-env.sh, hiverc or hive-site.xml…)? I don't know what property or variable to set up? Would you provide an example excerpt? Thank you, Kind Regards ~Maciek On Tue, Apr 14, 2015 at 10:43 PM, Bala Krishna Gangisetty