Re: Hive, Tez, clustering, buckets, and Presto

2018-04-03 Thread Edward Capriolo
True. The spec does not mandate the bucket files have to be there if they are empty. (missing directories are 0 row tables). Thanks, Edward On Tue, Apr 3, 2018 at 4:42 PM, Richard A. Bross wrote: > Gopal, > > The Presto devs say they are willing to make the changes to

Re: Hive, Tez, clustering, buckets, and Presto

2018-04-03 Thread Richard A. Bross
Gopal, The Presto devs say they are willing to make the changes to adhere to the Hive bucket spec. I quoted "Presto could fix their fail-safe for bucketing implementation to actually trust the Hive bucketing spec & get you out of this mess - the bucketing contract for Hive is actual file

Re: Hive, Tez, clustering, buckets, and Presto

2018-04-03 Thread Richard A. Bross
Gopal, Thanks for this. Great information and something to look at more closely to better understand the internals. Rick - Original Message - From: "Gopal Vijayaraghavan" To: user@hive.apache.org Sent: Tuesday, April 3, 2018 3:15:46 AM Subject: Re: Hive, Tez,

Re: Hive, Tez, clustering, buckets, and Presto

2018-04-03 Thread Gopal Vijayaraghavan
>* I'm interested in your statement that CLUSTERED BY does not CLUSTER BY. > My understanding was that this was related to the number of buckets, but you > are relating it to ORC stripes. It is odd that no examples that I've seen > include the SORTED BY statement other than in relation to