Re: non map-reduce for simple queries

2012-07-31 Thread Owen O'Malley
On Mon, Jul 30, 2012 at 9:12 PM, Namit Jain nj...@fb.com wrote: The total number of bytes of the input will be used to determine whether to not launch a map-reduce job for this query. That was in my original mail. However, given any complex where condition and the lack of column statistics

Re: non map-reduce for simple queries

2012-07-31 Thread Namit Jain
On 7/31/12 12:01 PM, Owen O'Malley omal...@apache.org wrote: On Mon, Jul 30, 2012 at 9:12 PM, Namit Jain nj...@fb.com wrote: The total number of bytes of the input will be used to determine whether to not launch a map-reduce job for this query. That was in my original mail. However,

Re: non map-reduce for simple queries

2012-07-31 Thread Owen O'Malley
On Mon, Jul 30, 2012 at 11:38 PM, Namit Jain nj...@fb.com wrote: That would be difficult. The % done can be estimated from the data already read. I'm confused. Wouldn't the maximum size of the data remaining over the maximum size of the original query give a reasonable approximation of the

Re: non map-reduce for simple queries

2012-07-31 Thread Namit Jain
On 7/31/12 9:23 PM, Owen O'Malley omal...@apache.org wrote: On Mon, Jul 30, 2012 at 11:38 PM, Namit Jain nj...@fb.com wrote: That would be difficult. The % done can be estimated from the data already read. I'm confused. Wouldn't the maximum size of the data remaining over the maximum size

Re: non map-reduce for simple queries

2012-07-30 Thread Owen O'Malley
On Sat, Jul 28, 2012 at 6:17 PM, Navis류승우 navis@nexr.com wrote: I was thinking of timeout for fetching, 2000msec for example. How about that? Instead of time, which requires launching the query and letting it timeout, how about determining the number of bytes that would need to be fetched

Re: non map-reduce for simple queries

2012-07-30 Thread Navis류승우
It supports table sampling also. select * from src TABLESAMPLE (BUCKET 1 OUT OF 40 ON key); select * from src TABLESAMPLE (0.25 PERCENT); But there is no sampling option specifying number of bytes. This can be done in another issue. 2012/7/31 Owen O'Malley omal...@apache.org On Sat, Jul 28,

Re: non map-reduce for simple queries

2012-07-30 Thread Namit Jain
The total number of bytes of the input will be used to determine whether to not launch a map-reduce job for this query. That was in my original mail. However, given any complex where condition and the lack of column statistics in hive, we cannot determine the number of bytes that would be needed

Re: non map-reduce for simple queries

2012-07-29 Thread Namit Jain
I like Navis's idea. The timeout can be configurable. On 7/29/12 6:47 AM, Navis류승우 navis@nexr.com wrote: I was thinking of timeout for fetching, 2000msec for example. How about that? 2012년 7월 29일 일요일에 Edward Caprioloedlinuxg...@gmail.com님이 작성: If where condition is too complex , selecting

Re: non map-reduce for simple queries

2012-07-29 Thread Namit Jain
This can be a follow-up to HIVE-2925. Navis, if you want, I can work on it. On 7/29/12 7:58 PM, Namit Jain nj...@fb.com wrote: I like Navis's idea. The timeout can be configurable. On 7/29/12 6:47 AM, Navis류승우 navis@nexr.com wrote: I was thinking of timeout for fetching, 2000msec for

non map-reduce for simple queries

2012-07-28 Thread Namit Jain
Currently, hive does not launch map-reduce jobs for the following queries: select * from T where condition on partition columns (limit n)? This behavior is not configurable, and cannot be altered. HIVE-2925 wants to extend this behavior. The goal is not to spawn map-reduce jobs for the

Re: non map-reduce for simple queries

2012-07-28 Thread Edward Capriolo
If where condition is too complex , selecting specific columns seems simple enough and useful. On Saturday, July 28, 2012, Namit Jain nj...@fb.com wrote: Currently, hive does not launch map-reduce jobs for the following queries: select * from T where condition on partition columns (limit n)?

Re: non map-reduce for simple queries

2012-07-28 Thread Navis류승우
I was thinking of timeout for fetching, 2000msec for example. How about that? 2012년 7월 29일 일요일에 Edward Caprioloedlinuxg...@gmail.com님이 작성: If where condition is too complex , selecting specific columns seems simple enough and useful. On Saturday, July 28, 2012, Namit Jain nj...@fb.com wrote: