Re: Hive huge 'startup time'

2014-07-18 Thread diogo
are, that is what you > should try to tune first. > > > > > On Fri, Jul 18, 2014 at 9:36 AM, diogo wrote: > >> This is probably a simple question, but I'm noticing that for queries >> that run on 1+TB of data, it can take Hive up to 30 minutes to actually >&

Hive huge 'startup time'

2014-07-18 Thread diogo
This is probably a simple question, but I'm noticing that for queries that run on 1+TB of data, it can take Hive up to 30 minutes to actually start the first map-reduce stage. What is it doing? I imagine it's gathering information about the data somehow, this 'startup' time is clearly a function of

Multiple joins cause failures in Reduce phase

2014-07-10 Thread diogo
So, I have a query like this: select user.id ud_name.value as name ud_age.value as age from user left outer join user_data ud_name on user.id = ud_name.user_id and ud_name.key = 'name' left outer join user_data ud_age on user.id = ud_age.user_id and ud_age.key = 'age' ... ; With multiple joins