Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-27 Thread Philip Lee
Hello, the same question about DISTRIBUTE BY on Hive. Accorring to you, you do not use hashCode of Object class on DBY, Distribute By. I tried to understand how ObjectInspectorUtils works for distribution, but it seemed it has a lot of Hive API. It is not much understnading. I want to override

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-27 Thread Gopal Vijayaraghavan
> I want to override partitionByHash function on Flink like the same way >of DBY on Hive. > I am working on implementing some benchmark system for these two system, >which could be contritbutino to Hive as well. I would be very disappointed if Flink fails to outperform Hive with a Distribute BY,

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-24 Thread Philip Lee
Hello, the same question about DISTRIBUTE BY on Hive. Accorring to you, you do not use hashCode of Object class on DBY, Distribute By. I tried to understand how ObjectInspectorUtils works for distribution, but it seemed it has a lot of Hive API. It is not much understnading. I want to override

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Gopal Vijayaraghavan
> so do you think if we want the same result from Hive and Spark or the >other freamwork, how could we try this one ? There's a special backwards compat slow codepath that gets triggered if you do set mapred.reduce.tasks=199; (or any number) This will produce the exact same hash-code as the

Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Philip Lee
Hello, I am working on Flink and Spark majoring in Computer Science in Berlin. I have the important question. Well, this question is from what I do these days, which is translations Hive Query to Flink. When applying [Distribute By] on Hive to the framework, the function should be

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Philip Lee
Thanks for your help. so do you think if we want the same result from Hive and Spark or the other freamwork, how could we try this one ? could you tell me in detail. Regards, Philip On Thu, Oct 22, 2015 at 6:25 PM, Gopal Vijayaraghavan wrote: > > > When applying

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Gopal Vijayaraghavan
> When applying [Distribute By] on Hive to the framework, the function >should be partitionByHash on Flink. This is to spread out all the rows >distributed by a hash key from Object Class in Java. Hive does not use the Object hashCode - the identityHashCode is inconsistent, so Object.hashCode()