Thanks for your help.

so do you think if we want the same result from Hive and Spark or the other
freamwork, how could we try this one ?
could you tell me in detail.

Regards,
Philip

On Thu, Oct 22, 2015 at 6:25 PM, Gopal Vijayaraghavan <[email protected]
> wrote:

>
> > When applying [Distribute By] on Hive to the framework, the function
> >should be partitionByHash on Flink. This is to spread out all the rows
> >distributed by a hash key from Object Class in Java.
>
> Hive does not use the Object hashCode - the identityHashCode is
> inconsistent, so Object.hashCode() .
>
> ObjectInspectorUtils::hashCode() is the hashcode used by the DBY in hive
> (SORT BY uses a Random number generator).
>
> Cheers,
> Gopal
>
>


-- 

==========================================================

*Hae Joon Lee*


Now, in Germany,

M.S. Candidate, Interested in Distributed System, Iterative Processing

Dept. of Computer Science, Informatik in German, TUB

Technical University of Berlin


In Korea,

M.S. Candidate, Computer Architecture Laboratory

Dept. of Computer Science, KAIST


Rm# 4414 CS Dept. KAIST

373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701)


Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea

==========================================================

Reply via email to