Thanks for your help. so do you think if we want the same result from Hive and Spark or the other freamwork, how could we try this one ? could you tell me in detail.
Regards, Philip On Thu, Oct 22, 2015 at 6:25 PM, Gopal Vijayaraghavan <[email protected] > wrote: > > > When applying [Distribute By] on Hive to the framework, the function > >should be partitionByHash on Flink. This is to spread out all the rows > >distributed by a hash key from Object Class in Java. > > Hive does not use the Object hashCode - the identityHashCode is > inconsistent, so Object.hashCode() . > > ObjectInspectorUtils::hashCode() is the hashcode used by the DBY in hive > (SORT BY uses a Random number generator). > > Cheers, > Gopal > > -- ========================================================== *Hae Joon Lee* Now, in Germany, M.S. Candidate, Interested in Distributed System, Iterative Processing Dept. of Computer Science, Informatik in German, TUB Technical University of Berlin In Korea, M.S. Candidate, Computer Architecture Laboratory Dept. of Computer Science, KAIST Rm# 4414 CS Dept. KAIST 373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701) Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea ==========================================================
