but do you think that i can change the default behavior ? for exemple i have ten nodes in my cluster and my table is stored only on two nodes this table have 1000 rows. with the default behavior only two nodes will work for a map/reduce task., isn't it ?
if i do a custom input that split the table by 100 rows, can i distribute manually each part on a node regardless where the data is ? Le 5 avril 2012 00:36, Doug Meil <[email protected]> a écrit : > > The default behavior is that the input splits are where the data is stored. > > > > > On 4/4/12 5:24 PM, "sdnetwork" <[email protected]> wrote: > >>ok thanks, >> >>but i don't find the information that tell me how the result of the split >>is >>distrubuted across the different node of the cluster ? >> >>1) randomely ? >>2) where the data is stored ? >> >> >> >> >> > >
