I was looking at the baseball MR example on the blog.

http://basho.com/blog/technical/2011/01/20/Baseball-Batting-Averages-Riak-Map-Reduce/

One thing I was wondering was how the file split mechanism is aware of record 
lengths. It doesn't look like the author is using any particular split function 
to identify record boundaries and make a clean cut. So you likely have records 
that span the 1 MB boundary and are corrupted for the map job.

Perhaps this is the flaw the author hints at? If so, what's the proper way in 
Riak MR to specify a split function to be sure proper boundaries are applied to 
Luwak files?

-Nate


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to