I was looking at the baseball MR example on the blog. http://basho.com/blog/technical/2011/01/20/Baseball-Batting-Averages-Riak-Map-Reduce/
One thing I was wondering was how the file split mechanism is aware of record lengths. It doesn't look like the author is using any particular split function to identify record boundaries and make a clean cut. So you likely have records that span the 1 MB boundary and are corrupted for the map job. Perhaps this is the flaw the author hints at? If so, what's the proper way in Riak MR to specify a split function to be sure proper boundaries are applied to Luwak files? -Nate _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
