Hey John, I posted a patch here: https://issues.apache.org/jira/browse/CRUNCH-209
I created it against master, as I don't think there have been any changes to the MR execution stuff in 0.6.0 we need to worry about, but if you can't apply it, let me know and I'll find a way to backport it. I'm 50-50 on whether this will fix the issue, so please let me know if this doesn't do the trick. J On Wed, May 22, 2013 at 4:42 PM, John Jensen <[email protected]>wrote: > > Certainly. Appreciate it. > > ------------------------------ > *From:* Josh Wills [[email protected]] > *Sent:* Wednesday, May 22, 2013 4:38 PM > *To:* [email protected] > *Subject:* Re: Problem running job with large number of directories > > Hey John, > > I haven't hit that one before, but I have some hypothesis we could test > if you're up for some trying out some patches I write. > > J > > > On Wed, May 22, 2013 at 4:01 PM, John Jensen <[email protected]>wrote: > >> >> I have a curious problem when running a crunch job on (avro) files in a >> fairly large set of directories (just slightly less than 100). >> After running some fraction of the mappers they start failing with the >> exception below. Things work fine with a smaller number of directories. >> >> The magic >> 'zdHJpbmcifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOiJzdHJpbmcifV19fSwiZGVmYXVsdCI' >> string shows up in the 'crunch.inputs.dir' entry in the job config, so I >> assume it has something to do with deserializing that value, but reading >> through the code I don't see any obvious way how. >> >> Furthermore, the crunch.inputs.dir config entry is just under 1.5M, so >> it would not surprise me if I'm running up against a hadoop limit somewhere. >> >> Has anybody else seen similar issues? (this is 0.5.0, btw). >> >> -- John >> >> java.io.IOException: Split class zdHJp >> bmcifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOiJzdHJpbmcifV19fSwiZGVmYXVsdCI not found >> at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:342) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:614) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> Caused by: java.lang.ClassNotFoundException: Class zdHJp >> bmcifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOiJzdHJpbmcifV19fSwiZGVmYXVsdCI not found >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) >> at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:340) >> ... 7 more >> >> > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
