Thanks Gabriel I'll give that a try now. I was actually planning on making that change once I realized that my current strategy was forcing me to materialize data early on.
On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <[email protected]>wrote: > On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <[email protected]> wrote: > > No luck. I get the same error even when using a single reducer. I'm > > attaching the job configuration as shown in the web ui. > > > > When I look at the job tracker for the job, it has no map tasks. Is that > > expected? I've never heard of a reduce only job. > > > > Nope, a job with no map tasks doesn't sound right to me. I noticed > that you're doing a effectively doing a materialize at [1], and then > using a BloomFilterJoinStrategy. While this should work fine, I'm > thinking that it could also potentially lead to some issues such as > the one you're having (i.e. a job with no map tasks). > > Could you try using the default join strategy there to see what > happens. I'm thinking that the AvroPathPerKeyTarget issue could just a > consequence of something else going wrong earlier on. > > 1. > https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156 > > > > > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <[email protected]> wrote: > >> > >> This is my first time on a cluster I'll try what Josh suggests now. > >> > >> J > >> > >> > >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <[email protected]> > wrote: > >>> > >>> > >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <[email protected]> > >>> wrote: > >>>> > >>>> Hi Jeremy, > >>>> > >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <[email protected]> wrote: > >>>> > Hi > >>>> > > >>>> > I'm hitting the exception pasted below when using > >>>> > AvroPathPerKeyTarget. > >>>> > Interestingly, my code works just fine when I run on a small dataset > >>>> > using > >>>> > the LocalJobTracker. However, when I run on a large dataset using a > >>>> > hadoop > >>>> > cluster I hit the exception. > >>>> > > >>>> > >>>> Have you ever been able to successfully use the AvroPathPerKeyTarget > >>>> on a real cluster, or is this the first try with it? > >>>> > >>>> I'm wondering if this could be a problem that's always been around (as > >>>> the integration test for AvroPathPerKeyTarget also runs in the local > >>>> jobtracker), or if this could be something new. > >>> > >>> > >>> +1-- Jeremy, if you force the job to run w/a single reducer on the > >>> cluster (i.e., via groupByKey(1)), does it work? > >>> > >>>> > >>>> > >>>> - Gabriel > >>> > >>> > >> > > >
