On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <[email protected]> wrote: > No luck. I get the same error even when using a single reducer. I'm > attaching the job configuration as shown in the web ui. > > When I look at the job tracker for the job, it has no map tasks. Is that > expected? I've never heard of a reduce only job. >
Nope, a job with no map tasks doesn't sound right to me. I noticed that you're doing a effectively doing a materialize at [1], and then using a BloomFilterJoinStrategy. While this should work fine, I'm thinking that it could also potentially lead to some issues such as the one you're having (i.e. a job with no map tasks). Could you try using the default join strategy there to see what happens. I'm thinking that the AvroPathPerKeyTarget issue could just a consequence of something else going wrong earlier on. 1. https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156 > > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <[email protected]> wrote: >> >> This is my first time on a cluster I'll try what Josh suggests now. >> >> J >> >> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <[email protected]> wrote: >>> >>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <[email protected]> >>> wrote: >>>> >>>> Hi Jeremy, >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <[email protected]> wrote: >>>> > Hi >>>> > >>>> > I'm hitting the exception pasted below when using >>>> > AvroPathPerKeyTarget. >>>> > Interestingly, my code works just fine when I run on a small dataset >>>> > using >>>> > the LocalJobTracker. However, when I run on a large dataset using a >>>> > hadoop >>>> > cluster I hit the exception. >>>> > >>>> >>>> Have you ever been able to successfully use the AvroPathPerKeyTarget >>>> on a real cluster, or is this the first try with it? >>>> >>>> I'm wondering if this could be a problem that's always been around (as >>>> the integration test for AvroPathPerKeyTarget also runs in the local >>>> jobtracker), or if this could be something new. >>> >>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the >>> cluster (i.e., via groupByKey(1)), does it work? >>> >>>> >>>> >>>> - Gabriel >>> >>> >> >
