On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <[email protected]> wrote: > Unfortunately that didn't work. I still have a reduce only job. > > Here's a link to the console output in case that's helpful: > https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing > > > I'm currently ungrouping my records before writing them (an earlier attempt > to fix this issue). I'm trying without the ungroup now.
Looking at the console output, I noticed that the second and third jobs are logging "Total input paths to process : 0", which makes me think that the first job being run doesn't have any output. Could you check the job counters there to see if it is indeed outputting anything? And was your local job running on the same data? The fact that there are no inputs would explain the reduce-only job, and I'm guessing/hoping that will be the reason the AvroPathPerKeyTarget is breaking. - Gabriel > > J > > > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <[email protected]> wrote: >> >> Unfortunately that didn't work. I still have a reduce only job. I'm >> attaching the console output from when I run my job in case thats helpful. >> I'm currently ungrouping my records before writing them (an earlier >> attempt to fix this). I'm try undoing that. >> >> J >> >> >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <[email protected]> wrote: >>> >>> Thanks Gabriel I'll give that a try now. I was actually planning on >>> making that change once I realized that my current strategy was forcing me >>> to materialize data early on. >>> >>> >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <[email protected]> >>> wrote: >>>> >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <[email protected]> wrote: >>>> > No luck. I get the same error even when using a single reducer. I'm >>>> > attaching the job configuration as shown in the web ui. >>>> > >>>> > When I look at the job tracker for the job, it has no map tasks. Is >>>> > that >>>> > expected? I've never heard of a reduce only job. >>>> > >>>> >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed >>>> that you're doing a effectively doing a materialize at [1], and then >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm >>>> thinking that it could also potentially lead to some issues such as >>>> the one you're having (i.e. a job with no map tasks). >>>> >>>> Could you try using the default join strategy there to see what >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a >>>> consequence of something else going wrong earlier on. >>>> >>>> 1. >>>> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156 >>>> >>>> > >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <[email protected]> wrote: >>>> >> >>>> >> This is my first time on a cluster I'll try what Josh suggests now. >>>> >> >>>> >> J >>>> >> >>>> >> >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <[email protected]> >>>> >> wrote: >>>> >>> >>>> >>> >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid >>>> >>> <[email protected]> >>>> >>> wrote: >>>> >>>> >>>> >>>> Hi Jeremy, >>>> >>>> >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <[email protected]> >>>> >>>> wrote: >>>> >>>> > Hi >>>> >>>> > >>>> >>>> > I'm hitting the exception pasted below when using >>>> >>>> > AvroPathPerKeyTarget. >>>> >>>> > Interestingly, my code works just fine when I run on a small >>>> >>>> > dataset >>>> >>>> > using >>>> >>>> > the LocalJobTracker. However, when I run on a large dataset using >>>> >>>> > a >>>> >>>> > hadoop >>>> >>>> > cluster I hit the exception. >>>> >>>> > >>>> >>>> >>>> >>>> Have you ever been able to successfully use the >>>> >>>> AvroPathPerKeyTarget >>>> >>>> on a real cluster, or is this the first try with it? >>>> >>>> >>>> >>>> I'm wondering if this could be a problem that's always been around >>>> >>>> (as >>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the >>>> >>>> local >>>> >>>> jobtracker), or if this could be something new. >>>> >>> >>>> >>> >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the >>>> >>> cluster (i.e., via groupByKey(1)), does it work? >>>> >>> >>>> >>>> >>>> >>>> >>>> >>>> - Gabriel >>>> >>> >>>> >>> >>>> >> >>>> > >>> >>> >> >
