Thanks for the tip. I'll look into it and try to figure it out.
On Fri, Mar 28, 2014 at 11:11 AM, Gabriel Reid <[email protected]>wrote: > On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <[email protected]> wrote: > > Unfortunately that didn't work. I still have a reduce only job. > > > > Here's a link to the console output in case that's helpful: > > > https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing > > > > > > I'm currently ungrouping my records before writing them (an earlier > attempt > > to fix this issue). I'm trying without the ungroup now. > > Looking at the console output, I noticed that the second and third > jobs are logging "Total input paths to process : 0", which makes me > think that the first job being run doesn't have any output. Could you > check the job counters there to see if it is indeed outputting > anything? And was your local job running on the same data? > > The fact that there are no inputs would explain the reduce-only job, > and I'm guessing/hoping that will be the reason the > AvroPathPerKeyTarget is breaking. > > - Gabriel > > > > > > J > > > > > > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <[email protected]> wrote: > >> > >> Unfortunately that didn't work. I still have a reduce only job. I'm > >> attaching the console output from when I run my job in case thats > helpful. > >> I'm currently ungrouping my records before writing them (an earlier > >> attempt to fix this). I'm try undoing that. > >> > >> J > >> > >> > >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <[email protected]> wrote: > >>> > >>> Thanks Gabriel I'll give that a try now. I was actually planning on > >>> making that change once I realized that my current strategy was > forcing me > >>> to materialize data early on. > >>> > >>> > >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <[email protected]> > >>> wrote: > >>>> > >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <[email protected]> wrote: > >>>> > No luck. I get the same error even when using a single reducer. I'm > >>>> > attaching the job configuration as shown in the web ui. > >>>> > > >>>> > When I look at the job tracker for the job, it has no map tasks. Is > >>>> > that > >>>> > expected? I've never heard of a reduce only job. > >>>> > > >>>> > >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed > >>>> that you're doing a effectively doing a materialize at [1], and then > >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm > >>>> thinking that it could also potentially lead to some issues such as > >>>> the one you're having (i.e. a job with no map tasks). > >>>> > >>>> Could you try using the default join strategy there to see what > >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a > >>>> consequence of something else going wrong earlier on. > >>>> > >>>> 1. > >>>> > https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156 > >>>> > >>>> > > >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <[email protected]> > wrote: > >>>> >> > >>>> >> This is my first time on a cluster I'll try what Josh suggests > now. > >>>> >> > >>>> >> J > >>>> >> > >>>> >> > >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <[email protected]> > >>>> >> wrote: > >>>> >>> > >>>> >>> > >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid > >>>> >>> <[email protected]> > >>>> >>> wrote: > >>>> >>>> > >>>> >>>> Hi Jeremy, > >>>> >>>> > >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <[email protected]> > >>>> >>>> wrote: > >>>> >>>> > Hi > >>>> >>>> > > >>>> >>>> > I'm hitting the exception pasted below when using > >>>> >>>> > AvroPathPerKeyTarget. > >>>> >>>> > Interestingly, my code works just fine when I run on a small > >>>> >>>> > dataset > >>>> >>>> > using > >>>> >>>> > the LocalJobTracker. However, when I run on a large dataset > using > >>>> >>>> > a > >>>> >>>> > hadoop > >>>> >>>> > cluster I hit the exception. > >>>> >>>> > > >>>> >>>> > >>>> >>>> Have you ever been able to successfully use the > >>>> >>>> AvroPathPerKeyTarget > >>>> >>>> on a real cluster, or is this the first try with it? > >>>> >>>> > >>>> >>>> I'm wondering if this could be a problem that's always been > around > >>>> >>>> (as > >>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the > >>>> >>>> local > >>>> >>>> jobtracker), or if this could be something new. > >>>> >>> > >>>> >>> > >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the > >>>> >>> cluster (i.e., via groupByKey(1)), does it work? > >>>> >>> > >>>> >>>> > >>>> >>>> > >>>> >>>> - Gabriel > >>>> >>> > >>>> >>> > >>>> >> > >>>> > > >>> > >>> > >> > > >
