Re: Exception with AvroPathPerKeyTarget

Jeremy Lewi Sat, 29 Mar 2014 16:43:23 -0700

Gabriel,

Thanks for investigating. I'm fixing the issue that's causing me to produce
empty pcollections so that should resolve the problem for me.


J


On Sat, Mar 29, 2014 at 5:13 AM, Gabriel Reid <[email protected]>wrote:

> Hi Jeremy,
>
> I just took some time to dig into this a bit deeper. It turns out that
> it is indeed an issue with handling an empty output PCollection in the
> AvroPathPerKeyTarget -- I've logged
> https://issues.apache.org/jira/browse/CRUNCH-371 to resolve it.
>
> The reason it was working on the local job tracker is a difference in
> the implementation of LocalFileSystem and DistributedFileSystem in
> hadoop-1. The good/bad news is that the current code will consistently
> crash with with a consistent exception on hadoop-2 with both the local
> file system and HDFS. The short-term solution (other than patching
> your Crunch build with the patch in CRUNCH-371) would be just just
> ensure that the PCollection being output isn't empty.
>
> - Gabriel
>
>
> On Sat, Mar 29, 2014 at 2:27 AM, Jeremy Lewi <[email protected]> wrote:
> > Thanks for the tip. I'll look into it and try to figure it out.
> >
> >
> > On Fri, Mar 28, 2014 at 11:11 AM, Gabriel Reid <[email protected]>
> > wrote:
> >>
> >> On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <[email protected]> wrote:
> >> > Unfortunately that didn't work. I still have a reduce only job.
> >> >
> >> > Here's a link to the console output in case that's helpful:
> >> >
> >> >
> https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
> >> >
> >> >
> >> > I'm currently ungrouping my records before writing them (an earlier
> >> > attempt
> >> > to fix this issue). I'm trying without the ungroup now.
> >>
> >> Looking at the console output, I noticed that the second and third
> >> jobs are logging "Total input paths to process : 0", which makes me
> >> think that the first job being run doesn't have any output. Could you
> >> check the job counters there to see if it is indeed outputting
> >> anything? And was your local job running on the same data?
> >>
> >> The fact that there are no inputs would explain the reduce-only job,
> >> and I'm guessing/hoping that will be the reason the
> >> AvroPathPerKeyTarget is breaking.
> >>
> >> - Gabriel
> >>
> >>
> >> >
> >> > J
> >> >
> >> >
> >> > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <[email protected]> wrote:
> >> >>
> >> >> Unfortunately that didn't work. I still have a reduce only job. I'm
> >> >> attaching the console output from when I run my job in case thats
> >> >> helpful.
> >> >> I'm currently ungrouping my records before writing them (an earlier
> >> >> attempt to fix this). I'm try undoing that.
> >> >>
> >> >> J
> >> >>
> >> >>
> >> >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <[email protected]> wrote:
> >> >>>
> >> >>> Thanks Gabriel I'll give that a try now. I was actually planning on
> >> >>> making that change once I realized that my current strategy was
> >> >>> forcing me
> >> >>> to materialize data early on.
> >> >>>
> >> >>>
> >> >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <
> [email protected]>
> >> >>> wrote:
> >> >>>>
> >> >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <[email protected]>
> wrote:
> >> >>>> > No luck. I get the same error even when using a single reducer.
> I'm
> >> >>>> > attaching the job configuration as shown in the web ui.
> >> >>>> >
> >> >>>> > When I look at the job tracker for the job, it has no map tasks.
> Is
> >> >>>> > that
> >> >>>> > expected? I've never heard of a reduce only job.
> >> >>>> >
> >> >>>>
> >> >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
> >> >>>> that you're doing a effectively doing a materialize at [1], and
> then
> >> >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
> >> >>>> thinking that it could also potentially lead to some issues such as
> >> >>>> the one you're having (i.e. a job with no map tasks).
> >> >>>>
> >> >>>> Could you try using the default join strategy there to see what
> >> >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could
> just
> >> >>>> a
> >> >>>> consequence of something else going wrong earlier on.
> >> >>>>
> >> >>>> 1.
> >> >>>>
> >> >>>>
> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
> >> >>>>
> >> >>>> >
> >> >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <[email protected]>
> >> >>>> > wrote:
> >> >>>> >>
> >> >>>> >> This is my first time on a  cluster I'll try what Josh suggests
> >> >>>> >> now.
> >> >>>> >>
> >> >>>> >> J
> >> >>>> >>
> >> >>>> >>
> >> >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <
> [email protected]>
> >> >>>> >> wrote:
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
> >> >>>> >>> <[email protected]>
> >> >>>> >>> wrote:
> >> >>>> >>>>
> >> >>>> >>>> Hi Jeremy,
> >> >>>> >>>>
> >> >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <[email protected]>
> >> >>>> >>>> wrote:
> >> >>>> >>>> > Hi
> >> >>>> >>>> >
> >> >>>> >>>> > I'm hitting the exception pasted below when using
> >> >>>> >>>> > AvroPathPerKeyTarget.
> >> >>>> >>>> > Interestingly, my code works just fine when I run on a small
> >> >>>> >>>> > dataset
> >> >>>> >>>> > using
> >> >>>> >>>> > the LocalJobTracker. However, when I run on a large dataset
> >> >>>> >>>> > using
> >> >>>> >>>> > a
> >> >>>> >>>> > hadoop
> >> >>>> >>>> > cluster I hit the exception.
> >> >>>> >>>> >
> >> >>>> >>>>
> >> >>>> >>>> Have you ever been able to successfully use the
> >> >>>> >>>> AvroPathPerKeyTarget
> >> >>>> >>>> on a real cluster, or is this the first try with it?
> >> >>>> >>>>
> >> >>>> >>>> I'm wondering if this could be a problem that's always been
> >> >>>> >>>> around
> >> >>>> >>>> (as
> >> >>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the
> >> >>>> >>>> local
> >> >>>> >>>> jobtracker), or if this could be something new.
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on
> >> >>>> >>> the
> >> >>>> >>> cluster (i.e., via groupByKey(1)), does it work?
> >> >>>> >>>
> >> >>>> >>>>
> >> >>>> >>>>
> >> >>>> >>>> - Gabriel
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>
> >> >>>> >
> >> >>>
> >> >>>
> >> >>
> >> >
> >
> >
>

Re: Exception with AvroPathPerKeyTarget

Reply via email to