Re: Problem with fetch reduce phase

Doğacan Güney Fri, 07 Sep 2007 00:38:48 -0700

On 9/7/07, Ned Rockson <[EMAIL PROTECTED]> wrote:
> Oh great, I didn't know that was an option.  How would I go about
> running the parse by itself?


bin/nutch parse <segment>

>
> On 9/7/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
> > On 9/7/07, Ned Rockson <[EMAIL PROTECTED]> wrote:
> > > So I ran a thread dump and got what I consider to be pretty
> > > meaningless.  It doesn't seem to say I'm stuck in a regex filter,
> > > although when I printed out the urls that were being printed by the
> > > reducer, there was one that had some unprintable characters in the
> > > URL.  Also, there were a lot of URLs that were severely malformed, so
> > > I assume that could be a problem that I'm going to look into.  The
> > > last URL that was printed (on both of the tasks) looked pretty
> > > harmless though: a wiki entry and a .js page, so I assume there must
> > > be a buffer that writes when it fills up.  Where is this buffer
> > > located and would it be pretty easy to dump it to stdout rather than a
> > > file for debug purposes?
> >
> > I keep forgetting that people run fetch with parse. Can you try
> > running fetch with "-noParsing" option? If reduce phase has a problem
> > with urlfiltering, this should solve it as no-parsing-fetch's reduce
> > phase is just identity reducing.
> >
> > >
> > > Here is the thread dump:
> > >
> > > "[EMAIL PROTECTED]" daemon prio=1
> > > tid=0x00002aaaab72c6b0 nid=0x4ae8 waiting       on condition
> > > [0x0000000041367000..0x0000000041367b80] at
> > >         java.lang.Thread.sleep(Native Method) at
> > >         
> > > org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:458) at
> > >         java.lang.Thread.run(Thread.java:595)
> > > "Pinger for task_0018_r_000002_0" daemon prio=1 tid=0x00002aaaac2f1d80
> > > nid=0x4ae5 waiting on condition
> > > [0x0000000041165000..0x0000000041165c80] at
> > >         java.lang.Thread.sleep(Native Method) at
> > >         
> > > org.apache.hadoop.mapred.TaskTracker$Child$1.run(TaskTracker.java:1488)
> > > at
> > >         java.lang.Thread.run(Thread.java:595)
> > > "IPC Client connection to 0.0.0.0/0.0.0.0:50050" daemon prio=1
> > > tid=0x00002aaaac2d0670 nid=0x4ae4 in Object.wait()
> > > [0x0000000041064000..0x0000000041064d00] at
> > > java.lang.Object.wait(Native Method) - waiting on
> > > <0x00002b141d61d130>-   (aorg.apache.hadoop.ipc.Client$Connection) at
> > > java.lang.Object.wait(Object.java:474) at
> > >         
> > > org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:213)
> > > - locked <0x00002b141d61d130> (a
> > >                 org.apache.hadoop.ipc.Client$Connection) at
> > >         org.apache.hadoop.ipc.Client$Connection.run(Client.java:252)
> > > "org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=1
> > > tid=0x00002aaaac332a20 nid=0x4ae3 waiting on condition
> > > [0x0000000040f63000..0x0000000040f63d80] at
> > >         java.lang.Thread.sleep(Native Method) at
> > >         org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:401)
> > > "Low Memory Detector" daemon prio=1 tid=0x00002aaaac0025a0 nid=0x4ae1
> > > runnable        [0x0000000000000000..0x0000000000000000]
> > > "CompilerThread1" daemon prio=1 tid=0x00002aaaac000ab0 nid=0x4ae0
> > > waiting on condition [0x0000000000000000..0x0000000040c5f3e0]
> > > "CompilerThread0" daemon prio=1 tid=0x00002aaab00f3290 nid=0x4adf
> > > waiting on condition [0x0000000000000000..0x0000000040b5e460]
> > > "AdapterThread" daemon prio=1 tid=0x00002aaab00f1c70 nid=0x4ade
> > > waiting on condition [0x0000000000000000..0x0000000000000000]
> > > "Signal Dispatcher" daemon prio=1 tid=0x00002aaab00f07b0 nid=0x4add
> > > runnable [0x0000000000000000..0x0000000000000000]
> > > "Finalizer" daemon prio=1 tid=0x00002aaab00dbd70 nid=0x4adc in
> > > Object.wait()   [0x000000004085c000..0x000000004085cd00] at
> > >         java.lang.Object.wait(Native Method) - waiting on
> > > <0x00002b141d606288> (a java.lang.ref.ReferenceQueue$Lock) at
> > >         java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - 
> > > locked
> > > <0x00002b141d606288> (a                               
> > > java.lang.ref.ReferenceQueue$Lock) at
> > >         java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at
> > >         java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> > > "Reference Handler" daemon prio=1 tid=0x00002aaab00db290 nid=0x4adb in
> > > Object.wait()
> > >
> > > On 9/6/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > > > Ned Rockson wrote:
> > > > > (sorry if this is a repost, I'm not sure if it sent last time).
> > > > >
> > > > > I have a very strange, reproducible bug that shows up when running
> > > > > fetch across any number of documents >10000.  I'm running 47 map tasks
> > > > > and 47 reduce tasks on 24 nodes.  The map phase finishes fine and so
> > > > > does the majority of the reduce phase, however there are always two
> > > > > segments that perpetually hang in the reduce > reduce phase.  What
> > > > > happens is the reducer gets to 85.xx% and then stops responding.  Once
> > > > > 10 minutes go by, a new worker starts the task, gets to the same
> > > > > 85.xx(+/- .1%) and hangs.  The other consistent part is that it's
> > > > > always segment 2 and segment 5 (out of 47 segments).
> > > > >
> > > > > I figured I could fix it by simply copying data from a different
> > > > > segment in and continuing on the next iteration, but low and behold
> > > > > the same exact problem happens in segment 2 and segment 5.
> > > > >
> > > > > I assume it's not IO problems because all of the nodes involved in
> > > > > these segments finish other reduce tasks in the same iteration with no
> > > > > problems.  Furthermore, I have seen this happen persistently over the
> > > > > last many iterations.  My last iteration had 400,000 (+/-) documents
> > > > > pulled down and I saw the same behavior.
> > > > >
> > > > > Does anyone have any suggestions?
> > > > >
> > > >
> > > > Yes. Most likely this is a problem with urlfilter-regex getting stuck on
> > > > an abnormal URL (such as e.g. extremely long url, or url that contains
> > > > control characters).
> > > >
> > > > Please check the Jobtracker UI which task is stuck, and on which machine
> > > > it's executing. Log in to that machine, and identify what is the pid of
> > > > this task process, and then generate a thread dump (using 'kill
> > > > -SIGQUIT', which does NOT quit the process). If the thread dump shows
> > > > some threads being stuck in regex code then it's likely that this is the
> > > > problem.
> > > >
> > > > The solution is to avoid urlfilter-regex, or to change the order of
> > > > urlfilters and put simpler filters in front of urlfilter-regex, in the
> > > > hope that they will eliminate abnormal urls before they are passed to
> > > > urlfilter-regex.
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> > > >   ___. ___ ___ ___ _ _   __________________________________
> > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > http://www.sigram.com  Contact: info at sigram dot com
> > > >
> > > >
> > >
> >
> >
> > --
> > Doğacan Güney
> >
>


-- 
Doğacan Güney

Re: Problem with fetch reduce phase

Reply via email to