Is it roughly when the memory goes out of control? Could be a dodgy URL
putting the URLnormalisation in a spin : one gets all sorts of horrors after
a while.

Maybe try using '-noNorm' on the Generation and see if that has any impact.
Would be good also to know on which job and map/red the issue is happening,
can use the Hadoop jobtracker GUI on the pseudo distributed mode to see that

Thanks

Julien



On 3 February 2011 00:28, axierr <[email protected]> wrote:

>
>
> Here are the results, I'm going to do now without url partitioning :
>
> nutch generator output -
> Generator: starting at 2011-02-02 20:11:00
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
>
> jstack output -
> Full thread dump Java HotSpot(TM) Client VM (17.1-b03 mixed mode, sharing):
>
> "communication thread" daemon prio=10 tid=0x0a104800 nid=0x637 runnable
> [0xb3cad000]
>   java.lang.Thread.State: RUNNABLE
>        at java.lang.Object.getClass(Native Method)
>        at java.util.ArrayList.<init>(ArrayList.java:134)
>        at
> org.apache.hadoop.fs.FileSystem.getAllStatistics(FileSystem.java:1567)
>        - locked <0x8ef584c8> (a java.lang.Class for
> org.apache.hadoop.fs.FileSystem)
>        at org.apache.hadoop.mapred.Task.updateCounters(Task.java:652)
>        - locked <0x66a3d020> (a org.apache.hadoop.mapred.ReduceTask)
>        at org.apache.hadoop.mapred.Task.access$600(Task.java:56)
>        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:539)
>        at java.lang.Thread.run(Thread.java:662)
>
> "Attach Listener" daemon prio=10 tid=0x0a0a9800 nid=0x7e31 waiting on
> condition [0x00000000]
>   java.lang.Thread.State: RUNNABLE
>
> "Thread-13" prio=10 tid=0xb3b27800 nid=0x6fd9 runnable [0xb3cfe000]
>   java.lang.Thread.State: RUNNABLE
>        at java.util.ArrayList.size(ArrayList.java:177)
>        at java.util.AbstractList$Itr.hasNext(AbstractList.java:339)
>        at
>
> org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.regexNormalize(RegexURLNormalizer.java:168)
>        - locked <0x66a48778> (a
> org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer)
>        at
>
> org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.normalize(RegexURLNormalizer.java:179)
>        - locked <0x66a48778> (a
> org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer)
>        at
> org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:286)
>        at
> org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:244)
>        at
> org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109)
>        at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>
> "Low Memory Detector" daemon prio=10 tid=0x09ec8800 nid=0x6fb3 runnable
> [0x00000000]
>   java.lang.Thread.State: RUNNABLE
>
> "CompilerThread0" daemon prio=10 tid=0x09ec6800 nid=0x6fb2 waiting on
> condition [0x00000000]
>   java.lang.Thread.State: RUNNABLE
>
> "Signal Dispatcher" daemon prio=10 tid=0x09ec4c00 nid=0x6fb1 runnable
> [0x00000000]
>   java.lang.Thread.State: RUNNABLE
>
> "Finalizer" daemon prio=10 tid=0x09ec0800 nid=0x6fb0 in Object.wait()
> [0xb46cc000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x65410258> (a java.lang.ref.ReferenceQueue$Lock)
>        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>        - locked <0x65410258> (a java.lang.ref.ReferenceQueue$Lock)
>        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>
> "Reference Handler" daemon prio=10 tid=0x09ebbc00 nid=0x6faf in
> Object.wait() [0xb471d000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x654102e8> (a java.lang.ref.Reference$Lock)
>        at java.lang.Object.wait(Object.java:485)
>        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>        - locked <0x654102e8> (a java.lang.ref.Reference$Lock)
>
> "main" prio=10 tid=0x09e97000 nid=0x6fad runnable [0xb6c97000]
>   java.lang.Thread.State: RUNNABLE
>        at
> java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:509)
>        at
> java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:77)
>        at java.text.DecimalFormat.<init>(DecimalFormat.java:416)
>        at
> org.apache.hadoop.util.StringUtils.formatPercent(StringUtils.java:113)
>        at
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1283)
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
>        at org.apache.nutch.crawl.Generator.generate(Generator.java:526)
>        at org.apache.nutch.crawl.Generator.run(Generator.java:692)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.crawl.Generator.main(Generator.java:648)
>
> "VM Thread" prio=10 tid=0x09eba400 nid=0x6fae runnable
>
> "VM Periodic Task Thread" prio=10 tid=0x09ecac00 nid=0x6fb4 waiting on
> condition
>
> JNI global references: 1419
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2410061.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to