Here are the results, I'm going to do now without url partitioning :

nutch generator output - 
Generator: starting at 2011-02-02 20:11:00
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.

jstack output -
Full thread dump Java HotSpot(TM) Client VM (17.1-b03 mixed mode, sharing):

"communication thread" daemon prio=10 tid=0x0a104800 nid=0x637 runnable
[0xb3cad000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Object.getClass(Native Method)
        at java.util.ArrayList.<init>(ArrayList.java:134)
        at
org.apache.hadoop.fs.FileSystem.getAllStatistics(FileSystem.java:1567)
        - locked <0x8ef584c8> (a java.lang.Class for
org.apache.hadoop.fs.FileSystem)
        at org.apache.hadoop.mapred.Task.updateCounters(Task.java:652)
        - locked <0x66a3d020> (a org.apache.hadoop.mapred.ReduceTask)
        at org.apache.hadoop.mapred.Task.access$600(Task.java:56)
        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:539)
        at java.lang.Thread.run(Thread.java:662)

"Attach Listener" daemon prio=10 tid=0x0a0a9800 nid=0x7e31 waiting on
condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Thread-13" prio=10 tid=0xb3b27800 nid=0x6fd9 runnable [0xb3cfe000]
   java.lang.Thread.State: RUNNABLE
        at java.util.ArrayList.size(ArrayList.java:177)
        at java.util.AbstractList$Itr.hasNext(AbstractList.java:339)
        at
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.regexNormalize(RegexURLNormalizer.java:168)
        - locked <0x66a48778> (a
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer)
        at
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.normalize(RegexURLNormalizer.java:179)
        - locked <0x66a48778> (a
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer)
        at
org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:286)
        at
org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:244)
        at
org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109)
        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

"Low Memory Detector" daemon prio=10 tid=0x09ec8800 nid=0x6fb3 runnable
[0x00000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x09ec6800 nid=0x6fb2 waiting on
condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x09ec4c00 nid=0x6fb1 runnable
[0x00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x09ec0800 nid=0x6fb0 in Object.wait()
[0xb46cc000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65410258> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0x65410258> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x09ebbc00 nid=0x6faf in
Object.wait() [0xb471d000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x654102e8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x654102e8> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x09e97000 nid=0x6fad runnable [0xb6c97000]
   java.lang.Thread.State: RUNNABLE
        at
java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:509)
        at
java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:77)
        at java.text.DecimalFormat.<init>(DecimalFormat.java:416)
        at
org.apache.hadoop.util.StringUtils.formatPercent(StringUtils.java:113)
        at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1283)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:526)
        at org.apache.nutch.crawl.Generator.run(Generator.java:692)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Generator.main(Generator.java:648)

"VM Thread" prio=10 tid=0x09eba400 nid=0x6fae runnable

"VM Periodic Task Thread" prio=10 tid=0x09ecac00 nid=0x6fb4 waiting on
condition

JNI global references: 1419
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2410061.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to