Here are the results, I'm going to do now without url partitioning :
nutch generator output -
Generator: starting at 2011-02-02 20:11:00
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
jstack output -
Full thread dump Java HotSpot(TM) Client VM (17.1-b03 mixed mode, sharing):
"communication thread" daemon prio=10 tid=0x0a104800 nid=0x637 runnable
[0xb3cad000]
java.lang.Thread.State: RUNNABLE
at java.lang.Object.getClass(Native Method)
at java.util.ArrayList.<init>(ArrayList.java:134)
at
org.apache.hadoop.fs.FileSystem.getAllStatistics(FileSystem.java:1567)
- locked <0x8ef584c8> (a java.lang.Class for
org.apache.hadoop.fs.FileSystem)
at org.apache.hadoop.mapred.Task.updateCounters(Task.java:652)
- locked <0x66a3d020> (a org.apache.hadoop.mapred.ReduceTask)
at org.apache.hadoop.mapred.Task.access$600(Task.java:56)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:539)
at java.lang.Thread.run(Thread.java:662)
"Attach Listener" daemon prio=10 tid=0x0a0a9800 nid=0x7e31 waiting on
condition [0x00000000]
java.lang.Thread.State: RUNNABLE
"Thread-13" prio=10 tid=0xb3b27800 nid=0x6fd9 runnable [0xb3cfe000]
java.lang.Thread.State: RUNNABLE
at java.util.ArrayList.size(ArrayList.java:177)
at java.util.AbstractList$Itr.hasNext(AbstractList.java:339)
at
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.regexNormalize(RegexURLNormalizer.java:168)
- locked <0x66a48778> (a
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer)
at
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer.normalize(RegexURLNormalizer.java:179)
- locked <0x66a48778> (a
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer)
at
org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:286)
at
org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:244)
at
org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:109)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
"Low Memory Detector" daemon prio=10 tid=0x09ec8800 nid=0x6fb3 runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x09ec6800 nid=0x6fb2 waiting on
condition [0x00000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x09ec4c00 nid=0x6fb1 runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x09ec0800 nid=0x6fb0 in Object.wait()
[0xb46cc000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x65410258> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x65410258> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x09ebbc00 nid=0x6faf in
Object.wait() [0xb471d000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x654102e8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x654102e8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x09e97000 nid=0x6fad runnable [0xb6c97000]
java.lang.Thread.State: RUNNABLE
at
java.text.DecimalFormatSymbols.initialize(DecimalFormatSymbols.java:509)
at
java.text.DecimalFormatSymbols.<init>(DecimalFormatSymbols.java:77)
at java.text.DecimalFormat.<init>(DecimalFormat.java:416)
at
org.apache.hadoop.util.StringUtils.formatPercent(StringUtils.java:113)
at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1283)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
at org.apache.nutch.crawl.Generator.generate(Generator.java:526)
at org.apache.nutch.crawl.Generator.run(Generator.java:692)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Generator.main(Generator.java:648)
"VM Thread" prio=10 tid=0x09eba400 nid=0x6fae runnable
"VM Periodic Task Thread" prio=10 tid=0x09ecac00 nid=0x6fb4 waiting on
condition
JNI global references: 1419
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2410061.html
Sent from the Nutch - User mailing list archive at Nabble.com.