hello, i just migrated from 0.8.1 to 0.9 and ran into a problem with parsing (we do parsing after fetching) of a 500000 pages segment.
the process is using 0% cpu, but a lot of memory (goes like that for hours). it seems to be stalled according the logfiles. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15427 vetseeke 16 0 1291m 835m 14m T 0.0 41.2 864:45.87 java i examined the process and it turns out that the the perm space is 99.9% full. below is the output of (1) jmap -heap 15427 (2) jstack 15427 (3) the first lines of jmap -histo 15427 can anybody see what is going wrong (and maybe even what i can do about it)? we have limited file.content.length to 2mb, so why would the parse process need so much memory? any hints are very much appreciated! best wishes karsten (1) jmap -heap 15427 Attaching to process ID 15427, please wait... Debugger attached successfully. Client compiler detected. JVM version is 1.5.0_06-b05 using thread-local object allocation. Mark Sweep Compact GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 1048576000 (1000.0MB) NewSize = 655360 (0.625MB) MaxNewSize = 4294901760 (4095.9375MB) OldSize = 1441792 (1.375MB) NewRatio = 12 SurvivorRatio = 8 PermSize = 8388608 (8.0MB) MaxPermSize = 67108864 (64.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 51380224 (49.0MB) used = 6208 (0.00592041015625MB) free = 51374016 (48.99407958984375MB) 0.012082469706632654% used Eden Space: capacity = 45678592 (43.5625MB) used = 6208 (0.00592041015625MB) free = 45672384 (43.55657958984375MB) 0.013590611549497847% used >From Space: capacity = 5701632 (5.4375MB) used = 0 (0.0MB) free = 5701632 (5.4375MB) 0.0% used To Space: capacity = 5701632 (5.4375MB) used = 0 (0.0MB) free = 5701632 (5.4375MB) 0.0% used tenured generation: capacity = 684777472 (653.0546875MB) used = 387925944 (369.9550094604492MB) free = 296851528 (283.0996780395508MB) 56.649927876132% used Perm Generation: capacity = 67108864 (64.0MB) used = 67108784 (63.99992370605469MB) free = 80 (7.62939453125E-5MB) 99.99988079071045% used (2) jstack 15427 Attaching to process ID 15427, please wait... Debugger attached successfully. Client compiler detected. JVM version is 1.5.0_06-b05 Thread 16025: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.MapTask$2.run() @bci=44, line=201 (Compiled frame) Thread 15450: (state = BLOCKED) - java.lang.ClassLoader.defineClass1(java.lang.String, byte[], int, int, java.security.ProtectionDomain, java.lang.String) @bci=0 (Interpreted frame) - java.lang.ClassLoader.defineClass(java.lang.String, byte[], int, int, java.security.ProtectionDomain) @bci=34, line=620 (Interpreted frame) - java.security.SecureClassLoader.defineClass(java.lang.String, byte[], int, int, java.security.CodeSource) @bci=27, line=124 (Interpreted frame) - java.net.URLClassLoader.defineClass(java.lang.String, sun.misc.Resource) @bci=253, line=260 (Interpreted frame) - java.net.URLClassLoader.access$100(java.net.URLClassLoader, java.lang.String, sun.misc.Resource) @bci=3, line=56 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - java.net.URLClassLoader.findClass(java.lang.String) @bci=13, line=188 (Compiled frame) - java.lang.ClassLoader.loadClass(java.lang.String, boolean) @bci=43, line=306 (Compiled frame) - java.lang.ClassLoader.loadClass(java.lang.String) @bci=3, line=251 (Interpreted frame) - java.lang.ClassLoader.loadClassInternal(java.lang.String) @bci=2, line=319 (Interpreted frame) - org.apache.nutch.parse.pdf.PdfParser.getParse(org.apache.nutch.protocol.Content) @bci=111, line=90 (Interpreted frame) - org.apache.nutch.parse.ParseUtil.parse(org.apache.nutch.protocol.Content) @bci=174, line=84 (Compiled frame) - org.apache.nutch.parse.ParseSegment.map(org.apache.hadoop.io.WritableComparable, org.apache.hadoop.io.Writable, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) @bci=50, line=75 (Compiled frame) - org.apache.hadoop.mapred.MapRunner.run(org.apache.hadoop.mapred.RecordReader, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) @bci=39, line=48 (Compiled frame) - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=239, line=175 (Interpreted frame) - org.apache.hadoop.mapred.LocalJobRunner$Job.run() @bci=225, line=126 (Interpreted frame) Thread 15444: (state = BLOCKED) Thread 15443: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 (Compiled frame) - java.lang.ref.ReferenceQueue.remove() @bci=2, line=132 (Compiled frame) Thread 15442: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Compiled frame) Thread 15427: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - org.apache.hadoop.mapred.JobClient.runJob(org.apache.hadoop.mapred.JobConf) @bci=120, line=550 (Interpreted frame) - org.apache.nutch.parse.ParseSegment.parse(org.apache.hadoop.fs.Path) @bci=155, line=131 (Interpreted frame) - org.apache.nutch.parse.ParseSegment.main(java.lang.String[]) @bci=43, line=149 (Interpreted frame) (3) jmap -histo 15427 Object Histogram: Size Count Class description ------------------------------------------------------- 149696016 6237334 java.lang.String 130090736 6233990 char[] 27550488 1147937 java.util.HashMap$Entry 24244600 172572 * ConstMethodKlass 19227080 24261 java.lang.String[] 15574904 19114 byte[] 11129512 16591 * ConstantPoolKlass 9769240 13075 java.util.HashMap$Entry[] 9664528 172572 * MethodKlass 8334296 102729 java.lang.Object[] 6365728 14246 * ConstantPoolCacheKlass 6364552 16590 * InstanceKlassKlass 2333928 97247 java.util.ArrayList 1482976 16852 java.lang.Class 1482312 25182 * SymbolKlass 1256112 17990 short[] 1034840 21837 java.lang.Object[] 901824 56364 java.lang.Integer 703680 10499 int[] 586728 24447 java.util.Hashtable$Entry 522600 13065 java.util.HashMap 486848 7607 java.lang.reflect.Constructor 457104 19046 org.pdfbox.afmtypes.KernPair 457104 19046 org.pdfbox.afmtypes.KernPair 457104 19046 org.pdfbox.afmtypes.KernPair 457104 19046 org.pdfbox.afmtypes.KernPair 408192 17008 org.pdfbox.afmtypes.KernPair 408192 17008 org.pdfbox.afmtypes.KernPair 408192 17008 org.pdfbox.afmtypes.KernPair 397560 16565 org.pdfbox.afmtypes.KernPair 397560 16565 org.pdfbox.afmtypes.KernPair 397560 16565 org.pdfbox.afmtypes.KernPair 348648 14527 org.pdfbox.afmtypes.KernPair 336480 14020 org.pdfbox.afmtypes.KernPair 332640 13860 org.pdfbox.afmtypes.KernPair 292944 12206 org.pdfbox.afmtypes.KernPair 283728 11822 org.pdfbox.afmtypes.KernPair 273096 11379 org.pdfbox.afmtypes.KernPair 271840 2196 java.util.Hashtable$Entry[] 267720 11155 org.pdfbox.afmtypes.KernPair 267008 4172 org.pdfbox.afmtypes.CharMetric 228024 9501 org.pdfbox.afmtypes.KernPair 228024 9501 org.pdfbox.afmtypes.KernPair 224184 9341 org.pdfbox.afmtypes.KernPair 222864 13929 java.lang.Character 221760 3465 org.pdfbox.afmtypes.CharMetric 220136 3931 java.net.URL 186368 2912 org.pdfbox.afmtypes.CharMetric 184008 7667 org.pdfbox.afmtypes.KernPair 184008 7667 org.pdfbox.afmtypes.KernPair 181440 2835 org.pdfbox.afmtypes.CharMetric 174216 7259 org.pdfbox.afmtypes.KernPair 173440 2710 org.pdfbox.afmtypes.CharMetric 173440 2710 org.pdfbox.afmtypes.CharMetric 173440 2710 org.pdfbox.afmtypes.CharMetric 165000 6875 org.pdfbox.afmtypes.KernPair 153280 2395 org.pdfbox.afmtypes.CharMetric 153280 2395 org.pdfbox.afmtypes.CharMetric 153280 2395 org.pdfbox.afmtypes.CharMetric 153280 2395 org.pdfbox.afmtypes.CharMetric 152320 2380 org.apache.nutch.plugin.PluginDescriptor 148416 4638 java.lang.ref.SoftReference 146048 2282 org.pdfbox.afmtypes.CharMetric 133120 2080 org.pdfbox.afmtypes.CharMetric 133120 2080 org.pdfbox.afmtypes.CharMetric 133120 2080 org.pdfbox.afmtypes.CharMetric 125888 1967 org.pdfbox.afmtypes.CharMetric 125888 1967 org.pdfbox.afmtypes.CharMetric 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 124464 5186 org.pdfbox.afmtypes.KernPair 122176 7636 org.pdfbox.cos.COSName 118832 7427 org.pdfbox.cos.COSName 115056 2397 java.lang.Package 112960 1765 org.pdfbox.afmtypes.CharMetric 112960 1765 org.pdfbox.afmtypes.CharMetric 108408 4517 java.util.Vector 105728 1652 org.pdfbox.afmtypes.CharMetric 104832 6552 org.pdfbox.cos.COSName 102912 6432 org.pdfbox.cos.COSName 100464 4186 org.pdfbox.util.BoundingBox 99568 6223 org.pdfbox.cos.COSName 96576 6036 org.pdfbox.cos.COSName 96048 6003 org.pdfbox.cos.COSName 93744 5859 org.pdfbox.cos.COSName ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
