hello,

i just migrated from 0.8.1 to 0.9  and ran into a problem with parsing
(we do parsing after fetching) of a 500000 pages segment.

the process is using 0% cpu, but a lot of memory (goes like that for
hours). it seems to be stalled according the logfiles.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15427 vetseeke  16   0 1291m 835m  14m T  0.0 41.2 864:45.87 java

i examined the process and it turns out  that the  the perm space is
99.9% full.

below is the output of
(1) jmap -heap 15427
(2) jstack 15427
(3) the first  lines of jmap -histo 15427

can anybody see what is going wrong (and maybe even what i can do about it)?
we have limited file.content.length to 2mb, so why would the parse
process need so much memory?

any hints are very much appreciated!

best wishes
karsten


(1) jmap -heap 15427

Attaching to process ID 15427, please wait...
Debugger attached successfully.
Client compiler detected.
JVM version is 1.5.0_06-b05

using thread-local object allocation.
Mark Sweep Compact GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 1048576000 (1000.0MB)
   NewSize          = 655360 (0.625MB)
   MaxNewSize       = 4294901760 (4095.9375MB)
   OldSize          = 1441792 (1.375MB)
   NewRatio         = 12
   SurvivorRatio    = 8
   PermSize         = 8388608 (8.0MB)
   MaxPermSize      = 67108864 (64.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 51380224 (49.0MB)
   used     = 6208 (0.00592041015625MB)
   free     = 51374016 (48.99407958984375MB)
   0.012082469706632654% used
Eden Space:
   capacity = 45678592 (43.5625MB)
   used     = 6208 (0.00592041015625MB)
   free     = 45672384 (43.55657958984375MB)
   0.013590611549497847% used
>From Space:
   capacity = 5701632 (5.4375MB)
   used     = 0 (0.0MB)
   free     = 5701632 (5.4375MB)
   0.0% used
To Space:
   capacity = 5701632 (5.4375MB)
   used     = 0 (0.0MB)
   free     = 5701632 (5.4375MB)
   0.0% used
tenured generation:
   capacity = 684777472 (653.0546875MB)
   used     = 387925944 (369.9550094604492MB)
   free     = 296851528 (283.0996780395508MB)
   56.649927876132% used
Perm Generation:
   capacity = 67108864 (64.0MB)
   used     = 67108784 (63.99992370605469MB)
   free     = 80 (7.62939453125E-5MB)
   99.99988079071045% used



(2) jstack 15427

Attaching to process ID 15427, please wait...
Debugger attached successfully.
Client compiler detected.
JVM version is 1.5.0_06-b05
Thread 16025: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$2.run() @bci=44, line=201 (Compiled frame)


Thread 15450: (state = BLOCKED)
 - java.lang.ClassLoader.defineClass1(java.lang.String, byte[], int,
int, java.security.ProtectionDomain, java.lang.String) @bci=0
(Interpreted frame)
 - java.lang.ClassLoader.defineClass(java.lang.String, byte[], int,
int, java.security.ProtectionDomain) @bci=34, line=620 (Interpreted
frame)
 - java.security.SecureClassLoader.defineClass(java.lang.String,
byte[], int, int, java.security.CodeSource) @bci=27, line=124
(Interpreted frame)
 - java.net.URLClassLoader.defineClass(java.lang.String,
sun.misc.Resource) @bci=253, line=260 (Interpreted frame)
 - java.net.URLClassLoader.access$100(java.net.URLClassLoader,
java.lang.String, sun.misc.Resource) @bci=3, line=56 (Compiled frame)
 - 
java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
java.security.AccessControlContext) @bci=0 (Compiled frame)
 - java.net.URLClassLoader.findClass(java.lang.String) @bci=13,
line=188 (Compiled frame)
 - java.lang.ClassLoader.loadClass(java.lang.String, boolean) @bci=43,
line=306 (Compiled frame)
 - java.lang.ClassLoader.loadClass(java.lang.String) @bci=3, line=251
(Interpreted frame)
 - java.lang.ClassLoader.loadClassInternal(java.lang.String) @bci=2,
line=319 (Interpreted frame)
 - 
org.apache.nutch.parse.pdf.PdfParser.getParse(org.apache.nutch.protocol.Content)
@bci=111, line=90 (Interpreted frame)
 - org.apache.nutch.parse.ParseUtil.parse(org.apache.nutch.protocol.Content)
@bci=174, line=84 (Compiled frame)
 - 
org.apache.nutch.parse.ParseSegment.map(org.apache.hadoop.io.WritableComparable,
org.apache.hadoop.io.Writable,
org.apache.hadoop.mapred.OutputCollector,
org.apache.hadoop.mapred.Reporter) @bci=50, line=75 (Compiled frame)
 - org.apache.hadoop.mapred.MapRunner.run(org.apache.hadoop.mapred.RecordReader,
org.apache.hadoop.mapred.OutputCollector,
org.apache.hadoop.mapred.Reporter) @bci=39, line=48 (Compiled frame)
 - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=239, line=175
(Interpreted frame)
 - org.apache.hadoop.mapred.LocalJobRunner$Job.run() @bci=225,
line=126 (Interpreted frame)


Thread 15444: (state = BLOCKED)


Thread 15443: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 (Compiled frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=132 (Compiled frame)


Thread 15442: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=474 (Compiled frame)


Thread 15427: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - org.apache.hadoop.mapred.JobClient.runJob(org.apache.hadoop.mapred.JobConf)
@bci=120, line=550 (Interpreted frame)
 - org.apache.nutch.parse.ParseSegment.parse(org.apache.hadoop.fs.Path)
@bci=155, line=131 (Interpreted frame)
 - org.apache.nutch.parse.ParseSegment.main(java.lang.String[])
@bci=43, line=149 (Interpreted frame)


(3) jmap -histo 15427

Object Histogram:

Size    Count   Class description
-------------------------------------------------------
149696016       6237334 java.lang.String
130090736       6233990 char[]
27550488        1147937 java.util.HashMap$Entry
24244600        172572  * ConstMethodKlass
19227080        24261   java.lang.String[]
15574904        19114   byte[]
11129512        16591   * ConstantPoolKlass
9769240 13075   java.util.HashMap$Entry[]
9664528 172572  * MethodKlass
8334296 102729  java.lang.Object[]
6365728 14246   * ConstantPoolCacheKlass
6364552 16590   * InstanceKlassKlass
2333928 97247   java.util.ArrayList
1482976 16852   java.lang.Class
1482312 25182   * SymbolKlass
1256112 17990   short[]
1034840 21837   java.lang.Object[]
901824  56364   java.lang.Integer
703680  10499   int[]
586728  24447   java.util.Hashtable$Entry
522600  13065   java.util.HashMap
486848  7607    java.lang.reflect.Constructor
457104  19046   org.pdfbox.afmtypes.KernPair
457104  19046   org.pdfbox.afmtypes.KernPair
457104  19046   org.pdfbox.afmtypes.KernPair
457104  19046   org.pdfbox.afmtypes.KernPair
408192  17008   org.pdfbox.afmtypes.KernPair
408192  17008   org.pdfbox.afmtypes.KernPair
408192  17008   org.pdfbox.afmtypes.KernPair
397560  16565   org.pdfbox.afmtypes.KernPair
397560  16565   org.pdfbox.afmtypes.KernPair
397560  16565   org.pdfbox.afmtypes.KernPair
348648  14527   org.pdfbox.afmtypes.KernPair
336480  14020   org.pdfbox.afmtypes.KernPair
332640  13860   org.pdfbox.afmtypes.KernPair
292944  12206   org.pdfbox.afmtypes.KernPair
283728  11822   org.pdfbox.afmtypes.KernPair
273096  11379   org.pdfbox.afmtypes.KernPair
271840  2196    java.util.Hashtable$Entry[]
267720  11155   org.pdfbox.afmtypes.KernPair
267008  4172    org.pdfbox.afmtypes.CharMetric
228024  9501    org.pdfbox.afmtypes.KernPair
228024  9501    org.pdfbox.afmtypes.KernPair
224184  9341    org.pdfbox.afmtypes.KernPair
222864  13929   java.lang.Character
221760  3465    org.pdfbox.afmtypes.CharMetric
220136  3931    java.net.URL
186368  2912    org.pdfbox.afmtypes.CharMetric
184008  7667    org.pdfbox.afmtypes.KernPair
184008  7667    org.pdfbox.afmtypes.KernPair
181440  2835    org.pdfbox.afmtypes.CharMetric
174216  7259    org.pdfbox.afmtypes.KernPair
173440  2710    org.pdfbox.afmtypes.CharMetric
173440  2710    org.pdfbox.afmtypes.CharMetric
173440  2710    org.pdfbox.afmtypes.CharMetric
165000  6875    org.pdfbox.afmtypes.KernPair
153280  2395    org.pdfbox.afmtypes.CharMetric
153280  2395    org.pdfbox.afmtypes.CharMetric
153280  2395    org.pdfbox.afmtypes.CharMetric
153280  2395    org.pdfbox.afmtypes.CharMetric
152320  2380    org.apache.nutch.plugin.PluginDescriptor
148416  4638    java.lang.ref.SoftReference
146048  2282    org.pdfbox.afmtypes.CharMetric
133120  2080    org.pdfbox.afmtypes.CharMetric
133120  2080    org.pdfbox.afmtypes.CharMetric
133120  2080    org.pdfbox.afmtypes.CharMetric
125888  1967    org.pdfbox.afmtypes.CharMetric
125888  1967    org.pdfbox.afmtypes.CharMetric
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
124464  5186    org.pdfbox.afmtypes.KernPair
122176  7636    org.pdfbox.cos.COSName
118832  7427    org.pdfbox.cos.COSName
115056  2397    java.lang.Package
112960  1765    org.pdfbox.afmtypes.CharMetric
112960  1765    org.pdfbox.afmtypes.CharMetric
108408  4517    java.util.Vector
105728  1652    org.pdfbox.afmtypes.CharMetric
104832  6552    org.pdfbox.cos.COSName
102912  6432    org.pdfbox.cos.COSName
100464  4186    org.pdfbox.util.BoundingBox
99568   6223    org.pdfbox.cos.COSName
96576   6036    org.pdfbox.cos.COSName
96048   6003    org.pdfbox.cos.COSName
93744   5859    org.pdfbox.cos.COSName

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to