I believe simple regular expression (Pattern, Matcher) may create several
hundreds 'child' instances of Perl5Repetition, Perl5Substitution, etc.

Same as with parsing XML.


-Fuad
http://www.tokenizer.ca


> -----Original Message-----
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: February-12-10 7:54 PM
> To: nutch-user@lucene.apache.org
> Subject: memory consumed by jakarta-oro
> 
> Hi,
> We use jakarta-oro-2.0.7.jar
> I see the following from jmap output:
> 14369 instances of class org.apache.oro.text.regex.Perl5Repetition
> 4972 instances of class org.apache.oro.text.regex.PatternMatcherInput
> 4445 instances of class org.apache.hadoop.hbase.HColumnDescriptor
> 2969 instances of class org.apache.oro.text.regex.Perl5Substitution
> 
> I am wondering why so many objects from org.apache.oro.text.regex are
> held
> in memory. I see GC every 10 seconds.
> 
> Here is the list:
> 63916 instances of class
> org.apache.hadoop.hbase.io.ImmutableBytesWritable
> 26612 instances of class org.apache.hadoop.hbase.KeyValue
> 14369 instances of class org.apache.oro.text.regex.Perl5Repetition
> 4972 instances of class org.apache.oro.text.regex.PatternMatcherInput
> 4445 instances of class org.apache.hadoop.hbase.HColumnDescriptor
> 2969 instances of class org.apache.oro.text.regex.Perl5Substitution
> 2313 instances of class org.apache.nutch.util.domain.DomainSuffix
> 1709 instances of class org.apache.hadoop.hbase.client.Put
> 581 instances of class org.apache.nutch.parse.Outlink
> 553 instances of class org.apache.nutch.util.hbase.ColumnData
> 496 instances of class
> com.rialto.nutchbase.fetcher.FetcherReducer$FetchItem
> 495 instances of class org.apache.nutch.util.hbase.WebTableRow
> 422 instances of class org.apache.hadoop.hbase.HRegionLocation
> 422 instances of class org.apache.hadoop.hbase.HServerAddress
> 414 instances of class org.apache.hadoop.hbase.HRegionInfo
> 414 instances of class org.apache.hadoop.hbase.HTableDescriptor
> 412 instances of class org.apache.hadoop.hbase.util.SoftValue
> 293 instances of class org.apache.nutch.util.domain.TopLevelDomain
> 253 instances of class
> org.cyberneko.html.HTMLEntities$IntProperties$Entry
> 219 instances of class org.apache.oro.text.regex.Perl5Matcher
> 
> Your hint is helpful.


Reply via email to