I believe simple regular expression (Pattern, Matcher) may create several hundreds 'child' instances of Perl5Repetition, Perl5Substitution, etc.
Same as with parsing XML. -Fuad http://www.tokenizer.ca > -----Original Message----- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: February-12-10 7:54 PM > To: nutch-user@lucene.apache.org > Subject: memory consumed by jakarta-oro > > Hi, > We use jakarta-oro-2.0.7.jar > I see the following from jmap output: > 14369 instances of class org.apache.oro.text.regex.Perl5Repetition > 4972 instances of class org.apache.oro.text.regex.PatternMatcherInput > 4445 instances of class org.apache.hadoop.hbase.HColumnDescriptor > 2969 instances of class org.apache.oro.text.regex.Perl5Substitution > > I am wondering why so many objects from org.apache.oro.text.regex are > held > in memory. I see GC every 10 seconds. > > Here is the list: > 63916 instances of class > org.apache.hadoop.hbase.io.ImmutableBytesWritable > 26612 instances of class org.apache.hadoop.hbase.KeyValue > 14369 instances of class org.apache.oro.text.regex.Perl5Repetition > 4972 instances of class org.apache.oro.text.regex.PatternMatcherInput > 4445 instances of class org.apache.hadoop.hbase.HColumnDescriptor > 2969 instances of class org.apache.oro.text.regex.Perl5Substitution > 2313 instances of class org.apache.nutch.util.domain.DomainSuffix > 1709 instances of class org.apache.hadoop.hbase.client.Put > 581 instances of class org.apache.nutch.parse.Outlink > 553 instances of class org.apache.nutch.util.hbase.ColumnData > 496 instances of class > com.rialto.nutchbase.fetcher.FetcherReducer$FetchItem > 495 instances of class org.apache.nutch.util.hbase.WebTableRow > 422 instances of class org.apache.hadoop.hbase.HRegionLocation > 422 instances of class org.apache.hadoop.hbase.HServerAddress > 414 instances of class org.apache.hadoop.hbase.HRegionInfo > 414 instances of class org.apache.hadoop.hbase.HTableDescriptor > 412 instances of class org.apache.hadoop.hbase.util.SoftValue > 293 instances of class org.apache.nutch.util.domain.TopLevelDomain > 253 instances of class > org.cyberneko.html.HTMLEntities$IntProperties$Entry > 219 instances of class org.apache.oro.text.regex.Perl5Matcher > > Your hint is helpful.