Are you trying to use Nutch's indexer? AFAIK that's deprecated, isn't it? On Wed, Jun 22, 2011 at 2:19 PM, caomanhdat <[email protected]> wrote: > Hi all > I have a problem with get frequency of word in nutch :| > in Lucene it quite easy through this code : > > Directory dir2 = FSDirectory.open(new File(indexDir)); > IndexReader ir = IndexReader.open(dir2); > TermDocs termDocs = ir.termDocs(new Term("contents", "eBank")); > int count = 0; > while (termDocs.next()) { > count += termDocs.freq(); > } > > But in nutch, the indexer quite weird so i can't do the same thing > > Directory dir2 = FSDirectory.open(new File("D:\\nutch\\crawl\\indexes")); > IndexReader ir = IndexReader.open(dir2); > TermDocs termDocs = ir.termDocs(new Term("contents", "eBank")); > int count = 0; > while (termDocs.next()) { > count += termDocs.freq(); > } > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Get-frequency-of-word-tp3095236p3095236.html > Sent from the Nutch - User mailing list archive at Nabble.com. >
-- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).

