Problem Extracting HTML Meta Tags

2007-03-29 Thread z0mbi3
Hi, I am working on a focused Crawler for which I need the HTML meta tag info in ParseOutputFormat.java. It provides me with the parse of the HTML page so is there a way to Extract the HTML meta tags value through parse.getData? For Ex. for html page : BBCHindi.com I would like to ext

Re: Image Search Engine Input

2007-03-29 Thread Doug Cutting
Steve Severance wrote: I am not looking to really make an image retrieval engine. During indexing referencing docs will be analyzed and text content will be associated with the image. Currently I want to keep this in a separate index. So despite the fact that images will be returned the search

RE: Sequence File Question

2007-03-29 Thread Steve Severance
Got it. I am going to document this on the wiki. Thanks. Steve -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Thursday, March 29, 2007 2:31 PM To: nutch-dev@lucene.apache.org Subject: Re: Sequence File Question Steve Severance wrote: >> DB updates - or actually

Re: Sequence File Question

2007-03-29 Thread Andrzej Bialecki
Steve Severance wrote: DB updates - or actually replacements - see e.g. CrawlDb.install() method for details. This is not needed in case of segments, which are created once and never updated. How does the reader know which one it is expecting. For instance I can make a reader to read a linkD

RE: Sequence File Question

2007-03-29 Thread Steve Severance
> -Original Message- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 28, 2007 4:34 PM > To: nutch-dev@lucene.apache.org > Subject: Re: Sequence File Question > > Steve Severance wrote: > > Let me actually refine that question we do some directories like the > li

[jira] Commented: (NUTCH-435) Synonym-Editor that creates OWL for the ontology plugin

2007-03-29 Thread Urs Krebs (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485171 ] Urs Krebs commented on NUTCH-435: - The Synonym-Editor is now a Souceforge-project and you find it under: http://sour

Re: [VOTE] Release Apache Nutch 0.9

2007-03-29 Thread Sami Siren
> The branch should have been done when we kind fixed > the features for 0.9. not when the first rc is cut. That's the past for me. We're discussing how we should modify the procedure to get the best results in the future, given the team and the momentum we have now. I think we're discussing a

Re: [VOTE] Release Apache Nutch 0.9

2007-03-29 Thread Andrzej Bialecki
Sami Siren wrote: 2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>: Sami Siren wrote: > 2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>: >> >> Sami Siren wrote: >> >> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have >> > 0.9-rc2 tag and so on until we are satisfied. >> > >> > T

Re: [VOTE] Release Apache Nutch 0.9

2007-03-29 Thread Sami Siren
2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>: Sami Siren wrote: > 2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>: >> >> Sami Siren wrote: >> >> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have >> > 0.9-rc2 tag and so on until we are satisfied. >> > >> > Then when we're actu

Re: [VOTE] Release Apache Nutch 0.9

2007-03-29 Thread Andrzej Bialecki
Sami Siren wrote: 2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>: Sami Siren wrote: > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have > 0.9-rc2 tag and so on until we are satisfied. > > Then when we're actually satisfied create tag for 0.9 (copy from rc > that got promoted).

Re: [VOTE] Release Apache Nutch 0.9

2007-03-29 Thread Sami Siren
2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>: Sami Siren wrote: > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have > 0.9-rc2 tag and so on until we are satisfied. > > Then when we're actually satisfied create tag for 0.9 (copy from rc > that got promoted). > > What is the ben

Re: [VOTE] Release Apache Nutch 0.9

2007-03-29 Thread Andrzej Bialecki
Sami Siren wrote: IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have 0.9-rc2 tag and so on until we are satisfied. Then when we're actually satisfied create tag for 0.9 (copy from rc that got promoted). What is the benefit of using a branch before a release? That you don't with