Hi,
I am working on a focused Crawler for which I need the HTML meta tag info in
ParseOutputFormat.java. It provides me with the parse of the HTML page so is
there a way to Extract the HTML meta tags value through parse.getData?
For Ex. for html page :
BBCHindi.com
I would like to ext
Steve Severance wrote:
I am not looking to really make an image retrieval engine. During indexing
referencing docs will be analyzed and text content will be associated with the
image. Currently I want to keep this in a separate index. So despite the fact
that images will be returned the search
Got it. I am going to document this on the wiki. Thanks.
Steve
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 29, 2007 2:31 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Sequence File Question
Steve Severance wrote:
>> DB updates - or actually
Steve Severance wrote:
DB updates - or actually replacements - see e.g. CrawlDb.install()
method for details. This is not needed in case of segments, which
are created once and never updated.
How does the reader know which one it is expecting. For instance I
can make a reader to read a linkD
> -Original Message-
> From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 28, 2007 4:34 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: Sequence File Question
>
> Steve Severance wrote:
> > Let me actually refine that question we do some directories like the
> li
[
https://issues.apache.org/jira/browse/NUTCH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485171
]
Urs Krebs commented on NUTCH-435:
-
The Synonym-Editor is now a Souceforge-project and you find it under:
http://sour
> The branch should have been done when we kind fixed
> the features for 0.9. not when the first rc is cut.
That's the past for me. We're discussing how we should modify the
procedure to get the best results in the future, given the team and the
momentum we have now.
I think we're discussing a
Sami Siren wrote:
2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>:
Sami Siren wrote:
> 2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>:
>>
>> Sami Siren wrote:
>>
>> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
>> > 0.9-rc2 tag and so on until we are satisfied.
>> >
>> > T
2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>:
Sami Siren wrote:
> 2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>:
>>
>> Sami Siren wrote:
>>
>> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
>> > 0.9-rc2 tag and so on until we are satisfied.
>> >
>> > Then when we're actu
Sami Siren wrote:
2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>:
Sami Siren wrote:
> IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
> 0.9-rc2 tag and so on until we are satisfied.
>
> Then when we're actually satisfied create tag for 0.9 (copy from rc
> that got promoted).
2007/3/29, Andrzej Bialecki <[EMAIL PROTECTED]>:
Sami Siren wrote:
> IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
> 0.9-rc2 tag and so on until we are satisfied.
>
> Then when we're actually satisfied create tag for 0.9 (copy from rc
> that got promoted).
>
> What is the ben
Sami Siren wrote:
IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
0.9-rc2 tag and so on until we are satisfied.
Then when we're actually satisfied create tag for 0.9 (copy from rc
that got promoted).
What is the benefit of using a branch before a release?
That you don't with
12 matches
Mail list logo