I managed to "hack" HtmlParser by modifying the class HTMLMetaProcessor.
Now I'm able to parse my metadata.
I agree with you. I will write my own plugin later. At the moment I'm
only interested to find out whether it is possible to start using
Solr/Nutch instead of paying A LOT for a Fast/Ul
On 2010-01-11 13:18, Erlend Garåsen wrote:
First of all: I didn't know about the list archive, so sorry for not
searching that resource before I sent a new post.
MilleBii wrote:
For lastModified just enable the index|query-more plugins it will do
the job for you.
Unfortunately not. Our pages
First of all: I didn't know about the list archive, so sorry for not
searching that resource before I sent a new post.
MilleBii wrote:
For lastModified just enable the index|query-more plugins it will do
the job for you.
Unfortunately not. Our pages include Dublin core metadata which has a
Something like this may work for your filter. I have not tested this but
maybe it will give you a better idea of what you need to do for the author
data. This is based on nutch-1.0 so I'm not sure if this would work for the
trunk version.
public class AuthorFilter implements HtmlParseFilter {
p
For lastModified just enable the index|query-more plugins it will do
the job for you.
For other meta searc the mailing list its explained many times how to do it
2010/1/8, Erlend Garåsen :
>
> Hello,
>
> I have tried to add additional metadata by changing the code in
> HtmlParser.java and MoreInd
Hello,
I have tried to add additional metadata by changing the code in
HtmlParser.java and MoreIndexingFilter.java without any luck. Do I
really have to do something which is mentioned on the following wiki in
order to fetch the content of the metadata, i.e. write my own parser,
filter and a