Hi.We are using Nutch with a solr backend. I have some questions about the
field boost used by Nutch when indexing documents. I can't find the numbers
anywhere, but it seems like nutch is not using the default values?
When the document is indexed by nutch I get this result when searching for
the
I'm on 1.0 and it works fine, returning null from the indexingfilter actual
avoids indexing it.
SO you could consider switching to 1.0.
2009/10/8 Magnús Skúlason magg...@gmail.com
Hi,
I want nutch to only index some of the documents that it crawls, I have
tried what is suggested here:
Don't think it will work because at the indexing filter stage all the HTML
tags are gone from the text.
I think you need to modify the HTML parser to filter out the tags you want
to get rid of.
In some use case I have I would like to perform 'intelligent indexing', ie
use the tag information to
On Fri, 9 Oct 2009 18:00:41 +0200
MilleBii mille...@gmail.com wrote:
Don't think it will work because at the indexing filter stage all
the HTML tags are gone from the text.
I think you need to modify the HTML parser to filter out the tags
you want to get rid of.
In some use case I have I
HI
hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking
to start to create an HTML tag filter class.
mabe i can create my own HTML parser ! as i do for parsing and indexing
DublinCore metadata...it sounds possible don't you think so ?
i just hv to create also or to
BELLINI ADAM wrote:
HI
hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking
to start to create an HTML tag filter class.
mabe i can create my own HTML parser ! as i do for parsing and indexing
DublinCore metadata...it sounds possible don't you think so ?
i just hv
hi,
can you plz just tell us in english what the plugin creativecommons is for ?
i mean if i will include this plugin in my nutch-site.txt, what will i have as
result ?
thx
Date: Fri, 9 Oct 2009 19:16:44 +0200
From: a...@getopt.org
To: nutch-user@lucene.apache.org
Subject: Re: indexing
can you plz just tell us in english what the plugin creativecommons
is for ?
i mean if i will include this plugin in my nutch-site.txt, what will
i have as result ?
I think Andrzej is suggesting that you read the code.
If you look at the beginning of the CCParseFilter.java file, you'll see:
yes i did read the code but didnt understand what is 'the Creative Commons
license' that's why i asked what does mean creativecommons .
but as u said, i hv to be familiar with DOM manipulation to understand the
code...so lets start knowing DOM
thx
From: kkrugler_li...@transpac.com