Scoring when using solrindex

2009-10-09 Thread Ole-Martin Mørk
Hi.We are using Nutch with a solr backend. I have some questions about the field boost used by Nutch when indexing documents. I can't find the numbers anywhere, but it seems like nutch is not using the default values? When the document is indexed by nutch I get this result when searching for the

Re: Only indexing pages meeting certain criteria

2009-10-09 Thread MilleBii
I'm on 1.0 and it works fine, returning null from the indexingfilter actual avoids indexing it. SO you could consider switching to 1.0. 2009/10/8 Magnús Skúlason magg...@gmail.com Hi, I want nutch to only index some of the documents that it crawls, I have tried what is suggested here:

Re: indexing just certain content

2009-10-09 Thread MilleBii
Don't think it will work because at the indexing filter stage all the HTML tags are gone from the text. I think you need to modify the HTML parser to filter out the tags you want to get rid of. In some use case I have I would like to perform 'intelligent indexing', ie use the tag information to

Re: indexing just certain content

2009-10-09 Thread Gora Mohanty
On Fri, 9 Oct 2009 18:00:41 +0200 MilleBii mille...@gmail.com wrote: Don't think it will work because at the indexing filter stage all the HTML tags are gone from the text. I think you need to modify the HTML parser to filter out the tags you want to get rid of. In some use case I have I

RE: indexing just certain content

2009-10-09 Thread BELLINI ADAM
HI hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking to start to create an HTML tag filter class. mabe i can create my own HTML parser ! as i do for parsing and indexing DublinCore metadata...it sounds possible don't you think so ? i just hv to create also or to

Re: indexing just certain content

2009-10-09 Thread Andrzej Bialecki
BELLINI ADAM wrote: HI hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking to start to create an HTML tag filter class. mabe i can create my own HTML parser ! as i do for parsing and indexing DublinCore metadata...it sounds possible don't you think so ? i just hv

RE: indexing just certain content

2009-10-09 Thread BELLINI ADAM
hi, can you plz just tell us in english what the plugin creativecommons is for ? i mean if i will include this plugin in my nutch-site.txt, what will i have as result ? thx Date: Fri, 9 Oct 2009 19:16:44 +0200 From: a...@getopt.org To: nutch-user@lucene.apache.org Subject: Re: indexing

Re: indexing just certain content

2009-10-09 Thread Ken Krugler
can you plz just tell us in english what the plugin creativecommons is for ? i mean if i will include this plugin in my nutch-site.txt, what will i have as result ? I think Andrzej is suggesting that you read the code. If you look at the beginning of the CCParseFilter.java file, you'll see:

RE: indexing just certain content

2009-10-09 Thread BELLINI ADAM
yes i did read the code but didnt understand what is 'the Creative Commons license' that's why i asked what does mean creativecommons . but as u said, i hv to be familiar with DOM manipulation to understand the code...so lets start knowing DOM thx From: kkrugler_li...@transpac.com