RE: indexing just certain content

2009-10-11 Thread MilleBii
This is not very clear: There is big difference between removing garbage for indexingFilter and removing search results... I think you want the first one. You just need to build a custom Parser that will filter out the tags you dont want

RE: indexing just certain content

2009-10-11 Thread BELLINI ADAM
: indexing just certain content Date: Sun, 11 Oct 2009 11:02:21 +0200 To: nutch-user@lucene.apache.org This is not very clear: There is big difference between removing garbage for indexingFilter and removing search results... I think you want the first one. You just need to build a custom

Re: indexing just certain content

2009-10-10 Thread Andrzej Bialecki
MilleBii wrote: Andzej, The use case you are thinking is : at the parsing stage, filter out garbage content and index only the rest. I have a different use case, I want to keep everything as standard indexing _AND_ also extract part for being indexed in a dedicated field (which will be

RE: indexing just certain content

2009-10-10 Thread BELLINI ADAM
are using SOLR, so i just have to index the important content...the search will be performed with solr so i guess i dont need the QueryFilter. best regards Date: Sat, 10 Oct 2009 16:04:10 +0200 From: a...@getopt.org To: nutch-user@lucene.apache.org Subject: Re: indexing just certain content

RE: indexing just certain content

2009-10-10 Thread BELLINI ADAM
an html page :( if i will find this piece the rest will be like a peice of cake :) Date: Sat, 10 Oct 2009 16:41:44 +0200 Subject: Re: indexing just certain content From: mille...@gmail.com To: nutch-user@lucene.apache.org Andrzej, Great !!! I did not realize you could put your own

RE: indexing just certain content

2009-10-10 Thread BELLINI ADAM
what i want is exactly explained in this second post : How to ignore search results that don't have related keywords in main body? From: mbel...@msn.com To: nutch-user@lucene.apache.org Subject: RE: indexing just certain content Date: Sat, 10 Oct 2009 15:35:31 + yes

Re: indexing just certain content

2009-10-09 Thread MilleBii
Don't think it will work because at the indexing filter stage all the HTML tags are gone from the text. I think you need to modify the HTML parser to filter out the tags you want to get rid of. In some use case I have I would like to perform 'intelligent indexing', ie use the tag information to

Re: indexing just certain content

2009-10-09 Thread Gora Mohanty
On Fri, 9 Oct 2009 18:00:41 +0200 MilleBii mille...@gmail.com wrote: Don't think it will work because at the indexing filter stage all the HTML tags are gone from the text. I think you need to modify the HTML parser to filter out the tags you want to get rid of. In some use case I have I

RE: indexing just certain content

2009-10-09 Thread BELLINI ADAM
or to find a class which could filter an HTML pages and delete certain tag from it Thx. Date: Fri, 9 Oct 2009 22:04:41 +0530 From: g...@srijan.in To: nutch-user@lucene.apache.org Subject: Re: indexing just certain content On Fri, 9 Oct 2009 18:00:41 +0200 MilleBii mille...@gmail.com wrote

Re: indexing just certain content

2009-10-09 Thread Andrzej Bialecki
BELLINI ADAM wrote: HI hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking to start to create an HTML tag filter class. mabe i can create my own HTML parser ! as i do for parsing and indexing DublinCore metadata...it sounds possible don't you think so ? i just hv

RE: indexing just certain content

2009-10-09 Thread BELLINI ADAM
just certain content BELLINI ADAM wrote: HI hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking to start to create an HTML tag filter class. mabe i can create my own HTML parser ! as i do for parsing and indexing DublinCore metadata...it sounds possible

Re: indexing just certain content

2009-10-09 Thread Ken Krugler
2009 19:16:44 +0200 From: a...@getopt.org To: nutch-user@lucene.apache.org Subject: Re: indexing just certain content BELLINI ADAM wrote: HI hI THX FOR YOUR DETAILED ANSWER...you make me save lotofftime , i was thinking to start to create an HTML tag filter class. mabe i can create my own HTML

RE: indexing just certain content

2009-10-09 Thread BELLINI ADAM
To: nutch-user@lucene.apache.org Subject: Re: indexing just certain content Date: Fri, 9 Oct 2009 16:39:31 -0700 can you plz just tell us in english what the plugin creativecommons is for ? i mean if i will include this plugin in my nutch-site.txt, what will i have as result ? I

Re: indexing just certain content

2009-10-07 Thread BELLINI ADAM
in this class the BasicIndexingFilter.java, I think before adding the contenent to the document i could parse it again to filter certain div tags ?? text = parse.getText(); // i have to parse and filter the text here before adding it to the docuement new_Filtred_text =

Re: indexing just certain content

2009-10-05 Thread Eric
Adam, You could turn off all the indexing plugins and write your own plugin that only indexes certain meta content from your intranet - giving you complete control of the fields indexed. Eric On Oct 5, 2009, at 1:06 PM, BELLINI ADAM wrote: hi does anybody know if it's possible to