Hi,Franklin My application ended with no result, as I faced gr8 difficulty in deleting unwanted urls from the index, I havn't been able to deleted the unwanted urls, but I have applied double filtering of my search from a list of wanted urls contents .
I thnk in ur case u can use pruneIndexTool which will prune all the unwanted urls related with porn site. and If I found anything updated I will let u know later. Thanks "Ratnesh,V2Solutions,India" franklinb4u wrote: > > Even i am facing the same problems... > I dont know how to eliminate or delete the particular index of an url > which is crawled. > i need to eliminate the porn url's from my search engine... > > i m having the crawled data after crawling with me and now i need to > find,the indexes of the porn urls.. > > please help me in doing this... > > With Thanks, > Franklin.S > > Ratnesh,V2Solutions India wrote: >> >> no, >> i don't think that we hav to deal somthing we that, because if i remove >> then I wont b able to index my own file for which I am crawling to. >> >> But I will surely check, as at this moment I am not very sure?? >> Can you tell me abour ur whereabots?? >> >> Thnks >> Ratnesh V2Soltuons, India >> >> Siddharth Jonathan wrote: >>> >>> Hmmm...I haven't had to do this, but my guess would be to remove the >>> corresponding >>> plugin entries from the nutch-default.xml file. >>> There is a plugin include property in that file which includes the >>> default >>> indexing filters (index-basic,index-more etc) >>> and the query filter plugins(query-basic,query-more etc). Try removing >>> those. That might keep them from getting used. >>> >>> Jonathan >>> >>> >>> On 4/2/07, Ratnesh,V2Solutions India >>> <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> >>>> exactly offcourse , >>>> >>>> I want this only, Do you have any solution for this?? >>>> >>>> looking forwards for your reply >>>> >>>> Thnx >>>> >>>> >>>> Siddharth Jonathan wrote: >>>> > >>>> > Do you mean how do you get rid of some of the fields that are indexed >>>> by >>>> > default? eg. content, anchor text etc. >>>> > >>>> > Jonathan >>>> > On 4/2/07, Ratnesh,V2Solutions India >>>> > <[EMAIL PROTECTED]> >>>> > wrote: >>>> >> >>>> >> >>>> >> Hi, >>>> >> I have written a plugin , which finds no. of Object tags in a html >>>> and >>>> >> corresponding urls. >>>> >> I am storing "objects" as fields and page url as values. >>>> >> >>>> >> And finally interested in seeing the search realted with "objects" >>>> >> indexed >>>> >> fields not those which is already stored as indexed fields. >>>> >> >>>> >> So how shall I delete those index fields which is already stored???? >>>> >> >>>> >> Looking forward towards your reply(Valuable >>>> >> inputs)......................... >>>> >> >>>> >> Thnx to Nutch Community >>>> >> -- >>>> >> View this message in context: >>>> >> >>>> http://www.nabble.com/How-to-delete-already-stored-indexed-fields----tf3504164.html#a9786377 >>>> >> Sent from the Nutch - User mailing list archive at Nabble.com. >>>> >> >>>> >> >>>> > >>>> > >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/How-to-delete-already-stored-indexed-fields----tf3504164.html#a9803792 >>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/How-to-delete-already-stored-indexed-fields----tf3504164.html#a10099493 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
