Hi,
      I am crawling filesystem with nutch 0.7.2 on windows. I have enabled 
parse plugin for text and html. 
              It is to my surprise that it is including search results of file 
with extension of .java, .class,.jar,.dll  and so on so forth.
     I can add these into ignore list in regex-urlfilter.txt. But that is not a 
solution. Since there are number of file format and I can't add each of them in 
ignore list.
     Alternative could be that it fetch and show result only of parsable 
documents.
     can anybody help me in this regards.....l
   


    Regards, 
Arun Sharma (Tech Lead-Java/J2EE ) 
  www.voltix.com, www.voltixindia.com
  SCO 13-15, Sector 34A
  Chandigarh




                                
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.

Reply via email to