Arun. Please keep the discussion on the list.
I think what you want is not possible; it should be achieved through the regex-urlfilter. Rgrds, Thomas ---------- Forwarded message ---------- From: Arun Kumar Sharma <[EMAIL PROTECTED]> Date: Apr 25, 2006 9:48 AM Subject: Re: unable to filter different file format like .java,.jar,.class with nutch version 0.7.2 To: TDLN <[EMAIL PROTECTED]> Yes, you misunderstood my question. I do not want to fetch anything for which I didnot enable parse-plugins. I only want to parse html and text files. So I put only put respective parser for these document in plugins directory. But my search result(and log info for fetching) are showing results for .java,.class and .jar ,.dll files. I hope this time you got my problem right.... TDLN <[EMAIL PROTECTED]> wrote: > But earlier it happen I think with nutch 0.7.1 that un-parseable file type > neither fetch nor shown in search results. Isn't this what you want? If so, just use the regex-urlfilter, I would say. What is the sense in fetching files that a) can't be parsed and b) can't be indexed as a result and thus c) will not show in the search results? Or am I misunderstanding your question? Rgrds, Thomas > Why not urlfilter is returning "null" value unparsable content... > Why it is being added to fetchlist as happen earlier with nutch 0.7.1. > FYI, I crawled the same things earlier with nutch 0.7.1, and unparseable > file are not added then in the fetchlist !!, why it is now happen .. Do u > think I have modify something which has side effects of this kind.. > > > TDLN wrote: > > > Since there are number of file format and I can't add each of them in > ignore list. > > Why not? You can add something like > > -\.(java|.class|jar|dll) > > etc. > > Rgrds, Thomas > > > > > Alternative could be that it fetch and show result only of parsable > documents. > > can anybody help me in this regards.....l > > > > > > > > Regards, > > Arun Sharma (Tech Lead-Java/J2EE ) > > www.voltix.com, www.voltixindia.com > > SCO 13-15, Sector 34A > > Chandigarh > > > > > > > > > > > > --------------------------------- > > Jiyo cricket on Yahoo! India cricket > > Yahoo! Messenger Mobile Stay in touch with your buddies all the time. > > > > > > > > Regards, > Arun Sharma (Tech Lead-Java/J2EE ) > www.voltix.com, www.voltixindia.com > SCO 13-15, Sector 34A > Chandigarh > > > ________________________________ > Jiyo cricket on Yahoo! India cricket > Yahoo! Messenger Mobile Stay in touch with your buddies all the time. > > Regards, Arun Sharma (Tech Lead-Java/J2EE ) www.voltix.com, www.voltixindia.com SCO 13-15, Sector 34A Chandigarh ________________________________ Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
