Hi, Could you please submit a JIRA issue and attach this (or perhaps the diff for whole plugin exluding the jcifs .jar because it is lgpl) in it.
René Treffer wrote: > Hi, > > I've just written an protocol-smb, it's really simple (code attached). > It uses the jcifs lib and seems to work - but there is some stuff I'd > like to discuss... > > Nutch is glued to URL, which works if you write an URLHandler. No > Problem so far, but you can't install an URLHandler everywhere - have a > look at the jcifs FAQ ( http://jcifs.samba.org/src/docs/faq.html ). Most > important: It won't work in you war - so protocol plugins will be > useless in a web context! Might cause a lot of trouble. > Moreover Nutch will never be able to handle \\192.168.0.1\ correctly > with URL.... Perhaps a custom URL parser (nutch currently uses URL class only for parsing urls) could do the job here. I have seen custom implementations at least in tomcat which we could perhaps borrow and extend if required. > > Converting directories into html lists suck. And reproducing the code is > even worse. Perhaps a virtual mime-type could be added (e.g. > "nutch/dir"). Almost forgotten: tell my how I should index files with " > and ' in there name (currently I check for ' and change the href > quotes). Same problem for file:// There could perhaps be a different crawler implementation to crawl local filesystem and these shared windows resources (and perhaps webdav too) efficiently. -- Sami Siren ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
