protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation. ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: NUTCH-427 URL: https://issues.apache.org/jira/browse/NUTCH-427 Project: Nutch Issue Type: New Feature Components: fetcher Affects Versions: 0.8.1 Environment: JAVA - OS independent Reporter: Armel Nene Priority: Critical Title: protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares Author: Armel T. Nene Email: armel.nene NOSPAM-AT-NOSPAM idna-solutions.com A. Introduction The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also support all the properties from the JCifs library. You can find more information on the following site: http://jcifs.samba.org/ The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) . B. Installation 1) Binaries only: Copy the "protocol-smb" to NUTCHHOME/build/plugins directory. Put the "smb.properties" file in the NUTCHHOME/conf directory. Configure the properties in "smb.properties" file Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory 2) Source code: Always refer to the Nutch wiki for detailed instructions on building Nutch. In short: Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin Update the build.xml in NUTCHHOME/src/plugin to include plugin Update the NUTCHHOME/default.properties file to include plugin run ant to build Copy the 'smb.properties' file to NUTCHHOME/conf, and configure the properties Enable the plugin by updating the nutch-site.xml file C: Known Issues 1) URLMalformedException: unkown protocol: smb The SMB URL protocol handler is not being successfully installed. In short, the jCIFS jar must be loaded by the System class loader. Workaround: a) a short term solutions will be to installed the JCIFS jar library found in protocol-smb folder in JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext b) After completing step a), if the exeception is still thrown set the System properties by passing the following arguments to the JVM: -Djava.protocol.handler.pkgs=jcifs Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html 2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx This problem usually occurs if the following properties are not set correctly in the "smb.properties" file: - username - password - domain Also refer to the following resources for more information on the list of available properties and how to set them: http://jcifs.samba.org/src/docs/api/overview-summary.html#scp Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html N.B. All properties should set in the "smb.properties" file. You can set all supported JCIFS properties in the "smb.properties" file. 3) Only tested on Windows XP and Windows Server 2003. Please report any tests conclusion on other OS. It should also run on any other OS without any change. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers