[ 
https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Bauer updated NUTCH-427:
------------------------------

          Description: 
Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows 
shares
Author:   Armel T. Nene 
Update:   Vadim Bauer
Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r 
<AT> g m x . d e

A.  Introduction

    The protocol-smb plugins allows you to crawl Microsoft Windows shares. It 
implements
    the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin 
replicate the
    behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the 
JCifs library and also
    support all the properties from the JCifs library.
    You can find more information on the following site: http://jcifs.samba.org/
    The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. 
smb://server/share).
    
B.  Installation

    1) Binaries only:   The protocol-smb files can be found in the ../plugins 
directory.
                                Copy the "protocol-smb" to 
NUTCHHOME/build/plugins directory.
                        Put the "smb.properties" file in the NUTCHHOME/conf 
directory.
                        Configure the properties in "smb.properties" file
                        Enable the plugin by updating "nutch-site.xml" file 
found in NUTCHHOME/conf directory
                                e.g. <property>
                                        <name>plugin.includes</name>
                                        <value>protocol-smb| other 
plugins...</value>
                                        <description>
                                        </description>
                                     </property>

    2)  Source code:    The protocol-smb sources can be found in the ../src 
directory.
                                Always refer to the Nutch wiki for detailed 
instructions on building Nutch.  In short:
                        Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
                        Update the build.xml in NUTCHHOME/src/plugin to include 
plugin
                        Update the NUTCHHOME/default.properties file to include 
plugin
                        run ant to build
                        Copy the 'smb.properties' file to NUTCHHOME/conf, and 
configure the properties
                        Enable the plugin by updating the nutch-site.xml file

C: Known Issues

    1) URLMalformedException: unkown protocol: smb

       The SMB URL protocol handler is not being successfully installed. 
       In short, the jCIFS jar must be loaded by the System class loader.

       Workaround: a) a short term solutions will be to installed the JCIFS jar 
                      library found in protocol-smb folder in 
                      JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext

                   b) After completing step a), if the exeception is still 
thrown
                      set the System properties by passing the following 
arguments
                      to the JVM: 

                      -Djava.protocol.handler.pkgs=jcifs

                         c) You can set the property also in your Code for 
example if 
                            you start Crawling with org.apache.nutch.crawl.Crawl
                            Add the following two lines. This will be the Same 
like in b)
                            public static void main(String args[]) throws 
Exception {
                                
System.setProperty("java.protocol.handler.pkgs", "jcifs");
                                new 
java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
                                //and so on

       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

    2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx

       This problem usually occurs if the following properties are not set 
correctly in
       the "smb.properties" file:

       - username
       - password
       - domain

       Also refer to the following resources for more information on the list of
       available properties and how to set them:

       http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

       N.B. All properties should set in the "smb.properties" file. You can set 
            all supported JCIFS properties in the "smb.properties" file.
     
    3) Only tested on Windows XP and Windows Server 2003. Please report any 
tests 
       conclusion on other OS.

  was:
Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows 
shares
Author:   Armel T. Nene
Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com

A.  Introduction

    The protocol-smb plugins allows you to crawl Microsoft Windows shares. It 
implements
    the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin 
replicate the
    behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the 
JCifs library and also
    support all the properties from the JCifs library.
    You can find more information on the following site: http://jcifs.samba.org/
    The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share) 
.
    
B.  Installation

    1) Binaries only:   Copy the "protocol-smb" to NUTCHHOME/build/plugins 
directory.
                        Put the "smb.properties" file in the NUTCHHOME/conf 
directory.
                        Configure the properties in "smb.properties" file
                        Enable the plugin by updating "nutch-site.xml" file 
found in NUTCHHOME/conf directory

    2)  Source code:    Always refer to the Nutch wiki for detailed 
instructions on building Nutch.  In short:
                        Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
                        Update the build.xml in NUTCHHOME/src/plugin to include 
plugin
                        Update the NUTCHHOME/default.properties file to include 
plugin
                        run ant to build
                        Copy the 'smb.properties' file to NUTCHHOME/conf, and 
configure the properties
                        Enable the plugin by updating the nutch-site.xml file

C: Known Issues

    1) URLMalformedException: unkown protocol: smb

       The SMB URL protocol handler is not being successfully installed. 
       In short, the jCIFS jar must be loaded by the System class loader.

       Workaround: a) a short term solutions will be to installed the JCIFS jar 
                      library found in protocol-smb folder in 
                      JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext

                   b) After completing step a), if the exeception is still 
thrown
                      set the System properties by passing the following 
arguments
                      to the JVM: 

                      -Djava.protocol.handler.pkgs=jcifs

       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

    2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx

       This problem usually occurs if the following properties are not set 
correctly in
       the "smb.properties" file:

       - username
       - password
       - domain

       Also refer to the following resources for more information on the list of
       available properties and how to set them:

       http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
       Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html

       N.B. All properties should set in the "smb.properties" file. You can set 
            all supported JCIFS properties in the "smb.properties" file.
     
    3) Only tested on Windows XP and Windows Server 2003. Please report any 
tests 
       conclusion on other OS. It should also run on any other OS without any 
change.

    Affects Version/s: 1.0.0
                       0.9.0

The update fixes some issues which I had with the old version by trying to use 
it with Nutch 1.0-dev

> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This 
> protocol allows Nutch to crawl Microsoft Windows Shares remotely using the 
> CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>         Attachments: protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows 
> shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r 
> <AT> g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It 
> implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin 
> replicate the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses 
> the JCifs library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: 
> http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. 
> smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins 
> directory.
>                               Copy the "protocol-smb" to 
> NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf 
> directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file 
> found in NUTCHHOME/conf directory
>                               e.g. <property>
>                                       <name>plugin.includes</name>
>                                       <value>protocol-smb| other 
> plugins...</value>
>                                       <description>
>                                       </description>
>                                    </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src 
> directory.
>                               Always refer to the Nutch wiki for detailed 
> instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to 
> include plugin
>                         Update the NUTCHHOME/default.properties file to 
> include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and 
> configure the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS 
> jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still 
> thrown
>                       set the System properties by passing the following 
> arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
>                        c) You can set the property also in your Code for 
> example if 
>                           you start Crawling with org.apache.nutch.crawl.Crawl
>                           Add the following two lines. This will be the Same 
> like in b)
>                           public static void main(String args[]) throws 
> Exception {
>                               
> System.setProperty("java.protocol.handler.pkgs", "jcifs");
>                               new 
> java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
>                               //and so on
>        Also you can visit the FAQ page: 
> http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set 
> correctly in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list 
> of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: 
> http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can 
> set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any 
> tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to