hi

one of a available solution is to set up webdav and crawl resoutses as
files e.g. file://. but it wont exclude authentication.


Alexander

On 24/11/2011, Lewis John Mcgibbney <[email protected]> wrote:
> Hi Arkadi,
>
> Are you saying that this has been solved and that are successfully able to
> crawl the server?
>
> Thanks
>
> On Thu, Nov 24, 2011 at 12:48 AM, <[email protected]> wrote:
>
>> Hi,
>>
>> I am crawling a SharePoint server, no major problems. I do have to use
>> protocol-httpclient for this. Here is an extract from my
>> httpclient-auth.xml file, if it helps:
>>
>> <auth-configuration>
>>  <credentials username="myusername" password="mypassword">
>>    <default realm="myrealm" />
>>  </credentials>
>> </auth-configuration>
>>
>> Regards,
>>
>> Arkadi
>>
>> > -----Original Message-----
>> > From: Lewis John Mcgibbney [mailto:[email protected]]
>> > Sent: Tuesday, 22 November 2011 9:43 PM
>> > To: [email protected]
>> > Subject: Re: Nutch and Sharepoint authentication
>> >
>> > Hi,
>> >
>> > From what I have read on the Nutch user@ archives [1] it is possible to
>> > crawl a MS Sharepoint server which includes setting up NTLM
>> > authentication
>> > for your crawler. It is becoming a pretty major problem now the the
>> > protocol-httpclient plugin is unstable, there are Jira issues open for
>> > this.
>> >
>> > Unfortunately as Manifold CF is in incubation status, it can only be
>> > expected that they might have not completed all documentation yet,
>> > however
>> > I advise you to try there as well, as them about the Sharepoint
>> > configuration/documentation if it is not possible for you to work with
>> > Nutch protocol-httpclient.
>> >
>> > hth
>> >
>> > [1]
>> > http://www.mail-
>> > archive.com/search?q=sharepoint&l=user%40nutch.apache.org
>> >
>> > On Tue, Nov 22, 2011 at 5:27 AM, remi tassing <[email protected]>
>> > wrote:
>> >
>> > > Hello guys,
>> > >
>> > > I read the wiki on
>> > > "HttpAuthenticationSchemes<
>> > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes>".
>> > > I previously managed to make Nutch crawl local folders and websites
>> > (with
>> > > SSL authentication). However, I'm trying to crawl some sites in a
>> > corporate
>> > > intranet environment running under MS Sharepoint. I was unsucceful so
>> > far
>> > > and I believe it's because of authentication.
>> > >
>> > >
>> > >   - Is Nutch able to crawl Sharepoint? If yes, do you have a
>> > link/mail
>> > >   tutorial on this?
>> > >
>> > >
>> > > I was recently aware of the ManifoldCF initiative and it seems to be
>> > an
>> > > eventual solution to my problem. But it's currently poorly documented
>> > (as
>> > > far as Sharepoint connector is concerned).
>> > >
>> > >   - Do you have any recommendation on this regards?
>> > >
>> > >
>> > > Thanks in advance for your help, I'll really appreciate it!
>> > >
>> > > --
>> > > Remi Tassing
>> > >
>> >
>> >
>> >
>> > --
>> > *Lewis*
>>
>
>
>
> --
> *Lewis*
>


-- 
Best Regards
Alexander Aristov

Reply via email to