Hi Phil,

It is not surprising that the connector doesn't like 302 responses and
doesn't know what to do with them, because it isn't supposed to ever be
getting any of these.

I am puzzled by your statement that "only a couple of documents have
redirections in them", because the connector crawls Lists and Library
documents within SharePoint *only*, and these are very specifically
accessible through a SharePoint URL hierarchy structure.  There's no room
in any of that for a 302 redirection.  Since you see a 302 in the UI, I
feel pretty certain you have a problem with your configuration and it is
not just "a couple of documents".

Karl


On Tue, Feb 16, 2016 at 5:22 PM, Phil Riethmuller <
[email protected]> wrote:

> Thanks Karl,
>
> The majority of content is not going to the redirect, it’s probably just a
> handful of documents that are behaving this way.
>
> I’d agree that it’s of lesser concern whether or not the document itself
> is indexing, however I wouldn’t expect the 302 to be treated as a fatal
> error that causes the job to come to a halt. I’d expect the document to be
> passed over, and the crawl to continue.
>
> Is the only solution at this point to remove the documents which redirect
> to a 302 to get the crawl to run in full?
>
> Regards,
>
> *Phil Riethmuller*
> Technical Consultant
>
> *Funnelback |* 437 Kent Street, Sydney, NSW 2000
> *T* +61 2 9045 2882 | funnelback.com <http://www.funnelback.com/>
>
> *AUSTRALIA* | UNITED KINGDOM | NEW ZEALAND | POLAND | UNITED STATES
>
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
> *Twitter*
>
>
> From: Karl Wright <[email protected]>
> Reply-To: <[email protected]>
> Date: Wednesday, 17 February 2016 8:58 am
>
> To: "[email protected]" <[email protected]>
> Subject: Re: HTTP 302 error causing job to abort
>
> Hi Phil,
>
> You probably want to point your SharePoint repository connection to the
> proper server and site, and not rely on redirections.  It's also possible
> that you are missing the site entirely and the redirection you are seeing
> is taking you to some error page somewhere.
>
> I will be raising the question of redirections with the
> HttpComponents/HttpClient team, since I see no obvious problems with the
> SharePoint connector code.  However, if your connection is properly set up,
> redirections should be unneeded.
>
> I would read the documentation on the Wiki page for debugging SharePoint
> connections at the bottom of this page:
> https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections
>
> Thanks,
> Karl
>
>
> On Tue, Feb 16, 2016 at 4:55 PM, Phil Riethmuller <
> [email protected]> wrote:
>
>> Do you mean in the job status in the Manifold CF interface?
>>
>> The job status also shows the same:
>> Error: Unexpected http error code 302 accessing SharePoint at <url>:
>> (302)HTTP/1.0 302 Found
>>
>> I agree, I wouldn’t of thought that the crawler would follow any links or
>> redirections.
>>
>> What sort of configurations could be incorrectly configured, that I could
>> look at revising?
>>
>> Phil
>>
>>
>> From: Karl Wright <[email protected]>
>> Reply-To: <[email protected]>
>> Date: Wednesday, 17 February 2016 8:45 am
>>
>> To: "[email protected]" <[email protected]>
>> Subject: Re: HTTP 302 error causing job to abort
>>
>> Thanks.
>>
>> When you view the repository connection in the UI, do you get a 302 error
>> also?
>>
>> I have looked at the code; Httpclient is supposedly configured to honor
>> redirections.  Obviously it is not doing that, so I'll have to dig deeper
>> into why that is.  On the other hand, I would not expect you to be getting
>> any redirections, unless you have configured your connection incorrectly.
>>
>> Karl
>>
>>
>> On Tue, Feb 16, 2016 at 4:31 PM, Phil Riethmuller <
>> [email protected]> wrote:
>>
>>> Thanks Karl -
>>>
>>> I’ve replaced the actual URL with <URL> below, but here is the stack
>>> trace:
>>>
>>> ERROR 2016-02-16 12:10:55,251 (Worker thread '16') - Exception tossed:
>>> Unexpected http error code 302 accessing SharePoint at <URL>: (302)HTTP/1.0
>>> 302 Found
>>>
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
>>> http error code 302 accessing SharePoint at <URL>: (302)HTTP/1.0 302 Found
>>>
>>>         at
>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getSites(SPSProxyHelper.java:2246)
>>>
>>>         at
>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1549)
>>>
>>>         at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>>
>>> Caused by: (302)HTTP/1.0 302 Found
>>>
>>>         at
>>> org.apache.manifoldcf.connectorcommon.common.CommonsHTTPSender.invoke(CommonsHTTPSender.java:201)
>>>
>>>         at
>>> org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
>>>
>>>         at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
>>>
>>>         at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
>>>
>>>         at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
>>>
>>>         at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
>>>
>>>         at org.apache.axis.client.Call.invoke(Call.java:2767)
>>>
>>>         at org.apache.axis.client.Call.invoke(Call.java:2443)
>>>
>>>         at org.apache.axis.client.Call.invoke(Call.java:2366)
>>>
>>>         at org.apache.axis.client.Call.invoke(Call.java:1812)
>>>
>>>         at
>>> com.microsoft.schemas.sharepoint.soap.WebsSoapStub.getWebCollection(WebsSoapStub.java:854)
>>>
>>>         at
>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getSites(SPSProxyHelper.java:2161)
>>>
>>>
>>>
>>> Regards,
>>>
>>> *Phil Riethmuller*
>>> Technical Consultant
>>>
>>> *Funnelback |* 437 Kent Street, Sydney, NSW 2000
>>> *T* +61 2 9045 2882 | funnelback.com <http://www.funnelback.com/>
>>>
>>> *AUSTRALIA* | UNITED KINGDOM | NEW ZEALAND | POLAND | UNITED STATES
>>>
>>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>>>  *Twitter*
>>>
>>>
>>> From: Karl Wright <[email protected]>
>>> Reply-To: <[email protected]>
>>> Date: Tuesday, 16 February 2016 6:54 pm
>>> To: "[email protected]" <[email protected]>
>>> Subject: Re: HTTP 302 error causing job to abort
>>>
>>> Hi Phil,
>>>
>>> A HTTP 302 response is simply a redirection.  It should not, by itself,
>>> cause a job to abort.  I would expect that to go by in wire/http logging,
>>> but you should not see it anywhere else.  So it is not clear to me what you
>>> are really seeing here.
>>>
>>> Can you include an example stack trace from the manifoldcf log?
>>>
>>> Karl
>>>
>>>
>>> On Tue, Feb 16, 2016 at 12:22 AM, Phil Riethmuller <
>>> [email protected]> wrote:
>>>
>>>> Hi -
>>>>
>>>> When crawling a Sharepoint repository, I’m receiving a HTTP 302 error
>>>> which is causing the manifold job to abort. How do I prevent the crawler
>>>> from aborting the job?
>>>>
>>>> I’m using v2.3 of Manifold with a postgres database.
>>>>
>>>> Regards,
>>>> Phil
>>>>
>>>
>>>
>>
>

Reply via email to