Hi Phil, It is not surprising that the connector doesn't like 302 responses and doesn't know what to do with them, because it isn't supposed to ever be getting any of these.
I am puzzled by your statement that "only a couple of documents have redirections in them", because the connector crawls Lists and Library documents within SharePoint *only*, and these are very specifically accessible through a SharePoint URL hierarchy structure. There's no room in any of that for a 302 redirection. Since you see a 302 in the UI, I feel pretty certain you have a problem with your configuration and it is not just "a couple of documents". Karl On Tue, Feb 16, 2016 at 5:22 PM, Phil Riethmuller < [email protected]> wrote: > Thanks Karl, > > The majority of content is not going to the redirect, it’s probably just a > handful of documents that are behaving this way. > > I’d agree that it’s of lesser concern whether or not the document itself > is indexing, however I wouldn’t expect the 302 to be treated as a fatal > error that causes the job to come to a halt. I’d expect the document to be > passed over, and the crawl to continue. > > Is the only solution at this point to remove the documents which redirect > to a 302 to get the crawl to run in full? > > Regards, > > *Phil Riethmuller* > Technical Consultant > > *Funnelback |* 437 Kent Street, Sydney, NSW 2000 > *T* +61 2 9045 2882 | funnelback.com <http://www.funnelback.com/> > > *AUSTRALIA* | UNITED KINGDOM | NEW ZEALAND | POLAND | UNITED STATES > > Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - > *Twitter* > > > From: Karl Wright <[email protected]> > Reply-To: <[email protected]> > Date: Wednesday, 17 February 2016 8:58 am > > To: "[email protected]" <[email protected]> > Subject: Re: HTTP 302 error causing job to abort > > Hi Phil, > > You probably want to point your SharePoint repository connection to the > proper server and site, and not rely on redirections. It's also possible > that you are missing the site entirely and the redirection you are seeing > is taking you to some error page somewhere. > > I will be raising the question of redirections with the > HttpComponents/HttpClient team, since I see no obvious problems with the > SharePoint connector code. However, if your connection is properly set up, > redirections should be unneeded. > > I would read the documentation on the Wiki page for debugging SharePoint > connections at the bottom of this page: > https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections > > Thanks, > Karl > > > On Tue, Feb 16, 2016 at 4:55 PM, Phil Riethmuller < > [email protected]> wrote: > >> Do you mean in the job status in the Manifold CF interface? >> >> The job status also shows the same: >> Error: Unexpected http error code 302 accessing SharePoint at <url>: >> (302)HTTP/1.0 302 Found >> >> I agree, I wouldn’t of thought that the crawler would follow any links or >> redirections. >> >> What sort of configurations could be incorrectly configured, that I could >> look at revising? >> >> Phil >> >> >> From: Karl Wright <[email protected]> >> Reply-To: <[email protected]> >> Date: Wednesday, 17 February 2016 8:45 am >> >> To: "[email protected]" <[email protected]> >> Subject: Re: HTTP 302 error causing job to abort >> >> Thanks. >> >> When you view the repository connection in the UI, do you get a 302 error >> also? >> >> I have looked at the code; Httpclient is supposedly configured to honor >> redirections. Obviously it is not doing that, so I'll have to dig deeper >> into why that is. On the other hand, I would not expect you to be getting >> any redirections, unless you have configured your connection incorrectly. >> >> Karl >> >> >> On Tue, Feb 16, 2016 at 4:31 PM, Phil Riethmuller < >> [email protected]> wrote: >> >>> Thanks Karl - >>> >>> I’ve replaced the actual URL with <URL> below, but here is the stack >>> trace: >>> >>> ERROR 2016-02-16 12:10:55,251 (Worker thread '16') - Exception tossed: >>> Unexpected http error code 302 accessing SharePoint at <URL>: (302)HTTP/1.0 >>> 302 Found >>> >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected >>> http error code 302 accessing SharePoint at <URL>: (302)HTTP/1.0 302 Found >>> >>> at >>> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getSites(SPSProxyHelper.java:2246) >>> >>> at >>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1549) >>> >>> at >>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) >>> >>> Caused by: (302)HTTP/1.0 302 Found >>> >>> at >>> org.apache.manifoldcf.connectorcommon.common.CommonsHTTPSender.invoke(CommonsHTTPSender.java:201) >>> >>> at >>> org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) >>> >>> at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) >>> >>> at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) >>> >>> at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165) >>> >>> at org.apache.axis.client.Call.invokeEngine(Call.java:2784) >>> >>> at org.apache.axis.client.Call.invoke(Call.java:2767) >>> >>> at org.apache.axis.client.Call.invoke(Call.java:2443) >>> >>> at org.apache.axis.client.Call.invoke(Call.java:2366) >>> >>> at org.apache.axis.client.Call.invoke(Call.java:1812) >>> >>> at >>> com.microsoft.schemas.sharepoint.soap.WebsSoapStub.getWebCollection(WebsSoapStub.java:854) >>> >>> at >>> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getSites(SPSProxyHelper.java:2161) >>> >>> >>> >>> Regards, >>> >>> *Phil Riethmuller* >>> Technical Consultant >>> >>> *Funnelback |* 437 Kent Street, Sydney, NSW 2000 >>> *T* +61 2 9045 2882 | funnelback.com <http://www.funnelback.com/> >>> >>> *AUSTRALIA* | UNITED KINGDOM | NEW ZEALAND | POLAND | UNITED STATES >>> >>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - >>> *Twitter* >>> >>> >>> From: Karl Wright <[email protected]> >>> Reply-To: <[email protected]> >>> Date: Tuesday, 16 February 2016 6:54 pm >>> To: "[email protected]" <[email protected]> >>> Subject: Re: HTTP 302 error causing job to abort >>> >>> Hi Phil, >>> >>> A HTTP 302 response is simply a redirection. It should not, by itself, >>> cause a job to abort. I would expect that to go by in wire/http logging, >>> but you should not see it anywhere else. So it is not clear to me what you >>> are really seeing here. >>> >>> Can you include an example stack trace from the manifoldcf log? >>> >>> Karl >>> >>> >>> On Tue, Feb 16, 2016 at 12:22 AM, Phil Riethmuller < >>> [email protected]> wrote: >>> >>>> Hi - >>>> >>>> When crawling a Sharepoint repository, I’m receiving a HTTP 302 error >>>> which is causing the manifold job to abort. How do I prevent the crawler >>>> from aborting the job? >>>> >>>> I’m using v2.3 of Manifold with a postgres database. >>>> >>>> Regards, >>>> Phil >>>> >>> >>> >> >
