Hi Shigeki-san,

I do not know the cause, but I looked at the log of Solr, 
there were some exceptions that were raised by indexing certain files.
And I excluded from indexing these files, 
as a result I could crawl successfully.
If you check Solr's log out then you may find something like this.

Regards,
Shinichiro Abe

On 2012/09/05, at 14:37, Shigeki Kobayashi wrote:

> Hi Abe-san
> 
> I've just faced the same thing as you did, and now having a trouble in 
> figuring out how to solve this problem. 
> 
> Did you figure out how to get ride of this problem? If so, it would be nice 
> if you could share how you did it.
> 
> 
> Regards,
> 
> Shigeki
> 
> 2012/8/2 Shinichiro Abe <[email protected]>
> Thanks very much for the help!
> I understand.
> Shinichiro Abe
> 
> On 2012/08/01, at 19:35, Karl Wright wrote:
> 
> > On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe
> > <[email protected]> wrote:
> >> Hi Karl,
> >>
> >> I still have a problem.
> >> I reduced maximum number of connections into 2.
> >> I rebooted the file server, not domain controller.
> >> When I configured the paths[1], the log said no error
> >> and ShareDrive connector crawled the files successfully.
> >> When I made the path's config default(matching * ),
> >> the log said "all pipe instances are busy" error.
> >> Both of path's config pointed the same location.
> >>
> >> Also when this error occurred, watching the log of ingest,
> >> HttpPoster was waiting for response stream
> >> and couldn't get response from Solr,
> >> and threw SocketTimeoutException.
> >> I increased jcifs.smb.client.responseTimeout
> >> but still threw the exception.
> >> On Solr, Jetty threw SocketException(socket wr
> >> ite error).
> >> I'm working on checking Solr logs.
> >> Solr may do something wrong when running /update/extract.
> >>
> >
> > If Solr threw the exception this sounds likely.
> >
> >> Do you know something like this?
> >> Does path's matching config affect those errors?
> >>
> >> [1]Paths Tab:
> >> Include  directory(s)  matching  /01*
> >>
> >
> > This should have nothing to do with socket exceptions, except possibly
> > that the crawler winds up trying to read a file that isn't actually a
> > file but is something else, like a named pipe or something.  This
> > typically doesn't happen if the server is a Windows machine but if it
> > is a Samba server I could imagine something like that happening.
> >
> > Karl
> >
> >> P.S.
> >> Thank you for fix CONNECTORS-494.
> >> I checked trunk code, worked well.
> >>
> >> Thank you,
> >> Shinichiro Abe
> >>
> >> On 2012/07/24, at 22:13, Karl Wright wrote:
> >>
> >>> Hi Abe-san,
> >>>
> >>> Did you figure out what the problem was?
> >>>
> >>> Karl
> >>>
> >>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <[email protected]> wrote:
> >>>> Hi Abe-san,
> >>>>
> >>>> Sometimes what looks like a server error can actually be due to the
> >>>> domain controller.  I wonder if the domain controller needs to be
> >>>> rebooted?
> >>>>
> >>>> Karl
> >>>>
> >>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
> >>>> <[email protected]> wrote:
> >>>>> Hi Karl,
> >>>>> Thank you for the reply.
> >>>>> I tried to reduce maximum number of connections from 10
> >>>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
> >>>>> Thank you.
> >>>>> Shinichiro Abe
> >>>>>
> >>>>> On 2012/07/19, at 15:55, Karl Wright wrote:
> >>>>>
> >>>>>> Hi Abe-san,
> >>>>>>
> >>>>>> The "all pipe instances are busy" error is coming from the Windows
> >>>>>> server you are trying to crawl.  I don't know what is happening there
> >>>>>> but here are some possibilities:
> >>>>>>
> >>>>>> (1) The Windows server is just overloaded; you can try reducing the
> >>>>>> maximum number of connections to 2 or 3 to see if that helps.
> >>>>>> (2) The Windows server needs rebooting.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Karl
> >>>>>>
> >>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
> >>>>>> <[email protected]> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I use windows shares connector and ran a job.
> >>>>>>> The job was aborted without done normally and the job's status said:
> >>>>>>> Error: Repeated service interruptions - failure processing document: 
> >>>>>>> Read timed out
> >>>>>>>
> >>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest 
> >>>>>>> version's jcifs.jar.
> >>>>>>> Is the crawled server busy? I think the server MCF is installed seems 
> >>>>>>> not to be busy,
> >>>>>>> the other servers in which MCF will crawls seem to be busy.
> >>>>>>> How can I run the job without error? What's wrong?
> >>>>>>>
> >>>>>>>
> >>>>>>> the logs of connector:
> >>>>>>>
> >>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS: Possibly 
> >>>>>>> transient exception detected on attempt 1 while getting share 
> >>>>>>> security: All pipe instances are busy.
> >>>>>>>      at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
> >>>>>>>      at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: Possibly 
> >>>>>>> transient exception detected on attempt 3 while getting share 
> >>>>>>> security: All pipe instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy' 
> >>>>>>> response when getting document version for 
> >>>>>>> smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying...
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest 
> >>>>>>> service interruption reported for job 1342076182624 connection 
> >>>>>>> 'Windows shares': Timeout or other service interruption: All pipe 
> >>>>>>> instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service 
> >>>>>>> interruption reported for job 1342076182624 connection 'Windows 
> >>>>>>> shares': Ingestion API socket timeout exception waiting for response 
> >>>>>>> code: Read timed out; ingestion will be retried again later
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service 
> >>>>>>> interruption reported for job 1342076182624 connection 'Windows 
> >>>>>>> shares': Ingestion API socket timeout exception waiting for response 
> >>>>>>> code: Read timed out; ingestion will be retried again later
> >>>>>>> ..
> >>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception 
> >>>>>>> tossed: Repeated service interruptions - failure processing document: 
> >>>>>>> Read timed out
> >>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated 
> >>>>>>> service interruptions - failure processing document: Read timed out
> >>>>>>>      at 
> >>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
> >>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
> >>>>>>>      at java.net.SocketInputStream.socketRead0(Native Method)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at 
> >>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
> >>>>>>>      at 
> >>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>> Shinichiro Abe
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> 
> 

Reply via email to