Hi Shigeki-san, I do not know the cause, but I looked at the log of Solr, there were some exceptions that were raised by indexing certain files. And I excluded from indexing these files, as a result I could crawl successfully. If you check Solr's log out then you may find something like this.
Regards, Shinichiro Abe On 2012/09/05, at 14:37, Shigeki Kobayashi wrote: > Hi Abe-san > > I've just faced the same thing as you did, and now having a trouble in > figuring out how to solve this problem. > > Did you figure out how to get ride of this problem? If so, it would be nice > if you could share how you did it. > > > Regards, > > Shigeki > > 2012/8/2 Shinichiro Abe <[email protected]> > Thanks very much for the help! > I understand. > Shinichiro Abe > > On 2012/08/01, at 19:35, Karl Wright wrote: > > > On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe > > <[email protected]> wrote: > >> Hi Karl, > >> > >> I still have a problem. > >> I reduced maximum number of connections into 2. > >> I rebooted the file server, not domain controller. > >> When I configured the paths[1], the log said no error > >> and ShareDrive connector crawled the files successfully. > >> When I made the path's config default(matching * ), > >> the log said "all pipe instances are busy" error. > >> Both of path's config pointed the same location. > >> > >> Also when this error occurred, watching the log of ingest, > >> HttpPoster was waiting for response stream > >> and couldn't get response from Solr, > >> and threw SocketTimeoutException. > >> I increased jcifs.smb.client.responseTimeout > >> but still threw the exception. > >> On Solr, Jetty threw SocketException(socket wr > >> ite error). > >> I'm working on checking Solr logs. > >> Solr may do something wrong when running /update/extract. > >> > > > > If Solr threw the exception this sounds likely. > > > >> Do you know something like this? > >> Does path's matching config affect those errors? > >> > >> [1]Paths Tab: > >> Include directory(s) matching /01* > >> > > > > This should have nothing to do with socket exceptions, except possibly > > that the crawler winds up trying to read a file that isn't actually a > > file but is something else, like a named pipe or something. This > > typically doesn't happen if the server is a Windows machine but if it > > is a Samba server I could imagine something like that happening. > > > > Karl > > > >> P.S. > >> Thank you for fix CONNECTORS-494. > >> I checked trunk code, worked well. > >> > >> Thank you, > >> Shinichiro Abe > >> > >> On 2012/07/24, at 22:13, Karl Wright wrote: > >> > >>> Hi Abe-san, > >>> > >>> Did you figure out what the problem was? > >>> > >>> Karl > >>> > >>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <[email protected]> wrote: > >>>> Hi Abe-san, > >>>> > >>>> Sometimes what looks like a server error can actually be due to the > >>>> domain controller. I wonder if the domain controller needs to be > >>>> rebooted? > >>>> > >>>> Karl > >>>> > >>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe > >>>> <[email protected]> wrote: > >>>>> Hi Karl, > >>>>> Thank you for the reply. > >>>>> I tried to reduce maximum number of connections from 10 > >>>>> to 5, but didn't avoid busy error. I'll try to reduce more. > >>>>> Thank you. > >>>>> Shinichiro Abe > >>>>> > >>>>> On 2012/07/19, at 15:55, Karl Wright wrote: > >>>>> > >>>>>> Hi Abe-san, > >>>>>> > >>>>>> The "all pipe instances are busy" error is coming from the Windows > >>>>>> server you are trying to crawl. I don't know what is happening there > >>>>>> but here are some possibilities: > >>>>>> > >>>>>> (1) The Windows server is just overloaded; you can try reducing the > >>>>>> maximum number of connections to 2 or 3 to see if that helps. > >>>>>> (2) The Windows server needs rebooting. > >>>>>> > >>>>>> Thanks, > >>>>>> Karl > >>>>>> > >>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe > >>>>>> <[email protected]> wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I use windows shares connector and ran a job. > >>>>>>> The job was aborted without done normally and the job's status said: > >>>>>>> Error: Repeated service interruptions - failure processing document: > >>>>>>> Read timed out > >>>>>>> > >>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest > >>>>>>> version's jcifs.jar. > >>>>>>> Is the crawled server busy? I think the server MCF is installed seems > >>>>>>> not to be busy, > >>>>>>> the other servers in which MCF will crawls seem to be busy. > >>>>>>> How can I run the job without error? What's wrong? > >>>>>>> > >>>>>>> > >>>>>>> the logs of connector: > >>>>>>> > >>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS: Possibly > >>>>>>> transient exception detected on attempt 1 while getting share > >>>>>>> security: All pipe instances are busy. > >>>>>>> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) > >>>>>>> at jcifs.smb.SmbTransport.send(SmbTransport.java:663) > >>>>>>> .. > >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: Possibly > >>>>>>> transient exception detected on attempt 3 while getting share > >>>>>>> security: All pipe instances are busy. > >>>>>>> .. > >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy' > >>>>>>> response when getting document version for > >>>>>>> smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying... > >>>>>>> .. > >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest > >>>>>>> service interruption reported for job 1342076182624 connection > >>>>>>> 'Windows shares': Timeout or other service interruption: All pipe > >>>>>>> instances are busy. > >>>>>>> .. > >>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service > >>>>>>> interruption reported for job 1342076182624 connection 'Windows > >>>>>>> shares': Ingestion API socket timeout exception waiting for response > >>>>>>> code: Read timed out; ingestion will be retried again later > >>>>>>> .. > >>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service > >>>>>>> interruption reported for job 1342076182624 connection 'Windows > >>>>>>> shares': Ingestion API socket timeout exception waiting for response > >>>>>>> code: Read timed out; ingestion will be retried again later > >>>>>>> .. > >>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception > >>>>>>> tossed: Repeated service interruptions - failure processing document: > >>>>>>> Read timed out > >>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated > >>>>>>> service interruptions - failure processing document: Read timed out > >>>>>>> at > >>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606) > >>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out > >>>>>>> at java.net.SocketInputStream.socketRead0(Native Method) > >>>>>>> at java.net.SocketInputStream.read(Unknown Source) > >>>>>>> at java.net.SocketInputStream.read(Unknown Source) > >>>>>>> at > >>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571) > >>>>>>> at > >>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598) > >>>>>>> > >>>>>>> Thanks in advance, > >>>>>>> Shinichiro Abe > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >> > >
