Hi Othman, I used the java dependency inspector to see what the issue is and it turns out that poi-ooxml.jar does refer back to poi.jar in the class that is failing. So you will need to move poi-3.15.jar and commons-collections4-1.4.jar to the other place as well.
Let's hope that finally fixes this issue. I'm very unhappy about the quality of the POI project code; it is definitely not using reasonable engineering practices, and I will be opening a ticket with them. Thanks, Karl On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <[email protected]> wrote: > I'm using the file based example and all the changes you told me to do. I > reproduced them in the file based example. I'll try to install zookeeper > and use the zookeeper example. Will I need a configuration to do in order > to run the zookeeper example ? > > Othman. > > On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> wrote: > >> Are you using the zookeeper example, or the file-based example? >> >> If these jars have all been moved, and the options.env includes them, >> then I have to conclude that Apache POI's pom.xml is incorrect too. It >> will take a while to figure out what's missing that poi-ooxml.jar needs >> that is not listed. >> >> Karl >> >> >> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected]> >> wrote: >> >>> All the dependencies you mentioned have already been added in the >>> options.env.win file in the multiprocess-file-example repository. >>> >>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> wrote: >>> >>>> Yes, I added it in the options.env.win file. Should it be the one in >>>> the multiprocess-zk-example document or multiprocess-file-example ? >>>> >>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> wrote: >>>> >>>>> It's not related at all to elasticsearch. >>>>> Karl >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <[email protected]> >>>>> wrote: >>>>> >>>>>> Could it be a problem of elasticsearch's version ? I'm actually using >>>>>> 2.1.0 which is pretty old for this new version of ManifoldCF? >>>>>> >>>>>> Othman. >>>>>> >>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I moved back both the jars you mentioned and a different is showing. >>>>>>> You will find the stack trace attached. >>>>>>> >>>>>>> Thanks, >>>>>>> Othman >>>>>>> >>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I've looked at the dependencies; you should not have moved >>>>>>>> poi-3.15.jar. Please move that back, and commons-collections4-4.1.jar >>>>>>>> too. >>>>>>>> >>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> If you include poi.jar, then all dependencies of poi.jar must also >>>>>>>>> be included. This would mean that curvesapi-1.04.jar and >>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Karl, >>>>>>>>>> >>>>>>>>>> I added the two jars that you have mentioned and another one : >>>>>>>>>> poi-3.15.jar . Unfortunately, there is another error showing. This >>>>>>>>>> time, it >>>>>>>>>> concerns excel files. You will find attached the stack trace. >>>>>>>>>> >>>>>>>>>> Othman. >>>>>>>>>> >>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Othman, >>>>>>>>>>> >>>>>>>>>>> Yes, this shows that the jar we moved calls back into another >>>>>>>>>>> jar, which will also need to be moved. *That* jar has yet another >>>>>>>>>>> dependency too. >>>>>>>>>>> >>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>> >>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> You will find attached the stack trace. My apologies for the >>>>>>>>>>>> bad quality of the image, I'm doing my best to send you the stack >>>>>>>>>>>> trace as >>>>>>>>>>>> I don't have the right to send documents outside the company. >>>>>>>>>>>> >>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>> >>>>>>>>>>>> Othman >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the problem >>>>>>>>>>>>> is. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into the >>>>>>>>>>>>>> log file and saw the following error: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/ >>>>>>>>>>>>>> POIXMLTypeLoader. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have tried what you told me to do, and you expected the >>>>>>>>>>>>>>> crawling resumed. How about the regular expressions? How can I >>>>>>>>>>>>>>> make complex >>>>>>>>>>>>>>> regular expressions in the job's paths tab ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it works. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env files >>>>>>>>>>>>>>>>> to include them in the classpath for startup. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you >>>>>>>>>>>>>>>>>> could try. Specifically: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika server >>>>>>>>>>>>>>>>>>> transformer rather than the embedded Tika Extractor. I'm >>>>>>>>>>>>>>>>>>> still looking >>>>>>>>>>>>>>>>>>> into why the jar is not being found. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and >>>>>>>>>>>>>>>>>>>> my job got stuck on that specific file. >>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in the >>>>>>>>>>>>>>>>>>>> attached file. For your information, the job started >>>>>>>>>>>>>>>>>>>> yesterday. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing. >>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you >>>>>>>>>>>>>>>>>>>>> are indeed using the binary distribution. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For security >>>>>>>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I have >>>>>>>>>>>>>>>>>>>>>> copied the stack >>>>>>>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will >>>>>>>>>>>>>>>>>>>>>> be helpful. >>>>>>>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to >>>>>>>>>>>>>>>>>>>>>> restrict the crawling >>>>>>>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For >>>>>>>>>>>>>>>>>>>>>> instance, I would >>>>>>>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that >>>>>>>>>>>>>>>>>>>>>> counts the 'sound' >>>>>>>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is >>>>>>>>>>>>>>>>>>>>>> with capital letters >>>>>>>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows >>>>>>>>>>>>>>>>>>>>>>> share connector is by specifying information on the >>>>>>>>>>>>>>>>>>>>>>> "Paths" tab in jobs >>>>>>>>>>>>>>>>>>>>>>> that crawl windows shares. There is end-user >>>>>>>>>>>>>>>>>>>>>>> documentation both online and >>>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that describe >>>>>>>>>>>>>>>>>>>>>>> how to do this. >>>>>>>>>>>>>>>>>>>>>>> Have you found it? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using >>>>>>>>>>>>>>>>>>>>>>>> zookeeper and I will let you know if it works. I have >>>>>>>>>>>>>>>>>>>>>>>> another question to >>>>>>>>>>>>>>>>>>>>>>>> ask. Actually, I need to make some filters while >>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to >>>>>>>>>>>>>>>>>>>>>>>> crawl some files and some folders. Could you give me >>>>>>>>>>>>>>>>>>>>>>>> an example of how to >>>>>>>>>>>>>>>>>>>>>>>> use the regex. Does the regex allow to use /i to >>>>>>>>>>>>>>>>>>>>>>>> ignore cases ? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people often >>>>>>>>>>>>>>>>>>>>>>>>> have problems with getting file permissions right, >>>>>>>>>>>>>>>>>>>>>>>>> and they do not >>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and >>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient >>>>>>>>>>>>>>>>>>>>>>>>> against that. I highly recommend using zookeeper >>>>>>>>>>>>>>>>>>>>>>>>> sync. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into >>>>>>>>>>>>>>>>>>>>>>>>> memory so you do not need huge amounts of memory. >>>>>>>>>>>>>>>>>>>>>>>>> The default values are >>>>>>>>>>>>>>>>>>>>>>>>> more than enough for 35,000 files, which is a pretty >>>>>>>>>>>>>>>>>>>>>>>>> small job for >>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know >>>>>>>>>>>>>>>>>>>>>>>>>> how is zookeeper different from file based sync? I >>>>>>>>>>>>>>>>>>>>>>>>>> also need a guidance on >>>>>>>>>>>>>>>>>>>>>>>>>> how to manage my pc's memory. How many Go should I >>>>>>>>>>>>>>>>>>>>>>>>>> allocate for the >>>>>>>>>>>>>>>>>>>>>>>>>> start-agent of ManifoldCF? Is 4Go enough in order to >>>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and >>>>>>>>>>>>>>>>>>>>>>>>>>> that's interfering with ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of file-based >>>>>>>>>>>>>>>>>>>>>>>>>>> sync. >>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after >>>>>>>>>>>>>>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I >>>>>>>>>>>>>>>>>>>>>>>>>>>> have looked into the ManifoldCF log file and >>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following >>>>>>>>>>>>>>>>>>>>>>>>>>>> warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2. >>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access is >>>>>>>>>>>>>>>>>>>>>>>>>>>> denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be >>>>>>>>>>>>>>>>>>>>>>>>>>>> full. Shutting down process; locks may be left >>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You must cleanup >>>>>>>>>>>>>>>>>>>>>>>>>>>> before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the elasticsearch >>>>>>>>>>>>>>>>>>>>>>>>>>>> output connection. Moreover, the job uses Tika to >>>>>>>>>>>>>>>>>>>>>>>>>>>> extract metadata and a >>>>>>>>>>>>>>>>>>>>>>>>>>>> file system as a repository connection. During the >>>>>>>>>>>>>>>>>>>>>>>>>>>> job, I don't extract the >>>>>>>>>>>>>>>>>>>>>>>>>>>> content of the documents. I was wandering if the >>>>>>>>>>>>>>>>>>>>>>>>>>>> issue comes from >>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that looks like it might go away on retry, but >>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. It can be either >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the repository side or on the output side. If >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you look at the Simple >>>>>>>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log file, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you should be able to get >>>>>>>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong. Without >>>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't >>>>>>>>>>>>>>>>>>>>>>>>>>>>> say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> société générale in France. I'm actually using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your recent version of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index documents >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 35K >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big sized >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>
