Hi Karl, By 'other place', do you mean the \lib repository? If that so, then I have already tried it and it didn't work.
Othman. On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]> wrote: > Hi Othman, > > I used the java dependency inspector to see what the issue is and it turns > out that poi-ooxml.jar does refer back to poi.jar in the class that is > failing. So you will need to move poi-3.15.jar and > commons-collections4-1.4.jar to the other place as well. > > Let's hope that finally fixes this issue. > > I'm very unhappy about the quality of the POI project code; it is > definitely not using reasonable engineering practices, and I will be > opening a ticket with them. > > Thanks, > Karl > > > On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <[email protected]> > wrote: > >> I'm using the file based example and all the changes you told me to do. I >> reproduced them in the file based example. I'll try to install zookeeper >> and use the zookeeper example. Will I need a configuration to do in order >> to run the zookeeper example ? >> >> Othman. >> >> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> wrote: >> >>> Are you using the zookeeper example, or the file-based example? >>> >>> If these jars have all been moved, and the options.env includes them, >>> then I have to conclude that Apache POI's pom.xml is incorrect too. It >>> will take a while to figure out what's missing that poi-ooxml.jar needs >>> that is not listed. >>> >>> Karl >>> >>> >>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected]> >>> wrote: >>> >>>> All the dependencies you mentioned have already been added in the >>>> options.env.win file in the multiprocess-file-example repository. >>>> >>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> >>>> wrote: >>>> >>>>> Yes, I added it in the options.env.win file. Should it be the one in >>>>> the multiprocess-zk-example document or multiprocess-file-example ? >>>>> >>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> wrote: >>>>> >>>>>> It's not related at all to elasticsearch. >>>>>> Karl >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Could it be a problem of elasticsearch's version ? I'm actually >>>>>>> using 2.1.0 which is pretty old for this new version of ManifoldCF? >>>>>>> >>>>>>> Othman. >>>>>>> >>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I moved back both the jars you mentioned and a different is >>>>>>>> showing. You will find the stack trace attached. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Othman >>>>>>>> >>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I've looked at the dependencies; you should not have moved >>>>>>>>> poi-3.15.jar. Please move that back, and >>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>> >>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar must >>>>>>>>>> also be included. This would mean that curvesapi-1.04.jar and >>>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Karl, >>>>>>>>>>> >>>>>>>>>>> I added the two jars that you have mentioned and another one : >>>>>>>>>>> poi-3.15.jar . Unfortunately, there is another error showing. This >>>>>>>>>>> time, it >>>>>>>>>>> concerns excel files. You will find attached the stack trace. >>>>>>>>>>> >>>>>>>>>>> Othman. >>>>>>>>>>> >>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Othman, >>>>>>>>>>>> >>>>>>>>>>>> Yes, this shows that the jar we moved calls back into another >>>>>>>>>>>> jar, which will also need to be moved. *That* jar has yet another >>>>>>>>>>>> dependency too. >>>>>>>>>>>> >>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>> >>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>> >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You will find attached the stack trace. My apologies for the >>>>>>>>>>>>> bad quality of the image, I'm doing my best to send you the stack >>>>>>>>>>>>> trace as >>>>>>>>>>>>> I don't have the right to send documents outside the company. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>> >>>>>>>>>>>>> Othman >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the problem >>>>>>>>>>>>>> is. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Karl >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into the >>>>>>>>>>>>>>> log file and saw the following error: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected the >>>>>>>>>>>>>>>> crawling resumed. How about the regular expressions? How can I >>>>>>>>>>>>>>>> make complex >>>>>>>>>>>>>>>> regular expressions in the job's paths tab ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it works. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env files >>>>>>>>>>>>>>>>>> to include them in the classpath for startup. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you >>>>>>>>>>>>>>>>>>> could try. Specifically: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika >>>>>>>>>>>>>>>>>>>> server transformer rather than the embedded Tika >>>>>>>>>>>>>>>>>>>> Extractor. I'm still >>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and >>>>>>>>>>>>>>>>>>>>> my job got stuck on that specific file. >>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in the >>>>>>>>>>>>>>>>>>>>> attached file. For your information, the job started >>>>>>>>>>>>>>>>>>>>> yesterday. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing. >>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you >>>>>>>>>>>>>>>>>>>>>> are indeed using the binary distribution. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For security >>>>>>>>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I >>>>>>>>>>>>>>>>>>>>>>> have copied the stack >>>>>>>>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will >>>>>>>>>>>>>>>>>>>>>>> be helpful. >>>>>>>>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to >>>>>>>>>>>>>>>>>>>>>>> restrict the crawling >>>>>>>>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For >>>>>>>>>>>>>>>>>>>>>>> instance, I would >>>>>>>>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that >>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' >>>>>>>>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is >>>>>>>>>>>>>>>>>>>>>>> with capital letters >>>>>>>>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows >>>>>>>>>>>>>>>>>>>>>>>> share connector is by specifying information on the >>>>>>>>>>>>>>>>>>>>>>>> "Paths" tab in jobs >>>>>>>>>>>>>>>>>>>>>>>> that crawl windows shares. There is end-user >>>>>>>>>>>>>>>>>>>>>>>> documentation both online and >>>>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that >>>>>>>>>>>>>>>>>>>>>>>> describe how to do this. >>>>>>>>>>>>>>>>>>>>>>>> Have you found it? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using >>>>>>>>>>>>>>>>>>>>>>>>> zookeeper and I will let you know if it works. I have >>>>>>>>>>>>>>>>>>>>>>>>> another question to >>>>>>>>>>>>>>>>>>>>>>>>> ask. Actually, I need to make some filters while >>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to >>>>>>>>>>>>>>>>>>>>>>>>> crawl some files and some folders. Could you give me >>>>>>>>>>>>>>>>>>>>>>>>> an example of how to >>>>>>>>>>>>>>>>>>>>>>>>> use the regex. Does the regex allow to use /i to >>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people >>>>>>>>>>>>>>>>>>>>>>>>>> often have problems with getting file permissions >>>>>>>>>>>>>>>>>>>>>>>>>> right, and they do not >>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and >>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient >>>>>>>>>>>>>>>>>>>>>>>>>> against that. I highly recommend using zookeeper >>>>>>>>>>>>>>>>>>>>>>>>>> sync. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into >>>>>>>>>>>>>>>>>>>>>>>>>> memory so you do not need huge amounts of memory. >>>>>>>>>>>>>>>>>>>>>>>>>> The default values are >>>>>>>>>>>>>>>>>>>>>>>>>> more than enough for 35,000 files, which is a pretty >>>>>>>>>>>>>>>>>>>>>>>>>> small job for >>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know >>>>>>>>>>>>>>>>>>>>>>>>>>> how is zookeeper different from file based sync? I >>>>>>>>>>>>>>>>>>>>>>>>>>> also need a guidance on >>>>>>>>>>>>>>>>>>>>>>>>>>> how to manage my pc's memory. How many Go should I >>>>>>>>>>>>>>>>>>>>>>>>>>> allocate for the >>>>>>>>>>>>>>>>>>>>>>>>>>> start-agent of ManifoldCF? Is 4Go enough in order >>>>>>>>>>>>>>>>>>>>>>>>>>> to crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and >>>>>>>>>>>>>>>>>>>>>>>>>>>> that's interfering with ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after >>>>>>>>>>>>>>>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> have looked into the ManifoldCF log file and >>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following >>>>>>>>>>>>>>>>>>>>>>>>>>>>> warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>> full. Shutting down process; locks may be left >>>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You must cleanup >>>>>>>>>>>>>>>>>>>>>>>>>>>>> before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository >>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I was >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that looks like it might go away on retry, but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. It can be either >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the repository side or on the output side. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you look at the Simple >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file, you should be able to get >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong. Without >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> société générale in France. I'm actually using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your recent version of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents on windows shares. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 35K documents. Most of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sized documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>> >
