Hi Karl, Thank you very much for your help, I'm going to try out the zookeeper example. Should I initialize a new database? And how can I run the zookeeper start-agent ?
Othman. On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]> wrote: > Hi Othman, > > These exceptions are now coming from file locking and are due to > permissions problems. I suggest you go to Zookeeper for file locking. > > I am building a 2.8.1 release candidate. When it available for download, > I'll send you the URL. > > Thanks, > Karl > > > On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <[email protected]> wrote: > >> Hi Karl, >> >> This morning, I have followed the steps you told me to do and I still got >> stack traces. I have attached the stack traces as well as the content of my >> lib repo and option.env. >> I have installed zookeeper and I'm ready to use the zookeeper example. >> Could you guide through it? I don't know if I follow the same steps in the >> file based example, I may not get stack traces. >> >> Thanks, >> Othman >> >> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]> wrote: >> >>> Please do the following: >>> >>> (0) Shut down all ManifoldCF processes. >>> (1) Move poi*.jar from connector-common-lib to lib. >>> (2) Move dom4j*.jar from connector-common-lib to lib. >>> (3) Move commons-collections4*.jar from connector-common-lib to lib. >>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>> (6) Modify your options.env to include all of the jars you moved. >>> (7) Start up all ManifoldCF processes. >>> (8) If you still get stack traces, please send them to me. >>> >>> Karl >>> >>> >>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <[email protected]> >>> wrote: >>> >>>> Hi Karl, >>>> >>>> By 'other place', do you mean the \lib repository? If that so, then I >>>> have already tried it and it didn't work. >>>> >>>> Othman. >>>> >>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]> wrote: >>>> >>>>> Hi Othman, >>>>> >>>>> I used the java dependency inspector to see what the issue is and it >>>>> turns out that poi-ooxml.jar does refer back to poi.jar in the class that >>>>> is failing. So you will need to move poi-3.15.jar and >>>>> commons-collections4-1.4.jar to the other place as well. >>>>> >>>>> Let's hope that finally fixes this issue. >>>>> >>>>> I'm very unhappy about the quality of the POI project code; it is >>>>> definitely not using reasonable engineering practices, and I will be >>>>> opening a ticket with them. >>>>> >>>>> Thanks, >>>>> Karl >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <[email protected]> >>>>> wrote: >>>>> >>>>>> I'm using the file based example and all the changes you told me to >>>>>> do. I reproduced them in the file based example. I'll try to install >>>>>> zookeeper and use the zookeeper example. Will I need a configuration to >>>>>> do >>>>>> in order to run the zookeeper example ? >>>>>> >>>>>> Othman. >>>>>> >>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> wrote: >>>>>> >>>>>>> Are you using the zookeeper example, or the file-based example? >>>>>>> >>>>>>> If these jars have all been moved, and the options.env includes >>>>>>> them, then I have to conclude that Apache POI's pom.xml is incorrect >>>>>>> too. >>>>>>> It will take a while to figure out what's missing that poi-ooxml.jar >>>>>>> needs >>>>>>> that is not listed. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> All the dependencies you mentioned have already been added in the >>>>>>>> options.env.win file in the multiprocess-file-example repository. >>>>>>>> >>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes, I added it in the options.env.win file. Should it be the one >>>>>>>>> in the multiprocess-zk-example document or multiprocess-file-example ? >>>>>>>>> >>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm actually >>>>>>>>>>> using 2.1.0 which is pretty old for this new version of ManifoldCF? >>>>>>>>>>> >>>>>>>>>>> Othman. >>>>>>>>>>> >>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I moved back both the jars you mentioned and a different is >>>>>>>>>>>> showing. You will find the stack trace attached. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Othman >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I've looked at the dependencies; you should not have moved >>>>>>>>>>>>> poi-3.15.jar. Please move that back, and >>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>> >>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar must >>>>>>>>>>>>>> also be included. This would mean that curvesapi-1.04.jar and >>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Karl >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I added the two jars that you have mentioned and another one >>>>>>>>>>>>>>> : poi-3.15.jar . Unfortunately, there is another error showing. >>>>>>>>>>>>>>> This time, >>>>>>>>>>>>>>> it concerns excel files. You will find attached the stack trace. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into >>>>>>>>>>>>>>>> another jar, which will also need to be moved. *That* jar has >>>>>>>>>>>>>>>> yet another >>>>>>>>>>>>>>>> dependency too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies for >>>>>>>>>>>>>>>>> the bad quality of the image, I'm doing my best to send you >>>>>>>>>>>>>>>>> the stack trace >>>>>>>>>>>>>>>>> as I don't have the right to send documents outside the >>>>>>>>>>>>>>>>> company. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the >>>>>>>>>>>>>>>>>> problem is. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into >>>>>>>>>>>>>>>>>>> the log file and saw the following error: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected >>>>>>>>>>>>>>>>>>>> the crawling resumed. How about the regular expressions? >>>>>>>>>>>>>>>>>>>> How can I make >>>>>>>>>>>>>>>>>>>> complex regular expressions in the job's paths tab ? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it >>>>>>>>>>>>>>>>>>>>> works. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env >>>>>>>>>>>>>>>>>>>>>> files to include them in the classpath for startup. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you >>>>>>>>>>>>>>>>>>>>>>> could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika >>>>>>>>>>>>>>>>>>>>>>>> server transformer rather than the embedded Tika >>>>>>>>>>>>>>>>>>>>>>>> Extractor. I'm still >>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, >>>>>>>>>>>>>>>>>>>>>>>>> and my job got stuck on that specific file. >>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in >>>>>>>>>>>>>>>>>>>>>>>>> the attached file. For your information, the job >>>>>>>>>>>>>>>>>>>>>>>>> started yesterday. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is >>>>>>>>>>>>>>>>>>>>>>>>>> missing. >>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if >>>>>>>>>>>>>>>>>>>>>>>>>> you are indeed using the binary distribution. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For >>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from my >>>>>>>>>>>>>>>>>>>>>>>>>>> computer. I have copied >>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my cellphone. I >>>>>>>>>>>>>>>>>>>>>>>>>>> hope it will be >>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the documentation >>>>>>>>>>>>>>>>>>>>>>>>>>> about how to restrict the >>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the >>>>>>>>>>>>>>>>>>>>>>>>>>> specified. For instance, I >>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the >>>>>>>>>>>>>>>>>>>>>>>>>>> documents that counts the >>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . the >>>>>>>>>>>>>>>>>>>>>>>>>>> document is with capital >>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it into >>>>>>>>>>>>>>>>>>>>>>>>>>> consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows >>>>>>>>>>>>>>>>>>>>>>>>>>>> share connector is by specifying information on >>>>>>>>>>>>>>>>>>>>>>>>>>>> the "Paths" tab in jobs >>>>>>>>>>>>>>>>>>>>>>>>>>>> that crawl windows shares. There is end-user >>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation both online and >>>>>>>>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that >>>>>>>>>>>>>>>>>>>>>>>>>>>> describe how to do this. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start >>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it >>>>>>>>>>>>>>>>>>>>>>>>>>>>> works. I have another >>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make some >>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters while crawling. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some folders. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could you give me an >>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the regex >>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> often have problems with getting file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> permissions right, and they do not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and zookeeper is resilient >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> against that. I highly recommend using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory. The default values >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a pretty small job for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know how is zookeeper different from file based >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync? I also need a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. How >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> many Go should I allocate for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go enough in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> order to crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that's interfering with ManifoldCF 2.8 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have looked into the ManifoldCF log file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and extracted the following >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full. Shutting down process; locks may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> left dangling. You must cleanup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> was wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retry, but does not. It can be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output side. If you look at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log file, you should be able >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went wrong. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Without further information, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from société générale in France. I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actually using your recent version of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents on windows shares. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling 35K documents. Most of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sized documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> : software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> >
