Hi Othman, You do not need a new database instance.
You can download MCF 2.8.1 RC0 from here: https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1 Karl On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <[email protected]> wrote: > Hi Karl, > > Thank you very much for your help, I'm going to try out the zookeeper > example. Should I initialize a new database? And how can I run the > zookeeper start-agent ? > > Othman. > > On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]> wrote: > >> Hi Othman, >> >> These exceptions are now coming from file locking and are due to >> permissions problems. I suggest you go to Zookeeper for file locking. >> >> I am building a 2.8.1 release candidate. When it available for download, >> I'll send you the URL. >> >> Thanks, >> Karl >> >> >> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <[email protected]> >> wrote: >> >>> Hi Karl, >>> >>> This morning, I have followed the steps you told me to do and I still >>> got stack traces. I have attached the stack traces as well as the content >>> of my lib repo and option.env. >>> I have installed zookeeper and I'm ready to use the zookeeper example. >>> Could you guide through it? I don't know if I follow the same steps in the >>> file based example, I may not get stack traces. >>> >>> Thanks, >>> Othman >>> >>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]> wrote: >>> >>>> Please do the following: >>>> >>>> (0) Shut down all ManifoldCF processes. >>>> (1) Move poi*.jar from connector-common-lib to lib. >>>> (2) Move dom4j*.jar from connector-common-lib to lib. >>>> (3) Move commons-collections4*.jar from connector-common-lib to lib. >>>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>>> (6) Modify your options.env to include all of the jars you moved. >>>> (7) Start up all ManifoldCF processes. >>>> (8) If you still get stack traces, please send them to me. >>>> >>>> Karl >>>> >>>> >>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <[email protected]> >>>> wrote: >>>> >>>>> Hi Karl, >>>>> >>>>> By 'other place', do you mean the \lib repository? If that so, then I >>>>> have already tried it and it didn't work. >>>>> >>>>> Othman. >>>>> >>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]> wrote: >>>>> >>>>>> Hi Othman, >>>>>> >>>>>> I used the java dependency inspector to see what the issue is and it >>>>>> turns out that poi-ooxml.jar does refer back to poi.jar in the class that >>>>>> is failing. So you will need to move poi-3.15.jar and >>>>>> commons-collections4-1.4.jar to the other place as well. >>>>>> >>>>>> Let's hope that finally fixes this issue. >>>>>> >>>>>> I'm very unhappy about the quality of the POI project code; it is >>>>>> definitely not using reasonable engineering practices, and I will be >>>>>> opening a ticket with them. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I'm using the file based example and all the changes you told me to >>>>>>> do. I reproduced them in the file based example. I'll try to install >>>>>>> zookeeper and use the zookeeper example. Will I need a configuration to >>>>>>> do >>>>>>> in order to run the zookeeper example ? >>>>>>> >>>>>>> Othman. >>>>>>> >>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Are you using the zookeeper example, or the file-based example? >>>>>>>> >>>>>>>> If these jars have all been moved, and the options.env includes >>>>>>>> them, then I have to conclude that Apache POI's pom.xml is incorrect >>>>>>>> too. >>>>>>>> It will take a while to figure out what's missing that poi-ooxml.jar >>>>>>>> needs >>>>>>>> that is not listed. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> All the dependencies you mentioned have already been added in the >>>>>>>>> options.env.win file in the multiprocess-file-example repository. >>>>>>>>> >>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the one >>>>>>>>>> in the multiprocess-zk-example document or multiprocess-file-example >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm actually >>>>>>>>>>>> using 2.1.0 which is pretty old for this new version of ManifoldCF? >>>>>>>>>>>> >>>>>>>>>>>> Othman. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I moved back both the jars you mentioned and a different is >>>>>>>>>>>>> showing. You will find the stack trace attached. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Othman >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I've looked at the dependencies; you should not have moved >>>>>>>>>>>>>> poi-3.15.jar. Please move that back, and >>>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Karl >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar >>>>>>>>>>>>>>> must also be included. This would mean that curvesapi-1.04.jar >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I added the two jars that you have mentioned and another >>>>>>>>>>>>>>>> one : poi-3.15.jar . Unfortunately, there is another error >>>>>>>>>>>>>>>> showing. This >>>>>>>>>>>>>>>> time, it concerns excel files. You will find attached the >>>>>>>>>>>>>>>> stack trace. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into >>>>>>>>>>>>>>>>> another jar, which will also need to be moved. *That* jar >>>>>>>>>>>>>>>>> has yet another >>>>>>>>>>>>>>>>> dependency too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies for >>>>>>>>>>>>>>>>>> the bad quality of the image, I'm doing my best to send you >>>>>>>>>>>>>>>>>> the stack trace >>>>>>>>>>>>>>>>>> as I don't have the right to send documents outside the >>>>>>>>>>>>>>>>>> company. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the >>>>>>>>>>>>>>>>>>> problem is. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into >>>>>>>>>>>>>>>>>>>> the log file and saw the following error: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/ >>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected >>>>>>>>>>>>>>>>>>>>> the crawling resumed. How about the regular expressions? >>>>>>>>>>>>>>>>>>>>> How can I make >>>>>>>>>>>>>>>>>>>>> complex regular expressions in the job's paths tab ? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it >>>>>>>>>>>>>>>>>>>>>> works. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env >>>>>>>>>>>>>>>>>>>>>>> files to include them in the classpath for startup. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround >>>>>>>>>>>>>>>>>>>>>>>> you could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl >>>>>>>>>>>>>>>>>>>>>>>> resumes. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika >>>>>>>>>>>>>>>>>>>>>>>>> server transformer rather than the embedded Tika >>>>>>>>>>>>>>>>>>>>>>>>> Extractor. I'm still >>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary >>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific file. >>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it >>>>>>>>>>>>>>>>>>>>>>>>>> in the attached file. For your information, the job >>>>>>>>>>>>>>>>>>>>>>>>>> started yesterday. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is >>>>>>>>>>>>>>>>>>>>>>>>>>> missing. >>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, >>>>>>>>>>>>>>>>>>>>>>>>>>> if you are indeed using the binary distribution. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For >>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from my >>>>>>>>>>>>>>>>>>>>>>>>>>>> computer. I have copied >>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my cellphone. >>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope it will be >>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the documentation >>>>>>>>>>>>>>>>>>>>>>>>>>>> about how to restrict the >>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the >>>>>>>>>>>>>>>>>>>>>>>>>>>> specified. For instance, I >>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the >>>>>>>>>>>>>>>>>>>>>>>>>>>> documents that counts the >>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . the >>>>>>>>>>>>>>>>>>>>>>>>>>>> document is with capital >>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it into >>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying >>>>>>>>>>>>>>>>>>>>>>>>>>>>> information on the "Paths" tab in >>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares. There is >>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-user documentation both >>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributions that describe how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this. Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works. I have another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters while crawling. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some folders. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could you give me an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the regex >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> often have problems with getting file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> permissions right, and they do not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and zookeeper is resilient >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> against that. I highly recommend using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory. The default values >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a pretty small job for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know how is zookeeper different from file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based sync? I also need a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. How >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> many Go should I allocate for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go enough >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in order to crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that's interfering with ManifoldCF 2.8 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the ManifoldCF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log file and extracted the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be full. Shutting down process; locks may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> left dangling. You must >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I was wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retry, but does not. It can be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output side. If you look at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log file, you should be able >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went wrong. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Without further information, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from société générale in France. I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actually using your recent version of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal search engine. For this reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents on windows shares. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling 35K documents. Most of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sized documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> document : software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> >>
