Hi Karl, I took the binary from the ManifoldCF 2.8.1 RC0. It had the version 3.9 of POI and when I changed the version to 3.15 it worked fine. I really want to try the zookeeper if as you told me its performance is better than the file-based example. For the time being, I'm using the file-based because it is the only part that works for me but I actually need a stable version for my production environment. That is one point. Another point is, the path's tab is still an issue for me because I exclude some files and it still crawls them. I want to exclude some specific extensions of files and some specific directories. For instance, i don't want to index .exe files and contains a specific word. I do as follows I make the first exclude with *.exe and the second one with *word*. Only the second one which doesn't work. How can I solve this issue, please?
Thank you very much, have a nice week-end, Othman On Fri, 1 Sep 2017 at 16:46, Karl Wright <[email protected]> wrote: > Hi Othman, > > I will respin a new 2.8.1 (RC1) to address the zookeeper issue. > > The failure you are seeing is "NoSuchMethodError". Therefore, the class > is being found, but it is the *wrong* class. When you deployed the new > release, did you deploy it in a new directory, or did you overwrite the > previous deployment? If you overwrote it, you probably have multiple > versions of the POI jars. > > Karl > > > On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <[email protected]> wrote: > >> Hi Karl, >> >> I have just tried the new release of ManifoldCF. At first, the first job >> ended normally, but in the second I got a new stack trace concerning the >> POI. Moreover, the runzookeeper.bat doesn't run properly. It shows me the >> stack trace attached. >> >> Ps: >> The second attached file contains the POI stack trace. >> >> Othman. >> >> On Fri, 1 Sep 2017 at 12:21, Karl Wright <[email protected]> wrote: >> >>> Hi Othman, >>> >>> You do not need a new database instance. >>> >>> You can download MCF 2.8.1 RC0 from here: >>> >>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1 >>> >>> Karl >>> >>> >>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <[email protected]> >>> wrote: >>> >>>> Hi Karl, >>>> >>>> Thank you very much for your help, I'm going to try out the zookeeper >>>> example. Should I initialize a new database? And how can I run the >>>> zookeeper start-agent ? >>>> >>>> Othman. >>>> >>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]> wrote: >>>> >>>>> Hi Othman, >>>>> >>>>> These exceptions are now coming from file locking and are due to >>>>> permissions problems. I suggest you go to Zookeeper for file locking. >>>>> >>>>> I am building a 2.8.1 release candidate. When it available for >>>>> download, I'll send you the URL. >>>>> >>>>> Thanks, >>>>> Karl >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Karl, >>>>>> >>>>>> This morning, I have followed the steps you told me to do and I still >>>>>> got stack traces. I have attached the stack traces as well as the content >>>>>> of my lib repo and option.env. >>>>>> I have installed zookeeper and I'm ready to use the zookeeper >>>>>> example. Could you guide through it? I don't know if I follow the same >>>>>> steps in the file based example, I may not get stack traces. >>>>>> >>>>>> Thanks, >>>>>> Othman >>>>>> >>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]> wrote: >>>>>> >>>>>>> Please do the following: >>>>>>> >>>>>>> (0) Shut down all ManifoldCF processes. >>>>>>> (1) Move poi*.jar from connector-common-lib to lib. >>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib. >>>>>>> (3) Move commons-collections4*.jar from connector-common-lib to lib. >>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>>>>>> (6) Modify your options.env to include all of the jars you moved. >>>>>>> (7) Start up all ManifoldCF processes. >>>>>>> (8) If you still get stack traces, please send them to me. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi Karl, >>>>>>>> >>>>>>>> By 'other place', do you mean the \lib repository? If that so, then >>>>>>>> I have already tried it and it didn't work. >>>>>>>> >>>>>>>> Othman. >>>>>>>> >>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Othman, >>>>>>>>> >>>>>>>>> I used the java dependency inspector to see what the issue is and >>>>>>>>> it turns out that poi-ooxml.jar does refer back to poi.jar in the >>>>>>>>> class >>>>>>>>> that is failing. So you will need to move poi-3.15.jar and >>>>>>>>> commons-collections4-1.4.jar to the other place as well. >>>>>>>>> >>>>>>>>> Let's hope that finally fixes this issue. >>>>>>>>> >>>>>>>>> I'm very unhappy about the quality of the POI project code; it is >>>>>>>>> definitely not using reasonable engineering practices, and I will be >>>>>>>>> opening a ticket with them. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I'm using the file based example and all the changes you told me >>>>>>>>>> to do. I reproduced them in the file based example. I'll try to >>>>>>>>>> install >>>>>>>>>> zookeeper and use the zookeeper example. Will I need a configuration >>>>>>>>>> to do >>>>>>>>>> in order to run the zookeeper example ? >>>>>>>>>> >>>>>>>>>> Othman. >>>>>>>>>> >>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Are you using the zookeeper example, or the file-based example? >>>>>>>>>>> >>>>>>>>>>> If these jars have all been moved, and the options.env includes >>>>>>>>>>> them, then I have to conclude that Apache POI's pom.xml is >>>>>>>>>>> incorrect too. >>>>>>>>>>> It will take a while to figure out what's missing that >>>>>>>>>>> poi-ooxml.jar needs >>>>>>>>>>> that is not listed. >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> All the dependencies you mentioned have already been added in >>>>>>>>>>>> the options.env.win file in the multiprocess-file-example >>>>>>>>>>>> repository. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the >>>>>>>>>>>>> one in the multiprocess-zk-example document or >>>>>>>>>>>>> multiprocess-file-example ? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>>>>>> Karl >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm >>>>>>>>>>>>>>> actually using 2.1.0 which is pretty old for this new version >>>>>>>>>>>>>>> of ManifoldCF? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a different is >>>>>>>>>>>>>>>> showing. You will find the stack trace attached. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not have moved >>>>>>>>>>>>>>>>> poi-3.15.jar. Please move that back, and >>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar >>>>>>>>>>>>>>>>>> must also be included. This would mean that >>>>>>>>>>>>>>>>>> curvesapi-1.04.jar and >>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and another >>>>>>>>>>>>>>>>>>> one : poi-3.15.jar . Unfortunately, there is another error >>>>>>>>>>>>>>>>>>> showing. This >>>>>>>>>>>>>>>>>>> time, it concerns excel files. You will find attached the >>>>>>>>>>>>>>>>>>> stack trace. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into >>>>>>>>>>>>>>>>>>>> another jar, which will also need to be moved. *That* jar >>>>>>>>>>>>>>>>>>>> has yet another >>>>>>>>>>>>>>>>>>>> dependency too. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies >>>>>>>>>>>>>>>>>>>>> for the bad quality of the image, I'm doing my best to >>>>>>>>>>>>>>>>>>>>> send you the stack >>>>>>>>>>>>>>>>>>>>> trace as I don't have the right to send documents outside >>>>>>>>>>>>>>>>>>>>> the company. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the >>>>>>>>>>>>>>>>>>>>>> problem is. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked >>>>>>>>>>>>>>>>>>>>>>> into the log file and saw the following error: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you >>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about the regular >>>>>>>>>>>>>>>>>>>>>>>> expressions? How can I >>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job's paths >>>>>>>>>>>>>>>>>>>>>>>> tab ? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if >>>>>>>>>>>>>>>>>>>>>>>>> it works. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your >>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the classpath >>>>>>>>>>>>>>>>>>>>>>>>>> for startup. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround >>>>>>>>>>>>>>>>>>>>>>>>>>> you could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl >>>>>>>>>>>>>>>>>>>>>>>>>>> resumes. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external >>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika server transformer rather than the embedded >>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika Extractor. I'm still >>>>>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific >>>>>>>>>>>>>>>>>>>>>>>>>>>>> file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it in the attached file. For your information, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job started yesterday. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> missing. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, if you are indeed using the binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distribution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> my computer. I have copied >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cellphone. I hope it will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation about how to restrict the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified. For instance, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents that counts the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the document is with capital >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information on the "Paths" tab in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares. There is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-user documentation both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributions that describe how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this. Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works. I have another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some filters while crawling. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folders. Could you give me an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> regex allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people often have problems with getting file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> permissions right, and they do >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not understand how to shut processes down >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanly, and zookeeper is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that. I highly recommend >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of memory. The default values >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a pretty small job for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to know how is zookeeper different from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file based sync? I also need a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How many Go should I allocate for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enough in order to crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason, and that's interfering with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF log file and extracted the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be full. Shutting down process; locks >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be left dangling. You must >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Moreover, the job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. I was wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on retry, but does not. It can be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output side. If you look at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log file, you should be able >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrong. Without further information, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from société générale in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> France. I'm actually using your recent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . I'm working >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on an internal search engine. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manifoldcf in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> order to index documents on windows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a serious problem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while crawling 35K documents. Most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling a big sized documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> document : software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >
