(1) I would create a ticket for the "*word*" exclusion. It would be helpful to include a screen shot of the view page of your job as well. (2) I will be uploading a new ManifoldCF 2.8.1 RC shortly.
Karl On Fri, Sep 1, 2017 at 12:05 PM, Beelz Ryuzaki <[email protected]> wrote: > Hi Karl, > > I took the binary from the ManifoldCF 2.8.1 RC0. It had the version 3.9 of > POI and when I changed the version to 3.15 it worked fine. I really want to > try the zookeeper if as you told me its performance is better than the > file-based example. For the time being, I'm using the file-based because it > is the only part that works for me but I actually need a stable version for > my production environment. That is one point. > Another point is, the path's tab is still an issue for me because I > exclude some files and it still crawls them. I want to exclude some > specific extensions of files and some specific directories. For instance, i > don't want to index .exe files and contains a specific word. I do as > follows I make the first exclude with *.exe and the second one with *word*. > Only the second one which doesn't work. How can I solve this issue, please? > > Thank you very much, have a nice week-end, > > Othman > On Fri, 1 Sep 2017 at 16:46, Karl Wright <[email protected]> wrote: > >> Hi Othman, >> >> I will respin a new 2.8.1 (RC1) to address the zookeeper issue. >> >> The failure you are seeing is "NoSuchMethodError". Therefore, the class >> is being found, but it is the *wrong* class. When you deployed the new >> release, did you deploy it in a new directory, or did you overwrite the >> previous deployment? If you overwrote it, you probably have multiple >> versions of the POI jars. >> >> Karl >> >> >> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <[email protected]> >> wrote: >> >>> Hi Karl, >>> >>> I have just tried the new release of ManifoldCF. At first, the first job >>> ended normally, but in the second I got a new stack trace concerning the >>> POI. Moreover, the runzookeeper.bat doesn't run properly. It shows me the >>> stack trace attached. >>> >>> Ps: >>> The second attached file contains the POI stack trace. >>> >>> Othman. >>> >>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <[email protected]> wrote: >>> >>>> Hi Othman, >>>> >>>> You do not need a new database instance. >>>> >>>> You can download MCF 2.8.1 RC0 from here: >>>> >>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache- >>>> manifoldcf-2.8.1 >>>> >>>> Karl >>>> >>>> >>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <[email protected]> >>>> wrote: >>>> >>>>> Hi Karl, >>>>> >>>>> Thank you very much for your help, I'm going to try out the zookeeper >>>>> example. Should I initialize a new database? And how can I run the >>>>> zookeeper start-agent ? >>>>> >>>>> Othman. >>>>> >>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]> wrote: >>>>> >>>>>> Hi Othman, >>>>>> >>>>>> These exceptions are now coming from file locking and are due to >>>>>> permissions problems. I suggest you go to Zookeeper for file locking. >>>>>> >>>>>> I am building a 2.8.1 release candidate. When it available for >>>>>> download, I'll send you the URL. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Karl, >>>>>>> >>>>>>> This morning, I have followed the steps you told me to do and I >>>>>>> still got stack traces. I have attached the stack traces as well as the >>>>>>> content of my lib repo and option.env. >>>>>>> I have installed zookeeper and I'm ready to use the zookeeper >>>>>>> example. Could you guide through it? I don't know if I follow the same >>>>>>> steps in the file based example, I may not get stack traces. >>>>>>> >>>>>>> Thanks, >>>>>>> Othman >>>>>>> >>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Please do the following: >>>>>>>> >>>>>>>> (0) Shut down all ManifoldCF processes. >>>>>>>> (1) Move poi*.jar from connector-common-lib to lib. >>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib. >>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib to lib. >>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>>>>>>> (6) Modify your options.env to include all of the jars you moved. >>>>>>>> (7) Start up all ManifoldCF processes. >>>>>>>> (8) If you still get stack traces, please send them to me. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Karl, >>>>>>>>> >>>>>>>>> By 'other place', do you mean the \lib repository? If that so, >>>>>>>>> then I have already tried it and it didn't work. >>>>>>>>> >>>>>>>>> Othman. >>>>>>>>> >>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Othman, >>>>>>>>>> >>>>>>>>>> I used the java dependency inspector to see what the issue is and >>>>>>>>>> it turns out that poi-ooxml.jar does refer back to poi.jar in the >>>>>>>>>> class >>>>>>>>>> that is failing. So you will need to move poi-3.15.jar and >>>>>>>>>> commons-collections4-1.4.jar to the other place as well. >>>>>>>>>> >>>>>>>>>> Let's hope that finally fixes this issue. >>>>>>>>>> >>>>>>>>>> I'm very unhappy about the quality of the POI project code; it is >>>>>>>>>> definitely not using reasonable engineering practices, and I will be >>>>>>>>>> opening a ticket with them. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I'm using the file based example and all the changes you told me >>>>>>>>>>> to do. I reproduced them in the file based example. I'll try to >>>>>>>>>>> install >>>>>>>>>>> zookeeper and use the zookeeper example. Will I need a >>>>>>>>>>> configuration to do >>>>>>>>>>> in order to run the zookeeper example ? >>>>>>>>>>> >>>>>>>>>>> Othman. >>>>>>>>>>> >>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Are you using the zookeeper example, or the file-based example? >>>>>>>>>>>> >>>>>>>>>>>> If these jars have all been moved, and the options.env includes >>>>>>>>>>>> them, then I have to conclude that Apache POI's pom.xml is >>>>>>>>>>>> incorrect too. >>>>>>>>>>>> It will take a while to figure out what's missing that >>>>>>>>>>>> poi-ooxml.jar needs >>>>>>>>>>>> that is not listed. >>>>>>>>>>>> >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> All the dependencies you mentioned have already been added in >>>>>>>>>>>>> the options.env.win file in the multiprocess-file-example >>>>>>>>>>>>> repository. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the >>>>>>>>>>>>>> one in the multiprocess-zk-example document or >>>>>>>>>>>>>> multiprocess-file-example ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm >>>>>>>>>>>>>>>> actually using 2.1.0 which is pretty old for this new version >>>>>>>>>>>>>>>> of ManifoldCF? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a different >>>>>>>>>>>>>>>>> is showing. You will find the stack trace attached. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not have >>>>>>>>>>>>>>>>>> moved poi-3.15.jar. Please move that back, and >>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar >>>>>>>>>>>>>>>>>>> must also be included. This would mean that >>>>>>>>>>>>>>>>>>> curvesapi-1.04.jar and >>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and >>>>>>>>>>>>>>>>>>>> another one : poi-3.15.jar . Unfortunately, there is >>>>>>>>>>>>>>>>>>>> another error showing. >>>>>>>>>>>>>>>>>>>> This time, it concerns excel files. You will find attached >>>>>>>>>>>>>>>>>>>> the stack trace. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into >>>>>>>>>>>>>>>>>>>>> another jar, which will also need to be moved. *That* >>>>>>>>>>>>>>>>>>>>> jar has yet another >>>>>>>>>>>>>>>>>>>>> dependency too. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies >>>>>>>>>>>>>>>>>>>>>> for the bad quality of the image, I'm doing my best to >>>>>>>>>>>>>>>>>>>>>> send you the stack >>>>>>>>>>>>>>>>>>>>>> trace as I don't have the right to send documents >>>>>>>>>>>>>>>>>>>>>> outside the company. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what >>>>>>>>>>>>>>>>>>>>>>> the problem is. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked >>>>>>>>>>>>>>>>>>>>>>>> into the log file and saw the following error: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/ >>>>>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you >>>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about the regular >>>>>>>>>>>>>>>>>>>>>>>>> expressions? How can I >>>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job's paths >>>>>>>>>>>>>>>>>>>>>>>>> tab ? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if >>>>>>>>>>>>>>>>>>>>>>>>>> it works. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your >>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the classpath >>>>>>>>>>>>>>>>>>>>>>>>>>> for startup. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another >>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround you could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl >>>>>>>>>>>>>>>>>>>>>>>>>>>> resumes. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika server transformer rather than the embedded >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika Extractor. I'm still >>>>>>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it in the attached file. For your information, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job started yesterday. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> missing. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, if you are indeed using the binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distribution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> my computer. I have copied >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cellphone. I hope it will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation about how to restrict the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the specified. For instance, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents that counts the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the document is with capital >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information on the "Paths" tab in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares. There is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-user documentation both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributions that describe how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this. Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it works. I have another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some filters while crawling. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folders. Could you give me an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> regex allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people often have problems with getting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file permissions right, and they do >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not understand how to shut processes down >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanly, and zookeeper is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that. I highly recommend >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files into memory so you do not need huge >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> amounts of memory. The default >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> values are more than enough for 35,000 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files, which is a pretty small job >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to know how is zookeeper different >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from file based sync? I also need a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How many Go should I allocate for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enough in order to crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason, and that's interfering with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF log file and extracted the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be full. Shutting down process; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> locks may be left dangling. You must >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Moreover, the job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. I was wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an error that looks like it might go >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> away on retry, but does not. It can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be either on the repository side or on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the output side. If you look at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log file, you should be able >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrong. Without further information, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from société générale in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> France. I'm actually using your recent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working on an internal search engine. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manifoldcf in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> order to index documents on windows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a serious >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem while crawling 35K documents. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling a big sized documents (19Mo >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> document : software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>
