Hi Karl, I resolved the elasticsearch problem however the application doesn't seem to work after I have run a job to crawl over 500k documents. I get an GC overhead limit exceeded in the hsql database. How many should I allocate for it?
Best regards, Othman On Tue, 5 Sep 2017 at 12:43, Karl Wright <[email protected]> wrote: > Hi Othman, > > Thanks for doing the evaluation of the problem. > > Generally, the ManifoldCF project does not have the expertise to diagnose > problems with external systems like Solr or Elasticsearch. So going to > another newsgroup for those kinds of issues would be a good idea. > > Thanks! > Karl > > > On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki <[email protected]> wrote: > >> Hi Karl, >> >> I have analyzed the error and found out that it was mainly an >> elasticsearch problem. I saw in some forums that one of the adopted >> solution is to modify elasticsearch.yml and set the http.max_content_length >> to a greater value. However, the job got stuck in the last two indexable >> files ( two pptx files with 22Mo and 2Mo respectively). The job eventually >> ended but a stack trace showed that elasticsearch ran out of memory. For >> your information, I have allocated 4Go for elasticsearch execution. Is it >> enough in order to have a good performance. You will find attached the >> stack traces of elasticsearch. >> >> Best regards, >> >> Othman BELHAJ. >> >> On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki <[email protected]> wrote: >> >>> Hi Karl, >>> >>> I'm sorry to bother on your holiday. I will try to analyze it today and >>> let it you know what I have found. Enjoy your day ! >>> >>> Best regards, >>> >>> Othman BELHAJ. >>> >>> On Mon, 4 Sep 2017 at 16:06, Karl Wright <[email protected]> wrote: >>> >>>> Hi Othman, >>>> >>>> I won't be able to look at this today; it is a holiday here. But, the >>>> "socket write" error is coming from ElasticSearch. If ES is configured to >>>> not accept documents greater than a certain size, that might explain it. >>>> Maybe the ES logs would help? >>>> >>>> I'm afraid you're going to need to do the work to find out what is >>>> going wrong in those cases now. >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki <[email protected]> >>>> wrote: >>>> >>>>> Hi Karl, >>>>> >>>>> This morning, I have tried the zookeeper based file and it worked >>>>> really good. However, I still have one error which is bugging me. It is a >>>>> socket write error. You will find attached the simple history report. >>>>> Surprisingly, I didn't have any stack trace in the ManifoldCF log file. >>>>> >>>>> Best regards, >>>>> >>>>> Othman. >>>>> >>>>> On Fri, 1 Sep 2017 at 19:39, Karl Wright <[email protected]> wrote: >>>>> >>>>>> This is from file locking yet again. >>>>>> >>>>>> I have uploaded a new RC. Please download and try out the zookeeper >>>>>> locking. >>>>>> >>>>>> >>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1 >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> There is another issue as well that gives the following stack trace. >>>>>>> >>>>>>> Othman. >>>>>>> >>>>>>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Karl, >>>>>>>> >>>>>>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the version >>>>>>>> 3.9 of POI and when I changed the version to 3.15 it worked fine. I >>>>>>>> really >>>>>>>> want to try the zookeeper if as you told me its performance is better >>>>>>>> than >>>>>>>> the file-based example. For the time being, I'm using the file-based >>>>>>>> because it is the only part that works for me but I actually need a >>>>>>>> stable >>>>>>>> version for my production environment. That is one point. >>>>>>>> Another point is, the path's tab is still an issue for me because I >>>>>>>> exclude some files and it still crawls them. I want to exclude some >>>>>>>> specific extensions of files and some specific directories. For >>>>>>>> instance, i >>>>>>>> don't want to index .exe files and contains a specific word. I do as >>>>>>>> follows I make the first exclude with *.exe and the second one with >>>>>>>> *word*. >>>>>>>> Only the second one which doesn't work. How can I solve this issue, >>>>>>>> please? >>>>>>>> >>>>>>>> Thank you very much, have a nice week-end, >>>>>>>> >>>>>>>> Othman >>>>>>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Othman, >>>>>>>>> >>>>>>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper issue. >>>>>>>>> >>>>>>>>> The failure you are seeing is "NoSuchMethodError". Therefore, the >>>>>>>>> class is being found, but it is the *wrong* class. When you deployed >>>>>>>>> the >>>>>>>>> new release, did you deploy it in a new directory, or did you >>>>>>>>> overwrite the >>>>>>>>> previous deployment? If you overwrote it, you probably have multiple >>>>>>>>> versions of the POI jars. >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi Karl, >>>>>>>>>> >>>>>>>>>> I have just tried the new release of ManifoldCF. At first, the >>>>>>>>>> first job ended normally, but in the second I got a new stack trace >>>>>>>>>> concerning the POI. Moreover, the runzookeeper.bat doesn't run >>>>>>>>>> properly. It >>>>>>>>>> shows me the stack trace attached. >>>>>>>>>> >>>>>>>>>> Ps: >>>>>>>>>> The second attached file contains the POI stack trace. >>>>>>>>>> >>>>>>>>>> Othman. >>>>>>>>>> >>>>>>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Othman, >>>>>>>>>>> >>>>>>>>>>> You do not need a new database instance. >>>>>>>>>>> >>>>>>>>>>> You can download MCF 2.8.1 RC0 from here: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1 >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Karl, >>>>>>>>>>>> >>>>>>>>>>>> Thank you very much for your help, I'm going to try out the >>>>>>>>>>>> zookeeper example. Should I initialize a new database? And how can >>>>>>>>>>>> I run >>>>>>>>>>>> the zookeeper start-agent ? >>>>>>>>>>>> >>>>>>>>>>>> Othman. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>> >>>>>>>>>>>>> These exceptions are now coming from file locking and are due >>>>>>>>>>>>> to permissions problems. I suggest you go to Zookeeper for file >>>>>>>>>>>>> locking. >>>>>>>>>>>>> >>>>>>>>>>>>> I am building a 2.8.1 release candidate. When it available >>>>>>>>>>>>> for download, I'll send you the URL. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>> >>>>>>>>>>>>>> This morning, I have followed the steps you told me to do and >>>>>>>>>>>>>> I still got stack traces. I have attached the stack traces as >>>>>>>>>>>>>> well as the >>>>>>>>>>>>>> content of my lib repo and option.env. >>>>>>>>>>>>>> I have installed zookeeper and I'm ready to use the zookeeper >>>>>>>>>>>>>> example. Could you guide through it? I don't know if I follow >>>>>>>>>>>>>> the same >>>>>>>>>>>>>> steps in the file based example, I may not get stack traces. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Othman >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please do the following: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (0) Shut down all ManifoldCF processes. >>>>>>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib. >>>>>>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib. >>>>>>>>>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib >>>>>>>>>>>>>>> to lib. >>>>>>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>>>>>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>>>>>>>>>>>>>> (6) Modify your options.env to include all of the jars you >>>>>>>>>>>>>>> moved. >>>>>>>>>>>>>>> (7) Start up all ManifoldCF processes. >>>>>>>>>>>>>>> (8) If you still get stack traces, please send them to me. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> By 'other place', do you mean the \lib repository? If that >>>>>>>>>>>>>>>> so, then I have already tried it and it didn't work. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I used the java dependency inspector to see what the issue >>>>>>>>>>>>>>>>> is and it turns out that poi-ooxml.jar does refer back to >>>>>>>>>>>>>>>>> poi.jar in the >>>>>>>>>>>>>>>>> class that is failing. So you will need to move poi-3.15.jar >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Let's hope that finally fixes this issue. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm very unhappy about the quality of the POI project >>>>>>>>>>>>>>>>> code; it is definitely not using reasonable engineering >>>>>>>>>>>>>>>>> practices, and I >>>>>>>>>>>>>>>>> will be opening a ticket with them. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm using the file based example and all the changes you >>>>>>>>>>>>>>>>>> told me to do. I reproduced them in the file based example. >>>>>>>>>>>>>>>>>> I'll try to >>>>>>>>>>>>>>>>>> install zookeeper and use the zookeeper example. Will I need >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> configuration to do in order to run the zookeeper example ? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Are you using the zookeeper example, or the file-based >>>>>>>>>>>>>>>>>>> example? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If these jars have all been moved, and the options.env >>>>>>>>>>>>>>>>>>> includes them, then I have to conclude that Apache POI's >>>>>>>>>>>>>>>>>>> pom.xml is >>>>>>>>>>>>>>>>>>> incorrect too. It will take a while to figure out what's >>>>>>>>>>>>>>>>>>> missing that >>>>>>>>>>>>>>>>>>> poi-ooxml.jar needs that is not listed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> All the dependencies you mentioned have already been >>>>>>>>>>>>>>>>>>>> added in the options.env.win file in the >>>>>>>>>>>>>>>>>>>> multiprocess-file-example >>>>>>>>>>>>>>>>>>>> repository. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it >>>>>>>>>>>>>>>>>>>>> be the one in the multiprocess-zk-example document or >>>>>>>>>>>>>>>>>>>>> multiprocess-file-example ? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? >>>>>>>>>>>>>>>>>>>>>>> I'm actually using 2.1.0 which is pretty old for this >>>>>>>>>>>>>>>>>>>>>>> new version of >>>>>>>>>>>>>>>>>>>>>>> ManifoldCF? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a >>>>>>>>>>>>>>>>>>>>>>>> different is showing. You will find the stack trace >>>>>>>>>>>>>>>>>>>>>>>> attached. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not >>>>>>>>>>>>>>>>>>>>>>>>> have moved poi-3.15.jar. Please move that back, and >>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of >>>>>>>>>>>>>>>>>>>>>>>>>> poi.jar must also be included. This would mean that >>>>>>>>>>>>>>>>>>>>>>>>>> curvesapi-1.04.jar and >>>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and >>>>>>>>>>>>>>>>>>>>>>>>>>> another one : poi-3.15.jar . Unfortunately, there >>>>>>>>>>>>>>>>>>>>>>>>>>> is another error showing. >>>>>>>>>>>>>>>>>>>>>>>>>>> This time, it concerns excel files. You will find >>>>>>>>>>>>>>>>>>>>>>>>>>> attached the stack trace. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls >>>>>>>>>>>>>>>>>>>>>>>>>>>> back into another jar, which will also need to be >>>>>>>>>>>>>>>>>>>>>>>>>>>> moved. *That* jar has >>>>>>>>>>>>>>>>>>>>>>>>>>>> yet another dependency too. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My >>>>>>>>>>>>>>>>>>>>>>>>>>>>> apologies for the bad quality of the image, I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>>> doing my best to send you >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace as I don't have the right to send >>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents outside the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> company. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what the problem is. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> looked into the log file and saw the following >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you expected the crawling resumed. How about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the regular expressions? How >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can I make complex regular expressions in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job's paths tab ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know if it works. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classpath for startup. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround you could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawl resumes. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external Tika server transformer rather >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than the embedded Tika Extractor. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking into why the jar is not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being found. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> binary version, and my job got stuck on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that specific file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can see it in the attached file. For your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information, the job started >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POI is missing. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> address this, if you are indeed using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the binary distribution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version. For security reasons, I can't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> send any files from my computer. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have copied the stack trace and scanned >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it with my cellphone. I hope it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be helpful. Meanwhile, I have read >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the documentation about how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> restrict the crawling and I don't think >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the '|' works in the specified. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, I would like to restrict the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling for the documents that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' word . I proceed as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> follows: *(SON)* . the document is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with capital letters and I noticed that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it didn't take it into >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the windows share connector is by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying information on the "Paths" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tab >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in jobs that crawl windows shares. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There is end-user documentation both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributions that describe how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this. Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will start using zookeeper and I will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> let you know if it works. I have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another question to ask. Actually, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to make some filters while >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to crawl some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files and some folders. Could you give >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me an example of how to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> regex. Does the regex allow to use /i >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because people often have problems >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with getting file permissions right, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they do not understand how to shut >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes down cleanly, and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that. I highly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recommend using zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> put files into memory so you do not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need huge amounts of memory. The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default values are more than enough >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for 35,000 files, which is a pretty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> small job for ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper. i want to know how is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper different from file based >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also need a guidance on how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage my pc's memory. How many Go >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I allocate for the start-agent of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF? Is 4Go enough in order >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some reason, and that's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfering with ManifoldCF 2.8 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead of file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> get failures after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quick response. I have looked >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into the ManifoldCF log file and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the following warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disk may be full. Shutting down >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process; locks may be left >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> must cleanup before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the elasticsearch output >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection. Moreover, the job >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository connection. During the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. I was wandering if the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's an error that looks like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it might go away on retry, but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It can be either on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository side or on the output >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> side. If you look >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the Simple History in the UI, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or at the manifoldcf.log file, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be able to get a better sense of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what went wrong. Without further >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information, I can't say any >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software engineer from société >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> générale in France. I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actually using your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recent version of manifoldCF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.8 . I'm working on an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal search >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine. For this reason, I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using manifoldcf in order to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index documents >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while crawling 35K >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the time, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when manifoldcf start crawling >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a big sized >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for example), >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it ends the job with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repeated service interruptions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - failure processing document : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caused connection abort: socket >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how to solve this problem, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >>>> >
