Hi Karl,

I resolved the elasticsearch problem however the application doesn't seem
to work after I have run a job to crawl over 500k documents. I get an GC
overhead limit exceeded in the hsql database. How many should I allocate
for it?

Best regards,

Othman

On Tue, 5 Sep 2017 at 12:43, Karl Wright <[email protected]> wrote:

> Hi Othman,
>
> Thanks for doing the evaluation of the problem.
>
> Generally, the ManifoldCF project does not have the expertise to diagnose
> problems with external systems like Solr or Elasticsearch.  So going to
> another newsgroup for those kinds of issues would be a good idea.
>
> Thanks!
> Karl
>
>
> On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki <[email protected]> wrote:
>
>> Hi Karl,
>>
>> I have analyzed the error and found out that it was mainly an
>> elasticsearch problem. I saw in some forums that one of the adopted
>> solution is to modify elasticsearch.yml and set the http.max_content_length
>> to a greater value. However, the job got stuck in the last two indexable
>> files ( two pptx files with 22Mo and 2Mo respectively). The job eventually
>> ended but a stack trace showed that elasticsearch ran out of memory. For
>> your information, I have allocated 4Go for elasticsearch execution. Is it
>> enough in order to have a good performance. You will find attached the
>> stack traces of elasticsearch.
>>
>> Best regards,
>>
>> Othman BELHAJ.
>>
>> On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki <[email protected]> wrote:
>>
>>> Hi Karl,
>>>
>>> I'm sorry to bother on your holiday. I will try to analyze it today and
>>> let it you know what I have found. Enjoy your day !
>>>
>>> Best regards,
>>>
>>> Othman BELHAJ.
>>>
>>> On Mon, 4 Sep 2017 at 16:06, Karl Wright <[email protected]> wrote:
>>>
>>>> Hi Othman,
>>>>
>>>> I won't be able to look at this today; it is a holiday here.  But, the
>>>> "socket write" error is coming from ElasticSearch.  If ES is configured to
>>>> not accept documents greater than a certain size, that might explain it.
>>>> Maybe the ES logs would help?
>>>>
>>>> I'm afraid you're going to need to do the work to find out what is
>>>> going wrong in those cases now.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> This morning, I have tried the zookeeper based file and it worked
>>>>> really good. However, I still have one error which is bugging me. It is a
>>>>> socket write error. You will find attached the simple history report.
>>>>> Surprisingly, I didn't have any stack trace in the ManifoldCF log file.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Othman.
>>>>>
>>>>> On Fri, 1 Sep 2017 at 19:39, Karl Wright <[email protected]> wrote:
>>>>>
>>>>>> This is from file locking yet again.
>>>>>>
>>>>>> I have uploaded a new RC.  Please download and try out the zookeeper
>>>>>> locking.
>>>>>>
>>>>>>
>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> There is another issue as well that gives the following stack trace.
>>>>>>>
>>>>>>> Othman.
>>>>>>>
>>>>>>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the version
>>>>>>>> 3.9 of POI and when I changed the version to 3.15 it worked fine. I 
>>>>>>>> really
>>>>>>>> want to try the zookeeper if as you told me its performance is better 
>>>>>>>> than
>>>>>>>> the file-based example. For the time being, I'm using the file-based
>>>>>>>> because it is the only part that works for me but I actually need a 
>>>>>>>> stable
>>>>>>>> version for my production environment. That is one point.
>>>>>>>> Another point is, the path's tab is still an issue for me because I
>>>>>>>> exclude some files and it still crawls them. I want to exclude some
>>>>>>>> specific extensions of files and some specific directories. For 
>>>>>>>> instance, i
>>>>>>>> don't want to index .exe files and contains a specific word. I do as
>>>>>>>> follows I make the first exclude with *.exe and the second one with 
>>>>>>>> *word*.
>>>>>>>> Only the second one which doesn't work. How can I solve this issue, 
>>>>>>>> please?
>>>>>>>>
>>>>>>>> Thank you very much, have a nice week-end,
>>>>>>>>
>>>>>>>> Othman
>>>>>>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Othman,
>>>>>>>>>
>>>>>>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper issue.
>>>>>>>>>
>>>>>>>>> The failure you are seeing is "NoSuchMethodError".  Therefore, the
>>>>>>>>> class is being found, but it is the *wrong* class.  When you deployed 
>>>>>>>>> the
>>>>>>>>> new release, did you deploy it in a new directory, or did you 
>>>>>>>>> overwrite the
>>>>>>>>> previous deployment?  If you overwrote it, you probably have multiple
>>>>>>>>> versions of the POI jars.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <[email protected]
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>>
>>>>>>>>>> I have just tried the new release of ManifoldCF. At first, the
>>>>>>>>>> first job ended normally, but in the second I got a new stack trace
>>>>>>>>>> concerning the POI. Moreover, the runzookeeper.bat doesn't run 
>>>>>>>>>> properly. It
>>>>>>>>>> shows me the stack trace attached.
>>>>>>>>>>
>>>>>>>>>> Ps:
>>>>>>>>>> The second attached file contains the POI stack trace.
>>>>>>>>>>
>>>>>>>>>> Othman.
>>>>>>>>>>
>>>>>>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>
>>>>>>>>>>> You do not need a new database instance.
>>>>>>>>>>>
>>>>>>>>>>> You can download MCF 2.8.1 RC0 from here:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you very much for your help, I'm going to try out the
>>>>>>>>>>>> zookeeper example. Should I initialize a new database? And how can 
>>>>>>>>>>>> I run
>>>>>>>>>>>> the zookeeper start-agent ?
>>>>>>>>>>>>
>>>>>>>>>>>> Othman.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>
>>>>>>>>>>>>> These exceptions are now coming from file locking and are due
>>>>>>>>>>>>> to permissions problems.  I suggest you go to Zookeeper for file 
>>>>>>>>>>>>> locking.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am building a 2.8.1 release candidate.  When it available
>>>>>>>>>>>>> for download, I'll send you the URL.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This morning, I have followed the steps you told me to do and
>>>>>>>>>>>>>> I still got stack traces. I have attached the stack traces as 
>>>>>>>>>>>>>> well as the
>>>>>>>>>>>>>> content of my lib repo and option.env.
>>>>>>>>>>>>>> I have installed zookeeper and I'm ready to use the zookeeper
>>>>>>>>>>>>>> example. Could you guide through it? I don't know if I follow 
>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>> steps in the file based example, I may not get stack traces.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please do the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib
>>>>>>>>>>>>>>> to lib.
>>>>>>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib.
>>>>>>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>> (6) Modify your options.env to include all of the jars you
>>>>>>>>>>>>>>> moved.
>>>>>>>>>>>>>>> (7) Start up all ManifoldCF processes.
>>>>>>>>>>>>>>> (8) If you still get stack traces, please send them to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> By 'other place', do you mean the \lib repository? If that
>>>>>>>>>>>>>>>> so, then I have already tried it and it didn't work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I used the java dependency inspector to see what the issue
>>>>>>>>>>>>>>>>> is and it turns out that poi-ooxml.jar does refer back to 
>>>>>>>>>>>>>>>>> poi.jar in the
>>>>>>>>>>>>>>>>> class that is failing.  So you will need to move poi-3.15.jar 
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm very unhappy about the quality of the POI project
>>>>>>>>>>>>>>>>> code; it is definitely not using reasonable engineering 
>>>>>>>>>>>>>>>>> practices, and I
>>>>>>>>>>>>>>>>> will be opening a ticket with them.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm using the file based example and all the changes you
>>>>>>>>>>>>>>>>>> told me to do. I reproduced them in the file based example. 
>>>>>>>>>>>>>>>>>> I'll try to
>>>>>>>>>>>>>>>>>> install zookeeper and use the zookeeper example. Will I need 
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> configuration to do in order to run the zookeeper example ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Are you using the zookeeper example, or the file-based
>>>>>>>>>>>>>>>>>>> example?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If these jars have all been moved, and the options.env
>>>>>>>>>>>>>>>>>>> includes them, then I have to conclude that Apache POI's 
>>>>>>>>>>>>>>>>>>> pom.xml is
>>>>>>>>>>>>>>>>>>> incorrect too.  It will take a while to figure out what's 
>>>>>>>>>>>>>>>>>>> missing that
>>>>>>>>>>>>>>>>>>> poi-ooxml.jar needs that is not listed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> All the dependencies you mentioned have already been
>>>>>>>>>>>>>>>>>>>> added in the options.env.win file in the 
>>>>>>>>>>>>>>>>>>>> multiprocess-file-example
>>>>>>>>>>>>>>>>>>>> repository.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it
>>>>>>>>>>>>>>>>>>>>> be the one in the multiprocess-zk-example document or
>>>>>>>>>>>>>>>>>>>>> multiprocess-file-example ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ?
>>>>>>>>>>>>>>>>>>>>>>> I'm actually using 2.1.0 which is pretty old for this 
>>>>>>>>>>>>>>>>>>>>>>> new version of
>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a
>>>>>>>>>>>>>>>>>>>>>>>> different is showing. You will find the stack trace 
>>>>>>>>>>>>>>>>>>>>>>>> attached.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not
>>>>>>>>>>>>>>>>>>>>>>>>> have moved poi-3.15.jar.  Please move that back, and
>>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of
>>>>>>>>>>>>>>>>>>>>>>>>>> poi.jar must also be included.  This would mean that 
>>>>>>>>>>>>>>>>>>>>>>>>>> curvesapi-1.04.jar and
>>>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and
>>>>>>>>>>>>>>>>>>>>>>>>>>> another one : poi-3.15.jar . Unfortunately, there 
>>>>>>>>>>>>>>>>>>>>>>>>>>> is another error showing.
>>>>>>>>>>>>>>>>>>>>>>>>>>> This time, it concerns excel files. You will find 
>>>>>>>>>>>>>>>>>>>>>>>>>>> attached the stack trace.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>> back into another jar, which will also need to be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> moved.  *That* jar has
>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet another dependency too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apologies for the bad quality of the image, I'm 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doing my best to send you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace as I don't have the right to send 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents outside the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what the problem is.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> looked into the log file and saw the following 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you expected the crawling resumed. How about 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the regular expressions? How
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can I make complex regular expressions in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job's paths tab ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know if it works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawl resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external Tika server transformer rather 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than the embedded Tika Extractor.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking into why the jar is not 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> binary version, and my job got stuck on 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can see it in the attached file. For your 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information, the job started
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POI is missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> address this, if you are indeed using 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version. For security reasons, I can't 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> send any files from my computer. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have copied the stack trace and scanned 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it with my cellphone. I hope it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be helpful. Meanwhile, I have read 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the documentation about how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> restrict the crawling and I don't think 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the '|' works in the specified. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, I would like to restrict the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling for the documents that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' word . I proceed as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> follows: *(SON)* . the document is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with capital letters and I noticed that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it didn't take it into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the windows share connector is by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying information on the "Paths" 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tab
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in jobs that crawl windows shares.  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There is end-user documentation both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributions that describe how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will start using zookeeper and I will 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> let you know if it works. I have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another question to ask. Actually, I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to make some filters while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to crawl some 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files and some folders. Could you give
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me an example of how to use the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> regex. Does the regex allow to use /i 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because people often have problems 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with getting file permissions right, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they do not understand how to shut 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes down cleanly, and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that.  I highly 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> put files into memory so you do not 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need huge amounts of memory.  The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default values are more than enough 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for 35,000 files, which is a pretty
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> small job for ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper. i want to know how is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper different from file based 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also need a guidance on how to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage my pc's memory. How many Go 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I allocate for the start-agent of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF? Is 4Go enough in order 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some reason, and that's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfering with ManifoldCF 2.8 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead of file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> get failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quick response. I have looked 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into the ManifoldCF log file and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disk may be full. Shutting down 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process; locks may be left 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> must cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the elasticsearch output 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection. Moreover, the job 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository connection. During the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. I was wandering if the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's an error that looks like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it might go away on retry, but 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It can be either on the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository side or on the output 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> side.  If you look
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the Simple History in the UI, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or at the manifoldcf.log file, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be able to get a better sense of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what went wrong.  Without further
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information, I can't say any 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software engineer from société 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> générale in France. I'm 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actually using your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recent version of manifoldCF 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.8 . I'm working on an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal search
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine. For this reason, I'm 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using manifoldcf in order to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index documents
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while crawling 35K
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the time, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when manifoldcf start crawling 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a big sized
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for example), 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it ends the job with the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repeated service interruptions 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - failure processing document : 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caused connection abort: socket 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how to solve this problem, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>

Reply via email to