Hi Karl,

I took the binary from the ManifoldCF 2.8.1 RC0. It had the version 3.9 of
POI and when I changed the version to 3.15 it worked fine. I really want to
try the zookeeper if as you told me its performance is better than the
file-based example. For the time being, I'm using the file-based because it
is the only part that works for me but I actually need a stable version for
my production environment. That is one point.
Another point is, the path's tab is still an issue for me because I exclude
some files and it still crawls them. I want to exclude some specific
extensions of files and some specific directories. For instance, i don't
want to index .exe files and contains a specific word. I do as follows I
make the first exclude with *.exe and the second one with *word*. Only the
second one which doesn't work. How can I solve this issue, please?

Thank you very much, have a nice week-end,

Othman
On Fri, 1 Sep 2017 at 16:46, Karl Wright <[email protected]> wrote:

> Hi Othman,
>
> I will respin a new 2.8.1 (RC1) to address the zookeeper issue.
>
> The failure you are seeing is "NoSuchMethodError".  Therefore, the class
> is being found, but it is the *wrong* class.  When you deployed the new
> release, did you deploy it in a new directory, or did you overwrite the
> previous deployment?  If you overwrote it, you probably have multiple
> versions of the POI jars.
>
> Karl
>
>
> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <[email protected]> wrote:
>
>> Hi Karl,
>>
>> I have just tried the new release of ManifoldCF. At first, the first job
>> ended normally, but in the second I got a new stack trace concerning the
>> POI. Moreover, the runzookeeper.bat doesn't run properly. It shows me the
>> stack trace attached.
>>
>> Ps:
>> The second attached file contains the POI stack trace.
>>
>> Othman.
>>
>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <[email protected]> wrote:
>>
>>> Hi Othman,
>>>
>>> You do not need a new database instance.
>>>
>>> You can download MCF 2.8.1 RC0 from here:
>>>
>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>
>>> Karl
>>>
>>>
>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <[email protected]>
>>> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> Thank you very much for your help, I'm going to try out the zookeeper
>>>> example. Should I initialize a new database? And how can I run the
>>>> zookeeper start-agent ?
>>>>
>>>> Othman.
>>>>
>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <[email protected]> wrote:
>>>>
>>>>> Hi Othman,
>>>>>
>>>>> These exceptions are now coming from file locking and are due to
>>>>> permissions problems.  I suggest you go to Zookeeper for file locking.
>>>>>
>>>>> I am building a 2.8.1 release candidate.  When it available for
>>>>> download, I'll send you the URL.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>>
>>>>>> This morning, I have followed the steps you told me to do and I still
>>>>>> got stack traces. I have attached the stack traces as well as the content
>>>>>> of my lib repo and option.env.
>>>>>> I have installed zookeeper and I'm ready to use the zookeeper
>>>>>> example. Could you guide through it? I don't know if I follow the same
>>>>>> steps in the file based example, I may not get stack traces.
>>>>>>
>>>>>> Thanks,
>>>>>> Othman
>>>>>>
>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <[email protected]> wrote:
>>>>>>
>>>>>>> Please do the following:
>>>>>>>
>>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib to lib.
>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib.
>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib.
>>>>>>> (6) Modify your options.env to include all of the jars you moved.
>>>>>>> (7) Start up all ManifoldCF processes.
>>>>>>> (8) If you still get stack traces, please send them to me.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> By 'other place', do you mean the \lib repository? If that so, then
>>>>>>>> I have already tried it and it didn't work.
>>>>>>>>
>>>>>>>> Othman.
>>>>>>>>
>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Othman,
>>>>>>>>>
>>>>>>>>> I used the java dependency inspector to see what the issue is and
>>>>>>>>> it turns out that poi-ooxml.jar does refer back to poi.jar in the 
>>>>>>>>> class
>>>>>>>>> that is failing.  So you will need to move poi-3.15.jar and
>>>>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>>>>
>>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>>
>>>>>>>>> I'm very unhappy about the quality of the POI project code; it is
>>>>>>>>> definitely not using reasonable engineering practices, and I will be
>>>>>>>>> opening a ticket with them.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I'm using the file based example and all the changes you told me
>>>>>>>>>> to do. I reproduced them in the file based example. I'll try to 
>>>>>>>>>> install
>>>>>>>>>> zookeeper and use the zookeeper example. Will I need a configuration 
>>>>>>>>>> to do
>>>>>>>>>> in order to run the zookeeper example ?
>>>>>>>>>>
>>>>>>>>>> Othman.
>>>>>>>>>>
>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Are you using the zookeeper example, or the file-based example?
>>>>>>>>>>>
>>>>>>>>>>> If these jars have all been moved, and the options.env includes
>>>>>>>>>>> them, then I have to conclude that Apache POI's pom.xml is 
>>>>>>>>>>> incorrect too.
>>>>>>>>>>> It will take a while to figure out what's missing that 
>>>>>>>>>>> poi-ooxml.jar needs
>>>>>>>>>>> that is not listed.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> All the dependencies you mentioned have already been added in
>>>>>>>>>>>> the options.env.win file in the multiprocess-file-example 
>>>>>>>>>>>> repository.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the
>>>>>>>>>>>>> one in the multiprocess-zk-example document or 
>>>>>>>>>>>>> multiprocess-file-example ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm
>>>>>>>>>>>>>>> actually using 2.1.0 which is pretty old for this new version 
>>>>>>>>>>>>>>> of ManifoldCF?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a different is
>>>>>>>>>>>>>>>> showing. You will find the stack trace attached.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not have moved
>>>>>>>>>>>>>>>>> poi-3.15.jar.  Please move that back, and 
>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar
>>>>>>>>>>>>>>>>>> must also be included.  This would mean that 
>>>>>>>>>>>>>>>>>> curvesapi-1.04.jar and
>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and another
>>>>>>>>>>>>>>>>>>> one : poi-3.15.jar . Unfortunately, there is another error 
>>>>>>>>>>>>>>>>>>> showing. This
>>>>>>>>>>>>>>>>>>> time, it concerns excel files. You will find attached the 
>>>>>>>>>>>>>>>>>>> stack trace.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into
>>>>>>>>>>>>>>>>>>>> another jar, which will also need to be moved.  *That* jar 
>>>>>>>>>>>>>>>>>>>> has yet another
>>>>>>>>>>>>>>>>>>>> dependency too.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies
>>>>>>>>>>>>>>>>>>>>> for the bad quality of the image, I'm doing my best to 
>>>>>>>>>>>>>>>>>>>>> send you the stack
>>>>>>>>>>>>>>>>>>>>> trace as I don't have the right to send documents outside 
>>>>>>>>>>>>>>>>>>>>> the company.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the
>>>>>>>>>>>>>>>>>>>>>> problem is.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked
>>>>>>>>>>>>>>>>>>>>>>> into the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you
>>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about the regular 
>>>>>>>>>>>>>>>>>>>>>>>> expressions? How can I
>>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job's paths 
>>>>>>>>>>>>>>>>>>>>>>>> tab ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if
>>>>>>>>>>>>>>>>>>>>>>>>> it works.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your
>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the classpath 
>>>>>>>>>>>>>>>>>>>>>>>>>> for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround
>>>>>>>>>>>>>>>>>>>>>>>>>>> you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl
>>>>>>>>>>>>>>>>>>>>>>>>>>> resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika server transformer rather than the embedded 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it in the attached file. For your information, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job started yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, if you are indeed using the binary 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> my computer. I have copied
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cellphone. I hope it will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation about how to restrict the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified. For instance, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents that counts the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the document is with capital
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information on the "Paths" tab in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares.  There is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-user documentation both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributions that describe how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works. I have another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some filters while crawling. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folders. Could you give me an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> regex allow to use /i to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people often have problems with getting file 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> permissions right, and they do
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not understand how to shut processes down 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanly, and zookeeper is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that.  I highly recommend 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of memory.  The default values
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a pretty small job for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to know how is zookeeper different from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file based sync? I also need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How many Go should I allocate for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason, and that's interfering with 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF log file and extracted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be full. Shutting down process; locks 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be left dangling. You must
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Moreover, the job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. I was wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on retry, but does not.  It can be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output side.  If you look at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log file, you should be able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrong.  Without further information, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from société générale in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> France. I'm actually using your recent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . I'm working 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on an internal search engine. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manifoldcf in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> order to index documents on windows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a serious problem 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while crawling 35K documents. Most
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf start 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling a big sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> document : software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Reply via email to