Re: Question about ManifoldCF 2.8

Beelz Ryuzaki Thu, 31 Aug 2017 09:13:04 -0700

Hi Karl,

By 'other place', do you mean the \lib repository? If that so, then I have
already tried it and it didn't work.


Othman.

On Thu, 31 Aug 2017 at 18:07, Karl Wright <[email protected]> wrote:

> Hi Othman,
>
> I used the java dependency inspector to see what the issue is and it turns
> out that poi-ooxml.jar does refer back to poi.jar in the class that is
> failing.  So you will need to move poi-3.15.jar and
> commons-collections4-1.4.jar to the other place as well.
>
> Let's hope that finally fixes this issue.
>
> I'm very unhappy about the quality of the POI project code; it is
> definitely not using reasonable engineering practices, and I will be
> opening a ticket with them.
>
> Thanks,
> Karl
>
>
> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <[email protected]>
> wrote:
>
>> I'm using the file based example and all the changes you told me to do. I
>> reproduced them in the file based example. I'll try to install zookeeper
>> and use the zookeeper example. Will I need a configuration to do in order
>> to run the zookeeper example ?
>>
>> Othman.
>>
>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> wrote:
>>
>>> Are you using the zookeeper example, or the file-based example?
>>>
>>> If these jars have all been moved, and the options.env includes them,
>>> then I have to conclude that Apache POI's pom.xml is incorrect too.  It
>>> will take a while to figure out what's missing that poi-ooxml.jar needs
>>> that is not listed.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected]>
>>> wrote:
>>>
>>>> All the dependencies you mentioned have already been added in the
>>>> options.env.win file in the multiprocess-file-example repository.
>>>>
>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]>
>>>> wrote:
>>>>
>>>>> Yes, I added it in the options.env.win file. Should it be the one in
>>>>> the multiprocess-zk-example document or multiprocess-file-example ?
>>>>>
>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> wrote:
>>>>>
>>>>>> It's not related at all to elasticsearch.
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Could it be a problem of elasticsearch's version ? I'm actually
>>>>>>> using 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>>>>
>>>>>>> Othman.
>>>>>>>
>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I moved back both the jars you mentioned and a different is
>>>>>>>> showing. You will find the stack trace attached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Othman
>>>>>>>>
>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've looked at the dependencies; you should not have moved
>>>>>>>>> poi-3.15.jar.  Please move that back, and 
>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>
>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar must
>>>>>>>>>> also be included.  This would mean that curvesapi-1.04.jar and
>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>
>>>>>>>>>>> I added the two jars that you have mentioned and another one :
>>>>>>>>>>> poi-3.15.jar . Unfortunately, there is another error showing. This 
>>>>>>>>>>> time, it
>>>>>>>>>>> concerns excel files. You will find attached the stack trace.
>>>>>>>>>>>
>>>>>>>>>>> Othman.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into another
>>>>>>>>>>>> jar, which will also need to be moved.  *That* jar has yet another
>>>>>>>>>>>> dependency too.
>>>>>>>>>>>>
>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>
>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You will find attached the stack trace. My apologies for the
>>>>>>>>>>>>> bad quality of the image, I'm doing my best to send you the stack 
>>>>>>>>>>>>> trace as
>>>>>>>>>>>>> I don't have the right to send documents outside the company.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the problem
>>>>>>>>>>>>>> is.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into the
>>>>>>>>>>>>>>> log file and saw the following error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected the
>>>>>>>>>>>>>>>> crawling resumed. How about the regular expressions? How can I 
>>>>>>>>>>>>>>>> make complex
>>>>>>>>>>>>>>>> regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it works.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env files
>>>>>>>>>>>>>>>>>> to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you
>>>>>>>>>>>>>>>>>>> could try.  Specifically:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika
>>>>>>>>>>>>>>>>>>>> server transformer rather than the embedded Tika 
>>>>>>>>>>>>>>>>>>>> Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and
>>>>>>>>>>>>>>>>>>>>> my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in the
>>>>>>>>>>>>>>>>>>>>> attached file. For your information, the job started 
>>>>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing.
>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you
>>>>>>>>>>>>>>>>>>>>>> are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For security
>>>>>>>>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I 
>>>>>>>>>>>>>>>>>>>>>>> have copied the stack
>>>>>>>>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will 
>>>>>>>>>>>>>>>>>>>>>>> be helpful.
>>>>>>>>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to 
>>>>>>>>>>>>>>>>>>>>>>> restrict the crawling
>>>>>>>>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For 
>>>>>>>>>>>>>>>>>>>>>>> instance, I would
>>>>>>>>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that 
>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound'
>>>>>>>>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is 
>>>>>>>>>>>>>>>>>>>>>>> with capital letters
>>>>>>>>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows
>>>>>>>>>>>>>>>>>>>>>>>> share connector is by specifying information on the 
>>>>>>>>>>>>>>>>>>>>>>>> "Paths" tab in jobs
>>>>>>>>>>>>>>>>>>>>>>>> that crawl windows shares.  There is end-user 
>>>>>>>>>>>>>>>>>>>>>>>> documentation both online and
>>>>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that 
>>>>>>>>>>>>>>>>>>>>>>>> describe how to do this.
>>>>>>>>>>>>>>>>>>>>>>>> Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using
>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper and I will let you know if it works. I have 
>>>>>>>>>>>>>>>>>>>>>>>>> another question to
>>>>>>>>>>>>>>>>>>>>>>>>> ask. Actually, I need to make some filters while 
>>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to
>>>>>>>>>>>>>>>>>>>>>>>>> crawl some files and some folders. Could you give me 
>>>>>>>>>>>>>>>>>>>>>>>>> an example of how to
>>>>>>>>>>>>>>>>>>>>>>>>> use the regex. Does the regex allow to use /i to 
>>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people
>>>>>>>>>>>>>>>>>>>>>>>>>> often have problems with getting file permissions 
>>>>>>>>>>>>>>>>>>>>>>>>>> right, and they do not
>>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and 
>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient
>>>>>>>>>>>>>>>>>>>>>>>>>> against that.  I highly recommend using zookeeper 
>>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into
>>>>>>>>>>>>>>>>>>>>>>>>>> memory so you do not need huge amounts of memory.  
>>>>>>>>>>>>>>>>>>>>>>>>>> The default values are
>>>>>>>>>>>>>>>>>>>>>>>>>> more than enough for 35,000 files, which is a pretty 
>>>>>>>>>>>>>>>>>>>>>>>>>> small job for
>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know
>>>>>>>>>>>>>>>>>>>>>>>>>>> how is zookeeper different from file based sync? I 
>>>>>>>>>>>>>>>>>>>>>>>>>>> also need a guidance on
>>>>>>>>>>>>>>>>>>>>>>>>>>> how to manage my pc's memory. How many Go should I 
>>>>>>>>>>>>>>>>>>>>>>>>>>> allocate for the
>>>>>>>>>>>>>>>>>>>>>>>>>>> start-agent of ManifoldCF? Is 4Go enough in order 
>>>>>>>>>>>>>>>>>>>>>>>>>>> to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have looked into the ManifoldCF log file and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full. Shutting down process; locks may be left 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You must cleanup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I was 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that looks like it might go away on retry, but 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not.  It can be either
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the repository side or on the output side.  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you look at the Simple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file, you should be able to get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong.  Without 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> société générale in France. I'm actually using 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your recent version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents on windows shares. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 35K documents. Most of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>

Re: Question about ManifoldCF 2.8

Reply via email to