Re: Question about ManifoldCF 2.8

Karl Wright Thu, 31 Aug 2017 09:07:26 -0700

Hi Othman,

I used the java dependency inspector to see what the issue is and it turns
out that poi-ooxml.jar does refer back to poi.jar in the class that is
failing.  So you will need to move poi-3.15.jar and
commons-collections4-1.4.jar to the other place as well.


Let's hope that finally fixes this issue.

I'm very unhappy about the quality of the POI project code; it is
definitely not using reasonable engineering practices, and I will be
opening a ticket with them.

Thanks,
Karl


On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <[email protected]> wrote:

> I'm using the file based example and all the changes you told me to do. I
> reproduced them in the file based example. I'll try to install zookeeper
> and use the zookeeper example. Will I need a configuration to do in order
> to run the zookeeper example ?
>
> Othman.
>
> On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> wrote:
>
>> Are you using the zookeeper example, or the file-based example?
>>
>> If these jars have all been moved, and the options.env includes them,
>> then I have to conclude that Apache POI's pom.xml is incorrect too.  It
>> will take a while to figure out what's missing that poi-ooxml.jar needs
>> that is not listed.
>>
>> Karl
>>
>>
>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected]>
>> wrote:
>>
>>> All the dependencies you mentioned have already been added in the
>>> options.env.win file in the multiprocess-file-example repository.
>>>
>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> wrote:
>>>
>>>> Yes, I added it in the options.env.win file. Should it be the one in
>>>> the multiprocess-zk-example document or multiprocess-file-example ?
>>>>
>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> wrote:
>>>>
>>>>> It's not related at all to elasticsearch.
>>>>> Karl
>>>>>
>>>>>
>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Could it be a problem of elasticsearch's version ? I'm actually using
>>>>>> 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>>>
>>>>>> Othman.
>>>>>>
>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I moved back both the jars you mentioned and a different is showing.
>>>>>>> You will find the stack trace attached.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Othman
>>>>>>>
>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've looked at the dependencies; you should not have moved
>>>>>>>> poi-3.15.jar.  Please move that back, and commons-collections4-4.1.jar 
>>>>>>>> too.
>>>>>>>>
>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar must also
>>>>>>>>> be included.  This would mean that curvesapi-1.04.jar and
>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>>
>>>>>>>>>> I added the two jars that you have mentioned and another one :
>>>>>>>>>> poi-3.15.jar . Unfortunately, there is another error showing. This 
>>>>>>>>>> time, it
>>>>>>>>>> concerns excel files. You will find attached the stack trace.
>>>>>>>>>>
>>>>>>>>>> Othman.
>>>>>>>>>>
>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>
>>>>>>>>>>> Yes, this shows that the jar we moved calls back into another
>>>>>>>>>>> jar, which will also need to be moved.  *That* jar has yet another
>>>>>>>>>>> dependency too.
>>>>>>>>>>>
>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>
>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You will find attached the stack trace. My apologies for the
>>>>>>>>>>>> bad quality of the image, I'm doing my best to send you the stack 
>>>>>>>>>>>> trace as
>>>>>>>>>>>> I don't have the right to send documents outside the company.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>
>>>>>>>>>>>> Othman
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the problem
>>>>>>>>>>>>> is.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into the
>>>>>>>>>>>>>> log file and saw the following error:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/
>>>>>>>>>>>>>> POIXMLTypeLoader.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected the
>>>>>>>>>>>>>>> crawling resumed. How about the regular expressions? How can I 
>>>>>>>>>>>>>>> make complex
>>>>>>>>>>>>>>> regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it works.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env files
>>>>>>>>>>>>>>>>> to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you
>>>>>>>>>>>>>>>>>> could try.  Specifically:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika server
>>>>>>>>>>>>>>>>>>> transformer rather than the embedded Tika Extractor.  I'm 
>>>>>>>>>>>>>>>>>>> still looking
>>>>>>>>>>>>>>>>>>> into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and
>>>>>>>>>>>>>>>>>>>> my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in the
>>>>>>>>>>>>>>>>>>>> attached file. For your information, the job started 
>>>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing.
>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you
>>>>>>>>>>>>>>>>>>>>> are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For security
>>>>>>>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I have 
>>>>>>>>>>>>>>>>>>>>>> copied the stack
>>>>>>>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will 
>>>>>>>>>>>>>>>>>>>>>> be helpful.
>>>>>>>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to 
>>>>>>>>>>>>>>>>>>>>>> restrict the crawling
>>>>>>>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For 
>>>>>>>>>>>>>>>>>>>>>> instance, I would
>>>>>>>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that 
>>>>>>>>>>>>>>>>>>>>>> counts the 'sound'
>>>>>>>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is 
>>>>>>>>>>>>>>>>>>>>>> with capital letters
>>>>>>>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows
>>>>>>>>>>>>>>>>>>>>>>> share connector is by specifying information on the 
>>>>>>>>>>>>>>>>>>>>>>> "Paths" tab in jobs
>>>>>>>>>>>>>>>>>>>>>>> that crawl windows shares.  There is end-user 
>>>>>>>>>>>>>>>>>>>>>>> documentation both online and
>>>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that describe 
>>>>>>>>>>>>>>>>>>>>>>> how to do this.
>>>>>>>>>>>>>>>>>>>>>>> Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using
>>>>>>>>>>>>>>>>>>>>>>>> zookeeper and I will let you know if it works. I have 
>>>>>>>>>>>>>>>>>>>>>>>> another question to
>>>>>>>>>>>>>>>>>>>>>>>> ask. Actually, I need to make some filters while 
>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to
>>>>>>>>>>>>>>>>>>>>>>>> crawl some files and some folders. Could you give me 
>>>>>>>>>>>>>>>>>>>>>>>> an example of how to
>>>>>>>>>>>>>>>>>>>>>>>> use the regex. Does the regex allow to use /i to 
>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people often
>>>>>>>>>>>>>>>>>>>>>>>>> have problems with getting file permissions right, 
>>>>>>>>>>>>>>>>>>>>>>>>> and they do not
>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and 
>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient
>>>>>>>>>>>>>>>>>>>>>>>>> against that.  I highly recommend using zookeeper 
>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into
>>>>>>>>>>>>>>>>>>>>>>>>> memory so you do not need huge amounts of memory.  
>>>>>>>>>>>>>>>>>>>>>>>>> The default values are
>>>>>>>>>>>>>>>>>>>>>>>>> more than enough for 35,000 files, which is a pretty 
>>>>>>>>>>>>>>>>>>>>>>>>> small job for
>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know
>>>>>>>>>>>>>>>>>>>>>>>>>> how is zookeeper different from file based sync? I 
>>>>>>>>>>>>>>>>>>>>>>>>>> also need a guidance on
>>>>>>>>>>>>>>>>>>>>>>>>>> how to manage my pc's memory. How many Go should I 
>>>>>>>>>>>>>>>>>>>>>>>>>> allocate for the
>>>>>>>>>>>>>>>>>>>>>>>>>> start-agent of ManifoldCF? Is 4Go enough in order to 
>>>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and
>>>>>>>>>>>>>>>>>>>>>>>>>>> that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of file-based
>>>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after
>>>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> have looked into the ManifoldCF log file and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following
>>>>>>>>>>>>>>>>>>>>>>>>>>>> warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be
>>>>>>>>>>>>>>>>>>>>>>>>>>>> full. Shutting down process; locks may be left 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You must cleanup
>>>>>>>>>>>>>>>>>>>>>>>>>>>> before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the elasticsearch
>>>>>>>>>>>>>>>>>>>>>>>>>>>> output connection. Moreover, the job uses Tika to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> extract metadata and a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> file system as a repository connection. During the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> job, I don't extract the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> content of the documents. I was wandering if the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue comes from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that looks like it might go away on retry, but 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not.  It can be either
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the repository side or on the output side.  If 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you look at the Simple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log file, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you should be able to get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong.  Without 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> société générale in France. I'm actually using 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your recent version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index documents 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 35K 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big sized 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>

Re: Question about ManifoldCF 2.8

Reply via email to