Re: Question about ManifoldCF 2.8

Karl Wright Thu, 31 Aug 2017 08:46:19 -0700

Are you using the zookeeper example, or the file-based example?

If these jars have all been moved, and the options.env includes them, then
I have to conclude that Apache POI's pom.xml is incorrect too.  It will
take a while to figure out what's missing that poi-ooxml.jar needs that is
not listed.


Karl


On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected]> wrote:

> All the dependencies you mentioned have already been added in the
> options.env.win file in the multiprocess-file-example repository.
>
> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> wrote:
>
>> Yes, I added it in the options.env.win file. Should it be the one in the
>> multiprocess-zk-example document or multiprocess-file-example ?
>>
>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> wrote:
>>
>>> It's not related at all to elasticsearch.
>>> Karl
>>>
>>>
>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <[email protected]>
>>> wrote:
>>>
>>>> Could it be a problem of elasticsearch's version ? I'm actually using
>>>> 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>
>>>> Othman.
>>>>
>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]>
>>>> wrote:
>>>>
>>>>> I moved back both the jars you mentioned and a different is showing.
>>>>> You will find the stack trace attached.
>>>>>
>>>>> Thanks,
>>>>> Othman
>>>>>
>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]> wrote:
>>>>>
>>>>>> I've looked at the dependencies; you should not have moved
>>>>>> poi-3.15.jar.  Please move that back, and commons-collections4-4.1.jar 
>>>>>> too.
>>>>>>
>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> If you include poi.jar, then all dependencies of poi.jar must also
>>>>>>> be included.  This would mean that curvesapi-1.04.jar and
>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I added the two jars that you have mentioned and another one :
>>>>>>>> poi-3.15.jar . Unfortunately, there is another error showing. This 
>>>>>>>> time, it
>>>>>>>> concerns excel files. You will find attached the stack trace.
>>>>>>>>
>>>>>>>> Othman.
>>>>>>>>
>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Othman,
>>>>>>>>>
>>>>>>>>> Yes, this shows that the jar we moved calls back into another jar,
>>>>>>>>> which will also need to be moved.  *That* jar has yet another 
>>>>>>>>> dependency
>>>>>>>>> too.
>>>>>>>>>
>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>
>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> You will find attached the stack trace. My apologies for the bad
>>>>>>>>>> quality of the image, I'm doing my best to send you the stack trace 
>>>>>>>>>> as I
>>>>>>>>>> don't have the right to send documents outside the company.
>>>>>>>>>>
>>>>>>>>>> Thank you for your time,
>>>>>>>>>>
>>>>>>>>>> Othman
>>>>>>>>>>
>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Once again, I need a stack trace to diagnose what the problem is.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into the log
>>>>>>>>>>>> file and saw the following error:
>>>>>>>>>>>>
>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/
>>>>>>>>>>>> POIXMLTypeLoader.
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>
>>>>>>>>>>>> Othman.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I have tried what you told me to do, and you expected the
>>>>>>>>>>>>> crawling resumed. How about the regular expressions? How can I 
>>>>>>>>>>>>> make complex
>>>>>>>>>>>>> regular expressions in the job's paths tab ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env files to
>>>>>>>>>>>>>>> include them in the classpath for startup.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you could
>>>>>>>>>>>>>>>> try.  Specifically:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>> (2) Move the following two files from connector-common-lib
>>>>>>>>>>>>>>>> to lib:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika server
>>>>>>>>>>>>>>>>> transformer rather than the embedded Tika Extractor.  I'm 
>>>>>>>>>>>>>>>>> still looking
>>>>>>>>>>>>>>>>> into why the jar is not being found.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and my
>>>>>>>>>>>>>>>>>> job got stuck on that specific file.
>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in the
>>>>>>>>>>>>>>>>>> attached file. For your information, the job started 
>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing.
>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you
>>>>>>>>>>>>>>>>>>> are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For security
>>>>>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I have 
>>>>>>>>>>>>>>>>>>>> copied the stack
>>>>>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will be 
>>>>>>>>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to 
>>>>>>>>>>>>>>>>>>>> restrict the crawling
>>>>>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For 
>>>>>>>>>>>>>>>>>>>> instance, I would
>>>>>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that 
>>>>>>>>>>>>>>>>>>>> counts the 'sound'
>>>>>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is 
>>>>>>>>>>>>>>>>>>>> with capital letters
>>>>>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows share
>>>>>>>>>>>>>>>>>>>>> connector is by specifying information on the "Paths" tab 
>>>>>>>>>>>>>>>>>>>>> in jobs that
>>>>>>>>>>>>>>>>>>>>> crawl windows shares.  There is end-user documentation 
>>>>>>>>>>>>>>>>>>>>> both online and
>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that describe 
>>>>>>>>>>>>>>>>>>>>> how to do this.
>>>>>>>>>>>>>>>>>>>>> Have you found it?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using
>>>>>>>>>>>>>>>>>>>>>> zookeeper and I will let you know if it works. I have 
>>>>>>>>>>>>>>>>>>>>>> another question to
>>>>>>>>>>>>>>>>>>>>>> ask. Actually, I need to make some filters while 
>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to
>>>>>>>>>>>>>>>>>>>>>> crawl some files and some folders. Could you give me an 
>>>>>>>>>>>>>>>>>>>>>> example of how to
>>>>>>>>>>>>>>>>>>>>>> use the regex. Does the regex allow to use /i to ignore 
>>>>>>>>>>>>>>>>>>>>>> cases ?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people often
>>>>>>>>>>>>>>>>>>>>>>> have problems with getting file permissions right, and 
>>>>>>>>>>>>>>>>>>>>>>> they do not
>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and 
>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient
>>>>>>>>>>>>>>>>>>>>>>> against that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into
>>>>>>>>>>>>>>>>>>>>>>> memory so you do not need huge amounts of memory.  The 
>>>>>>>>>>>>>>>>>>>>>>> default values are
>>>>>>>>>>>>>>>>>>>>>>> more than enough for 35,000 files, which is a pretty 
>>>>>>>>>>>>>>>>>>>>>>> small job for
>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know
>>>>>>>>>>>>>>>>>>>>>>>> how is zookeeper different from file based sync? I 
>>>>>>>>>>>>>>>>>>>>>>>> also need a guidance on
>>>>>>>>>>>>>>>>>>>>>>>> how to manage my pc's memory. How many Go should I 
>>>>>>>>>>>>>>>>>>>>>>>> allocate for the
>>>>>>>>>>>>>>>>>>>>>>>> start-agent of ManifoldCF? Is 4Go enough in order to 
>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and
>>>>>>>>>>>>>>>>>>>>>>>>> that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of file-based
>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after
>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I have
>>>>>>>>>>>>>>>>>>>>>>>>>> looked into the ManifoldCF log file and extracted 
>>>>>>>>>>>>>>>>>>>>>>>>>> the following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.
>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be full.
>>>>>>>>>>>>>>>>>>>>>>>>>> Shutting down process; locks may be left dangling. 
>>>>>>>>>>>>>>>>>>>>>>>>>> You must cleanup before
>>>>>>>>>>>>>>>>>>>>>>>>>> restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the elasticsearch
>>>>>>>>>>>>>>>>>>>>>>>>>> output connection. Moreover, the job uses Tika to 
>>>>>>>>>>>>>>>>>>>>>>>>>> extract metadata and a
>>>>>>>>>>>>>>>>>>>>>>>>>> file system as a repository connection. During the 
>>>>>>>>>>>>>>>>>>>>>>>>>> job, I don't extract the
>>>>>>>>>>>>>>>>>>>>>>>>>> content of the documents. I was wandering if the 
>>>>>>>>>>>>>>>>>>>>>>>>>> issue comes from
>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error that
>>>>>>>>>>>>>>>>>>>>>>>>>>> looks like it might go away on retry, but does not. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>  It can be either on
>>>>>>>>>>>>>>>>>>>>>>>>>>> the repository side or on the output side.  If you 
>>>>>>>>>>>>>>>>>>>>>>>>>>> look at the Simple
>>>>>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log file, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> you should be able to get
>>>>>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong.  Without further 
>>>>>>>>>>>>>>>>>>>>>>>>>>> information, I can't
>>>>>>>>>>>>>>>>>>>>>>>>>>> say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> société générale in France. I'm actually using 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> your recent version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal search 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine. For this reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index documents 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 35K 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big sized 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve this
>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>

Re: Question about ManifoldCF 2.8

Reply via email to