I'm using the file based example and all the changes you told me to do. I reproduced them in the file based example. I'll try to install zookeeper and use the zookeeper example. Will I need a configuration to do in order to run the zookeeper example ?
Othman. On Thu, 31 Aug 2017 at 17:46, Karl Wright <[email protected]> wrote: > Are you using the zookeeper example, or the file-based example? > > If these jars have all been moved, and the options.env includes them, then > I have to conclude that Apache POI's pom.xml is incorrect too. It will > take a while to figure out what's missing that poi-ooxml.jar needs that is > not listed. > > Karl > > > On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <[email protected]> > wrote: > >> All the dependencies you mentioned have already been added in the >> options.env.win file in the multiprocess-file-example repository. >> >> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <[email protected]> wrote: >> >>> Yes, I added it in the options.env.win file. Should it be the one in the >>> multiprocess-zk-example document or multiprocess-file-example ? >>> >>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <[email protected]> wrote: >>> >>>> It's not related at all to elasticsearch. >>>> Karl >>>> >>>> >>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <[email protected]> >>>> wrote: >>>> >>>>> Could it be a problem of elasticsearch's version ? I'm actually using >>>>> 2.1.0 which is pretty old for this new version of ManifoldCF? >>>>> >>>>> Othman. >>>>> >>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <[email protected]> >>>>> wrote: >>>>> >>>>>> I moved back both the jars you mentioned and a different is showing. >>>>>> You will find the stack trace attached. >>>>>> >>>>>> Thanks, >>>>>> Othman >>>>>> >>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <[email protected]> wrote: >>>>>> >>>>>>> I've looked at the dependencies; you should not have moved >>>>>>> poi-3.15.jar. Please move that back, and commons-collections4-4.1.jar >>>>>>> too. >>>>>>> >>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>> >>>>>>> Thanks, >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> If you include poi.jar, then all dependencies of poi.jar must also >>>>>>>> be included. This would mean that curvesapi-1.04.jar and >>>>>>>> commons-collections4-4.1.jar should also be included. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Karl, >>>>>>>>> >>>>>>>>> I added the two jars that you have mentioned and another one : >>>>>>>>> poi-3.15.jar . Unfortunately, there is another error showing. This >>>>>>>>> time, it >>>>>>>>> concerns excel files. You will find attached the stack trace. >>>>>>>>> >>>>>>>>> Othman. >>>>>>>>> >>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Othman, >>>>>>>>>> >>>>>>>>>> Yes, this shows that the jar we moved calls back into another >>>>>>>>>> jar, which will also need to be moved. *That* jar has yet another >>>>>>>>>> dependency too. >>>>>>>>>> >>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>> >>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> You will find attached the stack trace. My apologies for the bad >>>>>>>>>>> quality of the image, I'm doing my best to send you the stack trace >>>>>>>>>>> as I >>>>>>>>>>> don't have the right to send documents outside the company. >>>>>>>>>>> >>>>>>>>>>> Thank you for your time, >>>>>>>>>>> >>>>>>>>>>> Othman >>>>>>>>>>> >>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Once again, I need a stack trace to diagnose what the problem >>>>>>>>>>>> is. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into the >>>>>>>>>>>>> log file and saw the following error: >>>>>>>>>>>>> >>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader. >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>> >>>>>>>>>>>>> Othman. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I have tried what you told me to do, and you expected the >>>>>>>>>>>>>> crawling resumed. How about the regular expressions? How can I >>>>>>>>>>>>>> make complex >>>>>>>>>>>>>> regular expressions in the job's paths tab ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it works. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env files to >>>>>>>>>>>>>>>> include them in the classpath for startup. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you could >>>>>>>>>>>>>>>>> try. Specifically: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>> (2) Move the following two files from connector-common-lib >>>>>>>>>>>>>>>>> to lib: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika server >>>>>>>>>>>>>>>>>> transformer rather than the embedded Tika Extractor. I'm >>>>>>>>>>>>>>>>>> still looking >>>>>>>>>>>>>>>>>> into why the jar is not being found. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version, and >>>>>>>>>>>>>>>>>>> my job got stuck on that specific file. >>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in the >>>>>>>>>>>>>>>>>>> attached file. For your information, the job started >>>>>>>>>>>>>>>>>>> yesterday. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is missing. >>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if you >>>>>>>>>>>>>>>>>>>> are indeed using the binary distribution. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For security >>>>>>>>>>>>>>>>>>>>> reasons, I can't send any files from my computer. I have >>>>>>>>>>>>>>>>>>>>> copied the stack >>>>>>>>>>>>>>>>>>>>> trace and scanned it with my cellphone. I hope it will be >>>>>>>>>>>>>>>>>>>>> helpful. >>>>>>>>>>>>>>>>>>>>> Meanwhile, I have read the documentation about how to >>>>>>>>>>>>>>>>>>>>> restrict the crawling >>>>>>>>>>>>>>>>>>>>> and I don't think the '|' works in the specified. For >>>>>>>>>>>>>>>>>>>>> instance, I would >>>>>>>>>>>>>>>>>>>>> like to restrict the crawling for the documents that >>>>>>>>>>>>>>>>>>>>> counts the 'sound' >>>>>>>>>>>>>>>>>>>>> word . I proceed as follows: *(SON)* . the document is >>>>>>>>>>>>>>>>>>>>> with capital letters >>>>>>>>>>>>>>>>>>>>> and I noticed that it didn't take it into consideration. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows share >>>>>>>>>>>>>>>>>>>>>> connector is by specifying information on the "Paths" >>>>>>>>>>>>>>>>>>>>>> tab in jobs that >>>>>>>>>>>>>>>>>>>>>> crawl windows shares. There is end-user documentation >>>>>>>>>>>>>>>>>>>>>> both online and >>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that describe >>>>>>>>>>>>>>>>>>>>>> how to do this. >>>>>>>>>>>>>>>>>>>>>> Have you found it? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start using >>>>>>>>>>>>>>>>>>>>>>> zookeeper and I will let you know if it works. I have >>>>>>>>>>>>>>>>>>>>>>> another question to >>>>>>>>>>>>>>>>>>>>>>> ask. Actually, I need to make some filters while >>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to >>>>>>>>>>>>>>>>>>>>>>> crawl some files and some folders. Could you give me an >>>>>>>>>>>>>>>>>>>>>>> example of how to >>>>>>>>>>>>>>>>>>>>>>> use the regex. Does the regex allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people often >>>>>>>>>>>>>>>>>>>>>>>> have problems with getting file permissions right, and >>>>>>>>>>>>>>>>>>>>>>>> they do not >>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and >>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient >>>>>>>>>>>>>>>>>>>>>>>> against that. I highly recommend using zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files into >>>>>>>>>>>>>>>>>>>>>>>> memory so you do not need huge amounts of memory. The >>>>>>>>>>>>>>>>>>>>>>>> default values are >>>>>>>>>>>>>>>>>>>>>>>> more than enough for 35,000 files, which is a pretty >>>>>>>>>>>>>>>>>>>>>>>> small job for >>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to know >>>>>>>>>>>>>>>>>>>>>>>>> how is zookeeper different from file based sync? I >>>>>>>>>>>>>>>>>>>>>>>>> also need a guidance on >>>>>>>>>>>>>>>>>>>>>>>>> how to manage my pc's memory. How many Go should I >>>>>>>>>>>>>>>>>>>>>>>>> allocate for the >>>>>>>>>>>>>>>>>>>>>>>>> start-agent of ManifoldCF? Is 4Go enough in order to >>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason, and >>>>>>>>>>>>>>>>>>>>>>>>>> that's interfering with ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of file-based >>>>>>>>>>>>>>>>>>>>>>>>>> sync. >>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures after >>>>>>>>>>>>>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response. I >>>>>>>>>>>>>>>>>>>>>>>>>>> have looked into the ManifoldCF log file and >>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following >>>>>>>>>>>>>>>>>>>>>>>>>>> warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch >>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) >>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be full. >>>>>>>>>>>>>>>>>>>>>>>>>>> Shutting down process; locks may be left dangling. >>>>>>>>>>>>>>>>>>>>>>>>>>> You must cleanup before >>>>>>>>>>>>>>>>>>>>>>>>>>> restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the elasticsearch >>>>>>>>>>>>>>>>>>>>>>>>>>> output connection. Moreover, the job uses Tika to >>>>>>>>>>>>>>>>>>>>>>>>>>> extract metadata and a >>>>>>>>>>>>>>>>>>>>>>>>>>> file system as a repository connection. During the >>>>>>>>>>>>>>>>>>>>>>>>>>> job, I don't extract the >>>>>>>>>>>>>>>>>>>>>>>>>>> content of the documents. I was wandering if the >>>>>>>>>>>>>>>>>>>>>>>>>>> issue comes from >>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an error >>>>>>>>>>>>>>>>>>>>>>>>>>>> that looks like it might go away on retry, but >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. It can be either >>>>>>>>>>>>>>>>>>>>>>>>>>>> on the repository side or on the output side. If >>>>>>>>>>>>>>>>>>>>>>>>>>>> you look at the Simple >>>>>>>>>>>>>>>>>>>>>>>>>>>> History in the UI, or at the manifoldcf.log file, >>>>>>>>>>>>>>>>>>>>>>>>>>>> you should be able to get >>>>>>>>>>>>>>>>>>>>>>>>>>>> a better sense of what went wrong. Without >>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't >>>>>>>>>>>>>>>>>>>>>>>>>>>> say any more. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer from >>>>>>>>>>>>>>>>>>>>>>>>>>>>> société générale in France. I'm actually using >>>>>>>>>>>>>>>>>>>>>>>>>>>>> your recent version of >>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index documents >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 35K >>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big sized >>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following >>>>>>>>>>>>>>>>>>>>>>>>>>>>> error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : >>>>>>>>>>>>>>>>>>>>>>>>>>>>> software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to solve this >>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch 2.1.0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>> >
