Thanks Nick for your reply. Thanks Peter too. This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?
Yes, I have an unusual setup. Let me try to explain the setup. * My application is a test application. That runs too many test cases in parallel, which generates lot of log files. I'm using Lucy to index those log files for faster search, pagination and generating summary. * From my application I kickoff a lucyindexer script using Open3 which is primarily responsible for indexing all the files while tests are progressing. The output & error of lucyindexer goes to STDOUT that is redirected to a log file. * My application generates log files from 4 different sources. The information of all the log files that are newly created and end of files are stored in 4 different tables in our database. * In my lucyindexer main module I use EV watchers. To monitor the tables I use EV::periodic (5sec) for new entries and completion of file (10sec), and EV:stat 1sec for file changes (however is just like periodic since that EV::stat won't work on NFS) and EV::IO to check the broken pipe so that I exit from indexer process once my test application ends. *With each watcher when I get a new log files it follows following workflow. Scan through the file with very limited keywords, doing file open and reading line by line and create a Lucy doc base of defined regular expression. If it got the end time from db then insert another special doc end marker indicating the end of the file. That file gets removed from my list after adding end marker. End marker also stores the last line number and last seek pointer. If no end time got for that file then it keeps 1 sec stat for new changes and add Lucy docs incrementally. With every new next file in I use Lucy Search to search if that file was opened previously, if I found that file name then I get its last line number and the seek pointer from the end marker. I delete that doc (end marker) using Lucy indexer delete by query and start reading that file for further changes. Once that file is aging closed I again insert the end marker. Once I get the pipe broken from my parent test application i keep buffer of 2 mins to insert end marker for in process files. * The index directory for all these files is under same folder name with name .lucyindexer/1 (I fixed it). There are multiple files in the same folder but it is rare (I never see it) that they conflicts in creating docs. Why I'm saying it is because one version of application is already out which is generating the docs however, it has a problem that when same file opens again it re-index it full which takes time and creates duplicates. That is the reason I tried to insert Search before adding doc for those files. I can also keep them in memory but since sometime list of file goes in 100k (for long running tests) the system get out of memory and become very slow. * Indexer and log files are on NFS mount. * I also observed that EV sometime getting premature ends (without calling break) but I'm not sure it is because of Indexer error. That time there is no error reported. * In my Viewer application I run Forked LucySearch to consolidate data from all the folders. The list of folders sometime goes 1000. I used polysearcher but not found it faster than fork. My Lucy library version is 0.4.2 I have asked my infra team to upgrade, which may take a month of so. Here is what happening in parallel most of the time. Search->Found->delete doc->add doc->commit add doc->commit Thanks for reading till here. I'm open for any suggestions. I really liked this framework and see big opportunity in my company internal triaging strategy with linking it with product logs for more effective results. You guys rock! Thanks, Rajiv Gupta -----Original Message----- From: Nick Wellnhofer [mailto:wellnho...@aevum.de] Sent: Wednesday, December 07, 2016 4:46 PM To: user@lucy.apache.org Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119 On 06/12/2016 17:17, Gupta, Rajiv wrote: > Any idea why I'm getting this error. > > Error Invalid path: 'seg_9i/lextemp' > 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB > Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp' > 20161205 184114 [] LUCY_Folder_Open_Out_IMP at > core/Lucy/Store/Folder.c line 119 > 20161205 184114 [] S_lazy_init at core/Lucy/Index/PostingListWriter.c > line 92 > > > In another log file getting different error > > Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> > /.lucyindex/1/schema_an.json' failed: Invalid argument > 20161205 174146 [] LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c > line 429 > > When committing the indexer object. This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS? > In both the case I'm seeing one common pattern that time is getting skewed up > in the STDOUT log file by 5-6 hrs before starting the process on this file. > In actual system time is not changed. I don't fully understand this paragraph. Can you clarify? Nick