Thanks Nick for your reply. Thanks Peter too. 

This looks like two processes are writing to the index at once. This shouldn't 
happen unless something with our locking mechanism is broken. Do you have an 
unusual setup? Are you perhaps running on NFS?

Yes, I have an unusual setup. Let me try to explain the setup. 

* My application is a test application. That runs too many test cases in 
parallel, which generates lot of log files. I'm using Lucy to index those log 
files for faster search, pagination and generating summary. 
* From my application I kickoff a lucyindexer script using Open3 which is 
primarily responsible for indexing all the files while tests are progressing. 
The output & error of lucyindexer goes to STDOUT that is redirected to a log 
file. 
* My application generates log files from 4 different sources. The information 
of all the log files that are newly created and end of files are stored in 4 
different tables in our database.
* In my lucyindexer main module I use EV watchers. To monitor the tables I use 
EV::periodic (5sec) for new entries and completion of file (10sec), and EV:stat 
1sec for file changes (however is just like periodic since that EV::stat won't 
work on NFS) and EV::IO to check the broken pipe so that I exit from indexer 
process once my test application ends. 
*With each watcher when I get a new log files it follows following workflow. 
Scan through the file with very limited keywords, doing file open and reading 
line by line and create a Lucy doc base of defined regular expression. If it 
got the end time from db then insert another special doc end marker indicating 
the end of the file. That file gets removed from my list after adding end 
marker. End marker also stores the last line number and last seek pointer.  If 
no end time got for that file then it keeps 1 sec stat for new changes and add 
Lucy docs incrementally. With every new next file in I use Lucy Search to 
search if that file was opened previously, if I found that file name then I get 
its last line number and the seek pointer from the end marker. I delete that 
doc (end marker) using Lucy indexer delete by query and start reading that file 
for further changes. Once that file is aging closed I again insert the end 
marker. Once I get the pipe broken from my parent test application i keep 
buffer of 2 mins to insert end marker for in process  files.
* The index directory for all these files is under same folder name with name 
.lucyindexer/1 (I fixed it). There are multiple files in the same folder but it 
is rare (I never see it) that they conflicts in creating docs. Why I'm saying 
it is because one version of application is already out which is generating the 
docs however, it has a problem that when same file opens again it re-index it 
full which takes time and creates duplicates. That is the reason I tried to 
insert Search before adding doc for those files. I can also keep them in memory 
but since sometime list of file goes in 100k (for long running tests) the 
system get out of memory and become very slow.
* Indexer and log files are on NFS mount.
* I also observed that EV sometime getting premature ends (without calling 
break) but I'm not sure it is because of Indexer error. That time there is no 
error reported.
* In my Viewer application I run Forked LucySearch to consolidate data from all 
the folders. The list of folders sometime goes 1000. I used polysearcher but 
not found it faster than fork. 

My Lucy library version is 0.4.2 I have asked my infra team to upgrade, which 
may take a month of so.  

Here is what happening in parallel most of the time.

Search->Found->delete doc->add doc->commit
add doc->commit

Thanks for reading till here. I'm open for any suggestions. I really liked this 
framework and see big opportunity in my company internal triaging strategy with 
linking it with product logs for more effective results. 

You guys rock!

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnho...@aevum.de] 
Sent: Wednesday, December 07, 2016 4:46 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c 
line 119

On 06/12/2016 17:17, Gupta, Rajiv wrote:
> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB 
> Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c 
> line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> 
> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c 
> line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't 
happen unless something with our locking mechanism is broken. Do you have an 
unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up 
> in the STDOUT log file by 5-6 hrs before starting the process on this file. 
> In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick

Reply via email to