I think you may not have liked the approach :(

However,  I tried that and it seems working fine. I gave 20+ big runs and they 
all seems went through. 

Just checking should I use raw copy or is there better way to copy indexes 
without losing any transit data, such as ($indexer->add_index($index);)

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:rajiv.gu...@netapp.com] 
Sent: Monday, January 02, 2017 7:47 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c 
line 119

Till now we are under the impression of - 
http://lucene.472066.n3.nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-using-Lucy-td4160395.html
 so avoiding any kind of parallel indexing. 

Let me know your thoughts on this approach. Run all indexing in parallel and 
save indexes at /tmp (local fs location) and periodically copy it to shared 
location. Why to copy because from servers where I'm performing search need 
access to the indexes. Insertion will happen only from one server however 
searches can be performed from different servers using indexed data. 

-Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnho...@aevum.de] 
Sent: Monday, December 19, 2016 7:09 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c 
line 119

On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running 
> from the same host. Would you think giving host name uniqueness with some 
> random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's 
actually no need to set a hostname at all, although it's good practice. It's 
all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same 
machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a 
> test framework that generate lot of logs and I'm doing indexing per test runs 
> and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of 
your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index 
> all files under that directory serially. Currently I'm creating watchers on 
> all the files and some time multiple files in the same directory may try to 
> get indexed at the same time.  And as you stated this might be the issue. I'm 
> not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, 
especially if it all happens on a single machine. The worst thing that could 
happen are lock errors which can be addressed by changing timeouts or retrying. 
But without code to reproduce the problem, I can't tell whether it's a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the problems 
are caused by parallel indexing at all. I'd also try to move your indices to a 
local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale 
NFS filehandle". If you have another way to handle such errors, there might be 
no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick

Reply via email to