Thanks Peter. For now I'm using copydir. Not seen any problem so far except 
indexes are not available during copy which is expected and for that I have put 
the retry.

-----Original Message-----
From: pek...@gmail.com [mailto:pek...@gmail.com] On Behalf Of Peter Karman
Sent: Wednesday, January 04, 2017 8:31 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c 
line 119

I use rsync to copy indexes from one machine to another. Copy probably works 
too.

Another approach is to have a single indexer and some kind of queue, so that 
separate worker machines can push documents-to-be-indexed to the queue and the 
indexer runs periodically to injest them. Same idea, but performance may vary 
depending on the number of workers and frequency of updates.

On Wed, Jan 4, 2017 at 8:22 AM, Gupta, Rajiv <rajiv.gu...@netapp.com> wrote:

> I think you may not have liked the approach :(
>
> However,  I tried that and it seems working fine. I gave 20+ big runs 
> and they all seems went through.
>
> Just checking should I use raw copy or is there better way to copy 
> indexes without losing any transit data, such as 
> ($indexer->add_index($index);)
>
> Thanks,
> Rajiv Gupta
>
> -----Original Message-----
> From: Gupta, Rajiv [mailto:rajiv.gu...@netapp.com]
> Sent: Monday, January 02, 2017 7:47 PM
> To: user@lucy.apache.org
> Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
>
> Till now we are under the impression of - http://lucene.472066.n3.
> nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-
> using-Lucy-td4160395.html so avoiding any kind of parallel indexing.
>
> Let me know your thoughts on this approach. Run all indexing in 
> parallel and save indexes at /tmp (local fs location) and periodically 
> copy it to shared location. Why to copy because from servers where I'm 
> performing search need access to the indexes. Insertion will happen 
> only from one server however searches can be performed from different 
> servers using indexed data.
>
> -Rajiv
>
> -----Original Message-----
> From: Nick Wellnhofer [mailto:wellnho...@aevum.de]
> Sent: Monday, December 19, 2016 7:09 PM
> To: user@lucy.apache.org
> Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
>
> On 19/12/2016 04:21, Gupta, Rajiv wrote:
> > Rajiv>>>All parallel processes are child process of one process and
> running from the same host. Would you think giving host name 
> uniqueness with some random number would help for multiple processes.
>
> If you access an index on a shared volume only from a single host, 
> there's actually no need to set a hostname at all, although it's good 
> practice.
> It's all explained in Lucy::Docs::FileLocking:
>
>      http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html
>
> But you should never use different or even random `host` values on the 
> same machine. This can lead to stale lock files not being deleted 
> after a crash.
>
> > Rajiv>>> Going to local file system is not possible for my case. 
> > Rajiv>>> This is
> a test framework that generate lot of logs and I'm doing indexing per 
> test runs and all these logs needs to be on shared volume for other 
> triaging purpose.
>
> It doesn't matter where the log files are. I'm talking about the 
> location of your Lucy index directory.
>
> > The next thing I'm going to try is create a watcher per directory 
> > and
> index all files under that directory serially. Currently I'm creating 
> watchers on all the files and some time multiple files in the same 
> directory may try to get indexed at the same time.  And as you stated 
> this might be the issue. I'm not sure how it will perform with the 
> current time limits.
>
> By Lucy's design, indexing files in parallel shouldn't cause any 
> problems, especially if it all happens on a single machine. The worst 
> thing that could happen are lock errors which can be addressed by 
> changing timeouts or retrying. But without code to reproduce the 
> problem, I can't tell whether it's a Lucy bug.
>
> If you can't provide a test case, it's a good idea to test whether the 
> problems are caused by parallel indexing at all. I'd also try to move 
> your indices to a local file system to see whether it makes a difference.
>
> > Creating Indexer manager adding overhead to the search process.
>
> You only have to use IndexManagers for searchers to avoid errors like 
> "Stale NFS filehandle". If you have another way to handle such errors, 
> there might be no need for IndexManagers at all. Again, see 
> Lucy::Docs:FileLocking.
>
> Nick
>
>


--
Peter Karman . pe...@peknet.com . http://peknet.com/

Reply via email to