Re: [Lucene.Net] Test case for: possible infinite loop bug in portuguese snowball stemmer?
Here is a test case: string text = @Califórnia; Lucene.Net.Analysis.KeywordTokenizer tokenizer = new KeywordTokenizer(new StringReader(text)); Lucene.Net.Analysis.Snowball.SnowballFilter stemmer= new Lucene.Net.Analysis.Snowball.SnowballFilter(tokenizer, Portuguese); Lucene.Net.Analysis.Token token; while ((token = stemmer.Next()) != null) { System.Console.WriteLine(tokenText); } Seems to go into infinite loop. Call to stemmer.Next() never returns. Not sure if this is the only stemmer I am having trouble with. And it does happen to us on a near daily basis. Thanks, Bob On Sep 13, 2011, at 9:37 AM, Robert Stewart wrote: Are there any known issues with snowball stemmers (portuguese in particular) going into some infinite loop? I have a problem that happens on a recurring basis where IndexWriter locks up on AddDocument and never returns (it has taken up to 3 days before we realize it), requiring manual killing of the process. It seems to happen only on portuguese documents from what I can tell so far, and the stack trace when thread is aborted is always as follows: System.Threading.ThreadAbortException: Thread was being aborted. at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() System.SystemException: System.Threading.ThreadAbortException: Thread was being aborted. at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() at Lucene.Net.Analysis.TokenStream.IncrementToken() at Lucene.Net.Index.DocInverterPerField.ProcessFields(Fieldable[] fields, Int32 count) at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument() at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc, Analyzer analyzer, Term delTerm) at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer analyzer) Is there another list of contrib/snowball issues? I have not been able to reproduce a small test case yet however. Have there been any such issues with stemmers in the past? Thanks, Bob
[Lucene.Net] possible infinite loop bug in portuguese snowball stemmer?
Are there any known issues with snowball stemmers (portuguese in particular) going into some infinite loop? I have a problem that happens on a recurring basis where IndexWriter locks up on AddDocument and never returns (it has taken up to 3 days before we realize it), requiring manual killing of the process. It seems to happen only on portuguese documents from what I can tell so far, and the stack trace when thread is aborted is always as follows: System.Threading.ThreadAbortException: Thread was being aborted. at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() System.SystemException: System.Threading.ThreadAbortException: Thread was being aborted. at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() at Lucene.Net.Analysis.TokenStream.IncrementToken() at Lucene.Net.Index.DocInverterPerField.ProcessFields(Fieldable[] fields, Int32 count) at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument() at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc, Analyzer analyzer, Term delTerm) at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer analyzer) Is there another list of contrib/snowball issues? I have not been able to reproduce a small test case yet however. Have there been any such issues with stemmers in the past? Thanks, Bob
[Lucene.Net] How to add document to more than one index (but only analyze once)?
Is it possible to add a document to more than one index at the same time, such that document fields are only analyzed one time? For instance, to add document to both a master index, and a smaller near real-time index. I would like to avoid analyzing document fields more than once but I dont see if that is possible at all using Lucene API. Thanks, Bob
Re: [Lucene.Net] How to add document to more than one index (but only analyze once)?
That sounds like a good plan. How will that affect existing merge scheduling? For master index I use merge factor of 2. On Sep 9, 2011, at 11:44 AM, digy digy wrote: How about indexing the new document(s) in memory using RAMDirectory then calling indexWriter.AddIndexesNoOptimize for NRT master index? DIGY On Fri, Sep 9, 2011 at 5:33 PM, Robert Stewart robert_stew...@epam.comwrote: Is it possible to add a document to more than one index at the same time, such that document fields are only analyzed one time? For instance, to add document to both a master index, and a smaller near real-time index. I would like to avoid analyzing document fields more than once but I dont see if that is possible at all using Lucene API. Thanks, Bob
Re: [Lucene.Net] Lucene Steroids
I have built something similar using NTFS hard-links and re-using existing local snapshot files, etc. It runs in production for 3+ years now with more than 100 million docs, and distributes new snapshots from master servers every minute. It does not use any rsync, but only leverages unique file names in lucene - it only copies files not already existing on slaves, and uses NTFS hard links to copy existing local files into new snapshot directory. Also, on the masters, it just uses NTFS hard links to create a new snapshot of the master index, and then slaves just look for new snapshot directories on the master servers. When new directory shows up, it looks at existing local snapshot to see which files are new on master (or have been deleted by master), and then only copies new files. It does not need to send any explicit commit operations, and there is no explicit communication between masters and slaves (slaves just look in some remote directory for new snapshot sub-directories). This has worked great with no problems at all. All this was built prior to SOLR being available on windows. Going forward we are transitioning to Java and SOLR on Linux (it is just to hard to keep up with improvements otherwise IMO). On Jul 6, 2011, at 8:22 PM, Guilherme Balena Versiani wrote: Hi, I am working on a derived work of Solr for .NET. The purpose is to obtain a similar solution of Lucene replication available at Solr, but without the need to port all Solr code. There is a SnapShooter, SnapPuller and a SnapInstaller. The SnapShooter does similar work as in Solr script. The SnapPuller uses cwRsync to replicate the database between machines, but without storing the snapshot.current.MACHINENAME files on master, as cwRsync does no support sync with the server. The SnapInstaller tries to substitute the Lucene database files in-place -- the Lucene application should use a SteroidsFSDirectory that creates a special SteroidsFSIndexInput that permits to rename files in use; after that, SnapInstaller sends a commit operation through a Windows named pipe to the application to reset its current IndexSearcher instance. This solution has the suggestive name of Lucene Steroids, and was hosted in BitBucket.org. What is the best way to continue to distribute it? Should I continue to maintain it on BitBucket.org or should I apply to Lucene.NET project (I don't know how) to include it on Contrib modules? The current code is available at http://bitbucket.org/guibv/lucene.steroids. The work is incomplete; the first stable version should be available on next few days. Best regards, Guilherme Balena Versiani.
[Lucene.Net] alternatives to FSDirectory for multi-threaded search performance
What are the recommended best practices for using FSDirectory vs. RamDirectory, etc. for use in multi-threaded search? In a previous version of Lucene.Net (1.9) I used a modified FSDirectory implementation which used a pool of open FileStream objects for each segment file, and handed them out in round-robin fashion from the Clone() method. That way multiple threads could read most segment files in parallel. It definitely increased multithreaded search performance quite a bit. My indexes are quite large (100+ million docs) and I can not load entire segments in to RAM using RamDirectory. My question is what is the best practice here? Is using a pool of descriptors as described above the best idea? Thanks Bob
[Lucene.Net] Score(collector) called for each subReader - but not what I need
As I previously tried to explain, I have custom query for some pre-cached terms, which I load into RAM in efficient compressed form. I need this for faster searching and also for much faster faceting. So what I do is process incoming query and replace certain sub-queries with my own CachedTermQuery objects, which extend Query. Since these are not per-segment, I only want scorer.Score(collector) called once, not once for each segment in my index. Essentially what happens now if I have a search is it collects the same documents N times, 1 time for each segment. Is there anyway to combine different Scorers/Collectors such that I can control when it enumerates collection by multiple sub-readers, and when not to? This all worked in previous version of Lucene because enumerating sub-indexes (segments) was pushed to a lower level inside Lucene API and not it is elevated to a higher level. Thanks Bob On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: I found the problem. The problem is that I have a custom query optimizer, and that replaces certain TermQuery's within a Boolean query with a custom Query and this query has its own weight/scorer that retrieves matching documents from an in-memory cache (and that is not Lucene backed). But it looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper which assumes Collect() needs called for multiple segments - so it is adding a start offset to the doc ID that comes from my custom query implementation. I looked at the new Collector class and it seems it works the same way (assumes it needs to set the next index reader with some offset). How can I make my custom query work with the new API (so that there is basically a single segment in RAM that my query uses, but still other query clauses in same boolean query use multiple lucene segments)? I am sure that is not clear and will try to provide more detail soon. Thanks Bob On Jun 9, 2011, at 1:48 PM, Digy wrote: Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the problem. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 8:40 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
Re: [Lucene.Net] Score(collector) called for each subReader - but not what I need
No I will try it though. Thanks. Bob On Jun 10, 2011, at 12:37 PM, Digy wrote: Have you tried to use Lucene.Net as is, before working on optimizing your code? There are a lot of speed improvements in it since 1.9. There is also a Faceted Search project in contrib. (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search ) DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Friday, June 10, 2011 7:14 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Score(collector) called for each subReader - but not what I need As I previously tried to explain, I have custom query for some pre-cached terms, which I load into RAM in efficient compressed form. I need this for faster searching and also for much faster faceting. So what I do is process incoming query and replace certain sub-queries with my own CachedTermQuery objects, which extend Query. Since these are not per-segment, I only want scorer.Score(collector) called once, not once for each segment in my index. Essentially what happens now if I have a search is it collects the same documents N times, 1 time for each segment. Is there anyway to combine different Scorers/Collectors such that I can control when it enumerates collection by multiple sub-readers, and when not to? This all worked in previous version of Lucene because enumerating sub-indexes (segments) was pushed to a lower level inside Lucene API and not it is elevated to a higher level. Thanks Bob On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: I found the problem. The problem is that I have a custom query optimizer, and that replaces certain TermQuery's within a Boolean query with a custom Query and this query has its own weight/scorer that retrieves matching documents from an in-memory cache (and that is not Lucene backed). But it looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper which assumes Collect() needs called for multiple segments - so it is adding a start offset to the doc ID that comes from my custom query implementation. I looked at the new Collector class and it seems it works the same way (assumes it needs to set the next index reader with some offset). How can I make my custom query work with the new API (so that there is basically a single segment in RAM that my query uses, but still other query clauses in same boolean query use multiple lucene segments)? I am sure that is not clear and will try to provide more detail soon. Thanks Bob On Jun 9, 2011, at 1:48 PM, Digy wrote: Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the problem. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 8:40 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about
[Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob
[Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I have a Lucene index created with Lucene.Nethttp://Lucene.Net/ 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Nethttp://Lucene.Net/ 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Nethttp://Lucene.Net/ 1.9 compatible with 2.9.2? If not (and I assume it is not), is there some way I can convert existing indexes? (in production we have many indexes containing about 200 million docs so I'd much rather convert existing indexes than rebuilt them). Thanks Bob
Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I found the problem. The problem is that I have a custom query optimizer, and that replaces certain TermQuery's within a Boolean query with a custom Query and this query has its own weight/scorer that retrieves matching documents from an in-memory cache (and that is not Lucene backed). But it looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper which assumes Collect() needs called for multiple segments - so it is adding a start offset to the doc ID that comes from my custom query implementation. I looked at the new Collector class and it seems it works the same way (assumes it needs to set the next index reader with some offset). How can I make my custom query work with the new API (so that there is basically a single segment in RAM that my query uses, but still other query clauses in same boolean query use multiple lucene segments)? I am sure that is not clear and will try to provide more detail soon. Thanks Bob On Jun 9, 2011, at 1:48 PM, Digy wrote: Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the problem. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 8:40 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=