Re: Lucene.Net 2.1 build 3 as "Release Candidate"
I have been using it in production for a while now. It seems very solid. If I had a vote I would say lets mark it as final and move on to v2.2. Jeff On Dec 10, 2007 9:47 PM, George Aroush <[EMAIL PROTECTED]> wrote: > Hi folks, > > I just labeled Lucene.Net 2.1 build 3 as "Release Candidate" in SVN. I > also > added to the "tags". Unless if anyone has objection, I want to label this > as "Final" and start committing into the "trunk" the code base for 2.2. > > Please let me know if you have any objection to this plan. > > Regards, > > -- George > >
Re: .NET 2.0 with Lucene.NET
We initially started running v1.4.3 with the 1.1 framework and have since migrated over to the 2.0 framework without difficulty. There are some differences down in the guts, but nothing that can't be handled in conversion. Our index environment sounds very similar to yours - highly structured and relatively high volatility. In addition, we have quick turnaround on reflected changes from the original datasource (db) being reflected in the search index. I would suggest initially focusing your time with understanding how the file formats (compound vs. non-compound) and parameters like mergefactor and maxmergedocs affect your specific index creation and production. You may have already done this, but I found that efficiency levels changed with structural index changes, i.e. decisions about field population and settings. Depending on your available system resources, I've also noticed considerable performance degradation when an index passes a certain size threshold, i.e. 300MB on the given system I'm working with. (We break our aggregate index out to multiple individual indexes for the best mix of indexing and search performance.) Hope this helps. -- jeff r. On 4/26/06, Rob Tucker <[EMAIL PROTECTED]> wrote: > > Thanks for the quick response George, > > > > I was assuming that 1.4 was .NET 1.1 compliant, is this not the case? > Generally, I've struggled to find information about Lucene.NET support > for .NET. Is there somewhere where I can find it? > > > > We're using Lucene.NET for a slightly unusual searching implementation > where it's holding information about highly structured documents in an > environment where there is the potential for a fairly high degree of > volatility in document contents and lifetimes. The main issues we've > found with regard to efficiency is in adding and removing large number > of documents can take a long time, merging and optimising the index is > the particular challenge. I've really got a general concern that > switching to .NET2.0 might affect this efficiency and robustness for any > number of reasons, say in changing the implementation of data > structures, file access classes or plain bugs in the framework! From > experience, I don't tend to believe the MS marketing hype, I've always > hit upgrade problems when change OS version, runtimes etc, I don't > expect .NET 2.0 to be any different. > > > > In general, we've got great results with Lucene.NET, I'm really looking > for an initial feel for what to expect with the move to 2.0. Is there > information anywhere about the required changes that you've mentioned? > > > > To be honest, I've struggled to find much information about Lucene.NET > on the ASF site, is this the main page?: > http://incubator.apache.org/projects/lucene.net.html If so, the WIKI > seems to be down. > > > > Thanks, > > > > Rob Tucker. > > > > _ > > From: George Aroush [mailto:[EMAIL PROTECTED] > Sent: 26 April 2006 13:42 > To: Rob Tucker > Subject: RE: .NET 2.0 with Lucene.NET > > > > Hi Rob, > > > > With few changes, you can get Lucene.Net to run on .NET 2.0 -- others > have done it. as you may know, I am finishing off 1.9 which will be > .NET 1.1 compliant. After which, I will be releasing Lucene.Net 2.0 > which will be .NET 2.0 compliant. > > > > Sorry, I don't have numbers to show you if Lucene.Net is faster or more > stable under .NET 2.0. But I am curious, what are your concerns of > Lucene.Net 1.4.3 in regards to robustness and efficiency? > > > > Regards, > > > > -- George > > > > PS: Please subscribe to Lucene.Net mailing list at ASF and post > questions like those there for the whole community to pitch in on. > > > > _ > > From: Rob Tucker [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 26, 2006 6:32 AM > To: George Aroush > Subject: .NET 2.0 with Lucene.NET > > Hi, > > > > Do you have any information about how Lucene.Net runs with .NET2.0? We > have a .NET 1.1 project that we'd like to upgrade and would like to get > some initial confidence about how Lucene.NET will run. We're using 1.4 > at the moment. I'm concerned with both robustness and efficiency. > > > > Regards, > > > > Rob Tucker > > > > [EMAIL PROTECTED] > > >
Re:
Hi Ali - Please send these messages to [EMAIL PROTECTED] The dev mailing list is for Lucene.Net developers within Apache, not general developers using Lucene.Net. By parsing the expression, what input/output are you looking for? Can you provide a sample? -- j On 5/3/06, Ali Khawaja <[EMAIL PROTECTED]> wrote: Hi - Can anyone tell me if I can use Lucene as Boolean expression parser. I need to handle the parsed tree by myself. Thanks Ali
Re: Compression Implementation
Looking at this from a bit broader perspective, this opens up a bigger conversation. While working to implement a third-party hook-by-reflection process into the code, the .NET 2.0 framework already contains the appropriate classes to handle compression. While there's a need for .NET 1.1 compliance, doing so with a round-about method seems more like an exception approach vs. a standard approach. I don't mean to suggest that usage for the 1.1 Framework be abandoned; I'm sure there is greater 1.1 usage out in the world as opposed to 2.0. However, jumping through hoops to support 1.1 is also just a stopgap. I know there is a plan to move to the 2.0 Framework later on when the java-based Lucene project hits its 2.0 definition. Would it be worthwhile to consider a side-by-side port to the 2.0Framework? I ported 1.4.3 to the 2.0 Framework myself last winter, and it has changed a few underlying things as well as improved several core classes. Having used the 2.0 Framework for the past 6 months, I would strongly suggest we consider this as a possible solution. Thoughts? -- j On 5/11/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Johnny, I have to keep Lucene.Net 1.9 .NET 1.1 compliant. Since .NET 1.1 doesn't have compression API, I couldn't implement this port -- thus, I left it out. My idea on how to resolve this is to use reflection and through reflection, one can integrate a 3rd party compression into Lucene.Net 1.9. If you want to take on this part, please do and submit your code. Your effort will be more then welcome and is a path to becoming a committer for Lucene.Net. Regards, -- George Aroush -Original Message- From: J C [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 10, 2006 7:51 PM To: lucene-net-dev@incubator.apache.org Subject: Compression Implementation Importance: High Hello George I have found this: // {{Aroush-1.9}} for .NET 1.1, we can use reflection and ZLib? in FieldsWriter.cs. It seems that the ZIP compression is not yet implemented. I would like to give it a try. Please confirm. Regards Johnny _ Be the one of the first to try the NEW Windows Live Mail. http://ideas.live.com/programPage.aspx?versionId=5d21c51a-b161-4314-9b0e-491 1fb2b2e6d
Re: Compression Implementation
Does "compatible" equal the ability for a Java implementation of Lucene to open/read/write to an index created in Lucene.Net? On 5/15/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Jeff, We need compression support in Lucene.Net 1.9 using .NET 1.1 otherwise 1.9 can't be declared compatible with it's Java based index. Beside, doing reflection to provide a plug-in solution to a 3rd party compression isn't hard. Eyal already asked if he can work on this part. I said yes but I have not heard back from him yet. Eyal: If you are reading this, please let us know if you are taking on this task or not. Thanks! Regards, -- George Aroush -----Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Monday, May 15, 2006 12:32 PM To: lucene-net-dev@incubator.apache.org Subject: Re: Compression Implementation Looking at this from a bit broader perspective, this opens up a bigger conversation. While working to implement a third-party hook-by-reflection process into the code, the .NET 2.0 framework already contains the appropriate classes to handle compression. While there's a need for .NET 1.1 compliance, doing so with a round-about method seems more like an exception approach vs. a standard approach. I don't mean to suggest that usage for the 1.1 Framework be abandoned; I'm sure there is greater 1.1 usage out in the world as opposed to 2.0. However, jumping through hoops to support 1.1 is also just a stopgap. I know there is a plan to move to the 2.0 Framework later on when the java-based Lucene project hits its 2.0 definition. Would it be worthwhile to consider a side-by-side port to the 2.0Framework ? I ported 1.4.3 to the 2.0 Framework myself last winter, and it has changed a few underlying things as well as improved several core classes. Having used the 2.0 Framework for the past 6 months, I would strongly suggest we consider this as a possible solution. Thoughts? -- j On 5/11/06, George Aroush <[EMAIL PROTECTED]> wrote: > > Hi Johnny, > > I have to keep Lucene.Net 1.9 .NET 1.1 compliant. Since .NET 1.1 > doesn't have compression API, I couldn't implement this port -- thus, > I left it out. > > My idea on how to resolve this is to use reflection and through > reflection, one can integrate a 3rd party compression into Lucene.Net > 1.9. If you want to take on this part, please do and submit your > code. Your effort will be more then welcome and is a path to becoming > a committer for Lucene.Net. > > Regards, > > -- George Aroush > > -Original Message- > From: J C [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 10, 2006 7:51 PM > To: lucene-net-dev@incubator.apache.org > Subject: Compression Implementation > Importance: High > > Hello George > > I have found this: > // {{Aroush-1.9}} for .NET 1.1, we can use reflection and ZLib? > in FieldsWriter.cs. It seems that the ZIP compression is not yet > implemented. > > I would like to give it a try. Please confirm. > > Regards > Johnny > > _ > Be the one of the first to try the NEW Windows Live Mail. > > http://ideas.live.com/programPage.aspx?versionId=5d21c51a-b161-4314-9b > 0e-491 > 1fb2b2e6d > >
Re: Compression Implementation
George - thanks for the clarification. -- j On 5/15/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Jeff, Yes, "compatible" does mean the index can be open/read/write/etc. to when created with Java/C# Lucene. This is already is the case with 1.4.x and must remain so for 1.9 and forward. In fact, right now you can have two processes, one Java and another .NET Lucene where both concurrently accessing the same index as long as they are sharing the same lock file. Regards, -- George Aroush -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Monday, May 15, 2006 4:43 PM To: lucene-net-dev@incubator.apache.org Subject: Re: Compression Implementation Does "compatible" equal the ability for a Java implementation of Lucene to open/read/write to an index created in Lucene.Net? On 5/15/06, George Aroush <[EMAIL PROTECTED]> wrote: > > Hi Jeff, > > We need compression support in Lucene.Net 1.9 using .NET 1.1 otherwise > 1.9 can't be declared compatible with it's Java based index. Beside, > doing reflection to provide a plug-in solution to a 3rd party > compression isn't hard. > > Eyal already asked if he can work on this part. I said yes but I have > not heard back from him yet. > > Eyal: If you are reading this, please let us know if you are taking on > this task or not. Thanks! > > Regards, > > -- George Aroush > > -Original Message- > From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] > Sent: Monday, May 15, 2006 12:32 PM > To: lucene-net-dev@incubator.apache.org > Subject: Re: Compression Implementation > > Looking at this from a bit broader perspective, this opens up a bigger > conversation. > > While working to implement a third-party hook-by-reflection process > into the code, the .NET 2.0 framework already contains the appropriate > classes to handle compression. While there's a need for .NET 1.1 > compliance, doing so with a round-about method seems more like an > exception approach vs. a standard approach. > > I don't mean to suggest that usage for the 1.1 Framework be abandoned; > I'm sure there is greater 1.1 usage out in the world as opposed to 2.0. > However, jumping through hoops to support 1.1 is also just a stopgap. > I know there is a plan to move to the 2.0 Framework later on when the > java-based Lucene project hits its 2.0 definition. > > Would it be worthwhile to consider a side-by-side port to the > 2.0Framework ? > I ported > 1.4.3 to the 2.0 Framework myself last winter, and it has changed a > few underlying things as well as improved several core classes. > Having used the 2.0 Framework for the past 6 months, I would strongly > suggest we consider this as a possible solution. > > Thoughts? > > -- j > > On 5/11/06, George Aroush <[EMAIL PROTECTED]> wrote: > > > > Hi Johnny, > > > > I have to keep Lucene.Net 1.9 .NET 1.1 compliant. Since .NET 1.1 > > doesn't have compression API, I couldn't implement this port -- > > thus, I left it out. > > > > My idea on how to resolve this is to use reflection and through > > reflection, one can integrate a 3rd party compression into > > Lucene.Net 1.9. If you want to take on this part, please do and > > submit your code. Your effort will be more then welcome and is a > > path to becoming a committer for Lucene.Net. > > > > Regards, > > > > -- George Aroush > > > > -Original Message- > > From: J C [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, May 10, 2006 7:51 PM > > To: lucene-net-dev@incubator.apache.org > > Subject: Compression Implementation > > Importance: High > > > > Hello George > > > > I have found this: > > // {{Aroush-1.9}} for .NET 1.1, we can use reflection and ZLib? > > in FieldsWriter.cs. It seems that the ZIP compression is not yet > > implemented. > > > > I would like to give it a try. Please confirm. > > > > Regards > > Johnny > > > > _ > > Be the one of the first to try the NEW Windows Live Mail. > > > > http://ideas.live.com/programPage.aspx?versionId=5d21c51a-b161-4314- > > 9b > > 0e-491 > > 1fb2b2e6d > > > > > >
Re: Compression Implementation
I like Eyal's suggestion in keeping the adapter definition to implementing an interface. This would be initiated through a reflection call, yes? I would add that the configuration information could be driven via custom config sections, which I've done a bazillion of lately. If it would help, I'll do the code for custom configuration sections that ensure the requisite data is loaded from the config file in a structured manner. -- j On 5/15/06, Eyal Post <[EMAIL PROTECTED]> wrote: What I was thinking of doing is this: Declare an interface for compression: public interface CompressionAdapter { byte[] Compress(byte[] input); byte[] Uncompress(byte[] input); } Allow users to develop an adapter that implements this interface (i.e. SharpZLibCompressionAdapter). The user then add the adapter class name to the app.config file and Lucene will dynamically create an instance of that adapter. This means there's no actual dependency from Lucene to any 3rd party library. If the adapter is not configured, compression will not work, if it is - this it's the user responsibility to provide the compression library and an adapter. Eyal > -Original Message- > From: George Aroush [mailto:[EMAIL PROTECTED] > Sent: Monday, May 15, 2006 23:48 PM > To: lucene-net-dev@incubator.apache.org > Subject: RE: Compression Implementation > > Hi Eyal, > > First thanks for taking on this task, it's much appreciated. > > The reason why I believe we need a reflection base solution > is because the current code of Lucene.Net must remain > independent of 3rd party requirement. > For example, if you look at the Test code for Lucene.Net, you > can't compile or run it without having NUnit be installed on > your machine and have the code reference the library. If > your solution will have similar requirement, then I don't > think we can accept it for 1.9. > > Reflection seems to me is the only way to save this problem. > > After putting together the reflection code in Lucene.Net's > code base, we still have to provide an interface which a user > much code to in order for the compression code to be in a > working order and utilized by Lucene.Net. > But this code, because it's not a physical part of > Lucene.Net, it doesn't put any restriction on Lucene.Net to > require a 3rd party library/code to be present to use > Lucene.Net -- unless if the user wants compression. > > Again, thanks for taking on this task. > > Regards, > > -- George Aroush > > -Original Message- > From: Eyal Post [mailto:[EMAIL PROTECTED] > Sent: Monday, May 15, 2006 4:24 PM > To: lucene-net-dev@incubator.apache.org > Subject: RE: Compression Implementation > > I'm on it. > Just wondering, why take the reflection way and not the interface way? > Interface way seems more "correct" and will also perform better. > > Eyal > > > > -Original Message- > > From: George Aroush [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 15, 2006 21:54 PM > > To: lucene-net-dev@incubator.apache.org > > Subject: RE: Compression Implementation > > > > Hi Jeff, > > > > We need compression support in Lucene.Net 1.9 using .NET > 1.1 otherwise > > 1.9 can't be declared compatible with it's Java based > index. Beside, > > doing reflection to provide a plug-in solution to a 3rd party > > compression isn't hard. > > > > Eyal already asked if he can work on this part. I said yes > but I have > > not heard back from him yet. > > > > Eyal: If you are reading this, please let us know if you > are taking on > > this task or not. Thanks! > > > > Regards, > > > > -- George Aroush > > > > -Original Message- > > From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 15, 2006 12:32 PM > > To: lucene-net-dev@incubator.apache.org > > Subject: Re: Compression Implementation > > > > Looking at this from a bit broader perspective, this opens > up a bigger > > conversation. > > > > While working to implement a third-party hook-by-reflection process > > into the code, the .NET 2.0 framework already contains the > appropriate > > classes to handle compression. > > While there's a need for .NET 1.1 compliance, doing so with a > > round-about method seems more like an exception approach vs. > > a standard approach. > > > > I don't mean to suggest that usage for the 1.1 Framework be > abandoned; > > I'm sure there is greater 1.1 usage out in the world as opposed to > > 2.0. > > However, jumping through hoops to support 1.1 is also jus
Re: Ranking and Scoring Hits
Hi Ed - For future reference, these questions are intended to be directed to the lucene-net-user mailing list. When iterating on your Hits collection, call Hits.Score() the same way you call Hits.Doc() -- by passing it the index value (int) for your loop iteration. On 5/18/06, Ed Jones <[EMAIL PROTECTED]> wrote: Hi, I've only just downloaded Lucene.net and I've been doing some initial work on it. I've written an indexer and got a test button working to run queries to find the results. So far I like it and it's wonderfully fast, however, I'm trying to return the score for each returned result, mainly so I can tell how relevant result 1 is over result 2. I've looked though the documentation and can't find how to do this. Can anyone give me a pointer? Also can somebody confirm that the default search results are in relevance order? Thanks Ed
Re: noobie question
Hi Pamela - Performance certainly changes as your index grows, and it's not even necessarily a linear progression. How you indexed your data, compression factors, compound vs. loose file format, number of indexes, etc. all play a part in affecting search performance at runtime. There are a lot of places to look for improvements. I would suggest looking at your specific indexes and see if you can break those up into smaller indexes -- this will lead you to the MultiSearcher (and, if you have multi-processor hardware, ParallelMultiSearcher). Leave your index updating operation out of the picture for the moment. Indexing can have a big impact on search performance, so take that out of the equation. After you're able to get to better runtime search performance, go back and add indexing to the mix. I can tell you from experience that most search systems with indexes of substantial size are executing indexing operations on separate systems to avoid performance impacts. Hope this helps. -- j On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: I have been developing a C# search solution for an application which has tens of millions of web pages. Most of these web pages are under 1 k. While our initial pilot was very encouraging on our tests of 1,000,000 docs, when we scaled up to 10 million subsecond searches are now taking 8-10 seconds. Where should I focus my efforts to increase search speed? Should I be using the RAMDirectory? MultiSearcher? We only have one machine right now which serves indexing and searching. TIA Pam
Re: noobie question
The Compound file format is the default file format for the index that you create (at least in v1.4.x). When creating an index, you can specify true/false in a constructor that indicates if you wish the index file to be compacted or not. Check out http://lucene.apache.org/java/docs/fileformats.html to understand this better. When you're index gets to be of significant size, the file format can become very important. Using the default compound format, searching will tend to be faster (assuming all other things equal) but index updates will be slower; vice versa, searching may be slower but index updates can be faster. There are three other properties that can affect the mix as well: mergefactor, minmergedocs, and maxmergedocs. Tweaking these properties in conjunction with the file format settings grows in importance as your index size increases. Check out the thread at http://www.gossamer-threads.com/lists/lucene/java-user/11999?search_string=minmergedocs;#11999 . -- j On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: Thanks Jeff, I am a little confused by the compound vs loose file format you speak of. We are indexing html docs and indexing 10 metatags. By indexing I mean we index the body, but we also query the properties. I am not sure what the correct definition is. Are you saying that if we were merely indexing the document bodies we would be further ahead? We need to restrict our searches by date, and a few other properties, so its really important that we be able to do these restrictions. TIA Pam On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > > Hi Pamela - > > Performance certainly changes as your index grows, and it's not even > necessarily a linear progression. How you indexed your data, compression > factors, compound vs. loose file format, number of indexes, etc. all play > a > part in affecting search performance at runtime. > > There are a lot of places to look for improvements. I would suggest > looking > at your specific indexes and see if you can break those up into smaller > indexes -- this will lead you to the MultiSearcher (and, if you have > multi-processor hardware, ParallelMultiSearcher). > > Leave your index updating operation out of the picture for the moment. > Indexing can have a big impact on search performance, so take that out of > the equation. After you're able to get to better runtime search > performance, go back and add indexing to the mix. I can tell you from > experience that most search systems with indexes of substantial size are > executing indexing operations on separate systems to avoid performance > impacts. > > Hope this helps. > > -- j > > > > On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: > > > > I have been developing a C# search solution for an application which has > > tens of millions of web pages. Most of these web pages are under 1 k. > > > > While our initial pilot was very encouraging on our tests of 1,000,000 > > docs, > > when we scaled up to 10 million subsecond searches are now taking 8-10 > > seconds. > > > > Where should I focus my efforts to increase search speed? Should I be > > using > > the RAMDirectory? MultiSearcher? > > > > We only have one machine right now which serves indexing and searching. > > > > TIA > > > > Pam > > > > > >
Re: noobie question
Yes, the merge parameters does affect indexing performance, but compactness also affects search performance as your index gets larger. As you incrementally update the index, the fragmentation effect (which the merge properties will dictate) causes performance degradation at search time. As for index size, I don't know about any hard and fast rules. We have about 7-8GB of indexes of varying structure, and those are spread out over about 40 indexes. We try to keep individual indexes below 300MB, as the operational hassles after that size seem to be more burdensome. We also use distributed searching so our indexes are allocated across multiple machines (no duplication). As a rule, we also try to stay below 2.5GB of aggregate indexes on one machine. Our indexes are a full corpus and we must search across all indexes all the time. You can structure your indexes more effectively if you don't need to search the full corpus all the time. With multiple indexes being searched collectively, you'll soon be using the MultiSearcher class. Be sure to look at MultiReader, as it makes a difference in search performance (nice caching). -- j On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: Hi Jeff A couple more questions. Don't the merge parameters determine how aggressively the index is compacted? And if so, doesn't this affect only indexing performance and not search performance? Secondly how large should each index be? Should I be partitioning the indexes, ie by date range? So one index for Decemeber 2005, one for January, etc? Or is it done by size? TIA Pam On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > > Hi Pamela - > > Performance certainly changes as your index grows, and it's not even > necessarily a linear progression. How you indexed your data, compression > factors, compound vs. loose file format, number of indexes, etc. all play > a > part in affecting search performance at runtime. > > There are a lot of places to look for improvements. I would suggest > looking > at your specific indexes and see if you can break those up into smaller > indexes -- this will lead you to the MultiSearcher (and, if you have > multi-processor hardware, ParallelMultiSearcher). > > Leave your index updating operation out of the picture for the moment. > Indexing can have a big impact on search performance, so take that out of > the equation. After you're able to get to better runtime search > performance, go back and add indexing to the mix. I can tell you from > experience that most search systems with indexes of substantial size are > executing indexing operations on separate systems to avoid performance > impacts. > > Hope this helps. > > -- j > > > > On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: > > > > I have been developing a C# search solution for an application which has > > tens of millions of web pages. Most of these web pages are under 1 k. > > > > While our initial pilot was very encouraging on our tests of 1,000,000 > > docs, > > when we scaled up to 10 million subsecond searches are now taking 8-10 > > seconds. > > > > Where should I focus my efforts to increase search speed? Should I be > > using > > the RAMDirectory? MultiSearcher? > > > > We only have one machine right now which serves indexing and searching. > > > > TIA > > > > Pam > > > > > >
Re: noobie question
Correct on our configuration, give or take a few 100 MB. :-) And we have three servers accessed simultaneously for each search. For our index, we're dealing with information that's geographically defined, so our indexes are broken up along those lines. We still monitor each index for size, but the geographic data drives our index maintenance logic. We've indexed approximately 20 MM rows of information. Our partitioning criteria serves two purposes: query efficiency and index maintainability. Depending on how your index is structured (the Lucene settings + your own document structure), these two can compete with each other to the point of being polar. Generally you'll want to find a happy medium between the two. While we have many rows of data and our index documents contain quite a few fields of data, many of them are simple data fields that aren't large (database is the data source). By contrast, if we were indexing full-on text documents, I'm sure our index would be substantially larger and we'd likely take a different approach. I did a lot of research prior to constructing our index, and with as much feedback and data that I could glean, trial-and-error proved to be the most effective manner in determining what to do and how to do it. -- j On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: OK, I'm very confused here Jeff. It sound like what you are suggesting is that you have multiple indexes per machine, each around 300 Mbyes, which means about 2.5/.3 = 8 indexes per machine, and you have 7.5/2.5 =3 machines in the mix. Is this correct? On what criteria do you partition your index? Date, or some other criteria, or is it merely size? I think we have indexed 1 million rows and our index is 7 Gigs. Pam On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > > Yes, the merge parameters does affect indexing performance, but > compactness > also affects search performance as your index gets larger. As you > incrementally update the index, the fragmentation effect (which the merge > properties will dictate) causes performance degradation at search time. > > As for index size, I don't know about any hard and fast rules. We have > about 7-8GB of indexes of varying structure, and those are spread out over > about 40 indexes. We try to keep individual indexes below 300MB, as the > operational hassles after that size seem to be more burdensome. We also > use > distributed searching so our indexes are allocated across multiple > machines > (no duplication). As a rule, we also try to stay below 2.5GB of aggregate > indexes on one machine. Our indexes are a full corpus and we must search > across all indexes all the time. You can structure your indexes more > effectively if you don't need to search the full corpus all the time. > > With multiple indexes being searched collectively, you'll soon be using > the > MultiSearcher class. Be sure to look at MultiReader, as it makes a > difference in search performance (nice caching). > > -- j > > On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: > > > > Hi Jeff > > > > A couple more questions. Don't the merge parameters determine how > > aggressively the index is compacted? And if so, doesn't this affect only > > indexing performance and not search performance? > > > > Secondly how large should each index be? Should I be partitioning the > > indexes, ie by date range? So one index for Decemeber 2005, one for > > January, > > etc? Or is it done by size? > > > > TIA > > > > Pam > > > > On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > > > > > > Hi Pamela - > > > > > > Performance certainly changes as your index grows, and it's not even > > > necessarily a linear progression. How you indexed your data, > > compression > > > factors, compound vs. loose file format, number of indexes, etc. all > > play > > > a > > > part in affecting search performance at runtime. > > > > > > There are a lot of places to look for improvements. I would suggest > > > looking > > > at your specific indexes and see if you can break those up into > smaller > > > indexes -- this will lead you to the MultiSearcher (and, if you have > > > multi-processor hardware, ParallelMultiSearcher). > > > > > > Leave your index updating operation out of the picture for the moment. > > > Indexing can have a big impact on search performance, so take that out > > of > > > the equation. After you're able to get to better runtime search > > > performance, go back and add indexing to the mix. I can tell you
Re: noobie question
- Our index is currently 7 Gigs. I take it we should have more than 7 Gigs or RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs? You can go with big RAM, but I haven't found that to be a huge boost in search perf. We run dual-proc Xeons for our search servers, as CPU has been the bottleneck. Sorts are particularly egregious when it comes to CPU load as well. Bang for the buck, running the new dual-core Opterons are *amazingly* strong performers. - Each html doc we have has 10 metatags which we store. Other than date, and a 10 byte string for one of the metatags, the metatags are almost always empty. Will this degrade performance? I would not expect this to degrade your performance. - Also when you suggest we distribute our index, on what criteria do we partition? It looks like we need to optimize our IO for reads which means raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps cache it in ram (file system cache) by issuing warm up queries? The faster your disk, the better. And yes, warm-up queries are a big help. In our instance, warm up queries need to be logically distributed to hit all the searchers. On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: Hi George Our index is currently 7 Gigs. I take it we should have more than 7 Gigs or RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs? Each html doc we have has 10 metatags which we store. Other than date, and a 10 byte string for one of the metatags, the metatags are almost always empty. Will this degrade performance? Also when you suggest we distribute our index, on what criteria do we partition? It looks like we need to optimize our IO for reads which means raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps cache it in ram (file system cache) by issuing warm up queries? BTW - we will be running on the wintel platform using c#. TIA Pam On 5/19/06, George Aroush <[EMAIL PROTECTED]> wrote: > > Hi Pam, > > You also need to investigate your hardware configuration. Beside the > usual > of having a fast CPU and max out your memory, make sure have a fast hard > drive. > > As a Lucene index grows, anything you do with Lucene becomes I/O bound, > thus > a fast hard drive is critical. Simply moving from 5400rpm to 7200rpm will > give you a noticeable difference -- switch to a fast SCSI/RAID hard rive > and > you will even see better results. And yet even better, if you distribute > your index across multiple hard-drives/portions. > > One other thing to look for, are you storing any data in your Lucene > index? > If so, consider not doing it. The goal is to keep the index size as small > as possible to reduce I/O. > > Good luck. > > -- George Aroush > > -Original Message- > From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] > Sent: Friday, May 19, 2006 4:28 PM > To: lucene-net-dev@incubator.apache.org > Subject: Re: noobie question > > Yes, the merge parameters does affect indexing performance, but > compactness > also affects search performance as your index gets larger. As you > incrementally update the index, the fragmentation effect (which the merge > properties will dictate) causes performance degradation at search time. > > As for index size, I don't know about any hard and fast rules. We have > about 7-8GB of indexes of varying structure, and those are spread out over > about 40 indexes. We try to keep individual indexes below 300MB, as the > operational hassles after that size seem to be more burdensome. We also > use > distributed searching so our indexes are allocated across multiple > machines > (no duplication). As a rule, we also try to stay below 2.5GB of aggregate > indexes on one machine. Our indexes are a full corpus and we must search > across all indexes all the time. You can structure your indexes more > effectively if you don't need to search the full corpus all the time. > > With multiple indexes being searched collectively, you'll soon be using > the > MultiSearcher class. Be sure to look at MultiReader, as it makes a > difference in search performance (nice caching). > > -- j > > On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: > > > > Hi Jeff > > > > A couple more questions. Don't the merge parameters determine how > > aggressively the index is compacted? And if so, doesn't this affect > > only indexing performance and not search performance? > > > > Secondly how large should each index be? Should I be partitioning the > > indexes, ie by date range? So one index for Decemeber 2005, one for > > January, etc? Or is it done by size? > > > > TIA > > > > Pam > > > > On 5/19/06, Jeff Rodenburg <[E
Re: noobie question
You could certainly load a 7gb index into memory, given sufficient hardware running 64-bit Windows. That said, I wouldn't suggest trying to carry a single 7gb index in a single server's memory. Keeping an index below a 2Gb threshold is only treating a symptom and isn't really sustainable if your index is already in the 7Gb range. The issue at hand is dealing with the indexed data as efficiently as possible. Following George's suggestion for stripping the index down, i.e. just using searchable entities, is one possible approach. In our situation, we have quite a few fields of data that would be performance hits elsewhere on our system to retrieve at search run-time, so the lesser evil is to include them in our index. Just depends on your requirements to determine what's best. Likewise, monitoring your hardware statistics for bottlenecks aren't invalid, but I doubt you'll be able to make the modifications necessary to achieve the results you'd like to see on hardware config changes alone. Based on the conversation we've had thus far and a few assumptions on my part, I doubt you'll be able to keep your search times anywhere near the thresholds you'd like to see. You can help yourself with reduced index size, tweaked hardware configurations, and indexing strategies, but there is no silver bullet here. If my experiences hold true for you, you'll end up addressing each of these areas as your look for efficiencies of scale. -- j On 5/22/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Pam and Jeff, You can't load 7Gb of index into memory. A typical Windows application can't access more then 2Gb of RAM -- so if a machine has 8Gg and only Lucene is running chance are that you still have a lot of real memory not being used. You need to investigate and find out why your index grew to 7Gb and reduce it's size. For example, are you storing any data in Lucene's index? If so, consider not doing so. Monitor your CPU and see that it is being max'ed out or not. Chance are that it is and if queries are still taking log to run then your focus should be on disk I/O. Regards, -- George Aroush -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Saturday, May 20, 2006 11:18 AM To: lucene-net-dev@incubator.apache.org Subject: Re: noobie question - Our index is currently 7 Gigs. I take it we should have more than 7 Gigs or RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs? You can go with big RAM, but I haven't found that to be a huge boost in search perf. We run dual-proc Xeons for our search servers, as CPU has been the bottleneck. Sorts are particularly egregious when it comes to CPU load as well. Bang for the buck, running the new dual-core Opterons are *amazingly* strong performers. - Each html doc we have has 10 metatags which we store. Other than date, and a 10 byte string for one of the metatags, the metatags are almost always empty. Will this degrade performance? I would not expect this to degrade your performance. - Also when you suggest we distribute our index, on what criteria do we partition? It looks like we need to optimize our IO for reads which means raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps cache it in ram (file system cache) by issuing warm up queries? The faster your disk, the better. And yes, warm-up queries are a big help. In our instance, warm up queries need to be logically distributed to hit all the searchers. On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: > > Hi George > > Our index is currently 7 Gigs. I take it we should have more than 7 > Gigs or RAM on our machine? Can we get any other hardware specs? IE 2, > 4 procs? > > Each html doc we have has 10 metatags which we store. Other than date, > and a 10 byte string for one of the metatags, the metatags are almost > always empty. Will this degrade performance? > > Also when you suggest we distribute our index, on what criteria do we > partition? It looks like we need to optimize our IO for reads which > means raid 5 or a solid state ram drive to me. Is this correct? Could > we perhaps cache it in ram (file system cache) by issuing warm up queries? > > BTW - we will be running on the wintel platform using c#. > > TIA > > Pam > > > On 5/19/06, George Aroush <[EMAIL PROTECTED]> wrote: > > > > Hi Pam, > > > > You also need to investigate your hardware configuration. Beside > > the usual of having a fast CPU and max out your memory, make sure > > have a fast hard drive. > > > > As a Lucene index grows, anything you do with Lucene becomes I/O > > bound, thus a fast hard drive is critical. Simply moving from > > 5400rpm to 7200rpm > will > > give you a noticeable difference -- switc
Re: Error during indexing process
Hi Soormash - This sounds like a corrupt index. I've seen this with an index that wasn't properly closed or an indexing update didn't complete entirely. Try using the Luke index interrogation tool (Java app) for evaluating your index and see if it's still readable. -- j On 5/22/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Soormash, I have posted your question to the mailing list. Please subscribe to the list so you can post directly. See: http://incubator.apache.org/projects/lucene.net.html for instructions on how to subscribe. Thanks. -- George Aroush -Original Message- From: Soormasher Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, May 21, 2006 5:01 PM To: lucene-net-dev@incubator.apache.org Subject: Error during indexing process Hello there I'm using Lucene 1.4.3 to index database records (about 100,000 or so). Till yesterday, everything was going fine and I didn't have any problems in indexing. Today, out of nowhere, I've started getting the following error: Cannot rename segments.new to segments In some cases, it's the same error with the word 'delete' instead of rename. Sometimes the above error occurs after indexing 20,000 records, other times after 2000. I've tried using StopAnalyzer (which is what I'm using by default) and StandardAnalyzer but get the same problem with both. Any suggestions/workarounds for this. I really need some inputs on this. Thanks! soormash __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: noobie question
Hi Pam - I am confused, what do you mean by storing data in my index? (George, correct me if I'm wrong here.) What George is referring to is the different manners in which data can be included in an index. Take a look at the Field class and you'll notice a series of static methods that store data in a number of ways. The static methods define four different ways to include data in an index -- Keyword, Unindexed, Unstored, and Text. These are just wrapper definitions for indexing, storing and tokenizing index information. "Indexing" means including data with a field that would be searchable. "Storing" means including data with a field for presentation. "Tokenizing" means using analyzed data with a field that's been designated as indexed (searchable). For the four static methods: Keyword - values are indexed (searchable) and stored but not tokenized Unindexed - values are stored but not indexed or tokenized Unstored - values are indexed and tokenized (searchable) but not stored Text - values are indexed, tokenized and stored In making decisions about index composition, choose the field storage method that best matches the need for your particular data field. The fewer data fields you need, the smaller the index, the better the performance. Thanks to you and Jeff for all of your help! I really appreciate it! That's why the list is here. :-) -- j On 5/23/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: Hi George I am confused, what do you mean by storing data in my index? Thanks to you and Jeff for all of your help! I really appreciate it! Pam On 5/22/06, George Aroush <[EMAIL PROTECTED]> wrote: > > Hi Pam and Jeff, > > You can't load 7Gb of index into memory. A typical Windows application > can't access more then 2Gb of RAM -- so if a machine has 8Gg and only > Lucene > is running chance are that you still have a lot of real memory not being > used. > > You need to investigate and find out why your index grew to 7Gb and reduce > it's size. For example, are you storing any data in Lucene's index? If > so, > consider not doing so. > > Monitor your CPU and see that it is being max'ed out or not. Chance are > that it is and if queries are still taking log to run then your focus > should > be on disk I/O. > > Regards, > > -- George Aroush > > > -Original Message- > From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] > Sent: Saturday, May 20, 2006 11:18 AM > To: lucene-net-dev@incubator.apache.org > Subject: Re: noobie question > > - Our index is currently 7 Gigs. I take it we should have more than 7 Gigs > or RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs? > > You can go with big RAM, but I haven't found that to be a huge boost in > search perf. We run dual-proc Xeons for our search servers, as CPU has > been > the bottleneck. Sorts are particularly egregious when it comes to CPU > load > as well. Bang for the buck, running the new dual-core Opterons are > *amazingly* strong performers. > > - Each html doc we have has 10 metatags which we store. Other than date, > and > a 10 byte string for one of the metatags, the metatags are almost always > empty. Will this degrade performance? > > I would not expect this to degrade your performance. > > - Also when you suggest we distribute our index, on what criteria do we > partition? It looks like we need to optimize our IO for reads which means > raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps > cache it in ram (file system cache) by issuing warm up queries? > > The faster your disk, the better. And yes, warm-up queries are a big > help. > In our instance, warm up queries need to be logically distributed to hit > all > the searchers. > > > On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote: > > > > Hi George > > > > Our index is currently 7 Gigs. I take it we should have more than 7 > > Gigs or RAM on our machine? Can we get any other hardware specs? IE 2, > > 4 procs? > > > > Each html doc we have has 10 metatags which we store. Other than date, > > and a 10 byte string for one of the metatags, the metatags are almost > > always empty. Will this degrade performance? > > > > Also when you suggest we distribute our index, on what criteria do we > > partition? It looks like we need to optimize our IO for reads which > > means raid 5 or a solid state ram drive to me. Is this correct? Could > > we perhaps cache it in ram (file system cache) by issuing warm up > queries? > > > > BTW - we will be running on the wintel platform using c#. > > > > TIA > > > > Pam > > > > >
Fwd: Lucene 2.0.0 release available
Below is a recent message from the Java dev list for Lucene. As it states, this is mostly a bugfix release against the 1.9 code. The development path that's been suggested is that we develop the 1.9release on the 1.1 Framework and that we would cut over to the 2.0 Framework with the 2.0Lucene release. I believe this is fine, but we need to begin porting the Java 2.0 release soon. The Java 1.9 release was considered complete some time last fall. The time divide between the Java release and our C# port is growing and is getting longer. Not to take away from the 1.9 efforts on the 1.1 Framework, I'm going to proceed on porting the Java 2.0 release to C# under the 2.0 Framework. If there are a substantial number of bugfixes in the 2.0 release, we should make use of that as well. Questions or comments welcome. cheers, jeff r. -- Forwarded Message -- Subject: Lucene 2.0.0 release available Date: Samstag 27 Mai 2006 05:57 From: Doug Cutting <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Release 2.0.0 of Lucene is now available from: http://www.apache.org/dyn/closer.cgi/lucene/java/ This is mostly a bugfix release from release 1.9.1. Note however that deprecated 1.x features have now been removed. Any code that compiles against Lucene 1.9.1 without deprecation warnings should work without further changes with any 2.x release. The detailed change log is at: http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_0_0/CHANGES.txt Doug
Re: Lucene 2.0.0 release available
George - I hear your concern about the 1.9 release not being finished. I will take point with you on the reason that it's so far behind the Java version. It wasn't until recently (February) that the code was posted (the Alpha version). The lists with Apache didn't come online until April. Even as such, the process of evaluating the code, finding a bug or improvement, making a suggestion and returning it to the community has really been nothing more than emailing you. I know you're busy like all the rest of us, but this process had to run directly through yourself for a very long time. I frankly believe that many people were very ready to jump in and get the thing rolling, but were frustrated at the process and the bottlenecks that came with it and gave up. Sour grapes to the community response because you're now ready for participation is not the fault of the community. However, that's not my reason for suggesting review of the 2.0 Java codebase. The fact of the matter is that the time difference between the Java release and the C# port is growing. The value in that time difference is knowledge of known issues with the prior release (1.9) and how to deal with it (fixes in 2.0). The Java mailing list has already identified bugs to be fixed with their release marked 2.0. If there are bugs in the 1.9release of Java, chances are those same bugs will appear in the C# port. The Java community has already worked those out, and I'd like to take advantage of those improvements. Additionally, looking at a C# port under the 2.0 Framework has significant differences in things like threading and exception handling, as well as taking advantage of performance improvements like generics. I will echo George's request to finish the 1.9 release. I'm not sure there's any value in the claim of a 1.9 release any more than a non-complete 1.9 release. Nonetheless, I've received some offers to help review the 2.0release, and will respond to those people privately. cheers, jeff r. On 5/29/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Jeff and all, We must finish 1.9 before working on 2.0 otherwise, there is no guaranty that 2.0 will not end up with the same fait as 1.9. Lets face it, 1.9 has been behind it's Java version release mainly because despite my repeated call for help to finishing it off (even back at SourceForge.net) I have yet to receive any help. For 1.9, unlike 1.3 and 1.4 releases, NO ONE, has stepped up and offered to help (except recently for Eyal's compression code.) As you can tell, I am frustrated with this. Because despite not getting any help, I am getting private emails where folks asking me that they want to become a committer on ASF for Lucene.Net -- when I pointed them to http://incubator.apache.org/learn/newcommitters.html I don't hear back!! So please folks, lets first finish off 1.9. Take a look at the current source code and comment on the lines that I have questions on. Those are found by searching for the text "Aroush". This past weekend, I have finished the port of the Test code for 1.9 and it is running. About 40% of the tests are failing and some were due to bug in the 1.9 code and the others due to bug in the port of the Test code. In a day or two I will release code on ASF and again will be asking for help to finish off 1.9. To sum-up, I don't support that we do any work on 2.0 until when we have 1.9 done, otherwise, not only will we have an incomplete 1.9 but 2.0 might end up like 1.9, incomplete -- and thus, we will now have two incomplete releases instead of one. 1.9 is very close to being "final" -- lets work together to finish it off and use this opportunity to become a committer on ASF. Regards, -- George Aroush -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Saturday, May 27, 2006 1:02 PM To: lucene-net-dev@incubator.apache.org; lucene-net-user@incubator.apache.org Subject: Fwd: Lucene 2.0.0 release available Below is a recent message from the Java dev list for Lucene. As it states, this is mostly a bugfix release against the 1.9 code. The development path that's been suggested is that we develop the 1.9release on the 1.1 Framework and that we would cut over to the 2.0 Framework with the 2.0Lucene release. I believe this is fine, but we need to begin porting the Java 2.0 release soon. The Java 1.9 release was considered complete some time last fall. The time divide between the Java release and our C# port is growing and is getting longer. Not to take away from the 1.9 efforts on the 1.1 Framework, I'm going to proceed on porting the Java 2.0 release to C# under the 2.0 Framework. If there are a substantial number of bugfixes in the 2.0 release, we should make use of that as well. Questions or comments welcome. cheers, jeff r. -- Forwarded Message -- Subject: Luc
Re: Lucene 2.0.0 release available
I understand your frustration, but if the community is not reaching out to participate, then the approach needs to improve. I'm certain the ASF can help, but the logistical stuff has to be there. For example, we need the code base under version control and the how for participation needs to be spelled out. I'm not an open source community guru, but my participation on other projects has certainly increased because I understood what I could do and how to go about it. Right now, our sales pitch consists of "please help" and it's not moving anyone to action. Just a suggestion, but maybe a more granular list of what's needed to finish 1.9 might improve participation. As for my own participation, I have cycles to put into review, but not with the 1.1 Framework. I have other projects that rely on Lucene.Net and those projects use the 2.0 Framework, so strictly speaking for myself, I have an interest in that side of the equation. It doesn't help the crowd with the 1.9 release, but neither does the 1.9 release help me in my short-term needs. I've run the 1.4.3 version of Lucene.Net under both the 1.1Framework and the 2.0 Framework, and the differences just in the Framework code are not insubstantial. It doesn't have to be an all-or-nothing, top-down directive approach. For as many people as there are on the 1.1 Framework, I've talked to plenty others who have migrated to the 2.0 Framework. For us, the sooner we can get the latest release up and running, the better. So as to not dissuade attention from the 1.9 release, I'll keep any conversation about the 2.0 release and the 2.0 Framework off the list. cheers, jeff r. On 5/30/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Jeff and all, I was the central point because there was no one else and we needed a way to coordinate the project. With 1.3 and 1.4 when I asked for help, folks asked which CS files they can take on and they delivered. For 1.9 release, (which by the way was first released back on May 26, 2005 -- yes, I did say "2005") despite my repeated calls for help, non were made. So I don't think people were ready to jump in, they just weren't around, busy or lost interest; I hope things will change now that Lucene.Net is at ASF but so far that hasn't been the case so I am disappointed. Now coming back to your suggestion of working on 2.0. If you have the cycles to review the 2.0 code base, why not put those cycles to finish off 1.9? Anything that was fixed in Java's release of 1.9 must be fixed in Lucene.Net 1.9 release -- in fact, I would suggest that we look at 1.9.1 release. Beside, the Java release of 2.0 is just compliant with Java 5.0. The value for us to have 1.9 (or 1.9.1) release is the support for .NET 1.1. Not releasing 1.9 is like Java Lucene 1.9 not support Java 1.3 (did I got the Java ver right?!) In addition, keep in mind that Lucene.Net 1.9 isn't that far off from being "final". Thus, if we get 1.9 out, it shouldn't be hard to get 2.0 out. Best regards, -- George Aroush -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 30, 2006 11:36 AM To: lucene-net-dev@incubator.apache.org Cc: lucene-net-user@incubator.apache.org; [EMAIL PROTECTED] Subject: Re: Lucene 2.0.0 release available George - I hear your concern about the 1.9 release not being finished. I will take point with you on the reason that it's so far behind the Java version. It wasn't until recently (February) that the code was posted (the Alpha version). The lists with Apache didn't come online until April. Even as such, the process of evaluating the code, finding a bug or improvement, making a suggestion and returning it to the community has really been nothing more than emailing you. I know you're busy like all the rest of us, but this process had to run directly through yourself for a very long time. I frankly believe that many people were very ready to jump in and get the thing rolling, but were frustrated at the process and the bottlenecks that came with it and gave up. Sour grapes to the community response because you're now ready for participation is not the fault of the community. However, that's not my reason for suggesting review of the 2.0 Java codebase. The fact of the matter is that the time difference between the Java release and the C# port is growing. The value in that time difference is knowledge of known issues with the prior release (1.9) and how to deal with it (fixes in 2.0). The Java mailing list has already identified bugs to be fixed with their release marked 2.0. If there are bugs in the 1.9release of Java, chances are those same bugs will appear in the C# port. The Java community has already worked those out, and I'd like to take advantage of those improvements. Additionally, looking at a C# port under the 2.0 Framework has
Re: Remote searches with Lucene
Hello all - I've been watching this thread to follow the direction and thought I might be able to offer some assistance. I run a search system that involves 4 separate search servers -- 3 serving search objects via RemoteSearchable, and a 4th that serves in an index updating role. The codebase for Lucene.Net provides all the library routines one needs to provide distributed search capabilities, but does not provide facilities for distributed search operation -- nor should it. The ideas presented here are certainly possible; I've implemented a working operation without requiring the changes described here. I'm confident in our implementation; for the calendar year, our uptime/availability of search services is 99.99%. Our only outage was related to network hardware, otherwise we're sitting solid at 100%. I've been authorized to provide our operational code for distributed search under Lucene.Net to the community at large. Some of the code is customized to our operation, but for the most part it's rather generic. We started the project under Lucene v1.4.3, but the operational aspect still applies under v1.9. The system consists of a LuceneServer, which provides searchability against indexes as defined in XML configuration files. In addition, an IndexUpdateServer provides master index updating, master/slave index replication and automated index maintenance. Integration with our web site ensures the index stays available, updated and current. There's a great deal of applied knowledge and learned behavior of many of the underlying sub-system components that distributed search under Lucene.Net makes use of -- .Net remoting, garbage collection, etc. If anyone has interest, please reply. Contributing this code requires a little cleanup of our customization work, so my response may not be immediate but I would make efforts to release the code in short order. thanks, jeff r. On 8/19/06, Robert Boulanger <[EMAIL PROTECTED]> wrote: Hi Elena, hi Rest, > Dear All, > > The application I am working on is intended to make use of the > distributed search capabilities of the Lucene library. While trying to > work with the Lucene's RemoteSearchable class, I faced some problems > cased by the current Lucene implementation. In following I'll try to > describe them, as well as the possible ways of their solution, I > identified. The most important question for me is, if these changes > have a chance to be integrated in the coming Lucene versions, such > that remote searches would really become feasible. I would appreciate > any feedback. Same problem for me and I found some more issues which I explain below: > > The first problem concerns the construction of the RemoteSearchable > object. .Net framework allows for both, server and client activation > models of the remote objects. Currently, RemoteSearchable class > possesses only one constructor that requires knowledge of a local > Searchable object: > > public RemoteSearchable(Lucene.Net.Search.Searchable local) > I just added a new constructor to RemoteSearchable public RemoteSearchable(): base() { this.local = this.local; } not the fine method but for me it works so far. > Since this "local" object is located on the server, knowledge of the > server's index paths is needed for its creation. However, there are at > least some scenarios where only the server, but not the client, knows > where the indexes are stored on the server side. I think this problem > could be solved by extending RemoteSearchable class with a standard > constructor that reads the names of the indexes to be published out of > a configuration file on the server side. > My "Server" now implements a Class which inherits directly from Remote Searchable. in the parameterless constructor there I read the server sided configfile which contains the index location , create a new IndexReader and pass it as Argument to MyBase.New() See sample below. > 2. Bug in Term construction [snip] This whole chapter was very useful and I can commit everything works fine from there on. But there is still a bug in FieldDocSortedHitQueue line 130 and below: I figured out that the castings are not working when the system is running in a non english globalization context. The String in docAFields[i] which might be for example 1.345678 is casted to 1345678.0 since the decimal sign is misinterpreted in German systems as it seems. So the casting results in an overflow. So I changed it as follows: case SortField.SCORE: float r1 = (float)Convert.ToSingle(docA.fields[i], System.Globalization.NumberFormatInfo.InvariantInfo); float r2 = (float)Convert.ToSingle(docA.fields[i], System.Globalization.NumberFormatInfo.InvariantInfo); if (r1 > r2) c = - 1; if (r1 < r2) c = 1; break; Same in line 172 and 174: float f1 = (float)Convert.ToSingle(do
Re: Remote searches with Lucene
That's great, thanks George. Perfect place to park the code. I've received quite a few requests today, mostly off-list. I'll start prepping the code for contribution. I have some internal/proprietary things to pull out, but mostly just need to document it better so that it makes sense (different code running in different places. I'll start with this tonight, and try to get something out in the next few days. cheers, jeff r. On 8/21/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Jeff, If you want to contribute the code, I am sure many can benefit from it. I can make it part of the "contribute" code base of Lucene.Net and share it here: https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/contrib/ Regards, -- George -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Monday, August 21, 2006 12:11 PM To: lucene-net-dev@incubator.apache.org Cc: Elena Demidova Subject: Re: Remote searches with Lucene Hello all - I've been watching this thread to follow the direction and thought I might be able to offer some assistance. I run a search system that involves 4 separate search servers -- 3 serving search objects via RemoteSearchable, and a 4th that serves in an index updating role. The codebase for Lucene.Net provides all the library routines one needs to provide distributed search capabilities, but does not provide facilities for distributed search operation -- nor should it. The ideas presented here are certainly possible; I've implemented a working operation without requiring the changes described here. I'm confident in our implementation; for the calendar year, our uptime/availability of search services is 99.99%. Our only outage was related to network hardware, otherwise we're sitting solid at 100%. I've been authorized to provide our operational code for distributed search under Lucene.Net to the community at large. Some of the code is customized to our operation, but for the most part it's rather generic. We started the project under Lucene v1.4.3, but the operational aspect still applies under v1.9. The system consists of a LuceneServer, which provides searchability against indexes as defined in XML configuration files. In addition, an IndexUpdateServer provides master index updating, master/slave index replication and automated index maintenance. Integration with our web site ensures the index stays available, updated and current. There's a great deal of applied knowledge and learned behavior of many of the underlying sub-system components that distributed search under Lucene.Net makes use of -- .Net remoting, garbage collection, etc. If anyone has interest, please reply. Contributing this code requires a little cleanup of our customization work, so my response may not be immediate but I would make efforts to release the code in short order. thanks, jeff r. On 8/19/06, Robert Boulanger <[EMAIL PROTECTED]> wrote: > > Hi Elena, hi Rest, > > > Dear All, > > > > The application I am working on is intended to make use of the > > distributed search capabilities of the Lucene library. While trying > > to work with the Lucene's RemoteSearchable class, I faced some > > problems cased by the current Lucene implementation. In following > > I'll try to describe them, as well as the possible ways of their > > solution, I identified. The most important question for me is, if > > these changes have a chance to be integrated in the coming Lucene > > versions, such that remote searches would really become feasible. I > > would appreciate any feedback. > > Same problem for me and I found some more issues which I explain below: > > > > > The first problem concerns the construction of the RemoteSearchable > > object. .Net framework allows for both, server and client activation > > models of the remote objects. Currently, RemoteSearchable class > > possesses only one constructor that requires knowledge of a local > > Searchable object: > > > > public RemoteSearchable(Lucene.Net.Search.Searchable local) > > > I just added a new constructor to RemoteSearchable public > RemoteSearchable(): base() { this.local = this.local; } > > not the fine method but for me it works so far. > > > Since this "local" object is located on the server, knowledge of the > > server's index paths is needed for its creation. However, there are > > at least some scenarios where only the server, but not the client, > > knows where the indexes are stored on the server side. I think this > > problem could be solved by extending RemoteSearchable class with a > > standard constructor that reads the names of the indexes to be > > published out of a configuration file on the server side. >
Re: Remote searches with Lucene
Just a follow-up to everyone on this topic. I received a lot of offlist mail about this, so this message has a rather wide distribution. I'm in process of modifying the code for our distributed search components so that they're generic enough for general usage and public consumption. This is taking a little of my time, but nonetheless I expect to complete it soon. As for distributing the code, it will be located in the contrib portion of the Lucene.Net repository at apache.org. There is some logistic work involved, but ideally this is moving forward. As soon as I have more information to relay, I'll pass it along to the list. cheers, jeff r. On 8/21/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: Hello all - I've been watching this thread to follow the direction and thought I might be able to offer some assistance. I run a search system that involves 4 separate search servers -- 3 serving search objects via RemoteSearchable, and a 4th that serves in an index updating role. The codebase for Lucene.Net provides all the library routines one needs to provide distributed search capabilities, but does not provide facilities for distributed search operation -- nor should it. The ideas presented here are certainly possible; I've implemented a working operation without requiring the changes described here. I'm confident in our implementation; for the calendar year, our uptime/availability of search services is 99.99%. Our only outage was related to network hardware, otherwise we're sitting solid at 100%. I've been authorized to provide our operational code for distributed search under Lucene.Net to the community at large. Some of the code is customized to our operation, but for the most part it's rather generic. We started the project under Lucene v1.4.3, but the operational aspect still applies under v1.9. The system consists of a LuceneServer, which provides searchability against indexes as defined in XML configuration files. In addition, an IndexUpdateServer provides master index updating, master/slave index replication and automated index maintenance. Integration with our web site ensures the index stays available, updated and current. There's a great deal of applied knowledge and learned behavior of many of the underlying sub-system components that distributed search under Lucene.Net makes use of -- .Net remoting, garbage collection, etc. If anyone has interest, please reply. Contributing this code requires a little cleanup of our customization work, so my response may not be immediate but I would make efforts to release the code in short order. thanks, jeff r. On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote: > > Hi Elena, hi Rest, > > > Dear All, > > > > The application I am working on is intended to make use of the > > distributed search capabilities of the Lucene library. While trying to > > work with the Lucene's RemoteSearchable class, I faced some problems > > cased by the current Lucene implementation. In following I'll try to > > describe them, as well as the possible ways of their solution, I > > identified. The most important question for me is, if these changes > > have a chance to be integrated in the coming Lucene versions, such > > that remote searches would really become feasible. I would appreciate > > any feedback. > > Same problem for me and I found some more issues which I explain below: > > > > > The first problem concerns the construction of the RemoteSearchable > > object. .Net framework allows for both, server and client activation > > models of the remote objects. Currently, RemoteSearchable class > > possesses only one constructor that requires knowledge of a local > > Searchable object: > > > > public RemoteSearchable(Lucene.Net.Search.Searchable local) > > > I just added a new constructor to RemoteSearchable > public RemoteSearchable(): base() > { > this.local = this.local; > } > > not the fine method but for me it works so far. > > > Since this "local" object is located on the server, knowledge of the > > server's index paths is needed for its creation. However, there are at > > > least some scenarios where only the server, but not the client, knows > > where the indexes are stored on the server side. I think this problem > > could be solved by extending RemoteSearchable class with a standard > > constructor that reads the names of the indexes to be published out of > > a configuration file on the server side. > > > My "Server" now implements a Class which inherits directly from Remote > Searchable. > in the parameterless constructor there I read the server sided > configfile which contains the index location , create a new In
Re: Remote searches with Lucene
As promised, an update to the list. I have code ready for delivery, if I can get svn access to the contrib section. A request has been made for this but it's going nowhere, so I'm going to find another place to host the files. There's quite a bit of documentation behind this so I'm working diligently to explain how this works. If anyone has a place to hold the code until the uber-powers at apache decide to grant me access, we would greatly appreciate the assistance. cheers, jeff r. On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: Just a follow-up to everyone on this topic. I received a lot of offlist mail about this, so this message has a rather wide distribution. I'm in process of modifying the code for our distributed search components so that they're generic enough for general usage and public consumption. This is taking a little of my time, but nonetheless I expect to complete it soon. As for distributing the code, it will be located in the contrib portion of the Lucene.Net repository at apache.org. There is some logistic work involved, but ideally this is moving forward. As soon as I have more information to relay, I'll pass it along to the list. cheers, jeff r. On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: > > Hello all - > > I've been watching this thread to follow the direction and thought I > might be able to offer some assistance. I run a search system that involves > 4 separate search servers -- 3 serving search objects via RemoteSearchable, > and a 4th that serves in an index updating role. > > The codebase for Lucene.Net provides all the library routines one needs > to provide distributed search capabilities, but does not provide facilities > for distributed search operation -- nor should it. The ideas presented here > are certainly possible; I've implemented a working operation without > requiring the changes described here. I'm confident in our implementation; > for the calendar year, our uptime/availability of search services is > 99.99%. Our only outage was related to network hardware, otherwise > we're sitting solid at 100%. > > I've been authorized to provide our operational code for distributed > search under Lucene.Net to the community at large. Some of the code is > customized to our operation, but for the most part it's rather generic. We > started the project under Lucene v1.4.3, but the operational aspect > still applies under v1.9. > > The system consists of a LuceneServer, which provides searchability > against indexes as defined in XML configuration files. In addition, an > IndexUpdateServer provides master index updating, master/slave index > replication and automated index maintenance. Integration with our web site > ensures the index stays available, updated and current. There's a great > deal of applied knowledge and learned behavior of many of the underlying > sub-system components that distributed search under Lucene.Net makes use > of -- .Net remoting, garbage collection, etc. > > If anyone has interest, please reply. Contributing this code requires a > little cleanup of our customization work, so my response may not be > immediate but I would make efforts to release the code in short order. > > thanks, > jeff r. > > > > > On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote: > > > > Hi Elena, hi Rest, > > > > > Dear All, > > > > > > The application I am working on is intended to make use of the > > > distributed search capabilities of the Lucene library. While trying > > to > > > work with the Lucene's RemoteSearchable class, I faced some problems > > > > > cased by the current Lucene implementation. In following I'll try to > > > describe them, as well as the possible ways of their solution, I > > > identified. The most important question for me is, if these changes > > > have a chance to be integrated in the coming Lucene versions, such > > > that remote searches would really become feasible. I would > > appreciate > > > any feedback. > > > > Same problem for me and I found some more issues which I explain > > below: > > > > > > > > The first problem concerns the construction of the RemoteSearchable > > > object. .Net framework allows for both, server and client activation > > > models of the remote objects. Currently, RemoteSearchable class > > > possesses only one constructor that requires knowledge of a local > > > Searchable object: > > > > > > public RemoteSearchable(Lucene.Net.Search.Searchable local) > > > > > I just added a new constructor to Re
Re: Remote searches with Lucene
That's likely our only option for now. I believe George would need to do the posting; I'm not aware of anyone else with commit access. As long as the turnaround is rapid and it doesn't present an admin burden, I'm ok with it. -- j On 8/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: Jeff, I haven't heard anything back about your account request yet. (but the officialness of the vote is in question - so it may be a while) What about posting a .zip file to JIRA and having George or someone commit it on your behalf and submit patches from then on? Erik On Aug 26, 2006, at 10:23 PM, Jeff Rodenburg wrote: > As promised, an update to the list. > > I have code ready for delivery, if I can get svn access to the contrib > section. A request has been made for this but it's going nowhere, > so I'm > going to find another place to host the files. > > There's quite a bit of documentation behind this so I'm working > diligently > to explain how this works. If anyone has a place to hold the code > until the > uber-powers at apache decide to grant me access, we would greatly > appreciate > the assistance. > > cheers, > jeff r. > > > On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: >> >> Just a follow-up to everyone on this topic. I received a lot of >> offlist >> mail about this, so this message has a rather wide distribution. >> >> I'm in process of modifying the code for our distributed search >> components >> so that they're generic enough for general usage and public >> consumption. >> This is taking a little of my time, but nonetheless I expect to >> complete it >> soon. >> >> As for distributing the code, it will be located in the contrib >> portion of >> the Lucene.Net repository at apache.org. There is some logistic work >> involved, but ideally this is moving forward. >> >> As soon as I have more information to relay, I'll pass it along to >> the >> list. >> >> cheers, >> jeff r. >> >> >> >> >> On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: >> > >> > Hello all - >> > >> > I've been watching this thread to follow the direction and >> thought I >> > might be able to offer some assistance. I run a search system >> that involves >> > 4 separate search servers -- 3 serving search objects via >> RemoteSearchable, >> > and a 4th that serves in an index updating role. >> > >> > The codebase for Lucene.Net provides all the library routines >> one needs >> > to provide distributed search capabilities, but does not provide >> facilities >> > for distributed search operation -- nor should it. The ideas >> presented here >> > are certainly possible; I've implemented a working operation >> without >> > requiring the changes described here. I'm confident in our >> implementation; >> > for the calendar year, our uptime/availability of search >> services is >> > 99.99%. Our only outage was related to network hardware, otherwise >> > we're sitting solid at 100%. >> > >> > I've been authorized to provide our operational code for >> distributed >> > search under Lucene.Net to the community at large. Some of the >> code is >> > customized to our operation, but for the most part it's rather >> generic. We >> > started the project under Lucene v1.4.3, but the operational aspect >> > still applies under v1.9. >> > >> > The system consists of a LuceneServer, which provides searchability >> > against indexes as defined in XML configuration files. In >> addition, an >> > IndexUpdateServer provides master index updating, master/slave >> index >> > replication and automated index maintenance. Integration with >> our web site >> > ensures the index stays available, updated and current. There's >> a great >> > deal of applied knowledge and learned behavior of many of the >> underlying >> > sub-system components that distributed search under Lucene.Net >> makes use >> > of -- .Net remoting, garbage collection, etc. >> > >> > If anyone has interest, please reply. Contributing this code >> requires a >> > little cleanup of our customization work, so my response may not be >> > immediate but I would make efforts to release the code in short >> order. >> > >> > thanks, >> > jeff r. >> > &
Re: Remote searches with Lucene
Hi Saurabh - Thanks for your offer of help. SVN or FTP is probably the best situation. I would expect some feedback and suggestions for improvement to the original code base, and I need to be able to revise it (assuming I stay the source author) in rather short order. There's been a suggestion to basically have garoush upload the code on my behalf to the contrib section at apache. If that can get turned around quickly, I might go that route. -- j On 8/26/06, Saurabh Dani <[EMAIL PROTECTED]> wrote: Hi Jeff, What type of "place to hold" are you looking at? Is simple "FTP" site enough or are you looking at some kind of SVN ? CVS? Thanks Saurabh Date: Sat, 26 Aug 2006 19:23:27 -0700 From: "Jeff Rodenburg" <[EMAIL PROTECTED]> To: lucene-net-dev@incubator.apache.org Subject: Re: Remote searches with Lucene As promised, an update to the list. I have code ready for delivery, if I can get svn access to the contrib section. A request has been made for this but it's going nowhere, so I'm going to find another place to host the files. There's quite a bit of documentation behind this so I'm working diligently to explain how this works. If anyone has a place to hold the code until the uber-powers at apache decide to grant me access, we would greatly appreciate the assistance. cheers, jeff r. On 8/23/06, Jeff Rodenburg wrote: > > Just a follow-up to everyone on this topic. I received a lot of offlist > mail about this, so this message has a rather wide distribution. > > I'm in process of modifying the code for our distributed search components > so that they're generic enough for general usage and public consumption. > This is taking a little of my time, but nonetheless I expect to complete it > soon. > > As for distributing the code, it will be located in the contrib portion of > the Lucene.Net repository at apache.org. There is some logistic work > involved, but ideally this is moving forward. > > As soon as I have more information to relay, I'll pass it along to the > list. > > cheers, > jeff r. > > > > > On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: > > > > Hello all - > > > > I've been watching this thread to follow the direction and thought I > > might be able to offer some assistance. I run a search system that involves > > 4 separate search servers -- 3 serving search objects via RemoteSearchable, > > and a 4th that serves in an index updating role. > > > > The codebase for Lucene.Net provides all the library routines one needs > > to provide distributed search capabilities, but does not provide facilities > > for distributed search operation -- nor should it. The ideas presented here > > are certainly possible; I've implemented a working operation without > > requiring the changes described here. I'm confident in our implementation; > > for the calendar year, our uptime/availability of search services is > > 99.99%. Our only outage was related to network hardware, otherwise > > we're sitting solid at 100%. > > > > I've been authorized to provide our operational code for distributed > > search under Lucene.Net to the community at large. Some of the code is > > customized to our operation, but for the most part it's rather generic. We > > started the project under Lucene v1.4.3, but the operational aspect > > still applies under v1.9. > > > > The system consists of a LuceneServer, which provides searchability > > against indexes as defined in XML configuration files. In addition, an > > IndexUpdateServer provides master index updating, master/slave index > > replication and automated index maintenance. Integration with our web site > > ensures the index stays available, updated and current. There's a great > > deal of applied knowledge and learned behavior of many of the underlying > > sub-system components that distributed search under Lucene.Net makes use > > of -- .Net remoting, garbage collection, etc. > > > > If anyone has interest, please reply. Contributing this code requires a > > little cleanup of our customization work, so my response may not be > > immediate but I would make efforts to release the code in short order. > > > > thanks, > > jeff r. > > > > > > > > > > On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote: > > > > > > Hi Elena, hi Rest, > > > > > > > Dear All, > > > > > > > > The application I am working on is intended to make use of the > > > > distributed search capabilities of
Re: Remote searches with Lucene
have ASF licensed in every CS file you submit. No problem. I'll mimic the headers found in the Lucene.Net source. Let me know if it should be something different. Also, it would be nice to have an NUnit test written for it. I'll tackle this during the week. I've been updating the code to include proper comments throughout, as well as supporting documents for making it all work together. Is there a specific flavor of Nunit to look for, or is the most recent acceptable? cheers, jeff On 8/28/06, George Aroush <[EMAIL PROTECTED]> wrote: I have no problem adding the code to the SVN under contribute. Jeff: Just ZIP up the code, and submit it to the mailing list and I can do the rest. Make sure, like Erik said, to have ASF licensed in every CS file you submit. Also, it would be nice to have an NUnit test written for it. This will serve as a validation for the code as well as an example on how to use the code. Regards, -- George -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Sunday, August 27, 2006 7:36 PM To: lucene-net-dev@incubator.apache.org Subject: Re: Remote searches with Lucene I'm sure I have commit privileges also, and would be happy to apply an "svn add" for the initial dump and clean patches for a while, as long as the code is ASF licensed and George doesn't mind. It'd be better if he did so to vet it, as I'm not a .NET programmer. Erik On Aug 27, 2006, at 5:22 PM, Jeff Rodenburg wrote: > That's likely our only option for now. I believe George would need > to do > the posting; I'm not aware of anyone else with commit access. > As long as the turnaround is rapid and it doesn't present an admin > burden, > I'm ok with it. > > -- j > > > On 8/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: >> >> Jeff, >> >> I haven't heard anything back about your account request yet. (but >> the officialness of the vote is in question - so it may be a while) >> >> What about posting a .zip file to JIRA and having George or someone >> commit it on your behalf and submit patches from then on? >> >> Erik >> >> >> On Aug 26, 2006, at 10:23 PM, Jeff Rodenburg wrote: >> >> > As promised, an update to the list. >> > >> > I have code ready for delivery, if I can get svn access to the >> contrib >> > section. A request has been made for this but it's going nowhere, >> > so I'm >> > going to find another place to host the files. >> > >> > There's quite a bit of documentation behind this so I'm working >> > diligently >> > to explain how this works. If anyone has a place to hold the code >> > until the >> > uber-powers at apache decide to grant me access, we would greatly >> > appreciate >> > the assistance. >> > >> > cheers, >> > jeff r. >> > >> > >> > On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: >> >> >> >> Just a follow-up to everyone on this topic. I received a lot of >> >> offlist >> >> mail about this, so this message has a rather wide distribution. >> >> >> >> I'm in process of modifying the code for our distributed search >> >> components >> >> so that they're generic enough for general usage and public >> >> consumption. >> >> This is taking a little of my time, but nonetheless I expect to >> >> complete it >> >> soon. >> >> >> >> As for distributing the code, it will be located in the contrib >> >> portion of >> >> the Lucene.Net repository at apache.org. There is some >> logistic work >> >> involved, but ideally this is moving forward. >> >> >> >> As soon as I have more information to relay, I'll pass it along to >> >> the >> >> list. >> >> >> >> cheers, >> >> jeff r. >> >> >> >> >> >> >> >> >> >> On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: >> >> > >> >> > Hello all - >> >> > >> >> > I've been watching this thread to follow the direction and >> >> thought I >> >> > might be able to offer some assistance. I run a search system >> >> that involves >> >> > 4 separate search servers -- 3 serving search objects via >> >> RemoteSearchable, >> >> > and a 4th that serves in an index updating role. >> >&g
Remote searching with Lucene - project update
All - Another update on the remote searching application code that's been mentioned in this thread. I'm near completion of the entire collection of files that are needed for this project -- libraries, applications, unit tests, and documentation. There's quite a bit to this, and thanks for everybody's patience as I assemble the code into something that's less than confusing. There are several working pieces, so I'm packaging it for consumption. I expect to have this available sometime in the next few days, barring things like my life and regular job from getting in the way. Again, I'll share an announcement to the list when I've made the files available. Thanks, jeff r. On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: As promised, an update to the list. I have code ready for delivery, if I can get svn access to the contrib section. A request has been made for this but it's going nowhere, so I'm going to find another place to host the files. There's quite a bit of documentation behind this so I'm working diligently to explain how this works. If anyone has a place to hold the code until the uber-powers at apache decide to grant me access, we would greatly appreciate the assistance. cheers, jeff r. On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > > Just a follow-up to everyone on this topic. I received a lot of offlist > mail about this, so this message has a rather wide distribution. > > I'm in process of modifying the code for our distributed search > components so that they're generic enough for general usage and public > consumption. This is taking a little of my time, but nonetheless I expect > to complete it soon. > > As for distributing the code, it will be located in the contrib portion > of the Lucene.Net repository at apache.org . There is some logistic > work involved, but ideally this is moving forward. > > As soon as I have more information to relay, I'll pass it along to the > list. > > cheers, > jeff r. > > > > > On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: > > > > Hello all - > > > > I've been watching this thread to follow the direction and thought I > > might be able to offer some assistance. I run a search system that involves > > 4 separate search servers -- 3 serving search objects via RemoteSearchable, > > and a 4th that serves in an index updating role. > > > > The codebase for Lucene.Net provides all the library routines one > > needs to provide distributed search capabilities, but does not provide > > facilities for distributed search operation -- nor should it. The ideas > > presented here are certainly possible; I've implemented a working operation > > without requiring the changes described here. I'm confident in our > > implementation; for the calendar year, our uptime/availability of search > > services is 99.99%. Our only outage was related to network hardware, > > otherwise we're sitting solid at 100%. > > > > I've been authorized to provide our operational code for distributed > > search under Lucene.Net to the community at large. Some of the code > > is customized to our operation, but for the most part it's rather generic. > > We started the project under Lucene v1.4.3, but the operational aspect > > still applies under v1.9. > > > > The system consists of a LuceneServer, which provides searchability > > against indexes as defined in XML configuration files. In addition, an > > IndexUpdateServer provides master index updating, master/slave index > > replication and automated index maintenance. Integration with our web site > > ensures the index stays available, updated and current. There's a great > > deal of applied knowledge and learned behavior of many of the underlying > > sub-system components that distributed search under Lucene.Net makes > > use of -- .Net remoting, garbage collection, etc. > > > > If anyone has interest, please reply. Contributing this code requires > > a little cleanup of our customization work, so my response may not be > > immediate but I would make efforts to release the code in short order. > > > > thanks, > > jeff r. > > > > > > > > > > On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote: > > > > > > Hi Elena, hi Rest, > > > > > > > Dear All, > > > > > > > > The application I am working on is intended to make use of the > > > > distributed search capabilities of the Lucene library. While > > > trying to > > > > work with the Lucene'
Re: Lucene.Net Indexing Large Databases
Hi George - About a year ago we had a memory leak around some issues with the 1.4.3code. A few of us wrote some sample programs that manifested the error, but I was able to do a fair amount of sleuthing with Memprofiler ( http://memprofiler.com/). It's a pretty good tool for $100. -- j On 9/10/06, George Aroush <[EMAIL PROTECTED]> wrote: Hi Folks, Since last weekend, I have been trying to narrow down the problem to this memory leak without much of a luck. Does anyone have a tool (or could recommend one, without costing me $$) which hopefully show the source of the leak? Unlike C++ code, the leak here, obviously, is due to not releasing references to temporary or real objects. The trick is finding the object. This leak can be created with this simple code: public static void Main(System.String[] args) { IndexWriter diskIndex; Directory directory; Analyzeranalyzer; Documentdoc; int count; string indexDirectory; System.IO.FileInfo fi; indexDirectory = "C:\\Index.Bad"; fi = new System.IO.FileInfo(indexDirectory); directory = Lucene.Net.Store.FSDirectory.GetDirectory(fi, true); analyzer = new SimpleAnalyzer(); diskIndex = new IndexWriter(directory, analyzer, true); count = 0; while (count < 1) { doc = new Document(); diskIndex.AddDocument(doc); count++; } diskIndex.Close(); } This code will show a leak in 1.9, 1.9.1 and 2.0 but not 1.4.3. I also verified and it doesn't leak under the Java version of Lucene (2.0 is where I tested.) Regards, -- George -Original Message- From: George Aroush [mailto:[EMAIL PROTECTED] Sent: Friday, September 01, 2006 9:21 PM To: lucene-net-user@incubator.apache.org Subject: RE: Lucene.Net Indexing Large Databases Hi Chris, I am using 1.9.1 in production and I am not having this problem. Sorry, I don't have enough cycles to try your code on 1.9. This problem was reported on 1.4.x and was fixed. I am sure I carried it over to 1.9.x and 2.0 -- or maybe this is a new issue. I will double check when I get the cycles. You can get 1.4.3's source code as ZIP from the download site of Lucene.Net which is here: https://svn.apache.org/repos/asf/incubator/lucene.net/site/download/ or you can SVN the source code from here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/ Regards, -- George Aroush -Original Message- From: Chris David [mailto:[EMAIL PROTECTED] Sent: Friday, September 01, 2006 1:46 PM To: lucene-net-user@incubator.apache.org Subject: RE: Lucene.Net Indexing Large Databases Thanks René, so its not just me with this problem. Now where can I get a hold of this wonderful 1.4 Build of Lucene. Its not listed directly on Apache's Lucene.NET Page. I am anxious to see if my code actually does work. Thanks again for all your help, I really do appreciate it. Chris Snapstream Media -Original Message- From: René de Vries [mailto:[EMAIL PROTECTED] Sent: Friday, September 01, 2006 7:32 AM To: lucene-net-user@incubator.apache.org Subject: RE: Lucene.Net Indexing Large Databases Update: I didn't realize my earlier code example ran against 1.4. If I run this with 1.9final-005 build, I am experiencing the exact same problems as Chris mentions. Memory consumtion keeps growing, I had to kill it at 1.5Gb. Exact same code, but with a 1.4 version of the lucene.netDLL, and it runs along at 50Mb René
Remote searching with Lucene - forward progress
An update on the Remote Searching project I'm bringing forward. I've completed the base code for hand-off to the community. I'm presently working through a remoting/serialization issue that's popped up recently. This appears to be something new in the Lucene 2.0 release. I'm working through that issue now, but I haven no expectation of when that's resolved. Rather than release a non-working system, I'm going to resolve this problem first. Once things are working appropriately, I'll send out a release message. Thanks and if you have remoting experience and suggestions, feel free to ping me. :-) cheers, jeff r. On 9/7/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: All - Another update on the remote searching application code that's been mentioned in this thread. I'm near completion of the entire collection of files that are needed for this project -- libraries, applications, unit tests, and documentation. There's quite a bit to this, and thanks for everybody's patience as I assemble the code into something that's less than confusing. There are several working pieces, so I'm packaging it for consumption. I expect to have this available sometime in the next few days, barring things like my life and regular job from getting in the way. Again, I'll share an announcement to the list when I've made the files available. Thanks, jeff r. On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > > As promised, an update to the list. > > I have code ready for delivery, if I can get svn access to the contrib > section. A request has been made for this but it's going nowhere, so I'm > going to find another place to host the files. > > There's quite a bit of documentation behind this so I'm working > diligently to explain how this works. If anyone has a place to hold the > code until the uber-powers at apache decide to grant me access, we would > greatly appreciate the assistance. > > cheers, > jeff r. > > > > On 8/23/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: > > > > Just a follow-up to everyone on this topic. I received a lot of > > offlist mail about this, so this message has a rather wide distribution. > > > > I'm in process of modifying the code for our distributed search > > components so that they're generic enough for general usage and public > > consumption. This is taking a little of my time, but nonetheless I expect > > to complete it soon. > > > > As for distributing the code, it will be located in the contrib > > portion of the Lucene.Net repository at apache.org . There is some > > logistic work involved, but ideally this is moving forward. > > > > As soon as I have more information to relay, I'll pass it along to the > > list. > > > > cheers, > > jeff r. > > > > > > > > > > On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: > > > > > > Hello all - > > > > > > I've been watching this thread to follow the direction and thought I > > > might be able to offer some assistance. I run a search system that involves > > > 4 separate search servers -- 3 serving search objects via RemoteSearchable, > > > and a 4th that serves in an index updating role. > > > > > > The codebase for Lucene.Net provides all the library routines one > > > needs to provide distributed search capabilities, but does not provide > > > facilities for distributed search operation -- nor should it. The ideas > > > presented here are certainly possible; I've implemented a working operation > > > without requiring the changes described here. I'm confident in our > > > implementation; for the calendar year, our uptime/availability of search > > > services is 99.99%. Our only outage was related to network > > > hardware, otherwise we're sitting solid at 100%. > > > > > > I've been authorized to provide our operational code for distributed > > > search under Lucene.Net to the community at large. Some of the code > > > is customized to our operation, but for the most part it's rather generic. > > > We started the project under Lucene v1.4.3, but the operational > > > aspect still applies under v1.9. > > > > > > The system consists of a LuceneServer, which provides searchability > > > against indexes as defined in XML configuration files. In addition, an > > > IndexUpdateServer provides master index updating, master/slave index > > > replication and automated index maintenance. Integration with our web site >
Re: Remote searching with Lucene - forward progress
Hi Robert, et. al - No, I've not missed updating the list. I've been a bit busy with other things but have been working to resolve some serialization issues that are down in the core of .Net Remoting. The Lucene 2.0 codebase has been problematic inside of the remoting architecture. Rather than continue to update the list with notifications about a lack of progress, I've opted to attempt to address those issues and make an announcement when I'd reached success. So, no news for now. thanks, jeff On 12/3/06, Robert Boulanger <[EMAIL PROTECTED]> wrote: Hi Jeff, concerning the message thread below which I began in August this year, I wonder if there is any progress on your side so far. Maybe I missed something in the mailinglist (what I expect), since I was busy with other stuff, but the last note from you concerning remote search I find here was from september 13th. So, since I'm on this topic again, I just want to know, whether you released anything in the past months what I'm just not seeing or if you are still on the issue you are describing in your last note. thanks for replying best regards --Robert Jeff Rodenburg schrieb: > An update on the Remote Searching project I'm bringing forward. I've > completed the base code for hand-off to the community. I'm presently > working through a remoting/serialization issue that's popped up recently. > This appears to be something new in the Lucene 2.0 release. I'm working > through that issue now, but I haven no expectation of when that's > resolved. > > Rather than release a non-working system, I'm going to resolve this > problem > first. Once things are working appropriately, I'll send out a release > message. > > Thanks and if you have remoting experience and suggestions, feel free to > ping me. :-) > > cheers, > jeff r. > > > On 9/7/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: >> >> All - >> >> Another update on the remote searching application code that's been >> mentioned in this thread. I'm near completion of the entire >> collection of >> files that are needed for this project -- libraries, applications, unit >> tests, and documentation. There's quite a bit to this, and thanks for >> everybody's patience as I assemble the code into something that's >> less than >> confusing. There are several working pieces, so I'm packaging it for >> consumption. >> >> I expect to have this available sometime in the next few days, barring >> things like my life and regular job from getting in the way. Again, >> I'll >> share an announcement to the list when I've made the files available. >> >> Thanks, >> jeff r. >> >> >> >> On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: >> > >> > As promised, an update to the list. >> > >> > I have code ready for delivery, if I can get svn access to the contrib >> > section. A request has been made for this but it's going nowhere, >> so I'm >> > going to find another place to host the files. >> > >> > There's quite a bit of documentation behind this so I'm working >> > diligently to explain how this works. If anyone has a place to >> hold the >> > code until the uber-powers at apache decide to grant me access, we >> would >> > greatly appreciate the assistance. >> > >> > cheers, >> > jeff r. >> > >> > >> > >> > On 8/23/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote: >> > > >> > > Just a follow-up to everyone on this topic. I received a lot of >> > > offlist mail about this, so this message has a rather wide >> distribution. >> > > >> > > I'm in process of modifying the code for our distributed search >> > > components so that they're generic enough for general usage and >> public >> > > consumption. This is taking a little of my time, but nonetheless >> I expect >> > > to complete it soon. >> > > >> > > As for distributing the code, it will be located in the contrib >> > > portion of the Lucene.Net repository at apache.org . There is some >> > > logistic work involved, but ideally this is moving forward. >> > > >> > > As soon as I have more information to relay, I'll pass it along >> to the >> > > list. >> > > >> > > cheers, >> > > jeff r. >> > > >> > > >> > > >> >
Re: Solr for .NET
Thanks Erik. Vijay - porting Solr to C# would be rather extensive, on top of the Lucene-to-Lucene.Net port. Additionally, as Solr development progresses, dependencies get built into the Solr codebase that take advantage of development progress in java-based Lucene. Not to dissuade you from taking on the task, just to be aware of some of the complexities that could underly such an endeavor. Take a look at the SolrSharp library if you have the cycles. cheers, jeff r. On 8/2/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > Why port Solr? It is a "web service". Use Solr as-is and interface > with it through the SolrSharp API! > > <http://wiki.apache.org/solr/SolrSharp> > >Erik > > > On Aug 2, 2007, at 9:50 AM, Vijay Santhanam wrote: > > > Hi Lucenenites, > > > > > > > > Has anyone heard of any C# Solr ports? > > > > > > > > I'd like to continue coding a C# port like this in my spare time. > > > > > > > > Alternatively, if a Lucene.NET developer could give me some > > instructions on what to port, I'd be happy to contribute to > > Lucene.Net. I'm not sure where to begin. > > > > > > > > > > > > > > > > Vijay Santhanam > > B.Eng.(Soft.) > > Spectrum Wired - Software Engineer > > > > T: +61 2 4925 3266 > > F: +61 2 4925 3255 > > M: +61 407 525 087 > > W: www.spectrumwired.com > > > > > > > > > > > > Disclaimer: This email and any attached files are intended solely > > for the named addressee, are confidential and may contain legally > > privileged information. The copying or distribution of them or any > > information they contain, by anyone other than the addressee, is > > prohibited. If you have received this email in error, please let us > > know by telephone or return the email to the sender and destroy all > > copies. Thank you. > > > > > > > > > > > > > > > > > >
hadoop or similar C# implementation of map/reduce?
Has there ever been any discussion to port Hadoop to .NET as well? Or is anyone aware of a C# map/reduce project? thanks, j
[jira] Created: (LUCENENET-55) Documents.DateTools has issue creating a Date in StringToDate()
Documents.DateTools has issue creating a Date in StringToDate() --- Key: LUCENENET-55 URL: https://issues.apache.org/jira/browse/LUCENENET-55 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Attachments: DateTools.patch When using StringToDate(System.String dateString), it tries to create an invalid date with month and day = 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-55) Documents.DateTools has issue creating a Date in StringToDate()
[ https://issues.apache.org/jira/browse/LUCENENET-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-55: -- Attachment: DateTools.patch This patch resolves the issue and passes nunit tests. > Documents.DateTools has issue creating a Date in StringToDate() > --- > > Key: LUCENENET-55 > URL: https://issues.apache.org/jira/browse/LUCENENET-55 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff > Attachments: DateTools.patch > > > When using StringToDate(System.String dateString), it tries to create an > invalid date with month and day = 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-56) Incorrect file in TestLockFactory.RmDir()
Incorrect file in TestLockFactory.RmDir() - Key: LUCENENET-56 URL: https://issues.apache.org/jira/browse/LUCENENET-56 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Trivial When removing files, you don't need to add the path because it already exists in the filename. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-56) Incorrect file in TestLockFactory.RmDir()
[ https://issues.apache.org/jira/browse/LUCENENET-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-56: -- Attachment: TestLockFactory.patch Here is a patch to only use the full filename. a few more nunit tests pass now > Incorrect file in TestLockFactory.RmDir() > - > > Key: LUCENENET-56 > URL: https://issues.apache.org/jira/browse/LUCENENET-56 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Trivial > Attachments: TestLockFactory.patch > > > When removing files, you don't need to add the path because it already exists > in the filename. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-57) DocHelper in Tests not creating UTF8 Cleanly
DocHelper in Tests not creating UTF8 Cleanly Key: LUCENENET-57 URL: https://issues.apache.org/jira/browse/LUCENENET-57 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Attachments: DocHelper.patch DocHelper is used when performing unit tests. it is not encoding bytes correctly with UTF8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-57) DocHelper in Tests not creating UTF8 Cleanly
[ https://issues.apache.org/jira/browse/LUCENENET-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-57: -- Attachment: DocHelper.patch here is a patch that resolves the issue. > DocHelper in Tests not creating UTF8 Cleanly > > > Key: LUCENENET-57 > URL: https://issues.apache.org/jira/browse/LUCENENET-57 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff > Attachments: DocHelper.patch > > > DocHelper is used when performing unit tests. it is not encoding bytes > correctly with UTF8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-58) Issue in CheckHits c# doesn't perform an Assert against a hashtable
Issue in CheckHits c# doesn't perform an Assert against a hashtable --- Key: LUCENENET-58 URL: https://issues.apache.org/jira/browse/LUCENENET-58 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor Attachments: CheckHits.patch in CheckHits.CheckHitCollector() there is an assert to a hashtable. c# doesn't support this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-58) Issue in CheckHits c# doesn't perform an Assert against a hashtable
[ https://issues.apache.org/jira/browse/LUCENENET-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-58: -- Attachment: CheckHits.patch This patch loops through the hashtable performing an Assert. Also.. This fixed about 100 NUnit failures. > Issue in CheckHits c# doesn't perform an Assert against a hashtable > --- > > Key: LUCENENET-58 > URL: https://issues.apache.org/jira/browse/LUCENENET-58 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Priority: Minor > Attachments: CheckHits.patch > > > in CheckHits.CheckHitCollector() there is an assert to a hashtable. c# > doesn't support this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-58) Issue in CheckHits c# doesn't perform an Assert against a hashtable
[ https://issues.apache.org/jira/browse/LUCENENET-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-58: -- Attachment: CheckHits.patch2 Here is a new patch that fixes the problem the original fixed and another problem in this file. > Issue in CheckHits c# doesn't perform an Assert against a hashtable > --- > > Key: LUCENENET-58 > URL: https://issues.apache.org/jira/browse/LUCENENET-58 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Priority: Minor > Attachments: CheckHits.patch, CheckHits.patch2 > > > in CheckHits.CheckHitCollector() there is an assert to a hashtable. c# > doesn't support this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-59) QueryUtils has some invalid Asserts
QueryUtils has some invalid Asserts --- Key: LUCENENET-59 URL: https://issues.apache.org/jira/browse/LUCENENET-59 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query q2) and CheckUnequal(Query q1, Query q2). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts
[ https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-59: -- Attachment: QueryUtils.patch This patch fixes NUnit failures in QueryUtils. > QueryUtils has some invalid Asserts > --- > > Key: LUCENENET-59 > URL: https://issues.apache.org/jira/browse/LUCENENET-59 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: QueryUtils.patch > > > NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query > q2) and CheckUnequal(Query q1, Query q2). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts
[ https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-59: -- Attachment: QueryUtils.patch2 Opps.. first patch had more than just this fix in it. QueryUtils.patch2 has just this fix. > QueryUtils has some invalid Asserts > --- > > Key: LUCENENET-59 > URL: https://issues.apache.org/jira/browse/LUCENENET-59 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: QueryUtils.patch, QueryUtils.patch2 > > > NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query > q2) and CheckUnequal(Query q1, Query q2). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-61) Issue testing Backwards Compatibility
Issue testing Backwards Compatibility - Key: LUCENENET-61 URL: https://issues.apache.org/jira/browse/LUCENENET-61 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor NUnit tests fail because of non c# compliant tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-61) Issue testing Backwards Compatibility
[ https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-61: -- Attachment: TestBackwardsCompatibility.patch This test passes all Backward Compatibility NUnit Tests, removes the dependency of SupportClass and uses windows directory structure. It is also left disabled by default because of the dependency of SharpZipLib. > Issue testing Backwards Compatibility > - > > Key: LUCENENET-61 > URL: https://issues.apache.org/jira/browse/LUCENENET-61 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: TestBackwardsCompatibility.patch > > > NUnit tests fail because of non c# compliant tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-62) IndexReader.IndexExists() Fails if directory doesn't exists.
IndexReader.IndexExists() Fails if directory doesn't exists. Key: LUCENENET-62 URL: https://issues.apache.org/jira/browse/LUCENENET-62 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor There is no check to see if the Directory Exists before it checks for the index files, it just throws an error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-62) IndexReader.IndexExists() Fails if directory doesn't exists.
[ https://issues.apache.org/jira/browse/LUCENENET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-62: -- Attachment: IndexReader.patch This patch checks to see if the directory exists before it checks the Index. If the directory doesn't exists, it returns false. if (System.IO.Directory.Exists(directory.FullName)) { return SegmentInfos.GetCurrentSegmentGeneration(System.IO.Directory.GetFileSystemEntries(directory.FullName)) != -1; } else { return false; } > IndexReader.IndexExists() Fails if directory doesn't exists. > > > Key: LUCENENET-62 > URL: https://issues.apache.org/jira/browse/LUCENENET-62 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Priority: Minor > Attachments: IndexReader.patch > > > There is no check to see if the Directory Exists before it checks for the > index files, it just throws an error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-63) FieldCacheImpl tries to parse a float in f format.
FieldCacheImpl tries to parse a float in f format. --- Key: LUCENENET-63 URL: https://issues.apache.org/jira/browse/LUCENENET-63 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor Attachments: FieldCacheImpl.patch C# doesn't support F format for parsing floats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-63) FieldCacheImpl tries to parse a float in f format.
[ https://issues.apache.org/jira/browse/LUCENENET-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-63: -- Attachment: FieldCacheImpl.patch This fix trim's the string for f's at the end. public virtual float ParseFloat(System.String value_Renamed) { return System.Single.Parse(value_Renamed.TrimEnd('f')); } > FieldCacheImpl tries to parse a float in f format. > --- > > Key: LUCENENET-63 > URL: https://issues.apache.org/jira/browse/LUCENENET-63 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Priority: Minor > Attachments: FieldCacheImpl.patch > > > C# doesn't support F format for parsing floats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-64) TestDateFilter incorrectly gets total milliseconds
TestDateFilter incorrectly gets total milliseconds -- Key: LUCENENET-64 URL: https://issues.apache.org/jira/browse/LUCENENET-64 Project: Lucene.Net Issue Type: Bug Reporter: Jeff When performing TestBefore, it uses milliseconds instead of total milliseconds so it fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-64) TestDateFilter incorrectly gets total milliseconds
[ https://issues.apache.org/jira/browse/LUCENENET-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-64: -- Attachment: TestDateFilter.patch This patch uses total milliseconds. This is obtained by the following: long now = (long)((TimeSpan)(System.DateTime.Now - System.DateTime.MinValue)).TotalMilliseconds; > TestDateFilter incorrectly gets total milliseconds > -- > > Key: LUCENENET-64 > URL: https://issues.apache.org/jira/browse/LUCENENET-64 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff > Attachments: TestDateFilter.patch > > > When performing TestBefore, it uses milliseconds instead of total > milliseconds so it fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENENET-59) QueryUtils has some invalid Asserts
[ https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519761 ] Jeff commented on LUCENENET-59: --- I would agree.. however the only reason why it is failing is because an arraylist has a capacity of 4 instead of 2 which is failing the GetHashCode(). ToString() compares the full query so I figured that would be enough. There are no extra values in these fields, the extra values are null. I will find out where these values are removed and add a TrimToSize() to clean up this arraylist. This will make these match. I will take a look. Jeff > QueryUtils has some invalid Asserts > --- > > Key: LUCENENET-59 > URL: https://issues.apache.org/jira/browse/LUCENENET-59 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: QueryUtils.patch, QueryUtils.patch2 > > > NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query > q2) and CheckUnequal(Query q1, Query q2). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-69) FSIndexInput.isFDValid() not ported correctly
[ https://issues.apache.org/jira/browse/LUCENENET-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-69: -- Attachment: FSDirectory.patch Here is the patch that resolves this issue. > FSIndexInput.isFDValid() not ported correctly > - > > Key: LUCENENET-69 > URL: https://issues.apache.org/jira/browse/LUCENENET-69 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: FSDirectory.patch > > > FSIndexInput.isFDValid() was not ported correctly because it doesn't > translate one to one. After looking into this a little more... file.getFD is > part of the FileInputStream class in java. This would be the base stream of > file. so if the basestream is null it would be invalid. if it is not null it > would be valid. After making this change all TestCompoundFile tests pass. > public virtual bool IsFDValid() > { > return (file.BaseStream != null); > //return true; // return file.getFD().valid();// {{Aroush-2.1 in > .NET, how do we do this? > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-69) FSIndexInput.isFDValid() not ported correctly
FSIndexInput.isFDValid() not ported correctly - Key: LUCENENET-69 URL: https://issues.apache.org/jira/browse/LUCENENET-69 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor Attachments: FSDirectory.patch FSIndexInput.isFDValid() was not ported correctly because it doesn't translate one to one. After looking into this a little more... file.getFD is part of the FileInputStream class in java. This would be the base stream of file. so if the basestream is null it would be invalid. if it is not null it would be valid. After making this change all TestCompoundFile tests pass. public virtual bool IsFDValid() { return (file.BaseStream != null); //return true; // return file.getFD().valid();// {{Aroush-2.1 in .NET, how do we do this? } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts
[ https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-59: -- Attachment: QueryUtils.patch3 Here is a new patch that only checks Query.ToString() since you can't Assert a Query. GetHashCode() has been left in. additional patches will trim the arraylist higher up in the tests.. once the Arraylist TrimToSize() has been performed they pass the tests. The reason why CheckEqual and CheckUnequal hashes are failing is because when you clone an arraylist it sets the capacity to the length instead of keeping the capacity. This makes it invalid when comparing a hash. > QueryUtils has some invalid Asserts > --- > > Key: LUCENENET-59 > URL: https://issues.apache.org/jira/browse/LUCENENET-59 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Priority: Minor > Attachments: QueryUtils.patch, QueryUtils.patch2, QueryUtils.patch3 > > > NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query > q2) and CheckUnequal(Query q1, Query q2). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts
[ https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-59: -- Attachment: TestBoolean2.patch This patch uses TrimToSize() in in the TestBooleans2 Tests on an array to remove the extra values that were added when the array was created. ((System.Collections.ArrayList)current.Clauses()).TrimToSize(); > QueryUtils has some invalid Asserts > --- > > Key: LUCENENET-59 > URL: https://issues.apache.org/jira/browse/LUCENENET-59 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: QueryUtils.patch, QueryUtils.patch2, QueryUtils.patch3, > TestBoolean2.patch > > > NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query > q2) and CheckUnequal(Query q1, Query q2). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-76) DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests
DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests Key: LUCENENET-76 URL: https://issues.apache.org/jira/browse/LUCENENET-76 Project: Lucene.Net Issue Type: Bug Reporter: Jeff Priority: Minor DisjunctionMaxQuery.Clone() clones the DisjunctionMaxQuery then the disjuncts arraylist. When cloning the disjuncts arraylist it causes the unit tests to fail. disjuncts are cloned with cloning the query, so this is not needed. public override System.Object Clone() { DisjunctionMaxQuery clone = (DisjunctionMaxQuery) base.Clone(); //clone.disjuncts = (System.Collections.ArrayList) this.disjuncts.Clone(); return clone; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-76) DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests
[ https://issues.apache.org/jira/browse/LUCENENET-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-76: -- Attachment: DisjunctionMaxQuery.patch > DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests > > > Key: LUCENENET-76 > URL: https://issues.apache.org/jira/browse/LUCENENET-76 > Project: Lucene.Net > Issue Type: Bug > Reporter: Jeff >Priority: Minor > Attachments: DisjunctionMaxQuery.patch > > > DisjunctionMaxQuery.Clone() clones the DisjunctionMaxQuery then the disjuncts > arraylist. When cloning the disjuncts arraylist it causes the unit tests to > fail. disjuncts are cloned with cloning the query, so this is not needed. > public override System.Object Clone() > { > DisjunctionMaxQuery clone = (DisjunctionMaxQuery) base.Clone(); > //clone.disjuncts = (System.Collections.ArrayList) > this.disjuncts.Clone(); > return clone; > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENENET-78) TestDateFilter.patch for nunit test
[ https://issues.apache.org/jira/browse/LUCENENET-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520322 ] Jeff commented on LUCENENET-78: --- is a tick a millisecond? I thought it was a nanosecond. LUCENENET-64 solves this problem using milliseconds. Jeff > TestDateFilter.patch for nunit test > --- > > Key: LUCENENET-78 > URL: https://issues.apache.org/jira/browse/LUCENENET-78 > Project: Lucene.Net > Issue Type: Bug >Reporter: Digy >Priority: Trivial > Attachments: TestDateFilter.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (LUCENENET-61) Issue testing Backwards Compatibility
[ https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff reopened LUCENENET-61: --- This patch didn't get applied correctly. for some reason the line: entries = zipFile.Entries(); wasn't removed, but it was in the patch. > Issue testing Backwards Compatibility > - > > Key: LUCENENET-61 > URL: https://issues.apache.org/jira/browse/LUCENENET-61 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Assignee: George Aroush >Priority: Minor > Attachments: TestBackwardsCompatibility.patch > > > NUnit tests fail because of non c# compliant tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (LUCENENET-61) Issue testing Backwards Compatibility
[ https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff updated LUCENENET-61: -- Attachment: TestBackwardCompatibility.patch2 This patch removes the line: entries = zipFile.Entries(); Which didn't get removed in the first patch, and moves the line: Assert.Fail("Needs integration with SharpZipLib"); Inside the else of the conditional compilation symbol SHARP_ZIP_LIB > Issue testing Backwards Compatibility > - > > Key: LUCENENET-61 > URL: https://issues.apache.org/jira/browse/LUCENENET-61 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Assignee: George Aroush >Priority: Minor > Attachments: TestBackwardCompatibility.patch2, > TestBackwardsCompatibility.patch > > > NUnit tests fail because of non c# compliant tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (LUCENENET-61) Issue testing Backwards Compatibility
[ https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520443 ] jdell edited comment on LUCENENET-61 at 8/16/07 7:57 PM: This new patch TestBackwardCompatibility.patch2 removes the line: entries = zipFile.Entries(); Which didn't get removed in the first patch, and moves the line: Assert.Fail("Needs integration with SharpZipLib"); Inside the else of the conditional compilation symbol SHARP_ZIP_LIB was (Author: jdell): This patch removes the line: entries = zipFile.Entries(); Which didn't get removed in the first patch, and moves the line: Assert.Fail("Needs integration with SharpZipLib"); Inside the else of the conditional compilation symbol SHARP_ZIP_LIB > Issue testing Backwards Compatibility > - > > Key: LUCENENET-61 > URL: https://issues.apache.org/jira/browse/LUCENENET-61 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Assignee: George Aroush >Priority: Minor > Attachments: TestBackwardCompatibility.patch2, > TestBackwardsCompatibility.patch > > > NUnit tests fail because of non c# compliant tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENENET-63) FieldCacheImpl tries to parse a float in f format.
[ https://issues.apache.org/jira/browse/LUCENENET-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521297 ] Jeff commented on LUCENENET-63: --- This fixes unit tests in Search/TestSort/: TestAutoSort TestMultiSort TestParallelMultiSort TestReverseSort TestSortCombos TestTypesSort My guess is that maybe it has always been a problem and the unit tests never existed to see the problem was there.. or maybe something else has change internally.. but System.Single.Parse(string) doesn't parse a string that ends in an 'f'. If that trailing 'f' is removed it is able to convert the string to a single (float) without any trouble. Regards, Jeff > FieldCacheImpl tries to parse a float in f format. > --- > > Key: LUCENENET-63 > URL: https://issues.apache.org/jira/browse/LUCENENET-63 > Project: Lucene.Net > Issue Type: Bug >Reporter: Jeff >Priority: Minor > Attachments: FieldCacheImpl.patch > > > C# doesn't support F format for parsing floats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENENET-95) Nunite test for Search.TestDisjunctionMaxQuery.TestBooleanOptionalWithTiebreaker
[ https://issues.apache.org/jira/browse/LUCENENET-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526520 ] Jeff commented on LUCENENET-95: --- For what it's worth... I just got the latest version from SVN and this test passed for me. Windows XP Pro (SP2) on Dual Xeon 3Ghz CPU, .Net v2.0.50727, VS2005 Pro 8.0.50727.42 Jeff > Nunite test for > Search.TestDisjunctionMaxQuery.TestBooleanOptionalWithTiebreaker > > > Key: LUCENENET-95 > URL: https://issues.apache.org/jira/browse/LUCENENET-95 > Project: Lucene.Net > Issue Type: Bug >Reporter: Digy >Priority: Trivial > Attachments: TryThis.patch > > > Changing the line in TestDisjunctionMaxQuery.cs > from >public const float SCORE_COMP_THRESH = 0.f; > to >public const float SCORE_COMP_THRESH = 0.1f; > solves the problem but i am not sure if an exact match is needed or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENENET-192) Latest SVN build is about twice as slow running queries when compared to Java Lucene
Latest SVN build is about twice as slow running queries when compared to Java Lucene Key: LUCENENET-192 URL: https://issues.apache.org/jira/browse/LUCENENET-192 Project: Lucene.Net Issue Type: Improvement Environment: Visual Studio 2008 with .NET framework 3.5 Reporter: Jeff Johnson Priority: Minor I have been using the java luke tool for comparing query times with java vs C# and the java query time is consistantly about twice as fast as the C# query time. The index I am testing was built in C# and contains 10 million documents. I have made sure to "warm up" the index by running the same query a few times before timing it again. One example: Querying for a term that exists in every document takes about 1.3 seconds in C# and 0.6 seconds in java. The total size of my index directory is about 1 GB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENENET-192) Latest SVN build is about twice as slow running queries when compared to Java Lucene
[ https://issues.apache.org/jira/browse/LUCENENET-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752706#action_12752706 ] Jeff Johnson commented on LUCENENET-192: What about if you get 10M hits? > Latest SVN build is about twice as slow running queries when compared to Java > Lucene > > > Key: LUCENENET-192 > URL: https://issues.apache.org/jira/browse/LUCENENET-192 > Project: Lucene.Net > Issue Type: Improvement > Environment: Visual Studio 2008 with .NET framework 3.5 >Reporter: Jeff Johnson >Priority: Minor > > I have been using the java luke tool for comparing query times with java vs > C# and the java query time is consistantly about twice as fast as the C# > query time. The index I am testing was built in C# and contains 10 million > documents. I have made sure to "warm up" the index by running the same query > a few times before timing it again. > One example: Querying for a term that exists in every document takes about > 1.3 seconds in C# and 0.6 seconds in java. The total size of my index > directory is about 1 GB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.