from:"Jeff"

Re: Lucene.Net 2.1 build 3 as "Release Candidate"

2007-12-10 Thread Jeff

I have been using it in production for a while now. It seems very solid. If
I had a vote I would say lets mark it as final and move on to v2.2.

Jeff

On Dec 10, 2007 9:47 PM, George Aroush <[EMAIL PROTECTED]> wrote:

> Hi folks,
>
> I just labeled Lucene.Net 2.1 build 3 as "Release Candidate" in SVN.  I
> also
> added to the "tags".  Unless if anyone has objection, I want to label this
> as "Final" and start committing into the "trunk" the code base for 2.2.
>
> Please let me know if you have any objection to this plan.
>
> Regards,
>
> -- George
>
>

Re: .NET 2.0 with Lucene.NET

2006-04-26 Thread Jeff Rodenburg

We initially started running v1.4.3 with the 1.1 framework and have since
migrated over to the 2.0 framework without difficulty.  There are some
differences down in the guts, but nothing that can't be handled in
conversion.

Our index environment sounds very similar to yours - highly structured and
relatively high volatility.  In addition, we have quick turnaround on
reflected changes from the original datasource (db) being reflected in the
search index.

I would suggest initially focusing your time with understanding how the file
formats (compound vs. non-compound) and parameters like mergefactor and
maxmergedocs affect your specific index creation and production.  You may
have already done this, but I found that efficiency levels changed with
structural index changes, i.e. decisions about field population and
settings.

Depending on your available system resources, I've also noticed considerable
performance degradation when an index passes a certain size threshold, i.e.
300MB on the given system I'm working with.  (We break our aggregate index
out to multiple individual indexes for the best mix of indexing and search
performance.)

Hope this helps.

-- jeff r.


On 4/26/06, Rob Tucker <[EMAIL PROTECTED]> wrote:
>
> Thanks for the quick response George,
>
>
>
> I was assuming that 1.4 was .NET 1.1 compliant, is this not the case?
> Generally, I've struggled to find information about Lucene.NET support
> for .NET.  Is there somewhere where I can find it?
>
>
>
> We're using Lucene.NET for a slightly unusual searching implementation
> where it's holding information about highly structured documents in an
> environment where there is the potential for a fairly high degree of
> volatility in document contents and lifetimes.  The main issues we've
> found with regard to efficiency is in adding and removing large number
> of documents can take a long time, merging and optimising the index is
> the particular challenge.  I've really got a general concern that
> switching to .NET2.0 might affect this efficiency and robustness for any
> number of reasons, say in changing the implementation of data
> structures, file access classes or plain bugs in the framework!  From
> experience, I don't tend to believe the MS marketing hype, I've always
> hit upgrade problems when change OS version, runtimes etc, I don't
> expect .NET 2.0 to be any different.
>
>
>
> In general, we've got great results with Lucene.NET, I'm really looking
> for an initial feel for what to expect with the move to 2.0.  Is there
> information anywhere about the required changes that you've mentioned?
>
>
>
> To be honest, I've struggled to find much information about Lucene.NET
> on the ASF site, is this the main page?:
> http://incubator.apache.org/projects/lucene.net.html If so, the WIKI
> seems to be down.
>
>
>
> Thanks,
>
>
>
> Rob Tucker.
>
>
>
>   _
>
> From: George Aroush [mailto:[EMAIL PROTECTED]
> Sent: 26 April 2006 13:42
> To: Rob Tucker
> Subject: RE: .NET 2.0 with Lucene.NET
>
>
>
> Hi Rob,
>
>
>
> With few changes, you can get Lucene.Net to run on .NET 2.0 -- others
> have done it.  as you may know, I am finishing off 1.9 which will be
> .NET 1.1 compliant.  After which, I will be releasing Lucene.Net 2.0
> which will be .NET 2.0 compliant.
>
>
>
> Sorry, I don't have numbers to show you if Lucene.Net is faster or more
> stable under .NET 2.0.  But I am curious, what are your concerns of
> Lucene.Net 1.4.3 in regards to robustness and efficiency?
>
>
>
> Regards,
>
>
>
> -- George
>
>
>
> PS: Please subscribe to Lucene.Net mailing list at ASF and post
> questions like those there for the whole community to pitch in on.
>
>
>
>   _
>
> From: Rob Tucker [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 26, 2006 6:32 AM
> To: George Aroush
> Subject: .NET 2.0 with Lucene.NET
>
> Hi,
>
>
>
> Do you have any information about how Lucene.Net runs with .NET2.0?  We
> have a .NET 1.1 project that we'd like to upgrade and would like to get
> some initial confidence about how Lucene.NET will run.  We're using 1.4
> at the moment.  I'm concerned with both robustness and efficiency.
>
>
>
> Regards,
>
>
>
> Rob Tucker
>
>
>
> [EMAIL PROTECTED]
>
>
>

Re:

2006-05-03 Thread Jeff Rodenburg


Hi Ali -

Please send these messages to [EMAIL PROTECTED]  The dev
mailing list is for Lucene.Net developers within Apache, not general
developers using Lucene.Net.

By parsing the expression, what input/output are you looking for?  Can you
provide a sample?

-- j

On 5/3/06, Ali Khawaja <[EMAIL PROTECTED]> wrote:


Hi -

Can anyone tell me if I can use Lucene as Boolean expression parser. I
need to handle the parsed tree by myself.



Thanks

Ali

Re: Compression Implementation

2006-05-15 Thread Jeff Rodenburg


Looking at this from a bit broader perspective, this opens up a bigger
conversation.

While working to implement a third-party hook-by-reflection process into the
code, the .NET 2.0 framework already contains the appropriate classes to
handle compression.  While there's a need for .NET 1.1 compliance, doing so
with a round-about method seems more like an exception approach vs. a
standard approach.

I don't mean to suggest that usage for the 1.1 Framework be abandoned; I'm
sure there is greater 1.1 usage out in the world as opposed to 2.0.
However, jumping through hoops to support 1.1 is also just a stopgap.  I
know there is a plan to move to the 2.0 Framework later on when the
java-based Lucene project hits its 2.0 definition.

Would it be worthwhile to consider a side-by-side port to the
2.0Framework?  I ported
1.4.3 to the 2.0 Framework myself last winter, and it has changed a few
underlying things as well as improved several core classes.  Having used the
2.0 Framework for the past 6 months, I would strongly suggest we consider
this as a possible solution.

Thoughts?

-- j

On 5/11/06, George Aroush <[EMAIL PROTECTED]> wrote:


Hi Johnny,

I have to keep Lucene.Net 1.9 .NET 1.1 compliant.  Since .NET 1.1 doesn't
have compression API, I couldn't implement this port -- thus, I left it
out.

My idea on how to resolve this is to use reflection and through
reflection,
one can integrate a 3rd party compression into Lucene.Net 1.9.  If you
want
to take on this part, please do and submit your code.  Your effort will be
more then welcome and is a path to becoming a committer for Lucene.Net.

Regards,

-- George Aroush

-Original Message-
From: J C [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 10, 2006 7:51 PM
To: lucene-net-dev@incubator.apache.org
Subject: Compression Implementation
Importance: High

Hello George

I have found this:
// {{Aroush-1.9}} for .NET 1.1, we can use reflection and ZLib?
in FieldsWriter.cs. It seems that the ZIP compression is not yet
implemented.

I would like to give it a try. Please confirm.

Regards
Johnny

_
Be the one of the first to try the NEW Windows Live Mail.

http://ideas.live.com/programPage.aspx?versionId=5d21c51a-b161-4314-9b0e-491
1fb2b2e6d

Re: Compression Implementation

2006-05-15 Thread Jeff Rodenburg

Does "compatible" equal the ability for a Java implementation of Lucene to
open/read/write to an index created in Lucene.Net?

On 5/15/06, George Aroush <[EMAIL PROTECTED]> wrote:

Hi Jeff,

We need compression support in Lucene.Net 1.9 using .NET 1.1 otherwise 1.9
can't be declared compatible with it's Java based index.  Beside, doing
reflection to provide a plug-in solution to a 3rd party compression isn't
hard.

Eyal already asked if he can work on this part.  I said yes but I have not
heard back from him yet.

Eyal: If you are reading this, please let us know if you are taking on
this
task or not.  Thanks!

Regards,

-- George Aroush

-----Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Monday, May 15, 2006 12:32 PM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Compression Implementation

Looking at this from a bit broader perspective, this opens up a bigger
conversation.

While working to implement a third-party hook-by-reflection process into
the
code, the .NET 2.0 framework already contains the appropriate classes to
handle compression.  While there's a need for .NET 1.1 compliance, doing
so
with a round-about method seems more like an exception approach vs. a
standard approach.

I don't mean to suggest that usage for the 1.1 Framework be abandoned; I'm
sure there is greater 1.1 usage out in the world as opposed to 2.0.
However, jumping through hoops to support 1.1 is also just a stopgap.  I
know there is a plan to move to the 2.0 Framework later on when the
java-based Lucene project hits its 2.0 definition.

Would it be worthwhile to consider a side-by-side port to the 2.0Framework
?
I ported
1.4.3 to the 2.0 Framework myself last winter, and it has changed a few
underlying things as well as improved several core classes.  Having used
the
2.0 Framework for the past 6 months, I would strongly suggest we consider
this as a possible solution.

Thoughts?

-- j

On 5/11/06, George Aroush <[EMAIL PROTECTED]> wrote:
>
> Hi Johnny,
>
> I have to keep Lucene.Net 1.9 .NET 1.1 compliant.  Since .NET 1.1
> doesn't have compression API, I couldn't implement this port -- thus,
> I left it out.
>
> My idea on how to resolve this is to use reflection and through
> reflection, one can integrate a 3rd party compression into Lucene.Net
> 1.9.  If you want to take on this part, please do and submit your
> code.  Your effort will be more then welcome and is a path to becoming
> a committer for Lucene.Net.
>
> Regards,
>
> -- George Aroush
>
> -Original Message-
> From: J C [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, May 10, 2006 7:51 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: Compression Implementation
> Importance: High
>
> Hello George
>
> I have found this:
> // {{Aroush-1.9}} for .NET 1.1, we can use reflection and ZLib?
> in FieldsWriter.cs. It seems that the ZIP compression is not yet
> implemented.
>
> I would like to give it a try. Please confirm.
>
> Regards
> Johnny
>
> _
> Be the one of the first to try the NEW Windows Live Mail.
>
> http://ideas.live.com/programPage.aspx?versionId=5d21c51a-b161-4314-9b
> 0e-491
> 1fb2b2e6d
>
>

Re: Compression Implementation

2006-05-15 Thread Jeff Rodenburg


George - thanks for the clarification.

-- j

On 5/15/06, George Aroush <[EMAIL PROTECTED]> wrote:


Hi Jeff,

Yes, "compatible" does mean the index can be open/read/write/etc. to when
created with Java/C# Lucene.  This is already is the case with 1.4.x and
must remain so for 1.9 and forward.  In fact, right now you can have two
processes, one Java and another .NET Lucene where both concurrently
accessing the same index as long as they are sharing the same lock file.

Regards,

-- George Aroush

-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Monday, May 15, 2006 4:43 PM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Compression Implementation

Does "compatible" equal the ability for a Java implementation of Lucene to
open/read/write to an index created in Lucene.Net?

On 5/15/06, George Aroush <[EMAIL PROTECTED]> wrote:
>
> Hi Jeff,
>
> We need compression support in Lucene.Net 1.9 using .NET 1.1 otherwise
> 1.9 can't be declared compatible with it's Java based index.  Beside,
> doing reflection to provide a plug-in solution to a 3rd party
> compression isn't hard.
>
> Eyal already asked if he can work on this part.  I said yes but I have
> not heard back from him yet.
>
> Eyal: If you are reading this, please let us know if you are taking on
> this task or not.  Thanks!
>
> Regards,
>
> -- George Aroush
>
> -Original Message-
> From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
> Sent: Monday, May 15, 2006 12:32 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: Compression Implementation
>
> Looking at this from a bit broader perspective, this opens up a bigger
> conversation.
>
> While working to implement a third-party hook-by-reflection process
> into the code, the .NET 2.0 framework already contains the appropriate
> classes to handle compression.  While there's a need for .NET 1.1
> compliance, doing so with a round-about method seems more like an
> exception approach vs. a standard approach.
>
> I don't mean to suggest that usage for the 1.1 Framework be abandoned;
> I'm sure there is greater 1.1 usage out in the world as opposed to 2.0.
> However, jumping through hoops to support 1.1 is also just a stopgap.
> I know there is a plan to move to the 2.0 Framework later on when the
> java-based Lucene project hits its 2.0 definition.
>
> Would it be worthwhile to consider a side-by-side port to the
> 2.0Framework ?
> I ported
> 1.4.3 to the 2.0 Framework myself last winter, and it has changed a
> few underlying things as well as improved several core classes.
> Having used the 2.0 Framework for the past 6 months, I would strongly
> suggest we consider this as a possible solution.
>
> Thoughts?
>
> -- j
>
> On 5/11/06, George Aroush <[EMAIL PROTECTED]> wrote:
> >
> > Hi Johnny,
> >
> > I have to keep Lucene.Net 1.9 .NET 1.1 compliant.  Since .NET 1.1
> > doesn't have compression API, I couldn't implement this port --
> > thus, I left it out.
> >
> > My idea on how to resolve this is to use reflection and through
> > reflection, one can integrate a 3rd party compression into
> > Lucene.Net 1.9.  If you want to take on this part, please do and
> > submit your code.  Your effort will be more then welcome and is a
> > path to becoming a committer for Lucene.Net.
> >
> > Regards,
> >
> > -- George Aroush
> >
> > -Original Message-
> > From: J C [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, May 10, 2006 7:51 PM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: Compression Implementation
> > Importance: High
> >
> > Hello George
> >
> > I have found this:
> > // {{Aroush-1.9}} for .NET 1.1, we can use reflection and ZLib?
> > in FieldsWriter.cs. It seems that the ZIP compression is not yet
> > implemented.
> >
> > I would like to give it a try. Please confirm.
> >
> > Regards
> > Johnny
> >
> > _
> > Be the one of the first to try the NEW Windows Live Mail.
> >
> > http://ideas.live.com/programPage.aspx?versionId=5d21c51a-b161-4314-
> > 9b
> > 0e-491
> > 1fb2b2e6d
> >
> >
>
>

Re: Compression Implementation

2006-05-15 Thread Jeff Rodenburg


I like Eyal's suggestion in keeping the adapter definition to implementing
an interface.
This would be initiated through a reflection call, yes?

I would add that the configuration information could be driven via custom
config sections, which I've done a bazillion of lately.  If it would help,
I'll do the code for custom configuration sections that ensure the requisite
data is loaded from the config file in a structured manner.

-- j

On 5/15/06, Eyal Post <[EMAIL PROTECTED]> wrote:


What I was thinking of doing is this:
Declare an interface for compression:

  public interface CompressionAdapter
  {
byte[] Compress(byte[] input);
byte[] Uncompress(byte[] input);
  }

Allow users to develop an adapter that implements this interface (i.e.
SharpZLibCompressionAdapter).
The user then add the adapter class name to the app.config file and Lucene
will dynamically create an instance of that adapter.

This means there's no actual dependency from Lucene to any 3rd party
library. If the adapter is not configured, compression will not work, if
it
is - this it's the user responsibility to provide the compression library
and an adapter.

Eyal

> -Original Message-
> From: George Aroush [mailto:[EMAIL PROTECTED]
> Sent: Monday, May 15, 2006 23:48 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Compression Implementation
>
> Hi Eyal,
>
> First thanks for taking on this task, it's much appreciated.
>
> The reason why I believe we need a reflection base solution
> is because the current code of Lucene.Net must remain
> independent of 3rd party requirement.
> For example, if you look at the Test code for Lucene.Net, you
> can't compile or run it without having NUnit be installed on
> your machine and have the code reference the library.  If
> your solution will have similar requirement, then I don't
> think we can accept it for 1.9.
>
> Reflection seems to me is the only way to save this problem.
>
> After putting together the reflection code in Lucene.Net's
> code base, we still have to provide an interface which a user
> much code to in order for the compression code to be in a
> working order and utilized by Lucene.Net.
> But this code, because it's not a physical part of
> Lucene.Net, it doesn't put any restriction on Lucene.Net to
> require a 3rd party library/code to be present to use
> Lucene.Net -- unless if the user wants compression.
>
> Again, thanks for taking on this task.
>
> Regards,
>
> -- George Aroush
>
> -Original Message-
> From: Eyal Post [mailto:[EMAIL PROTECTED]
> Sent: Monday, May 15, 2006 4:24 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Compression Implementation
>
> I'm on it.
> Just wondering, why take the reflection way and not the interface way?
> Interface way seems more "correct" and will also perform better.
>
> Eyal
>
>
> > -Original Message-
> > From: George Aroush [mailto:[EMAIL PROTECTED]
> > Sent: Monday, May 15, 2006 21:54 PM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: RE: Compression Implementation
> >
> > Hi Jeff,
> >
> > We need compression support in Lucene.Net 1.9 using .NET
> 1.1 otherwise
> > 1.9 can't be declared compatible with it's Java based
> index.  Beside,
> > doing reflection to provide a plug-in solution to a 3rd party
> > compression isn't hard.
> >
> > Eyal already asked if he can work on this part.  I said yes
> but I have
> > not heard back from him yet.
> >
> > Eyal: If you are reading this, please let us know if you
> are taking on
> > this task or not.  Thanks!
> >
> > Regards,
> >
> > -- George Aroush
> >
> > -Original Message-
> > From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
> > Sent: Monday, May 15, 2006 12:32 PM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: Re: Compression Implementation
> >
> > Looking at this from a bit broader perspective, this opens
> up a bigger
> > conversation.
> >
> > While working to implement a third-party hook-by-reflection process
> > into the code, the .NET 2.0 framework already contains the
> appropriate
> > classes to handle compression.
> > While there's a need for .NET 1.1 compliance, doing so with a
> > round-about method seems more like an exception approach vs.
> > a standard approach.
> >
> > I don't mean to suggest that usage for the 1.1 Framework be
> abandoned;
> > I'm sure there is greater 1.1 usage out in the world as opposed to
> > 2.0.
> > However, jumping through hoops to support 1.1 is also jus

Re: Ranking and Scoring Hits

2006-05-18 Thread Jeff Rodenburg


Hi Ed -

For future reference, these questions are intended to be directed to the
lucene-net-user mailing list.

When iterating on your Hits collection, call Hits.Score() the same way you
call Hits.Doc() -- by passing it the index value (int) for your loop
iteration.

On 5/18/06, Ed Jones <[EMAIL PROTECTED]> wrote:


Hi,



I've only just downloaded Lucene.net and I've been doing some initial
work on it.

I've written an indexer and got a test button working to run queries to
find the results. So far I like it and it's wonderfully fast, however,
I'm trying to return the score for each returned result, mainly so I can
tell how relevant result 1 is over result 2. I've looked though the
documentation and can't find how to do this. Can anyone give me a
pointer? Also can somebody confirm that the default search results are
in relevance order?



Thanks

Ed

Re: noobie question

2006-05-19 Thread Jeff Rodenburg


Hi Pamela -

Performance certainly changes as your index grows, and it's not even
necessarily a linear progression.  How you indexed your data, compression
factors, compound vs. loose file format, number of indexes, etc. all play a
part in affecting search performance at runtime.

There are a lot of places to look for improvements.  I would suggest looking
at your specific indexes and see if you can break those up into smaller
indexes -- this will lead you to the MultiSearcher (and, if you have
multi-processor hardware, ParallelMultiSearcher).

Leave your index updating operation out of the picture for the moment.
Indexing can have a big impact on search performance, so take that out of
the equation.  After you're able to get to better runtime search
performance, go back and add indexing to the mix.  I can tell you from
experience that most search systems with indexes of substantial size are
executing indexing operations on separate systems to avoid performance
impacts.

Hope this helps.

-- j



On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:


I have been developing a C# search solution for an application which has
tens of millions of web pages. Most of these web pages are under 1 k.

While our initial pilot was very encouraging on our tests of 1,000,000
docs,
when we scaled up to 10 million subsecond searches are now taking 8-10
seconds.

Where should I focus my efforts to increase search speed? Should I be
using
the RAMDirectory? MultiSearcher?

We only have one machine right now which serves indexing and searching.

TIA

Pam

Re: noobie question

2006-05-19 Thread Jeff Rodenburg

The Compound file format is the default file format for the index that you
create (at least in v1.4.x).  When creating an index, you can specify
true/false in a constructor that indicates if you wish the index file to be
compacted or not.  Check out
http://lucene.apache.org/java/docs/fileformats.html to understand this
better.

When you're index gets to be of significant size, the file format can become
very important.  Using the default compound format, searching will tend to
be faster (assuming all other things equal) but index updates will be
slower; vice versa, searching may be slower but index updates can be
faster.  There are three other properties that can affect the mix as well:
mergefactor, minmergedocs, and maxmergedocs.  Tweaking these properties in
conjunction with the file format settings grows in importance as your index
size increases.  Check out the thread at
http://www.gossamer-threads.com/lists/lucene/java-user/11999?search_string=minmergedocs;#11999
.

-- j

On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:

Thanks Jeff, I am a little confused by the compound vs loose file format
you
speak of.

We are indexing html docs and indexing 10 metatags. By indexing I mean we
index the body, but we also query the properties. I am not sure what the
correct definition is.

Are you saying that if we were merely indexing the document bodies we
would
be further ahead? We need to restrict our searches by date, and a few
other
properties, so its really important that we be able to do these
restrictions.

TIA

Pam

On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>
> Hi Pamela -
>
> Performance certainly changes as your index grows, and it's not even
> necessarily a linear progression.  How you indexed your data,
compression
> factors, compound vs. loose file format, number of indexes, etc. all
play
> a
> part in affecting search performance at runtime.
>
> There are a lot of places to look for improvements.  I would suggest
> looking
> at your specific indexes and see if you can break those up into smaller
> indexes -- this will lead you to the MultiSearcher (and, if you have
> multi-processor hardware, ParallelMultiSearcher).
>
> Leave your index updating operation out of the picture for the moment.
> Indexing can have a big impact on search performance, so take that out
of
> the equation.  After you're able to get to better runtime search
> performance, go back and add indexing to the mix.  I can tell you from
> experience that most search systems with indexes of substantial size are
> executing indexing operations on separate systems to avoid performance
> impacts.
>
> Hope this helps.
>
> -- j
>
>
>
> On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:
> >
> > I have been developing a C# search solution for an application which
has
> > tens of millions of web pages. Most of these web pages are under 1 k.
> >
> > While our initial pilot was very encouraging on our tests of 1,000,000
> > docs,
> > when we scaled up to 10 million subsecond searches are now taking 8-10
> > seconds.
> >
> > Where should I focus my efforts to increase search speed? Should I be
> > using
> > the RAMDirectory? MultiSearcher?
> >
> > We only have one machine right now which serves indexing and
searching.
> >
> > TIA
> >
> > Pam
> >
> >
>
>

Re: noobie question

2006-05-19 Thread Jeff Rodenburg

Yes, the merge parameters does affect indexing performance, but compactness
also affects search performance as your index gets larger.  As you
incrementally update the index, the fragmentation effect (which the merge
properties will dictate) causes performance degradation at search time.

As for index size, I don't know about any hard and fast rules.  We have
about 7-8GB of indexes of varying structure, and those are spread out over
about 40 indexes.  We try to keep individual indexes below 300MB, as the
operational hassles after that size seem to be more burdensome.  We also use
distributed searching so our indexes are allocated across multiple machines
(no duplication).  As a rule, we also try to stay below 2.5GB of aggregate
indexes on one machine.  Our indexes are a full corpus and we must search
across all indexes all the time.  You can structure your indexes more
effectively if you don't need to search the full corpus all the time.

With multiple indexes being searched collectively, you'll soon be using the
MultiSearcher class.  Be sure to look at MultiReader, as it makes a
difference in search performance (nice caching).

-- j

On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:

Hi Jeff

A couple more questions. Don't the merge parameters determine how
aggressively the index is compacted? And if so, doesn't this affect only
indexing performance and not search performance?

Secondly how large should each index be? Should I be partitioning the
indexes, ie by date range? So one index for Decemeber 2005, one for
January,
etc? Or is it done by size?

TIA

Pam

On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>
> Hi Pamela -
>
> Performance certainly changes as your index grows, and it's not even
> necessarily a linear progression.  How you indexed your data,
compression
> factors, compound vs. loose file format, number of indexes, etc. all
play
> a
> part in affecting search performance at runtime.
>
> There are a lot of places to look for improvements.  I would suggest
> looking
> at your specific indexes and see if you can break those up into smaller
> indexes -- this will lead you to the MultiSearcher (and, if you have
> multi-processor hardware, ParallelMultiSearcher).
>
> Leave your index updating operation out of the picture for the moment.
> Indexing can have a big impact on search performance, so take that out
of
> the equation.  After you're able to get to better runtime search
> performance, go back and add indexing to the mix.  I can tell you from
> experience that most search systems with indexes of substantial size are
> executing indexing operations on separate systems to avoid performance
> impacts.
>
> Hope this helps.
>
> -- j
>
>
>
> On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:
> >
> > I have been developing a C# search solution for an application which
has
> > tens of millions of web pages. Most of these web pages are under 1 k.
> >
> > While our initial pilot was very encouraging on our tests of 1,000,000
> > docs,
> > when we scaled up to 10 million subsecond searches are now taking 8-10
> > seconds.
> >
> > Where should I focus my efforts to increase search speed? Should I be
> > using
> > the RAMDirectory? MultiSearcher?
> >
> > We only have one machine right now which serves indexing and
searching.
> >
> > TIA
> >
> > Pam
> >
> >
>
>

Re: noobie question

2006-05-20 Thread Jeff Rodenburg

Correct on our configuration, give or take a few 100 MB.  :-)
And we have three servers accessed simultaneously for each search.

For our index, we're dealing with information that's geographically defined,
so our indexes are broken up along those lines.  We still monitor each index
for size, but the geographic data drives our index maintenance logic.  We've
indexed approximately 20 MM rows of information.

Our partitioning criteria serves two purposes: query efficiency and index
maintainability.  Depending on how your index is structured (the Lucene
settings + your own document structure), these two can compete with each
other to the point of being polar.  Generally you'll want to find a happy
medium between the two.  While we have many rows of data and our index
documents contain quite a few fields of data, many of them are simple data
fields that aren't large (database is the data source).  By contrast, if we
were indexing full-on text documents, I'm sure our index would be
substantially larger and we'd likely take a different approach.

I did a lot of research prior to constructing our index, and with as much
feedback and data that I could glean, trial-and-error proved to be the most
effective manner in determining what to do and how to do it.

-- j

On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:

OK, I'm very confused here Jeff. It sound like what you are suggesting is
that you have multiple indexes per machine, each around 300 Mbyes, which
means about 2.5/.3 = 8 indexes per machine, and you have 7.5/2.5 =3
machines
in the mix. Is this correct?

On what criteria do you partition your index? Date, or some other
criteria,
or is it merely size?

I think we have indexed 1 million rows and our index is 7 Gigs.

Pam

On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>
> Yes, the merge parameters does affect indexing performance, but
> compactness
> also affects search performance as your index gets larger.  As you
> incrementally update the index, the fragmentation effect (which the
merge
> properties will dictate) causes performance degradation at search time.
>
> As for index size, I don't know about any hard and fast rules.  We have
> about 7-8GB of indexes of varying structure, and those are spread out
over
> about 40 indexes.  We try to keep individual indexes below 300MB, as the
> operational hassles after that size seem to be more burdensome.  We also
> use
> distributed searching so our indexes are allocated across multiple
> machines
> (no duplication).  As a rule, we also try to stay below 2.5GB of
aggregate
> indexes on one machine.  Our indexes are a full corpus and we must
search
> across all indexes all the time.  You can structure your indexes more
> effectively if you don't need to search the full corpus all the time.
>
> With multiple indexes being searched collectively, you'll soon be using
> the
> MultiSearcher class.  Be sure to look at MultiReader, as it makes a
> difference in search performance (nice caching).
>
> -- j
>
> On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:
> >
> > Hi Jeff
> >
> > A couple more questions. Don't the merge parameters determine how
> > aggressively the index is compacted? And if so, doesn't this affect
only
> > indexing performance and not search performance?
> >
> > Secondly how large should each index be? Should I be partitioning the
> > indexes, ie by date range? So one index for Decemeber 2005, one for
> > January,
> > etc? Or is it done by size?
> >
> > TIA
> >
> > Pam
> >
> > On 5/19/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi Pamela -
> > >
> > > Performance certainly changes as your index grows, and it's not even
> > > necessarily a linear progression.  How you indexed your data,
> > compression
> > > factors, compound vs. loose file format, number of indexes, etc. all
> > play
> > > a
> > > part in affecting search performance at runtime.
> > >
> > > There are a lot of places to look for improvements.  I would suggest
> > > looking
> > > at your specific indexes and see if you can break those up into
> smaller
> > > indexes -- this will lead you to the MultiSearcher (and, if you have
> > > multi-processor hardware, ParallelMultiSearcher).
> > >
> > > Leave your index updating operation out of the picture for the
moment.
> > > Indexing can have a big impact on search performance, so take that
out
> > of
> > > the equation.  After you're able to get to better runtime search
> > > performance, go back and add indexing to the mix.  I can tell you

Re: noobie question

2006-05-20 Thread Jeff Rodenburg


- Our index is currently 7 Gigs. I take it we should have more than 7 Gigs
or RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs?

You can go with big RAM, but I haven't found that to be a huge boost in
search perf.  We run dual-proc Xeons for our search servers, as CPU has been
the bottleneck.  Sorts are particularly egregious when it comes to CPU load
as well.  Bang for the buck, running the new dual-core Opterons are
*amazingly* strong performers.

- Each html doc we have has 10 metatags which we store. Other than date, and
a 10 byte string for one of the metatags, the metatags are almost always
empty. Will this degrade performance?

I would not expect this to degrade your performance.

- Also when you suggest we distribute our index, on what criteria do we
partition? It looks like we need to optimize our IO for reads which means
raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps
cache it in ram (file system cache) by issuing warm up queries?

The faster your disk, the better.  And yes, warm-up queries are a big help.
In our instance, warm up queries need to be logically distributed to hit all
the searchers.


On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:


Hi George

Our index is currently 7 Gigs. I take it we should have more than 7 Gigs
or
RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs?

Each html doc we have has 10 metatags which we store. Other than date, and
a
10 byte string for one of the metatags, the metatags are almost always
empty. Will this degrade performance?

Also when you suggest we distribute our index, on what criteria do we
partition? It looks like we need to optimize our IO for reads which means
raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps
cache it in ram (file system cache) by issuing warm up queries?

BTW - we will be running on the wintel platform using c#.

TIA

Pam


On 5/19/06, George Aroush <[EMAIL PROTECTED]> wrote:
>
> Hi Pam,
>
> You also need to investigate your hardware configuration.  Beside the
> usual
> of having a fast CPU and max out your memory, make sure have a fast hard
> drive.
>
> As a Lucene index grows, anything you do with Lucene becomes I/O bound,
> thus
> a fast hard drive is critical.  Simply moving from 5400rpm to 7200rpm
will
> give you a noticeable difference -- switch to a fast SCSI/RAID hard rive
> and
> you will even see better results.  And yet even better, if you
distribute
> your index across multiple hard-drives/portions.
>
> One other thing to look for, are you storing any data in your Lucene
> index?
> If so, consider not doing it.  The goal is to keep the index size as
small
> as possible to reduce I/O.
>
> Good luck.
>
> -- George Aroush
>
> -Original Message-
> From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
> Sent: Friday, May 19, 2006 4:28 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: noobie question
>
> Yes, the merge parameters does affect indexing performance, but
> compactness
> also affects search performance as your index gets larger.  As you
> incrementally update the index, the fragmentation effect (which the
merge
> properties will dictate) causes performance degradation at search time.
>
> As for index size, I don't know about any hard and fast rules.  We have
> about 7-8GB of indexes of varying structure, and those are spread out
over
> about 40 indexes.  We try to keep individual indexes below 300MB, as the
> operational hassles after that size seem to be more burdensome.  We also
> use
> distributed searching so our indexes are allocated across multiple
> machines
> (no duplication).  As a rule, we also try to stay below 2.5GB of
aggregate
> indexes on one machine.  Our indexes are a full corpus and we must
search
> across all indexes all the time.  You can structure your indexes more
> effectively if you don't need to search the full corpus all the time.
>
> With multiple indexes being searched collectively, you'll soon be using
> the
> MultiSearcher class.  Be sure to look at MultiReader, as it makes a
> difference in search performance (nice caching).
>
> -- j
>
> On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:
> >
> > Hi Jeff
> >
> > A couple more questions. Don't the merge parameters determine how
> > aggressively the index is compacted? And if so, doesn't this affect
> > only indexing performance and not search performance?
> >
> > Secondly how large should each index be? Should I be partitioning the
> > indexes, ie by date range? So one index for Decemeber 2005, one for
> > January, etc? Or is it done by size?
> >
> > TIA
> >
> > Pam
> >
> > On 5/19/06, Jeff Rodenburg <[E

Re: noobie question

2006-05-22 Thread Jeff Rodenburg

You could certainly load a 7gb index into memory, given sufficient hardware
running 64-bit Windows.  That said, I wouldn't suggest trying to carry a
single 7gb index in a single server's memory.

Keeping an index below a 2Gb threshold is only treating a symptom and isn't
really sustainable if your index is already in the 7Gb range.  The issue at
hand is dealing with the indexed data as efficiently as possible.  Following
George's suggestion for stripping the index down, i.e. just using searchable
entities, is one possible approach.  In our situation, we have quite a few
fields of data that would be performance hits elsewhere on our system to
retrieve at search run-time, so the lesser evil is to include them in our
index.  Just depends on your requirements to determine what's best.
Likewise, monitoring your hardware statistics for bottlenecks aren't
invalid, but I doubt you'll be able to make the modifications necessary to
achieve the results you'd like to see on hardware config changes alone.

Based on the conversation we've had thus far and a few assumptions on my
part, I doubt you'll be able to keep your search times anywhere near the
thresholds you'd like to see.  You can help yourself with reduced index
size, tweaked hardware configurations, and indexing strategies, but there is
no silver bullet here.  If my experiences hold true for you, you'll end up
addressing each of these areas as your look for efficiencies of scale.

-- j

On 5/22/06, George Aroush <[EMAIL PROTECTED]> wrote:

Hi Pam and Jeff,

You can't load 7Gb of index into memory.  A typical Windows application
can't access more then 2Gb of RAM -- so if a machine has 8Gg and only
Lucene
is running chance are that you still have a lot of real memory not being
used.

You need to investigate and find out why your index grew to 7Gb and reduce
it's size.  For example, are you storing any data in Lucene's index?  If
so,
consider not doing so.

Monitor your CPU and see that it is being max'ed out or not.  Chance are
that it is and if queries are still taking log to run then your focus
should
be on disk I/O.

Regards,

-- George Aroush

-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Saturday, May 20, 2006 11:18 AM
To: lucene-net-dev@incubator.apache.org
Subject: Re: noobie question

- Our index is currently 7 Gigs. I take it we should have more than 7 Gigs
or RAM on our machine? Can we get any other hardware specs? IE 2, 4 procs?

You can go with big RAM, but I haven't found that to be a huge boost in
search perf.  We run dual-proc Xeons for our search servers, as CPU has
been
the bottleneck.  Sorts are particularly egregious when it comes to CPU
load
as well.  Bang for the buck, running the new dual-core Opterons are
*amazingly* strong performers.

- Each html doc we have has 10 metatags which we store. Other than date,
and
a 10 byte string for one of the metatags, the metatags are almost always
empty. Will this degrade performance?

I would not expect this to degrade your performance.

- Also when you suggest we distribute our index, on what criteria do we
partition? It looks like we need to optimize our IO for reads which means
raid 5 or a solid state ram drive to me. Is this correct? Could we perhaps
cache it in ram (file system cache) by issuing warm up queries?

The faster your disk, the better.  And yes, warm-up queries are a big
help.
In our instance, warm up queries need to be logically distributed to hit
all
the searchers.

On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:
>
> Hi George
>
> Our index is currently 7 Gigs. I take it we should have more than 7
> Gigs or RAM on our machine? Can we get any other hardware specs? IE 2,
> 4 procs?
>
> Each html doc we have has 10 metatags which we store. Other than date,
> and a 10 byte string for one of the metatags, the metatags are almost
> always empty. Will this degrade performance?
>
> Also when you suggest we distribute our index, on what criteria do we
> partition? It looks like we need to optimize our IO for reads which
> means raid 5 or a solid state ram drive to me. Is this correct? Could
> we perhaps cache it in ram (file system cache) by issuing warm up
queries?
>
> BTW - we will be running on the wintel platform using c#.
>
> TIA
>
> Pam
>
>
> On 5/19/06, George Aroush <[EMAIL PROTECTED]> wrote:
> >
> > Hi Pam,
> >
> > You also need to investigate your hardware configuration.  Beside
> > the usual of having a fast CPU and max out your memory, make sure
> > have a fast hard drive.
> >
> > As a Lucene index grows, anything you do with Lucene becomes I/O
> > bound, thus a fast hard drive is critical.  Simply moving from
> > 5400rpm to 7200rpm
> will
> > give you a noticeable difference -- switc

Re: Error during indexing process

2006-05-22 Thread Jeff Rodenburg


Hi Soormash -

This sounds like a corrupt index.  I've seen this with an index that wasn't
properly closed or an indexing update didn't complete entirely.  Try using
the Luke index interrogation tool (Java app) for evaluating your index and
see if it's still readable.

-- j

On 5/22/06, George Aroush <[EMAIL PROTECTED]> wrote:


Hi Soormash,

I have posted your question to the mailing list.  Please subscribe to the
list so you can post directly.  See:
http://incubator.apache.org/projects/lucene.net.html for instructions on
how
to subscribe.

Thanks.

-- George Aroush

-Original Message-
From: Soormasher Singh [mailto:[EMAIL PROTECTED]
Sent: Sunday, May 21, 2006 5:01 PM
To: lucene-net-dev@incubator.apache.org
Subject: Error during indexing process

Hello there

I'm using Lucene 1.4.3 to index database records (about 100,000 or so).
Till
yesterday, everything was going fine and I didn't have any problems in
indexing.
Today, out of nowhere, I've started getting the following error:

Cannot rename segments.new to segments

In some cases, it's the same error with the word 'delete' instead of
rename.


Sometimes the above error occurs after indexing 20,000 records, other
times
after 2000.

I've tried using StopAnalyzer (which is what I'm using by default) and
StandardAnalyzer but get the same problem with both.

Any suggestions/workarounds for this. I really need some inputs on this.

Thanks!

soormash

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: noobie question

2006-05-23 Thread Jeff Rodenburg


Hi Pam -


I am confused, what do you mean by storing data in my index?

(George, correct me if I'm wrong here.)

What George is referring to is the different manners in which data can be
included in an index.  Take a look at the Field class and you'll notice a
series of static methods that store data in a number of ways.  The static
methods define four different ways to include data in an index -- Keyword,
Unindexed, Unstored, and Text.  These are just wrapper definitions for
indexing, storing and tokenizing index information.

"Indexing" means including data with a field that would be searchable.
"Storing" means including data with a field for presentation.
"Tokenizing" means using analyzed data with a field that's been designated
as indexed (searchable).

For the four static methods:
Keyword - values are indexed (searchable) and stored but not tokenized
Unindexed - values are stored but not indexed or tokenized
Unstored - values are indexed and tokenized (searchable) but not stored
Text - values are indexed, tokenized and stored

In making decisions about index composition, choose the field storage method
that best matches the need for your particular data field.  The fewer data
fields you need, the smaller the index, the better the performance.



Thanks to you and Jeff for all of your help! I really appreciate it!

That's why the list is here.  :-)

-- j



On 5/23/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:


Hi George

I am confused, what do you mean by storing data in my index?

Thanks to you and Jeff for all of your help! I really appreciate it!

Pam


On 5/22/06, George Aroush <[EMAIL PROTECTED]> wrote:
>
> Hi Pam and Jeff,
>
> You can't load 7Gb of index into memory.  A typical Windows application
> can't access more then 2Gb of RAM -- so if a machine has 8Gg and only
> Lucene
> is running chance are that you still have a lot of real memory not being
> used.
>
> You need to investigate and find out why your index grew to 7Gb and
reduce
> it's size.  For example, are you storing any data in Lucene's index?  If
> so,
> consider not doing so.
>
> Monitor your CPU and see that it is being max'ed out or not.  Chance are
> that it is and if queries are still taking log to run then your focus
> should
> be on disk I/O.
>
> Regards,
>
> -- George Aroush
>
>
> -Original Message-
> From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
> Sent: Saturday, May 20, 2006 11:18 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: noobie question
>
> - Our index is currently 7 Gigs. I take it we should have more than 7
Gigs
> or RAM on our machine? Can we get any other hardware specs? IE 2, 4
procs?
>
> You can go with big RAM, but I haven't found that to be a huge boost in
> search perf.  We run dual-proc Xeons for our search servers, as CPU has
> been
> the bottleneck.  Sorts are particularly egregious when it comes to CPU
> load
> as well.  Bang for the buck, running the new dual-core Opterons are
> *amazingly* strong performers.
>
> - Each html doc we have has 10 metatags which we store. Other than date,
> and
> a 10 byte string for one of the metatags, the metatags are almost always
> empty. Will this degrade performance?
>
> I would not expect this to degrade your performance.
>
> - Also when you suggest we distribute our index, on what criteria do we
> partition? It looks like we need to optimize our IO for reads which
means
> raid 5 or a solid state ram drive to me. Is this correct? Could we
perhaps
> cache it in ram (file system cache) by issuing warm up queries?
>
> The faster your disk, the better.  And yes, warm-up queries are a big
> help.
> In our instance, warm up queries need to be logically distributed to hit
> all
> the searchers.
>
>
> On 5/19/06, Pamela Foxcroft <[EMAIL PROTECTED]> wrote:
> >
> > Hi George
> >
> > Our index is currently 7 Gigs. I take it we should have more than 7
> > Gigs or RAM on our machine? Can we get any other hardware specs? IE 2,
> > 4 procs?
> >
> > Each html doc we have has 10 metatags which we store. Other than date,
> > and a 10 byte string for one of the metatags, the metatags are almost
> > always empty. Will this degrade performance?
> >
> > Also when you suggest we distribute our index, on what criteria do we
> > partition? It looks like we need to optimize our IO for reads which
> > means raid 5 or a solid state ram drive to me. Is this correct? Could
> > we perhaps cache it in ram (file system cache) by issuing warm up
> queries?
> >
> > BTW - we will be running on the wintel platform using c#.
> >
> > TIA
> >
> > Pam
> >
> >
>

Fwd: Lucene 2.0.0 release available

2006-05-27 Thread Jeff Rodenburg


Below is a recent message from the Java dev list for Lucene.  As it states,
this is mostly a bugfix release against the 1.9 code.

The development path that's been suggested is that we develop the
1.9release on the
1.1 Framework and that we would cut over to the 2.0 Framework with the
2.0Lucene release.  I believe this is fine, but we need to begin
porting the
Java 2.0 release soon.  The Java 1.9 release was considered complete some
time last fall.  The time divide between the Java release and our C# port is
growing and is getting longer.

Not to take away from the 1.9 efforts on the 1.1 Framework, I'm going to
proceed on porting the Java 2.0 release to C# under the 2.0 Framework.  If
there are a substantial number of bugfixes in the 2.0 release, we should
make use of that as well.

Questions or comments welcome.

cheers,
jeff r.


--  Forwarded Message  --

Subject: Lucene 2.0.0 release available
Date: Samstag 27 Mai 2006 05:57
From: Doug Cutting <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org

Release 2.0.0 of Lucene is now available from:

http://www.apache.org/dyn/closer.cgi/lucene/java/

This is mostly a bugfix release from release 1.9.1. Note however that
deprecated 1.x features have now been removed. Any code that compiles
against Lucene 1.9.1 without deprecation warnings should work without
further changes with any 2.x release.

The detailed change log is at:

http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_0_0/CHANGES.txt

Doug

Re: Lucene 2.0.0 release available

2006-05-30 Thread Jeff Rodenburg


George -

I hear your concern about the 1.9 release not being finished.  I will take
point with you on the reason that it's so far behind the Java version.  It
wasn't until recently (February) that the code was posted (the Alpha
version).  The lists with Apache didn't come online until April.  Even as
such, the process of evaluating the code, finding a bug or improvement,
making a suggestion and returning it to the community has really been
nothing more than emailing you.  I know you're busy like all the rest of us,
but this process had to run directly through yourself for a very long time.
I frankly believe that many people were very ready to jump in and get the
thing rolling, but were frustrated at the process and the bottlenecks that
came with it and gave up.  Sour grapes to the community response because
you're now ready for participation is not the fault of the community.

However, that's not my reason for suggesting review of the 2.0 Java
codebase.  The fact of the matter is that the time difference between the
Java release and the C# port is growing.  The value in that time difference
is knowledge of known issues with the prior release (1.9) and how to deal
with it (fixes in 2.0).  The Java mailing list has already identified bugs
to be fixed with their release marked 2.0.  If there are bugs in the
1.9release of Java, chances are those same bugs will appear in the C#
port.
The Java community has already worked those out, and I'd like to take
advantage of those improvements.  Additionally, looking at a C# port under
the 2.0 Framework has significant differences in things like threading and
exception handling, as well as taking advantage of performance improvements
like generics.

I will echo George's request to finish the 1.9 release.  I'm not sure
there's any value in the claim of a 1.9 release any more than a non-complete
1.9 release.  Nonetheless, I've received some offers to help review
the 2.0release, and will respond to those people privately.

cheers,
jeff r.


On 5/29/06, George Aroush <[EMAIL PROTECTED]> wrote:


Hi Jeff and all,

We must finish 1.9 before working on 2.0 otherwise, there is no guaranty
that 2.0 will not end up with the same fait as 1.9.

Lets face it, 1.9 has been behind it's Java version release mainly because

despite my repeated call for help to finishing it off (even back at
SourceForge.net) I have yet to receive any help.  For 1.9, unlike 1.3 and
1.4 releases, NO ONE, has stepped up and offered to help (except recently
for Eyal's compression code.)

As you can tell, I am frustrated with this.  Because despite not getting
any
help, I am getting private emails where folks asking me that they want to
become a committer on ASF for Lucene.Net -- when I pointed them to
http://incubator.apache.org/learn/newcommitters.html I don't hear back!!

So please folks, lets first finish off 1.9.  Take a look at the current
source code and comment on the lines that I have questions on.  Those are
found by searching for the text "Aroush".

This past weekend, I have finished the port of the Test code for 1.9 and
it
is running.  About 40% of the tests are failing and some were due to bug
in
the 1.9 code and the others due to bug in the port of the Test code.  In a
day or two I will release code on ASF and again will be asking for help to

finish off 1.9.

To sum-up, I don't support that we do any work on 2.0 until when we have
1.9
done, otherwise, not only will we have an incomplete 1.9 but 2.0 might end
up like 1.9, incomplete -- and thus, we will now have two incomplete
releases instead of one.

1.9 is very close to being "final" -- lets work together to finish it off
and use this opportunity to become a committer on ASF.

Regards,

-- George Aroush

-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Saturday, May 27, 2006 1:02 PM
To: lucene-net-dev@incubator.apache.org;
lucene-net-user@incubator.apache.org
Subject: Fwd: Lucene 2.0.0 release available

Below is a recent message from the Java dev list for Lucene.  As it
states,
this is mostly a bugfix release against the 1.9 code.

The development path that's been suggested is that we develop the
1.9release
on the
1.1 Framework and that we would cut over to the 2.0 Framework with the
2.0Lucene release.  I believe this is fine, but we need to begin porting
the
Java 2.0 release soon.  The Java 1.9 release was considered complete some
time last fall.  The time divide between the Java release and our C# port
is
growing and is getting longer.

Not to take away from the 1.9 efforts on the 1.1 Framework, I'm going to
proceed on porting the Java 2.0 release to C# under the 2.0 Framework.  If
there are a substantial number of bugfixes in the 2.0 release, we should
make use of that as well.

Questions or comments welcome.

cheers,
jeff r.


--  Forwarded Message  --

Subject: Luc

Re: Lucene 2.0.0 release available

2006-05-30 Thread Jeff Rodenburg


I understand your frustration, but if the community is not reaching out to
participate, then the approach needs to improve.  I'm certain the ASF can
help, but the logistical stuff has to be there.  For example, we need the
code base under version control and the how for participation needs to be
spelled out.  I'm not an open source community guru, but my participation on
other projects has certainly increased because I understood what I could do
and how to go about it.  Right now, our sales pitch consists of "please
help" and it's not moving anyone to action.  Just a suggestion, but maybe a
more granular list of what's needed to finish 1.9 might improve
participation.

As for my own participation, I have cycles to put into review, but not with
the 1.1 Framework.  I have other projects that rely on Lucene.Net and those
projects use the 2.0 Framework, so strictly speaking for myself, I have an
interest in that side of the equation.  It doesn't help the crowd with the
1.9 release, but neither does the 1.9 release help me in my short-term
needs.  I've run the 1.4.3 version of Lucene.Net under both the
1.1Framework and the
2.0 Framework, and the differences just in the Framework code are not
insubstantial.

It doesn't have to be an all-or-nothing, top-down directive approach.  For
as many people as there are on the 1.1 Framework, I've talked to plenty
others who have migrated to the 2.0 Framework.  For us, the sooner we can
get the latest release up and running, the better.

So as to not dissuade attention from the 1.9 release, I'll keep any
conversation about the 2.0 release and the 2.0 Framework off the list.

cheers,
jeff r.


On 5/30/06, George Aroush <[EMAIL PROTECTED]> wrote:


Hi Jeff and all,

I was the central point because there was no one else and we needed a way
to
coordinate the project.  With 1.3 and 1.4 when I asked for help, folks
asked
which CS files they can take on and they delivered.  For 1.9 release,
(which
by the way was first released back on May 26, 2005 -- yes, I did say
"2005")
despite my repeated calls for help, non were made.  So I don't think
people
were ready to jump in, they just weren't around, busy or lost interest; I
hope things will change now that Lucene.Net is at ASF but so far that
hasn't
been the case so I am disappointed.

Now coming back to your suggestion of working on 2.0.  If you have the
cycles to review the 2.0 code base, why not put those cycles to finish off
1.9?  Anything that was fixed in Java's release of 1.9 must be fixed in
Lucene.Net 1.9 release -- in fact, I would suggest that we look at 1.9.1
release.  Beside, the Java release of 2.0 is just compliant with Java 5.0.
The value for us to have 1.9 (or 1.9.1) release is the support for .NET
1.1.
Not releasing 1.9 is like Java Lucene 1.9 not support Java 1.3 (did I got
the Java ver right?!)  In addition, keep in mind that Lucene.Net 1.9 isn't
that far off from being "final".  Thus, if we get 1.9 out, it shouldn't be
hard to get 2.0 out.

Best regards,

-- George Aroush


-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 30, 2006 11:36 AM
To: lucene-net-dev@incubator.apache.org
Cc: lucene-net-user@incubator.apache.org; [EMAIL PROTECTED]
Subject: Re: Lucene 2.0.0 release available

George -

I hear your concern about the 1.9 release not being finished.  I will take
point with you on the reason that it's so far behind the Java version.  It
wasn't until recently (February) that the code was posted (the Alpha
version).  The lists with Apache didn't come online until April.  Even as
such, the process of evaluating the code, finding a bug or improvement,
making a suggestion and returning it to the community has really been
nothing more than emailing you.  I know you're busy like all the rest of
us,
but this process had to run directly through yourself for a very long
time.
I frankly believe that many people were very ready to jump in and get the
thing rolling, but were frustrated at the process and the bottlenecks that
came with it and gave up.  Sour grapes to the community response because
you're now ready for participation is not the fault of the community.

However, that's not my reason for suggesting review of the 2.0 Java
codebase.  The fact of the matter is that the time difference between the
Java release and the C# port is growing.  The value in that time
difference
is knowledge of known issues with the prior release (1.9) and how to deal
with it (fixes in 2.0).  The Java mailing list has already identified bugs
to be fixed with their release marked 2.0.  If there are bugs in the
1.9release of Java, chances are those same bugs will appear in the C#
port.
The Java community has already worked those out, and I'd like to take
advantage of those improvements.  Additionally, looking at a C# port under
the 2.0 Framework has

Re: Remote searches with Lucene

2006-08-21 Thread Jeff Rodenburg

Hello all -

I've been watching this thread to follow the direction and thought I might
be able to offer some assistance.  I run a search system that involves 4
separate search servers -- 3 serving search objects via RemoteSearchable,
and a 4th that serves in an index updating role.

The codebase for Lucene.Net provides all the library routines one needs to
provide distributed search capabilities, but does not provide facilities for
distributed search operation -- nor should it.  The ideas presented here are
certainly possible; I've implemented a working operation without requiring
the changes described here.  I'm confident in our implementation; for the
calendar year, our uptime/availability of search services is 99.99%.  Our
only outage was related to network hardware, otherwise we're sitting solid
at 100%.

I've been authorized to provide our operational code for distributed search
under Lucene.Net to the community at large.  Some of the code is customized
to our operation, but for the most part it's rather generic.  We started the
project under Lucene v1.4.3, but the operational aspect still applies under
v1.9.

The system consists of a LuceneServer, which provides searchability against
indexes as defined in XML configuration files.  In addition, an
IndexUpdateServer provides master index updating, master/slave index
replication and automated index maintenance.  Integration with our web site
ensures the index stays available, updated and current.  There's a great
deal of applied knowledge and learned behavior of many of the underlying
sub-system components that distributed search under Lucene.Net makes use of
-- .Net remoting, garbage collection, etc.

If anyone has interest, please reply.  Contributing this code requires a
little cleanup of our customization work, so my response may not be
immediate but I would make efforts to release the code in short order.

thanks,
jeff r.

On 8/19/06, Robert Boulanger <[EMAIL PROTECTED]> wrote:

Hi Elena, hi Rest,

> Dear All,
>
> The application I am working on is intended to make use of the
> distributed search capabilities of the Lucene library. While trying to
> work with the Lucene's RemoteSearchable class, I faced some problems
> cased by the current Lucene implementation. In following I'll try to
> describe them, as well as the possible ways of their solution, I
> identified. The most important question for me is, if these changes
> have a chance to be integrated in the coming Lucene versions, such
> that remote searches would really become feasible. I would appreciate
> any feedback.

Same problem for me and I found some more issues which I explain below:

>
> The first problem concerns the construction of the RemoteSearchable
> object. .Net framework allows for both, server and client activation
> models of the remote objects. Currently, RemoteSearchable class
> possesses only one constructor that requires knowledge of a local
> Searchable object:
>
> public RemoteSearchable(Lucene.Net.Search.Searchable local)
>
I just added a new constructor to RemoteSearchable
public RemoteSearchable(): base()
{
this.local = this.local;
}

not the fine method but for me it works so far.

> Since this "local" object is located on the server, knowledge of the
> server's index paths is needed for its creation. However, there are at
> least some scenarios where only the server, but not the client, knows
> where the indexes are stored on the server side. I think this problem
> could be solved by extending RemoteSearchable class with a standard
> constructor that reads the names of the indexes to be published out of
> a configuration file on the server side.
>
My "Server" now implements a Class which inherits directly from Remote
Searchable.
in the parameterless constructor there I read the server sided
configfile which contains the index location , create a new IndexReader
and pass it as Argument to MyBase.New()
See sample below.

> 2. Bug in Term construction
[snip]

This whole chapter was very useful and I can commit everything works
fine from there on.

But there is still a bug in FieldDocSortedHitQueue line 130 and below:
I figured out that the castings are not working when the system is
running in a non english globalization context.
The String in docAFields[i] which might be for example 1.345678 is
casted to 1345678.0 since the decimal sign is misinterpreted in German
systems as it seems.
So the casting results in an overflow.

So I changed it as follows:

case SortField.SCORE:
float r1 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
float r2 = (float)Convert.ToSingle(docA.fields[i],
System.Globalization.NumberFormatInfo.InvariantInfo);
if (r1 > r2)
c = - 1;
if (r1 < r2)
c = 1;
break;

Same in line 172 and 174:

float f1 = (float)Convert.ToSingle(do

Re: Remote searches with Lucene

2006-08-21 Thread Jeff Rodenburg

That's great, thanks George.  Perfect place to park the code.

I've received quite a few requests today, mostly off-list.  I'll start
prepping the code for contribution.  I have some internal/proprietary things
to pull out, but mostly just need to document it better so that it makes
sense (different code running in different places.

I'll start with this tonight, and try to get something out in the next few
days.

cheers,
jeff r.

On 8/21/06, George Aroush <[EMAIL PROTECTED]> wrote:

Hi Jeff,

If you want to contribute the code, I am sure many can benefit from it.  I
can make it part of the "contribute" code base of Lucene.Net and share it
here:
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/contrib/

Regards,

-- George

-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Monday, August 21, 2006 12:11 PM
To: lucene-net-dev@incubator.apache.org
Cc: Elena Demidova
Subject: Re: Remote searches with Lucene

Hello all -

I've been watching this thread to follow the direction and thought I might
be able to offer some assistance.  I run a search system that involves 4
separate search servers -- 3 serving search objects via RemoteSearchable,
and a 4th that serves in an index updating role.

The codebase for Lucene.Net provides all the library routines one needs to
provide distributed search capabilities, but does not provide facilities
for
distributed search operation -- nor should it.  The ideas presented here
are
certainly possible; I've implemented a working operation without requiring
the changes described here.  I'm confident in our implementation; for the
calendar year, our uptime/availability of search services is 99.99%.  Our
only outage was related to network hardware, otherwise we're sitting solid
at 100%.

I've been authorized to provide our operational code for distributed
search
under Lucene.Net to the community at large.  Some of the code is
customized
to our operation, but for the most part it's rather generic.  We started
the
project under Lucene v1.4.3, but the operational aspect still applies
under
v1.9.

The system consists of a LuceneServer, which provides searchability
against
indexes as defined in XML configuration files.  In addition, an
IndexUpdateServer provides master index updating, master/slave index
replication and automated index maintenance.  Integration with our web
site
ensures the index stays available, updated and current.  There's a great
deal of applied knowledge and learned behavior of many of the underlying
sub-system components that distributed search under Lucene.Net makes use
of
-- .Net remoting, garbage collection, etc.

If anyone has interest, please reply.  Contributing this code requires a
little cleanup of our customization work, so my response may not be
immediate but I would make efforts to release the code in short order.

thanks,
jeff r.

On 8/19/06, Robert Boulanger <[EMAIL PROTECTED]> wrote:
>
> Hi Elena, hi Rest,
>
> > Dear All,
> >
> > The application I am working on is intended to make use of the
> > distributed search capabilities of the Lucene library. While trying
> > to work with the Lucene's RemoteSearchable class, I faced some
> > problems cased by the current Lucene implementation. In following
> > I'll try to describe them, as well as the possible ways of their
> > solution, I identified. The most important question for me is, if
> > these changes have a chance to be integrated in the coming Lucene
> > versions, such that remote searches would really become feasible. I
> > would appreciate any feedback.
>
> Same problem for me and I found some more issues which I explain below:
>
> >
> > The first problem concerns the construction of the RemoteSearchable
> > object. .Net framework allows for both, server and client activation
> > models of the remote objects. Currently, RemoteSearchable class
> > possesses only one constructor that requires knowledge of a local
> > Searchable object:
> >
> > public RemoteSearchable(Lucene.Net.Search.Searchable local)
> >
> I just added a new constructor to RemoteSearchable public
> RemoteSearchable(): base() { this.local = this.local; }
>
> not the fine method but for me it works so far.
>
> > Since this "local" object is located on the server, knowledge of the
> > server's index paths is needed for its creation. However, there are
> > at least some scenarios where only the server, but not the client,
> > knows where the indexes are stored on the server side. I think this
> > problem could be solved by extending RemoteSearchable class with a
> > standard constructor that reads the names of the indexes to be
> > published out of a configuration file on the server side.
>

Re: Remote searches with Lucene

2006-08-23 Thread Jeff Rodenburg

Just a follow-up to everyone on this topic.  I received a lot of offlist
mail about this, so this message has a rather wide distribution.

I'm in process of modifying the code for our distributed search components
so that they're generic enough for general usage and public consumption.
This is taking a little of my time, but nonetheless I expect to complete it
soon.

As for distributing the code, it will be located in the contrib portion of
the Lucene.Net repository at apache.org.  There is some logistic work
involved, but ideally this is moving forward.

As soon as I have more information to relay, I'll pass it along to the list.

cheers,
jeff r.

On 8/21/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:

Hello all -

I've been watching this thread to follow the direction and thought I might
be able to offer some assistance.  I run a search system that involves 4
separate search servers -- 3 serving search objects via RemoteSearchable,
and a 4th that serves in an index updating role.

The codebase for Lucene.Net provides all the library routines one needs to
provide distributed search capabilities, but does not provide facilities for
distributed search operation -- nor should it.  The ideas presented here are
certainly possible; I've implemented a working operation without requiring
the changes described here.  I'm confident in our implementation; for the
calendar year, our uptime/availability of search services is 99.99%.  Our
only outage was related to network hardware, otherwise we're sitting solid
at 100%.

I've been authorized to provide our operational code for distributed
search under Lucene.Net to the community at large.  Some of the code is
customized to our operation, but for the most part it's rather generic.  We
started the project under Lucene v1.4.3, but the operational aspect still
applies under v1.9.

The system consists of a LuceneServer, which provides searchability
against indexes as defined in XML configuration files.  In addition, an
IndexUpdateServer provides master index updating, master/slave index
replication and automated index maintenance.  Integration with our web site
ensures the index stays available, updated and current.  There's a great
deal of applied knowledge and learned behavior of many of the underlying
sub-system components that distributed search under Lucene.Net makes use
of -- .Net remoting, garbage collection, etc.

If anyone has interest, please reply.  Contributing this code requires a
little cleanup of our customization work, so my response may not be
immediate but I would make efforts to release the code in short order.

thanks,
jeff r.

On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote:
>
> Hi Elena, hi Rest,
>
> > Dear All,
> >
> > The application I am working on is intended to make use of the
> > distributed search capabilities of the Lucene library. While trying to
> > work with the Lucene's RemoteSearchable class, I faced some problems
> > cased by the current Lucene implementation. In following I'll try to
> > describe them, as well as the possible ways of their solution, I
> > identified. The most important question for me is, if these changes
> > have a chance to be integrated in the coming Lucene versions, such
> > that remote searches would really become feasible. I would appreciate
> > any feedback.
>
> Same problem for me and I found some more issues which I explain below:
>
> >
> > The first problem concerns the construction of the RemoteSearchable
> > object. .Net framework allows for both, server and client activation
> > models of the remote objects. Currently, RemoteSearchable class
> > possesses only one constructor that requires knowledge of a local
> > Searchable object:
> >
> > public RemoteSearchable(Lucene.Net.Search.Searchable local)
> >
> I just added a new constructor to RemoteSearchable
> public RemoteSearchable(): base()
> {
> this.local = this.local;
> }
>
> not the fine method but for me it works so far.
>
> > Since this "local" object is located on the server, knowledge of the
> > server's index paths is needed for its creation. However, there are at
>
> > least some scenarios where only the server, but not the client, knows
> > where the indexes are stored on the server side. I think this problem
> > could be solved by extending RemoteSearchable class with a standard
> > constructor that reads the names of the indexes to be published out of
> > a configuration file on the server side.
> >
> My "Server" now implements a Class which inherits directly from Remote
> Searchable.
> in the parameterless constructor there I read the server sided
> configfile which contains the index location , create a new In

Re: Remote searches with Lucene

2006-08-26 Thread Jeff Rodenburg


As promised, an update to the list.

I have code ready for delivery, if I can get svn access to the contrib
section.  A request has been made for this but it's going nowhere, so I'm
going to find another place to host the files.

There's quite a bit of documentation behind this so I'm working diligently
to explain how this works.  If anyone has a place to hold the code until the
uber-powers at apache decide to grant me access, we would greatly appreciate
the assistance.

cheers,
jeff r.


On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:


Just a follow-up to everyone on this topic.  I received a lot of offlist
mail about this, so this message has a rather wide distribution.

I'm in process of modifying the code for our distributed search components
so that they're generic enough for general usage and public consumption.
This is taking a little of my time, but nonetheless I expect to complete it
soon.

As for distributing the code, it will be located in the contrib portion of
the Lucene.Net repository at apache.org.  There is some logistic work
involved, but ideally this is moving forward.

As soon as I have more information to relay, I'll pass it along to the
list.

cheers,
jeff r.




On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
>
> Hello all -
>
> I've been watching this thread to follow the direction and thought I
> might be able to offer some assistance.  I run a search system that involves
> 4 separate search servers -- 3 serving search objects via RemoteSearchable,
> and a 4th that serves in an index updating role.
>
> The codebase for Lucene.Net provides all the library routines one needs
> to provide distributed search capabilities, but does not provide facilities
> for distributed search operation -- nor should it.  The ideas presented here
> are certainly possible; I've implemented a working operation without
> requiring the changes described here.  I'm confident in our implementation;
> for the calendar year, our uptime/availability of search services is
> 99.99%.  Our only outage was related to network hardware, otherwise
> we're sitting solid at 100%.
>
> I've been authorized to provide our operational code for distributed
> search under Lucene.Net to the community at large.  Some of the code is
> customized to our operation, but for the most part it's rather generic.  We
> started the project under Lucene v1.4.3, but the operational aspect
> still applies under v1.9.
>
> The system consists of a LuceneServer, which provides searchability
> against indexes as defined in XML configuration files.  In addition, an
> IndexUpdateServer provides master index updating, master/slave index
> replication and automated index maintenance.  Integration with our web site
> ensures the index stays available, updated and current.  There's a great
> deal of applied knowledge and learned behavior of many of the underlying
> sub-system components that distributed search under Lucene.Net makes use
> of -- .Net remoting, garbage collection, etc.
>
> If anyone has interest, please reply.  Contributing this code requires a
> little cleanup of our customization work, so my response may not be
> immediate but I would make efforts to release the code in short order.
>
> thanks,
> jeff r.
>
>
>
>
> On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote:
> >
> > Hi Elena, hi Rest,
> >
> > > Dear All,
> > >
> > > The application I am working on is intended to make use of the
> > > distributed search capabilities of the Lucene library. While trying
> > to
> > > work with the Lucene's RemoteSearchable class, I faced some problems
> >
> > > cased by the current Lucene implementation. In following I'll try to
> > > describe them, as well as the possible ways of their solution, I
> > > identified. The most important question for me is, if these changes
> > > have a chance to be integrated in the coming Lucene versions, such
> > > that remote searches would really become feasible. I would
> > appreciate
> > > any feedback.
> >
> > Same problem for me and I found some more issues which I explain
> > below:
> >
> > >
> > > The first problem concerns the construction of the RemoteSearchable
> > > object. .Net framework allows for both, server and client activation
> > > models of the remote objects. Currently, RemoteSearchable class
> > > possesses only one constructor that requires knowledge of a local
> > > Searchable object:
> > >
> > > public RemoteSearchable(Lucene.Net.Search.Searchable local)
> > >
> > I just added a new constructor to Re

Re: Remote searches with Lucene

2006-08-27 Thread Jeff Rodenburg


That's likely our only option for now.  I believe George would need to do
the posting; I'm not aware of anyone else with commit access.
As long as the turnaround is rapid and it doesn't present an admin burden,
I'm ok with it.

-- j


On 8/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:


Jeff,

I haven't heard anything back about your account request yet.  (but
the officialness of the vote is in question - so it may be a while)

What about posting a .zip file to JIRA and having George or someone
commit it on your behalf and submit patches from then on?

Erik


On Aug 26, 2006, at 10:23 PM, Jeff Rodenburg wrote:

> As promised, an update to the list.
>
> I have code ready for delivery, if I can get svn access to the contrib
> section.  A request has been made for this but it's going nowhere,
> so I'm
> going to find another place to host the files.
>
> There's quite a bit of documentation behind this so I'm working
> diligently
> to explain how this works.  If anyone has a place to hold the code
> until the
> uber-powers at apache decide to grant me access, we would greatly
> appreciate
> the assistance.
>
> cheers,
> jeff r.
>
>
> On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>>
>> Just a follow-up to everyone on this topic.  I received a lot of
>> offlist
>> mail about this, so this message has a rather wide distribution.
>>
>> I'm in process of modifying the code for our distributed search
>> components
>> so that they're generic enough for general usage and public
>> consumption.
>> This is taking a little of my time, but nonetheless I expect to
>> complete it
>> soon.
>>
>> As for distributing the code, it will be located in the contrib
>> portion of
>> the Lucene.Net repository at apache.org.  There is some logistic work
>> involved, but ideally this is moving forward.
>>
>> As soon as I have more information to relay, I'll pass it along to
>> the
>> list.
>>
>> cheers,
>> jeff r.
>>
>>
>>
>>
>> On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
>> >
>> > Hello all -
>> >
>> > I've been watching this thread to follow the direction and
>> thought I
>> > might be able to offer some assistance.  I run a search system
>> that involves
>> > 4 separate search servers -- 3 serving search objects via
>> RemoteSearchable,
>> > and a 4th that serves in an index updating role.
>> >
>> > The codebase for Lucene.Net provides all the library routines
>> one needs
>> > to provide distributed search capabilities, but does not provide
>> facilities
>> > for distributed search operation -- nor should it.  The ideas
>> presented here
>> > are certainly possible; I've implemented a working operation
>> without
>> > requiring the changes described here.  I'm confident in our
>> implementation;
>> > for the calendar year, our uptime/availability of search
>> services is
>> > 99.99%.  Our only outage was related to network hardware, otherwise
>> > we're sitting solid at 100%.
>> >
>> > I've been authorized to provide our operational code for
>> distributed
>> > search under Lucene.Net to the community at large.  Some of the
>> code is
>> > customized to our operation, but for the most part it's rather
>> generic.  We
>> > started the project under Lucene v1.4.3, but the operational aspect
>> > still applies under v1.9.
>> >
>> > The system consists of a LuceneServer, which provides searchability
>> > against indexes as defined in XML configuration files.  In
>> addition, an
>> > IndexUpdateServer provides master index updating, master/slave
>> index
>> > replication and automated index maintenance.  Integration with
>> our web site
>> > ensures the index stays available, updated and current.  There's
>> a great
>> > deal of applied knowledge and learned behavior of many of the
>> underlying
>> > sub-system components that distributed search under Lucene.Net
>> makes use
>> > of -- .Net remoting, garbage collection, etc.
>> >
>> > If anyone has interest, please reply.  Contributing this code
>> requires a
>> > little cleanup of our customization work, so my response may not be
>> > immediate but I would make efforts to release the code in short
>> order.
>> >
>> > thanks,
>> > jeff r.
>> >
&

Re: Remote searches with Lucene

2006-08-27 Thread Jeff Rodenburg


Hi Saurabh -

Thanks for your offer of help. SVN or FTP is probably the best situation.  I
would expect some feedback and suggestions for improvement to the original
code base, and I need to be able to revise it (assuming I stay the source
author) in rather short order.

There's been a suggestion to basically have garoush upload the code on my
behalf to the contrib section at apache.  If that can get turned around
quickly, I might go that route.

-- j

On 8/26/06, Saurabh Dani <[EMAIL PROTECTED]> wrote:



Hi Jeff,

What type of "place to hold" are you looking at? Is simple "FTP" site
enough or are you looking at some kind of SVN ? CVS?

Thanks
Saurabh



Date: Sat, 26 Aug 2006 19:23:27 -0700
From: "Jeff Rodenburg" <[EMAIL PROTECTED]>
To: lucene-net-dev@incubator.apache.org
Subject: Re: Remote searches with Lucene

As promised, an update to the list.

I have code ready for delivery, if I can get svn access to the contrib
section. A request has been made for this but it's going nowhere, so I'm
going to find another place to host the files.

There's quite a bit of documentation behind this so I'm working diligently
to explain how this works. If anyone has a place to hold the code until
the
uber-powers at apache decide to grant me access, we would greatly
appreciate
the assistance.

cheers,
jeff r.

On 8/23/06, Jeff Rodenburg wrote:
>
> Just a follow-up to everyone on this topic. I received a lot of offlist
> mail about this, so this message has a rather wide distribution.
>
> I'm in process of modifying the code for our distributed search
components
> so that they're generic enough for general usage and public consumption.
> This is taking a little of my time, but nonetheless I expect to complete
it
> soon.
>
> As for distributing the code, it will be located in the contrib portion
of
> the Lucene.Net repository at apache.org. There is some logistic work
> involved, but ideally this is moving forward.
>
> As soon as I have more information to relay, I'll pass it along to the
> list.
>
> cheers,
> jeff r.
>
>
>
>
> On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
> >
> > Hello all -
> >
> > I've been watching this thread to follow the direction and thought I
> > might be able to offer some assistance. I run a search system that
involves
> > 4 separate search servers -- 3 serving search objects via
RemoteSearchable,
> > and a 4th that serves in an index updating role.
> >
> > The codebase for Lucene.Net provides all the library routines one
needs
> > to provide distributed search capabilities, but does not provide
facilities
> > for distributed search operation -- nor should it. The ideas presented
here
> > are certainly possible; I've implemented a working operation without
> > requiring the changes described here. I'm confident in our
implementation;
> > for the calendar year, our uptime/availability of search services is
> > 99.99%. Our only outage was related to network hardware, otherwise
> > we're sitting solid at 100%.
> >
> > I've been authorized to provide our operational code for distributed
> > search under Lucene.Net to the community at large. Some of the code is
> > customized to our operation, but for the most part it's rather
generic. We
> > started the project under Lucene v1.4.3, but the operational aspect
> > still applies under v1.9.
> >
> > The system consists of a LuceneServer, which provides searchability
> > against indexes as defined in XML configuration files. In addition, an
> > IndexUpdateServer provides master index updating, master/slave index
> > replication and automated index maintenance. Integration with our web
site
> > ensures the index stays available, updated and current. There's a
great
> > deal of applied knowledge and learned behavior of many of the
underlying
> > sub-system components that distributed search under Lucene.Net makes
use
> > of -- .Net remoting, garbage collection, etc.
> >
> > If anyone has interest, please reply. Contributing this code requires
a
> > little cleanup of our customization work, so my response may not be
> > immediate but I would make efforts to release the code in short order.
> >
> > thanks,
> > jeff r.
> >
> >
> >
> >
> > On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote:
> > >
> > > Hi Elena, hi Rest,
> > >
> > > > Dear All,
> > > >
> > > > The application I am working on is intended to make use of the
> > > > distributed search capabilities of

Re: Remote searches with Lucene

2006-08-28 Thread Jeff Rodenburg

have ASF licensed in every CS file you submit.

No problem.  I'll mimic the headers found in the Lucene.Net source.  Let me
know if it should be something different.

Also, it would be nice to have an NUnit test written for it.

I'll tackle this during the week.  I've been updating the code to include
proper comments throughout, as well as supporting documents for making it
all work together.  Is there a specific flavor of Nunit to look for, or is
the most recent acceptable?

cheers,
jeff

On 8/28/06, George Aroush <[EMAIL PROTECTED]> wrote:

I have no problem adding the code to the SVN under contribute.

Jeff: Just ZIP up the code, and submit it to the mailing list and I can do
the rest.  Make sure, like Erik said, to have ASF licensed in every CS
file
you submit.  Also, it would be nice to have an NUnit test written for it.
This will serve as a validation for the code as well as an example on how
to
use the code.

Regards,

-- George

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Sunday, August 27, 2006 7:36 PM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Remote searches with Lucene

I'm sure I have commit privileges also, and would be happy to apply an
"svn
add" for the initial dump and clean patches for a while, as long as the
code
is ASF licensed and George doesn't mind.  It'd be better if he did so to
vet
it, as I'm not a .NET programmer.

Erik

On Aug 27, 2006, at 5:22 PM, Jeff Rodenburg wrote:

> That's likely our only option for now.  I believe George would need
> to do
> the posting; I'm not aware of anyone else with commit access.
> As long as the turnaround is rapid and it doesn't present an admin
> burden,
> I'm ok with it.
>
> -- j
>
>
> On 8/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>>
>> Jeff,
>>
>> I haven't heard anything back about your account request yet.  (but
>> the officialness of the vote is in question - so it may be a while)
>>
>> What about posting a .zip file to JIRA and having George or someone
>> commit it on your behalf and submit patches from then on?
>>
>> Erik
>>
>>
>> On Aug 26, 2006, at 10:23 PM, Jeff Rodenburg wrote:
>>
>> > As promised, an update to the list.
>> >
>> > I have code ready for delivery, if I can get svn access to the
>> contrib
>> > section.  A request has been made for this but it's going nowhere,
>> > so I'm
>> > going to find another place to host the files.
>> >
>> > There's quite a bit of documentation behind this so I'm working
>> > diligently
>> > to explain how this works.  If anyone has a place to hold the code
>> > until the
>> > uber-powers at apache decide to grant me access, we would greatly
>> > appreciate
>> > the assistance.
>> >
>> > cheers,
>> > jeff r.
>> >
>> >
>> > On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Just a follow-up to everyone on this topic.  I received a lot of
>> >> offlist
>> >> mail about this, so this message has a rather wide distribution.
>> >>
>> >> I'm in process of modifying the code for our distributed search
>> >> components
>> >> so that they're generic enough for general usage and public
>> >> consumption.
>> >> This is taking a little of my time, but nonetheless I expect to
>> >> complete it
>> >> soon.
>> >>
>> >> As for distributing the code, it will be located in the contrib
>> >> portion of
>> >> the Lucene.Net repository at apache.org.  There is some
>> logistic work
>> >> involved, but ideally this is moving forward.
>> >>
>> >> As soon as I have more information to relay, I'll pass it along to
>> >> the
>> >> list.
>> >>
>> >> cheers,
>> >> jeff r.
>> >>
>> >>
>> >>
>> >>
>> >> On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
>> >> >
>> >> > Hello all -
>> >> >
>> >> > I've been watching this thread to follow the direction and
>> >> thought I
>> >> > might be able to offer some assistance.  I run a search system
>> >> that involves
>> >> > 4 separate search servers -- 3 serving search objects via
>> >> RemoteSearchable,
>> >> > and a 4th that serves in an index updating role.
>> >&g

Remote searching with Lucene - project update

2006-09-07 Thread Jeff Rodenburg


All -

Another update on the remote searching application code that's been
mentioned in this thread.  I'm near completion of the entire collection of
files that are needed for this project -- libraries, applications, unit
tests, and documentation.  There's quite a bit to this, and thanks for
everybody's patience as I assemble the code into something that's less than
confusing.  There are several working pieces, so I'm packaging it for
consumption.

I expect to have this available sometime in the next few days, barring
things like my life and regular job from getting in the way.  Again, I'll
share an announcement to the list when I've made the files available.

Thanks,
jeff r.



On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:


As promised, an update to the list.

I have code ready for delivery, if I can get svn access to the contrib
section.  A request has been made for this but it's going nowhere, so I'm
going to find another place to host the files.

There's quite a bit of documentation behind this so I'm working diligently
to explain how this works.  If anyone has a place to hold the code until the
uber-powers at apache decide to grant me access, we would greatly appreciate
the assistance.

cheers,
jeff r.



On 8/23/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>
> Just a follow-up to everyone on this topic.  I received a lot of offlist
> mail about this, so this message has a rather wide distribution.
>
> I'm in process of modifying the code for our distributed search
> components so that they're generic enough for general usage and public
> consumption.  This is taking a little of my time, but nonetheless I expect
> to complete it soon.
>
> As for distributing the code, it will be located in the contrib portion
> of the Lucene.Net repository at apache.org .  There is some logistic
> work involved, but ideally this is moving forward.
>
> As soon as I have more information to relay, I'll pass it along to the
> list.
>
> cheers,
> jeff r.
>
>
>
>
> On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
> >
> > Hello all -
> >
> > I've been watching this thread to follow the direction and thought I
> > might be able to offer some assistance.  I run a search system that involves
> > 4 separate search servers -- 3 serving search objects via RemoteSearchable,
> > and a 4th that serves in an index updating role.
> >
> > The codebase for Lucene.Net provides all the library routines one
> > needs to provide distributed search capabilities, but does not provide
> > facilities for distributed search operation -- nor should it.  The ideas
> > presented here are certainly possible; I've implemented a working operation
> > without requiring the changes described here.  I'm confident in our
> > implementation; for the calendar year, our uptime/availability of search
> > services is 99.99%.  Our only outage was related to network hardware,
> > otherwise we're sitting solid at 100%.
> >
> > I've been authorized to provide our operational code for distributed
> > search under Lucene.Net to the community at large.  Some of the code
> > is customized to our operation, but for the most part it's rather generic.
> > We started the project under Lucene v1.4.3, but the operational aspect
> > still applies under v1.9.
> >
> > The system consists of a LuceneServer, which provides searchability
> > against indexes as defined in XML configuration files.  In addition, an
> > IndexUpdateServer provides master index updating, master/slave index
> > replication and automated index maintenance.  Integration with our web site
> > ensures the index stays available, updated and current.  There's a great
> > deal of applied knowledge and learned behavior of many of the underlying
> > sub-system components that distributed search under Lucene.Net makes
> > use of -- .Net remoting, garbage collection, etc.
> >
> > If anyone has interest, please reply.  Contributing this code requires
> > a little cleanup of our customization work, so my response may not be
> > immediate but I would make efforts to release the code in short order.
> >
> > thanks,
> > jeff r.
> >
> >
> >
> >
> > On 8/19/06, Robert Boulanger < [EMAIL PROTECTED]> wrote:
> > >
> > > Hi Elena, hi Rest,
> > >
> > > > Dear All,
> > > >
> > > > The application I am working on is intended to make use of the
> > > > distributed search capabilities of the Lucene library. While
> > > trying to
> > > > work with the Lucene'

Re: Lucene.Net Indexing Large Databases

2006-09-11 Thread Jeff Rodenburg


Hi George -

About a year ago we had a memory leak around some issues with the
1.4.3code.  A few of us wrote some sample programs that manifested the
error, but
I was able to do a fair amount of sleuthing with Memprofiler (
http://memprofiler.com/).  It's a pretty good tool for $100.

-- j

On 9/10/06, George Aroush <[EMAIL PROTECTED]> wrote:


Hi Folks,

Since last weekend, I have been trying to narrow down the problem to this
memory leak without much of a luck.

Does anyone have a tool (or could recommend one, without costing me $$)
which hopefully show the source of the leak?

Unlike C++ code, the leak here, obviously, is due to not releasing
references to temporary or real objects.  The trick is finding the object.

This leak can be created with this simple code:

public static void  Main(System.String[] args)
{
IndexWriter diskIndex;
Directory   directory;
Analyzeranalyzer;
Documentdoc;
int count;
string  indexDirectory;
System.IO.FileInfo  fi;

indexDirectory = "C:\\Index.Bad";

fi = new System.IO.FileInfo(indexDirectory);
directory = Lucene.Net.Store.FSDirectory.GetDirectory(fi,
true);

analyzer = new SimpleAnalyzer();
diskIndex = new IndexWriter(directory, analyzer, true);

count = 0;
while (count < 1)
{
doc = new Document();
diskIndex.AddDocument(doc);
count++;
}

diskIndex.Close();
}

This code will show a leak in 1.9, 1.9.1 and 2.0 but not 1.4.3.  I also
verified and it doesn't leak under the Java version of Lucene (2.0 is
where
I tested.)

Regards,

-- George


-Original Message-
From: George Aroush [mailto:[EMAIL PROTECTED]
Sent: Friday, September 01, 2006 9:21 PM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene.Net Indexing Large Databases

Hi Chris,

I am using 1.9.1 in production and I am not having this problem.  Sorry, I
don't have enough cycles to try your code on 1.9.

This problem was reported on 1.4.x and was fixed.  I am sure I carried it
over to 1.9.x and 2.0 -- or maybe this is a new issue.  I will double
check
when I get the cycles.

You can get 1.4.3's source code as ZIP from the download site of
Lucene.Net
which is here:
https://svn.apache.org/repos/asf/incubator/lucene.net/site/download/ or
you
can SVN the source code from here:
https://svn.apache.org/repos/asf/incubator/lucene.net/tags/

Regards,

-- George Aroush


-Original Message-
From: Chris David [mailto:[EMAIL PROTECTED]
Sent: Friday, September 01, 2006 1:46 PM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene.Net Indexing Large Databases

Thanks René, so its not just me with this problem.  Now where can I get a
hold of this wonderful 1.4 Build of Lucene.  Its not listed directly on
Apache's Lucene.NET Page. I am anxious to see if my code actually does
work.
Thanks again for all your help, I really do appreciate it.

Chris
Snapstream Media
-Original Message-
From: René de Vries [mailto:[EMAIL PROTECTED]
Sent: Friday, September 01, 2006 7:32 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene.Net Indexing Large Databases

Update: I didn't realize my earlier code example ran against 1.4.

If I run this with 1.9final-005 build, I am experiencing the exact same
problems as Chris mentions. Memory consumtion keeps growing, I had to kill
it at 1.5Gb. Exact same code, but with a 1.4 version of the lucene.netDLL,
and it runs along at 50Mb

René

Remote searching with Lucene - forward progress

2006-09-13 Thread Jeff Rodenburg


An update on the Remote Searching project I'm bringing forward.  I've
completed the base code for hand-off to the community.  I'm presently
working through a remoting/serialization issue that's popped up recently.
This appears to be something new in the Lucene 2.0 release.  I'm working
through that issue now, but I haven no expectation of when that's resolved.

Rather than release a non-working system, I'm going to resolve this problem
first.  Once things are working appropriately, I'll send out a release
message.

Thanks and if you have remoting experience and suggestions, feel free to
ping me.  :-)

cheers,
jeff r.


On 9/7/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:


All -

Another update on the remote searching application code that's been
mentioned in this thread.  I'm near completion of the entire collection of
files that are needed for this project -- libraries, applications, unit
tests, and documentation.  There's quite a bit to this, and thanks for
everybody's patience as I assemble the code into something that's less than
confusing.  There are several working pieces, so I'm packaging it for
consumption.

I expect to have this available sometime in the next few days, barring
things like my life and regular job from getting in the way.  Again, I'll
share an announcement to the list when I've made the files available.

Thanks,
jeff r.



On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>
> As promised, an update to the list.
>
> I have code ready for delivery, if I can get svn access to the contrib
> section.  A request has been made for this but it's going nowhere, so I'm
> going to find another place to host the files.
>
> There's quite a bit of documentation behind this so I'm working
> diligently to explain how this works.  If anyone has a place to hold the
> code until the uber-powers at apache decide to grant me access, we would
> greatly appreciate the assistance.
>
> cheers,
> jeff r.
>
>
>
> On 8/23/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
> >
> > Just a follow-up to everyone on this topic.  I received a lot of
> > offlist mail about this, so this message has a rather wide distribution.
> >
> > I'm in process of modifying the code for our distributed search
> > components so that they're generic enough for general usage and public
> > consumption.  This is taking a little of my time, but nonetheless I expect
> > to complete it soon.
> >
> > As for distributing the code, it will be located in the contrib
> > portion of the Lucene.Net repository at apache.org .  There is some
> > logistic work involved, but ideally this is moving forward.
> >
> > As soon as I have more information to relay, I'll pass it along to the
> > list.
> >
> > cheers,
> > jeff r.
> >
> >
> >
> >
> > On 8/21/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
> > >
> > > Hello all -
> > >
> > > I've been watching this thread to follow the direction and thought I
> > > might be able to offer some assistance.  I run a search system that 
involves
> > > 4 separate search servers -- 3 serving search objects via 
RemoteSearchable,
> > > and a 4th that serves in an index updating role.
> > >
> > > The codebase for Lucene.Net provides all the library routines one
> > > needs to provide distributed search capabilities, but does not provide
> > > facilities for distributed search operation -- nor should it.  The ideas
> > > presented here are certainly possible; I've implemented a working 
operation
> > > without requiring the changes described here.  I'm confident in our
> > > implementation; for the calendar year, our uptime/availability of search
> > > services is 99.99%.  Our only outage was related to network
> > > hardware, otherwise we're sitting solid at 100%.
> > >
> > > I've been authorized to provide our operational code for distributed
> > > search under Lucene.Net to the community at large.  Some of the code
> > > is customized to our operation, but for the most part it's rather generic.
> > > We started the project under Lucene v1.4.3, but the operational
> > > aspect still applies under v1.9.
> > >
> > > The system consists of a LuceneServer, which provides searchability
> > > against indexes as defined in XML configuration files.  In addition, an
> > > IndexUpdateServer provides master index updating, master/slave index
> > > replication and automated index maintenance.  Integration with our web 
site
>

Re: Remote searching with Lucene - forward progress

2006-12-05 Thread Jeff Rodenburg


Hi Robert, et. al -

No, I've not missed updating the list.  I've been a bit busy with other
things but have been working to resolve some serialization issues that are
down in the core of .Net Remoting.  The Lucene 2.0 codebase has been
problematic inside of the remoting architecture.  Rather than continue to
update the list with notifications about a lack of progress, I've opted to
attempt to address those issues and make an announcement when I'd reached
success.

So, no news for now.

thanks,
jeff

On 12/3/06, Robert Boulanger <[EMAIL PROTECTED]> wrote:


Hi Jeff,

concerning the message thread below which I began in August this year, I
wonder if there is any progress on your side so far.
Maybe I missed something in the mailinglist (what I expect), since I was
busy with other stuff,  but the last note from you concerning remote
search I find here was from september 13th.
So, since I'm on this topic again, I just want to know, whether you
released anything in the past months what I'm just not seeing or if you
are still on the issue you are describing in your last note.
thanks for replying

best regards

--Robert



Jeff Rodenburg schrieb:
> An update on the Remote Searching project I'm bringing forward.  I've
> completed the base code for hand-off to the community.  I'm presently
> working through a remoting/serialization issue that's popped up
recently.
> This appears to be something new in the Lucene 2.0 release.  I'm working
> through that issue now, but I haven no expectation of when that's
> resolved.
>
> Rather than release a non-working system, I'm going to resolve this
> problem
> first.  Once things are working appropriately, I'll send out a release
> message.
>
> Thanks and if you have remoting experience and suggestions, feel free to
> ping me.  :-)
>
> cheers,
> jeff r.
>
>
> On 9/7/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>>
>> All -
>>
>> Another update on the remote searching application code that's been
>> mentioned in this thread.  I'm near completion of the entire
>> collection of
>> files that are needed for this project -- libraries, applications, unit
>> tests, and documentation.  There's quite a bit to this, and thanks for
>> everybody's patience as I assemble the code into something that's
>> less than
>> confusing.  There are several working pieces, so I'm packaging it for
>> consumption.
>>
>> I expect to have this available sometime in the next few days, barring
>> things like my life and regular job from getting in the way.  Again,
>> I'll
>> share an announcement to the list when I've made the files available.
>>
>> Thanks,
>> jeff r.
>>
>>
>>
>> On 8/26/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>> >
>> > As promised, an update to the list.
>> >
>> > I have code ready for delivery, if I can get svn access to the
contrib
>> > section.  A request has been made for this but it's going nowhere,
>> so I'm
>> > going to find another place to host the files.
>> >
>> > There's quite a bit of documentation behind this so I'm working
>> > diligently to explain how this works.  If anyone has a place to
>> hold the
>> > code until the uber-powers at apache decide to grant me access, we
>> would
>> > greatly appreciate the assistance.
>> >
>> > cheers,
>> > jeff r.
>> >
>> >
>> >
>> > On 8/23/06, Jeff Rodenburg < [EMAIL PROTECTED]> wrote:
>> > >
>> > > Just a follow-up to everyone on this topic.  I received a lot of
>> > > offlist mail about this, so this message has a rather wide
>> distribution.
>> > >
>> > > I'm in process of modifying the code for our distributed search
>> > > components so that they're generic enough for general usage and
>> public
>> > > consumption.  This is taking a little of my time, but nonetheless
>> I expect
>> > > to complete it soon.
>> > >
>> > > As for distributing the code, it will be located in the contrib
>> > > portion of the Lucene.Net repository at apache.org .  There is some
>> > > logistic work involved, but ideally this is moving forward.
>> > >
>> > > As soon as I have more information to relay, I'll pass it along
>> to the
>> > > list.
>> > >
>> > > cheers,
>> > > jeff r.
>> > >
>> > >
>> > >
>> >

Re: Solr for .NET

2007-08-02 Thread Jeff Rodenburg

Thanks Erik.

Vijay - porting Solr to C# would be rather extensive, on top of the
Lucene-to-Lucene.Net port.  Additionally, as Solr development progresses,
dependencies get built into the Solr codebase that take advantage of
development progress in java-based Lucene.  Not to dissuade you from taking
on the task, just to be aware of some of the complexities that could underly
such an endeavor.

Take a look at the SolrSharp library if you have the cycles.

cheers,
jeff r.

On 8/2/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> Why port Solr?   It is a "web service".  Use Solr as-is and interface
> with it through the SolrSharp API!
>
> <http://wiki.apache.org/solr/SolrSharp>
>
>Erik
>
>
> On Aug 2, 2007, at 9:50 AM, Vijay Santhanam wrote:
>
> > Hi Lucenenites,
> >
> >
> >
> > Has anyone heard  of any C# Solr ports?
> >
> >
> >
> > I'd like to continue coding a C# port like this in my spare time.
> >
> >
> >
> > Alternatively, if a Lucene.NET developer could give me some
> > instructions on what to port, I'd be happy to contribute to
> > Lucene.Net. I'm not sure where to begin.
> >
> >
> >
> >
> >
> >
> >
> > Vijay Santhanam
> > B.Eng.(Soft.)
> > Spectrum Wired - Software Engineer
> >
> > T: +61 2 4925 3266
> > F: +61 2 4925 3255
> > M: +61 407 525 087
> > W: www.spectrumwired.com
> >
> >
> >
> >
> >
> > Disclaimer: This email and any attached files are intended solely
> > for the named addressee, are confidential and may contain legally
> > privileged information. The copying or distribution of them or any
> > information they contain, by anyone other than the addressee, is
> > prohibited. If you have received this email in error, please let us
> > know by telephone or return the email to the sender and destroy all
> > copies. Thank you.
> >
> >
> >
> >
> >
> >
> >
> >
>
>

hadoop or similar C# implementation of map/reduce?

2007-08-02 Thread Jeff Fedor

Has there ever been any discussion to port Hadoop to .NET as well? Or
is anyone aware of a C# map/reduce project?

thanks,
j

[jira] Created: (LUCENENET-55) Documents.DateTools has issue creating a Date in StringToDate()

2007-08-13 Thread Jeff (JIRA)

Documents.DateTools has issue creating a Date in StringToDate()
---

 Key: LUCENENET-55
 URL: https://issues.apache.org/jira/browse/LUCENENET-55
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
 Attachments: DateTools.patch

When using StringToDate(System.String dateString), it tries to create an 
invalid date with month and day = 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-55) Documents.DateTools has issue creating a Date in StringToDate()

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-55:
--

Attachment: DateTools.patch

This patch resolves the issue and passes nunit tests.

> Documents.DateTools has issue creating a Date in StringToDate()
> ---
>
> Key: LUCENENET-55
> URL: https://issues.apache.org/jira/browse/LUCENENET-55
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
> Attachments: DateTools.patch
>
>
> When using StringToDate(System.String dateString), it tries to create an 
> invalid date with month and day = 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-56) Incorrect file in TestLockFactory.RmDir()

2007-08-13 Thread Jeff (JIRA)

Incorrect file in TestLockFactory.RmDir()
-

 Key: LUCENENET-56
 URL: https://issues.apache.org/jira/browse/LUCENENET-56
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Trivial


When removing files, you don't need to add the path because it already exists 
in the filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-56) Incorrect file in TestLockFactory.RmDir()

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-56:
--

Attachment: TestLockFactory.patch

Here is a patch to only use the full filename. a few more nunit tests pass now

> Incorrect file in TestLockFactory.RmDir()
> -
>
> Key: LUCENENET-56
> URL: https://issues.apache.org/jira/browse/LUCENENET-56
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Trivial
> Attachments: TestLockFactory.patch
>
>
> When removing files, you don't need to add the path because it already exists 
> in the filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-57) DocHelper in Tests not creating UTF8 Cleanly

2007-08-13 Thread Jeff (JIRA)

DocHelper in Tests not creating UTF8 Cleanly


 Key: LUCENENET-57
 URL: https://issues.apache.org/jira/browse/LUCENENET-57
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
 Attachments: DocHelper.patch

DocHelper is used when performing unit tests. it is not encoding bytes 
correctly with UTF8.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-57) DocHelper in Tests not creating UTF8 Cleanly

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-57:
--

Attachment: DocHelper.patch

here is a patch that resolves the issue.

> DocHelper in Tests not creating UTF8 Cleanly
> 
>
> Key: LUCENENET-57
> URL: https://issues.apache.org/jira/browse/LUCENENET-57
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
> Attachments: DocHelper.patch
>
>
> DocHelper is used when performing unit tests. it is not encoding bytes 
> correctly with UTF8.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-58) Issue in CheckHits c# doesn't perform an Assert against a hashtable

2007-08-13 Thread Jeff (JIRA)

Issue in CheckHits c# doesn't perform an Assert against a hashtable
---

 Key: LUCENENET-58
 URL: https://issues.apache.org/jira/browse/LUCENENET-58
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor
 Attachments: CheckHits.patch

in CheckHits.CheckHitCollector() there is an assert to a hashtable. c# doesn't 
support this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-58) Issue in CheckHits c# doesn't perform an Assert against a hashtable

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-58:
--

Attachment: CheckHits.patch

This patch loops through the hashtable performing an Assert. Also.. This fixed 
about 100 NUnit failures.

> Issue in CheckHits c# doesn't perform an Assert against a hashtable
> ---
>
> Key: LUCENENET-58
> URL: https://issues.apache.org/jira/browse/LUCENENET-58
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Priority: Minor
> Attachments: CheckHits.patch
>
>
> in CheckHits.CheckHitCollector() there is an assert to a hashtable. c# 
> doesn't support this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-58) Issue in CheckHits c# doesn't perform an Assert against a hashtable

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-58:
--

Attachment: CheckHits.patch2

Here is a new patch that fixes the problem the original fixed and another 
problem in this file.

> Issue in CheckHits c# doesn't perform an Assert against a hashtable
> ---
>
> Key: LUCENENET-58
> URL: https://issues.apache.org/jira/browse/LUCENENET-58
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Priority: Minor
> Attachments: CheckHits.patch, CheckHits.patch2
>
>
> in CheckHits.CheckHitCollector() there is an assert to a hashtable. c# 
> doesn't support this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-59) QueryUtils has some invalid Asserts

2007-08-13 Thread Jeff (JIRA)

QueryUtils has some invalid Asserts
---

 Key: LUCENENET-59
 URL: https://issues.apache.org/jira/browse/LUCENENET-59
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor


NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query 
q2) and CheckUnequal(Query q1, Query q2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-59:
--

Attachment: QueryUtils.patch

This patch fixes NUnit failures in QueryUtils.

> QueryUtils has some invalid Asserts
> ---
>
> Key: LUCENENET-59
> URL: https://issues.apache.org/jira/browse/LUCENENET-59
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: QueryUtils.patch
>
>
> NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query 
> q2) and CheckUnequal(Query q1, Query q2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts

2007-08-13 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-59:
--

Attachment: QueryUtils.patch2

Opps.. first patch had more than just this fix in it. QueryUtils.patch2 has 
just this fix.

> QueryUtils has some invalid Asserts
> ---
>
> Key: LUCENENET-59
> URL: https://issues.apache.org/jira/browse/LUCENENET-59
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: QueryUtils.patch, QueryUtils.patch2
>
>
> NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query 
> q2) and CheckUnequal(Query q1, Query q2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-61) Issue testing Backwards Compatibility

2007-08-14 Thread Jeff (JIRA)

Issue testing Backwards Compatibility
-

 Key: LUCENENET-61
 URL: https://issues.apache.org/jira/browse/LUCENENET-61
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor


NUnit tests fail because of non c# compliant tests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-61) Issue testing Backwards Compatibility

2007-08-14 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-61:
--

Attachment: TestBackwardsCompatibility.patch

This test passes all Backward Compatibility NUnit Tests, removes the dependency 
of SupportClass and uses windows directory structure. It is also left disabled 
by default because of the dependency of SharpZipLib. 

> Issue testing Backwards Compatibility
> -
>
> Key: LUCENENET-61
> URL: https://issues.apache.org/jira/browse/LUCENENET-61
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: TestBackwardsCompatibility.patch
>
>
> NUnit tests fail because of non c# compliant tests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-62) IndexReader.IndexExists() Fails if directory doesn't exists.

2007-08-14 Thread Jeff (JIRA)

IndexReader.IndexExists() Fails if directory doesn't exists.


 Key: LUCENENET-62
 URL: https://issues.apache.org/jira/browse/LUCENENET-62
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor


There is no check to see if the Directory Exists before it checks for the index 
files, it just throws an error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-62) IndexReader.IndexExists() Fails if directory doesn't exists.

2007-08-14 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-62:
--

Attachment: IndexReader.patch

This patch checks to see if the directory exists before it checks the Index. If 
the directory doesn't exists, it returns false. 

if (System.IO.Directory.Exists(directory.FullName))
{
return 
SegmentInfos.GetCurrentSegmentGeneration(System.IO.Directory.GetFileSystemEntries(directory.FullName))
 != -1;
}
else
{
return false;
}

> IndexReader.IndexExists() Fails if directory doesn't exists.
> 
>
> Key: LUCENENET-62
> URL: https://issues.apache.org/jira/browse/LUCENENET-62
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Priority: Minor
> Attachments: IndexReader.patch
>
>
> There is no check to see if the Directory Exists before it checks for the 
> index files, it just throws an error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-63) FieldCacheImpl tries to parse a float in f format.

2007-08-14 Thread Jeff (JIRA)

FieldCacheImpl tries to parse a float in f format. 
---

 Key: LUCENENET-63
 URL: https://issues.apache.org/jira/browse/LUCENENET-63
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor
 Attachments: FieldCacheImpl.patch

C# doesn't support F format for parsing floats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-63) FieldCacheImpl tries to parse a float in f format.

2007-08-14 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-63:
--

Attachment: FieldCacheImpl.patch

This fix trim's the string for f's at the end. 

public virtual float ParseFloat(System.String value_Renamed)
{
return System.Single.Parse(value_Renamed.TrimEnd('f'));
}

> FieldCacheImpl tries to parse a float in f format. 
> ---
>
> Key: LUCENENET-63
> URL: https://issues.apache.org/jira/browse/LUCENENET-63
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Priority: Minor
> Attachments: FieldCacheImpl.patch
>
>
> C# doesn't support F format for parsing floats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-64) TestDateFilter incorrectly gets total milliseconds

2007-08-14 Thread Jeff (JIRA)

TestDateFilter incorrectly gets total milliseconds
--

 Key: LUCENENET-64
 URL: https://issues.apache.org/jira/browse/LUCENENET-64
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff


When performing TestBefore, it uses milliseconds instead of total milliseconds 
so it fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-64) TestDateFilter incorrectly gets total milliseconds

2007-08-14 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-64:
--

Attachment: TestDateFilter.patch

This patch uses total milliseconds. This is obtained by the following:

long now = (long)((TimeSpan)(System.DateTime.Now - 
System.DateTime.MinValue)).TotalMilliseconds;

> TestDateFilter incorrectly gets total milliseconds
> --
>
> Key: LUCENENET-64
> URL: https://issues.apache.org/jira/browse/LUCENENET-64
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
> Attachments: TestDateFilter.patch
>
>
> When performing TestBefore, it uses milliseconds instead of total 
> milliseconds so it fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-59) QueryUtils has some invalid Asserts

2007-08-14 Thread Jeff (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519761
 ] 

Jeff commented on LUCENENET-59:
---

I would agree.. however the only reason why it is failing is because an 
arraylist has a capacity of 4 instead of 2 which is failing the GetHashCode(). 
ToString() compares the full query so I figured that would be enough. There are 
no extra values in these fields, the extra values are null. I will find out 
where these values are removed and add a TrimToSize() to clean up this 
arraylist. This will make these match. I will take a look.

Jeff

> QueryUtils has some invalid Asserts
> ---
>
> Key: LUCENENET-59
> URL: https://issues.apache.org/jira/browse/LUCENENET-59
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: QueryUtils.patch, QueryUtils.patch2
>
>
> NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query 
> q2) and CheckUnequal(Query q1, Query q2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-69) FSIndexInput.isFDValid() not ported correctly

2007-08-15 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-69:
--

Attachment: FSDirectory.patch

Here is the patch that resolves this issue.

> FSIndexInput.isFDValid() not ported correctly
> -
>
> Key: LUCENENET-69
> URL: https://issues.apache.org/jira/browse/LUCENENET-69
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: FSDirectory.patch
>
>
> FSIndexInput.isFDValid() was not ported correctly because it doesn't 
> translate one to one. After looking into this a little more... file.getFD is 
> part of the FileInputStream class in java. This would be the base stream of 
> file. so if the basestream is null it would be invalid. if it is not null it 
> would be valid. After making this change all TestCompoundFile tests pass.
> public virtual bool IsFDValid()
> {
>   return (file.BaseStream != null);
>   //return true; // return file.getFD().valid();// {{Aroush-2.1 in 
> .NET, how do we do this?
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-69) FSIndexInput.isFDValid() not ported correctly

2007-08-15 Thread Jeff (JIRA)

FSIndexInput.isFDValid() not ported correctly
-

 Key: LUCENENET-69
 URL: https://issues.apache.org/jira/browse/LUCENENET-69
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor
 Attachments: FSDirectory.patch

FSIndexInput.isFDValid() was not ported correctly because it doesn't translate 
one to one. After looking into this a little more... file.getFD is part of the 
FileInputStream class in java. This would be the base stream of file. so if the 
basestream is null it would be invalid. if it is not null it would be valid. 
After making this change all TestCompoundFile tests pass.

public virtual bool IsFDValid()
{
return (file.BaseStream != null);
//return true; // return file.getFD().valid();// {{Aroush-2.1 in 
.NET, how do we do this?
}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts

2007-08-15 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-59:
--

Attachment: QueryUtils.patch3

Here is a new patch that only checks Query.ToString() since you can't Assert a 
Query. GetHashCode() has been left in. additional patches will trim the 
arraylist higher up in the tests.. once the Arraylist TrimToSize() has been 
performed they pass the tests.

The reason why CheckEqual and CheckUnequal hashes are failing is because when 
you clone an arraylist it sets the capacity to the length instead of keeping 
the capacity. This makes it invalid when comparing a hash.

> QueryUtils has some invalid Asserts
> ---
>
> Key: LUCENENET-59
> URL: https://issues.apache.org/jira/browse/LUCENENET-59
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Priority: Minor
> Attachments: QueryUtils.patch, QueryUtils.patch2, QueryUtils.patch3
>
>
> NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query 
> q2) and CheckUnequal(Query q1, Query q2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-59) QueryUtils has some invalid Asserts

2007-08-15 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-59:
--

Attachment: TestBoolean2.patch

This patch uses TrimToSize() in in the TestBooleans2 Tests on an array to 
remove the extra values that were added when the array was created.

((System.Collections.ArrayList)current.Clauses()).TrimToSize();

> QueryUtils has some invalid Asserts
> ---
>
> Key: LUCENENET-59
> URL: https://issues.apache.org/jira/browse/LUCENENET-59
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: QueryUtils.patch, QueryUtils.patch2, QueryUtils.patch3, 
> TestBoolean2.patch
>
>
> NUnit tests are failing because of bad Asserts in CheckEqual(Query q1, Query 
> q2) and CheckUnequal(Query q1, Query q2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-76) DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests

2007-08-16 Thread Jeff (JIRA)

DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests


 Key: LUCENENET-76
 URL: https://issues.apache.org/jira/browse/LUCENENET-76
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Jeff
Priority: Minor


DisjunctionMaxQuery.Clone() clones the DisjunctionMaxQuery then the disjuncts 
arraylist. When cloning the disjuncts arraylist it causes the unit tests to 
fail. disjuncts are cloned with cloning the query, so this is not needed.

public override System.Object Clone()
{
DisjunctionMaxQuery clone = (DisjunctionMaxQuery) base.Clone();
//clone.disjuncts = (System.Collections.ArrayList) 
this.disjuncts.Clone();
return clone;
}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-76) DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests

2007-08-16 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-76:
--

Attachment: DisjunctionMaxQuery.patch

> DisjunctionMaxQuery has unnecessary clone which causes it to fail unit tests
> 
>
> Key: LUCENENET-76
> URL: https://issues.apache.org/jira/browse/LUCENENET-76
> Project: Lucene.Net
>  Issue Type: Bug
>    Reporter: Jeff
>Priority: Minor
> Attachments: DisjunctionMaxQuery.patch
>
>
> DisjunctionMaxQuery.Clone() clones the DisjunctionMaxQuery then the disjuncts 
> arraylist. When cloning the disjuncts arraylist it causes the unit tests to 
> fail. disjuncts are cloned with cloning the query, so this is not needed.
> public override System.Object Clone()
> {
>   DisjunctionMaxQuery clone = (DisjunctionMaxQuery) base.Clone();
>   //clone.disjuncts = (System.Collections.ArrayList) 
> this.disjuncts.Clone();
>   return clone;
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-78) TestDateFilter.patch for nunit test

2007-08-16 Thread Jeff (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520322
 ] 

Jeff commented on LUCENENET-78:
---

is a tick a millisecond? I thought it was a nanosecond. LUCENENET-64 solves 
this problem using milliseconds.

Jeff

> TestDateFilter.patch for nunit test
> ---
>
> Key: LUCENENET-78
> URL: https://issues.apache.org/jira/browse/LUCENENET-78
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Digy
>Priority: Trivial
> Attachments: TestDateFilter.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (LUCENENET-61) Issue testing Backwards Compatibility

2007-08-16 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff reopened LUCENENET-61:
---


This patch didn't get applied correctly. for some reason the line:

entries = zipFile.Entries();

wasn't removed, but it was in the patch.


> Issue testing Backwards Compatibility
> -
>
> Key: LUCENENET-61
> URL: https://issues.apache.org/jira/browse/LUCENENET-61
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Assignee: George Aroush
>Priority: Minor
> Attachments: TestBackwardsCompatibility.patch
>
>
> NUnit tests fail because of non c# compliant tests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LUCENENET-61) Issue testing Backwards Compatibility

2007-08-16 Thread Jeff (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff updated LUCENENET-61:
--

Attachment: TestBackwardCompatibility.patch2

This patch removes the line:
  entries = zipFile.Entries();

Which didn't get removed in the first patch, and moves the line:

  Assert.Fail("Needs integration with SharpZipLib");

Inside the else of the conditional compilation symbol SHARP_ZIP_LIB




> Issue testing Backwards Compatibility
> -
>
> Key: LUCENENET-61
> URL: https://issues.apache.org/jira/browse/LUCENENET-61
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Assignee: George Aroush
>Priority: Minor
> Attachments: TestBackwardCompatibility.patch2, 
> TestBackwardsCompatibility.patch
>
>
> NUnit tests fail because of non c# compliant tests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (LUCENENET-61) Issue testing Backwards Compatibility

2007-08-16 Thread Jeff (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520443
 ] 

jdell edited comment on LUCENENET-61 at 8/16/07 7:57 PM:


This new patch TestBackwardCompatibility.patch2 removes the line:
  entries = zipFile.Entries();

Which didn't get removed in the first patch, and moves the line:

  Assert.Fail("Needs integration with SharpZipLib");

Inside the else of the conditional compilation symbol SHARP_ZIP_LIB




  was (Author: jdell):
This patch removes the line:
  entries = zipFile.Entries();

Which didn't get removed in the first patch, and moves the line:

  Assert.Fail("Needs integration with SharpZipLib");

Inside the else of the conditional compilation symbol SHARP_ZIP_LIB



  
> Issue testing Backwards Compatibility
> -
>
> Key: LUCENENET-61
> URL: https://issues.apache.org/jira/browse/LUCENENET-61
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Jeff
>Assignee: George Aroush
>Priority: Minor
> Attachments: TestBackwardCompatibility.patch2, 
> TestBackwardsCompatibility.patch
>
>
> NUnit tests fail because of non c# compliant tests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-63) FieldCacheImpl tries to parse a float in f format.

2007-08-20 Thread Jeff (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521297
 ] 

Jeff commented on LUCENENET-63:
---

This fixes unit tests in Search/TestSort/:

TestAutoSort
TestMultiSort
TestParallelMultiSort
TestReverseSort
TestSortCombos
TestTypesSort

My guess is that maybe it has always been a problem and the unit tests never 
existed to see the problem was there.. or maybe something else has change 
internally.. but System.Single.Parse(string) doesn't parse a string that ends 
in an 'f'. If that trailing 'f' is removed it is able to convert the string to 
a single (float) without any trouble.

Regards,
Jeff

> FieldCacheImpl tries to parse a float in f format. 
> ---
>
> Key: LUCENENET-63
> URL: https://issues.apache.org/jira/browse/LUCENENET-63
> Project: Lucene.Net
>      Issue Type: Bug
>Reporter: Jeff
>Priority: Minor
> Attachments: FieldCacheImpl.patch
>
>
> C# doesn't support F format for parsing floats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-95) Nunite test for Search.TestDisjunctionMaxQuery.TestBooleanOptionalWithTiebreaker

2007-09-11 Thread Jeff (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526520
 ] 

Jeff commented on LUCENENET-95:
---

For what it's worth... I just got the latest version from SVN and this test 
passed for me.

Windows XP Pro (SP2) on Dual Xeon 3Ghz CPU,
.Net v2.0.50727,
VS2005 Pro 8.0.50727.42

Jeff

> Nunite test for 
> Search.TestDisjunctionMaxQuery.TestBooleanOptionalWithTiebreaker
> 
>
> Key: LUCENENET-95
> URL: https://issues.apache.org/jira/browse/LUCENENET-95
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Digy
>Priority: Trivial
> Attachments: TryThis.patch
>
>
> Changing the line in TestDisjunctionMaxQuery.cs
> from
>public const float SCORE_COMP_THRESH = 0.f;
> to
>public const float SCORE_COMP_THRESH = 0.1f;
> solves the problem but i am not sure if an exact match is needed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENENET-192) Latest SVN build is about twice as slow running queries when compared to Java Lucene

2009-09-04 Thread Jeff Johnson (JIRA)

Latest SVN build is about twice as slow running queries when compared to Java 
Lucene


 Key: LUCENENET-192
 URL: https://issues.apache.org/jira/browse/LUCENENET-192
 Project: Lucene.Net
  Issue Type: Improvement
 Environment: Visual Studio 2008 with .NET framework 3.5
Reporter: Jeff Johnson
Priority: Minor


I have been using the java luke tool for comparing query times with java vs C# 
and the java query time is consistantly about twice as fast as the C# query 
time. The index I am testing was built in C# and contains 10 million documents. 
I have made sure to "warm up" the index by running the same query a few times 
before timing it again.

One example: Querying for a term that exists in every document takes about 1.3 
seconds in C# and 0.6 seconds in java. The total size of my index directory is 
about 1 GB.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-192) Latest SVN build is about twice as slow running queries when compared to Java Lucene

2009-09-08 Thread Jeff Johnson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752706#action_12752706
 ] 

Jeff Johnson commented on LUCENENET-192:


What about if you get 10M hits?

> Latest SVN build is about twice as slow running queries when compared to Java 
> Lucene
> 
>
> Key: LUCENENET-192
> URL: https://issues.apache.org/jira/browse/LUCENENET-192
> Project: Lucene.Net
>  Issue Type: Improvement
> Environment: Visual Studio 2008 with .NET framework 3.5
>Reporter: Jeff Johnson
>Priority: Minor
>
> I have been using the java luke tool for comparing query times with java vs 
> C# and the java query time is consistantly about twice as fast as the C# 
> query time. The index I am testing was built in C# and contains 10 million 
> documents. I have made sure to "warm up" the index by running the same query 
> a few times before timing it again.
> One example: Querying for a term that exists in every document takes about 
> 1.3 seconds in C# and 0.6 seconds in java. The total size of my index 
> directory is about 1 GB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

67 matches

Mail list logo