Re: C# API for Solr

2007-04-01 Thread Maarten . De . Vilder
Well, i think there will be a lot of people who will be very happy with 
this C# client.

grts,m 




"Jeff Rodenburg" <[EMAIL PROTECTED]> 
31/03/2007 18:00
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
C# API for Solr






We built our first search system architecture around Lucene.Net back in 
2005
and continued to make modifications through 2006.  We quickly learned that
search management is so much more than query algorithms and indexing
choices.  We were not readily prepared for the operational overhead that 
our
Lucene-based search required: always-on availability, fast response times,
batch and real-time updates, etc.

Fast forward to 2007.  Our front-end is Microsoft-based, but we needed to
support parallel development on non-Microsoft architecture, and thus 
needed
a cross-platform search system.  Hello Solr!  We've transitioned our 
search
system to Solr with a Linux/Tomcat back-end, and it's been a champ.  We 
now
use solr not only for standard keyword search, but also to drive queries 
for
lots of different content sections on our site.  Solr has moved beyond
mission critical in our operation.

As we've proceeded, we've built out a nice C# client library to abstract 
the
interaction from C# to Solr.  It's mostly generic and designed for

extensibilty.  With a few modifications, this could be a stand-alone 
library
that works for others.

I have clearance from the organization to contribute our library to the
community if there's interest.  I'd first like to gauge the interest of
everyone before doing so; please reply if you do.

cheers,
jeff r.



Re: Question: how to config memory with SolrPerformanceFactor

2007-04-01 Thread James liu

Thk u Chris.

i just wanna know how to manage master and slaver hardware, for example ram
and disc 's  size.



2007/3/31, Chris Hostetter <[EMAIL PROTECTED]>:



: if we mv index.tmp$$ index,,,is it truly deleted?

it's not treuly deleted until no running processes have the file handles
open anymore.

: if we notify solr open new searcher, solr just redirect to new index,,

that will cause Solr to close the existing filehandles and open new ones,
so then the files will be deleted.

: Do old index cost memory size and hard disc space?

if they are open, then yes they cost memory -- they cost disk until truely
deleted.

: if it is not cached in memory,, does it means we no warry about
OutOfMemory
: when index file increase.

i dont' understand your question.

: if it cached in memory, how to limit it? use autowarmCount?

autowarmCount is a solr cache option .. solr caches are for very specific
things -- they have no control over how much memory LUcene uses for the
bulk of your index.

changing an autowarnCount actually has very little to do with the amount
of total memory Solr uses -- the *size* of each of your caches is a much
more significant factor in how much ram is used by the process.



-Hoss





--
regards
jl


Re: C# API for Solr

2007-04-01 Thread Jeff Rodenburg

Ryan - I'm working on cleanup to release this thing for the world to enjoy.

-- j

On 3/31/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:


Yes yes!


On 3/31/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> We built our first search system architecture around Lucene.Net back in
2005
> and continued to make modifications through 2006.  We quickly learned
that
> search management is so much more than query algorithms and indexing
> choices.  We were not readily prepared for the operational overhead that
our
> Lucene-based search required: always-on availability, fast response
times,
> batch and real-time updates, etc.
>
> Fast forward to 2007.  Our front-end is Microsoft-based, but we needed
to
> support parallel development on non-Microsoft architecture, and thus
needed
> a cross-platform search system.  Hello Solr!  We've transitioned our
search
> system to Solr with a Linux/Tomcat back-end, and it's been a champ.  We
now
> use solr not only for standard keyword search, but also to drive queries
for
> lots of different content sections on our site.  Solr has moved beyond
> mission critical in our operation.
>
> As we've proceeded, we've built out a nice C# client library to abstract
the
> interaction from C# to Solr.  It's mostly generic and designed for
> extensibilty.  With a few modifications, this could be a stand-alone
library
> that works for others.
>
> I have clearance from the organization to contribute our library to the
> community if there's interest.  I'd first like to gauge the interest of
> everyone before doing so; please reply if you do.
>
> cheers,
> jeff r.
>



Re: C# API for Solr

2007-04-01 Thread Paul Borgermans

I agree fully, for me the PHP client API in the works should be including a
good set of unit tests.

The channel dilemma also shows up, as in the PHP ecosystem there are at
least 2 major ones to hook up.

Personally I would go for eZ components as this already provides a good
framework with unit tests. But I also think having something generic (and
supports PHP4.x) that makes its way into Solr client API's independently is
good to increase the adoption of Solr anyhow.

Just a few early monday morning thoughts ...
Paul



On 4/1/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:


What would make things consistent for the client api's is a prescribed set
of implementations for a solr release.  For example, executing searches
with
these parameters, support for facets requires those parameters, updates
should be called in this manner, etc.  For lack of a better term, a
loosely-coupled interface definition.  Those requirements could then be
versioned, and the various api's could advertise themselves as solr
1.0compliant, solr
1.1 compliant, and so on.  The solr release dictates the requirements for
compliance; the api maintainer is responsible for meeting those
requirements.  This would also be handy when certain features are
deprecated, i.e. when the /update url is changed.

Regarding C#, this would be easy enough to implement.  There are common
community methods for building/compilation, test libraries, and help
documentation, so doing things consistently with Erik and the solrb
library
works for C# as well (and I assume most other languages.)

-- j


On 3/31/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> On a related note: We've still never really figured out how to deal with
> integrating compilation or testing for client code into our main and
build
> system -- or for that matter how we should distribute them when we do
our
> next release, so if you have any suggestions regarding your C# client by
> all means speak up ... in the mean time we can do the same thing Erik
> started with solrb and flare: an isolated build system that makes sense
to
> the people who understand that language and rely on community to cacth
any
> changes to Solr that might break clients.
>
> -Hoss
>
>



Re: Re[2]: Update doc boost and field boost w/o reposting?

2007-04-01 Thread Ryan McKinley

Yes, it is only solr - and will have a normal HTTP interface, no lucene.

But as i said, the catch is that *all* fields must be stored, not only
the ones you want to change.  Solr will pull the document out of the
index, modify it and put it back - it can only pull out stored fields
so you must store everything...



On 4/1/07, Jack L <[EMAIL PROTECTED]> wrote:

Hello Ryan,

Interesting. So it's solr only and does not require new Lucene
functionality? I think storing the fields that may change in
the future is ok.

--
Best regards,
Jack

Saturday, March 31, 2007, 5:50:20 PM, you wrote:

> Lucene does not have any way to modify existing fields, so solr can't
> do it either...  (document boosts are stored as part of the field)

> In http://issues.apache.org/jira/browse/SOLR-139, I'm working on a
> convenience function to let the client modify an existing solr
> document - the one catch is that all fields must be stored so they can
> be extracted and re-indexed on the server side.  This may not help
> your performance concern, but it may be easier to deal with.

> ryan


> On 3/31/07, Jack L <[EMAIL PROTECTED]> wrote:
>> I understand that I'm supposed to delete the old record and
>> re-post in order to update a document. But in many cases,
>> it takes time to extract data (from a database, etc.) and all
>> I want to change is the document boost. I wonder if it's possible
>> to adjust the document boost without deleting and re-posting
>> a whole document?
>>
>> --
>> Best regards,
>> Jack
>>
>>




Re[2]: Update doc boost and field boost w/o reposting?

2007-04-01 Thread Jack L
Hello Ryan,

Interesting. So it's solr only and does not require new Lucene
functionality? I think storing the fields that may change in
the future is ok.

-- 
Best regards,
Jack

Saturday, March 31, 2007, 5:50:20 PM, you wrote:

> Lucene does not have any way to modify existing fields, so solr can't
> do it either...  (document boosts are stored as part of the field)

> In http://issues.apache.org/jira/browse/SOLR-139, I'm working on a
> convenience function to let the client modify an existing solr
> document - the one catch is that all fields must be stored so they can
> be extracted and re-indexed on the server side.  This may not help
> your performance concern, but it may be easier to deal with.

> ryan


> On 3/31/07, Jack L <[EMAIL PROTECTED]> wrote:
>> I understand that I'm supposed to delete the old record and
>> re-post in order to update a document. But in many cases,
>> it takes time to extract data (from a database, etc.) and all
>> I want to change is the document boost. I wonder if it's possible
>> to adjust the document boost without deleting and re-posting
>> a whole document?
>>
>> --
>> Best regards,
>> Jack
>>
>>



Re: C# API for Solr

2007-04-01 Thread Jeff Rodenburg

What would make things consistent for the client api's is a prescribed set
of implementations for a solr release.  For example, executing searches with
these parameters, support for facets requires those parameters, updates
should be called in this manner, etc.  For lack of a better term, a
loosely-coupled interface definition.  Those requirements could then be
versioned, and the various api's could advertise themselves as solr
1.0compliant, solr
1.1 compliant, and so on.  The solr release dictates the requirements for
compliance; the api maintainer is responsible for meeting those
requirements.  This would also be handy when certain features are
deprecated, i.e. when the /update url is changed.

Regarding C#, this would be easy enough to implement.  There are common
community methods for building/compilation, test libraries, and help
documentation, so doing things consistently with Erik and the solrb library
works for C# as well (and I assume most other languages.)

-- j


On 3/31/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:



On a related note: We've still never really figured out how to deal with
integrating compilation or testing for client code into our main and build
system -- or for that matter how we should distribute them when we do our
next release, so if you have any suggestions regarding your C# client by
all means speak up ... in the mean time we can do the same thing Erik
started with solrb and flare: an isolated build system that makes sense to
the people who understand that language and rely on community to cacth any
changes to Solr that might break clients.

-Hoss




Troubleshooting java heap out-of-memory

2007-04-01 Thread Jeff Rodenburg

I've read through the list entries here, the Lucene list, and the wiki docs
and am not resolving a major pain point  for us.  We've been trying to
determine what could possibly cause us to hit this in our given environment,
and am hoping more eyes on this issue can help.

Our scenario: 150MB index, 14 documents, read/write servers in place
using standard replication.  Running Tomcat 5.5.17 on Redhat Enterprise
Linux 4.  Java configured to start with -Xmx1024m.  We encounter java heap
out-of-memory issues on the read server at staggered times, but usually once
every 48 hours.  Search request load is roughly 2 searches every 3 seconds,
with some spikes here or there.  We are using facets: 3 are based on type
integer, one is based on type string.  We are using sorts: 1 is based on
type sint, 2 are based on type date.  Caching is disabled.  Solr bits are
also from September 2006.

Is there anything in that configuration that we should interrogate?

thanks,
j