Re: Instructables on solr

2007-04-05 Thread Ryan McKinley


Can you elaborate on running SOLR-20 with a hibernate-solr auto link?  You 
mean you listen to Hibernate events and use them to keep the index served by Solr in sync 
with the DB?



I built a HibernateEventWatcher modeled after the compass framework
that automatically gets notified on insert/update/delete.  Anything
saved that the SolrDocumentBuilder knows what to do with gets sent
to solr automatically.  This way the solr index stays in sync with the
SQL index without any explict work.

A first pass at this is here:
http://solrstuff.org/svn/solrj-hibernate/src/org/apache/solr/client/solrj/hibernate/SolrSync.java

lots of that changed in the production code... when SOLR-20
stabalizes, I'll put the good bits back in and hopefully post it in a
'contrib' section.



Also, pooling for 30 seconds on the client side... - are you referring to 
keeping data cached in the Solr client for 30 seconds and every 30 second sending it to 
Solr for indexing?



We are currently running with a single (no replication or load
balancing) solr server.  With multiple webapps pointing to it.  Rather
then manage commit timing on the client-side, we have autoCommit set
to 1second.  The multiple webapps can't start overlapping commits.

Since commit flushes the caches and forces you to reopen the
searchers, we want to do it as little as possible.  This is required
for instant access to uploaded images, but not required for stuff that
can take a bit longer...  for the other stuff the client keeps a queue
(I called it pooling) of stuff to send.  Every 30 secs, it sends it to
solr in a bulk update.  That time could be longer, I found it was the
minimum time to avoid multiple unnecessary commits for our usage
patterns.



If so, why not index continuously, either in real-time or in some background thread that 
feeds off of a to index queue?



yes, we have a background thread that queues changes and sends them all at once.


ryan


Re: Does solr support Multi index and return by score and datetime

2007-04-05 Thread James liu

Anyone have problem like this and how to solve it?




2007/4/5, James liu [EMAIL PROTECTED]:




2007/4/5, Mike Klaas [EMAIL PROTECTED]:

 On 4/4/07, James liu [EMAIL PROTECTED] wrote:

I think it is part of full-text search.
 
  I think query slavers and combin result by score should be the part of
 solr.
 
  I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
  but i wanna use solr and i like it.
 
  Now i wanna find a good method to solve it by using solr and less
  coding.(More code will cost more time to write and test.)

 I agree that it would be an excellent addition to Solr, but it is a
 major undertaking, and so I wouldn't wait around for it if it is
 important to you.  Solr devs have code to write and test too :).

 If you document
 distribution is uniform random, then the norms converge to
 approximately equal values anyway.
   
I don't know it.
 
  I don't know why u say document distribution. Does it mean if i
 write code
  independently, i will consider it?

 One of the complexities of queries multiple remote Solr/lucene
 instances is that the scores are not directly comparable as the term
 idf scores will be different.  However, in practical situations, this
 can be glossed over.

 This is the basic algorithm for single-pass querying multiple solr
 slaves.  Say you want results N to N + M (e.g 10 to 20).

 1. query each solr instance independently for N+M documents for the
 given query.  This should be done asynchronously (or you could spawn a
 thread per server).
 2. wait for all responses (or for a certain timeout)
 3. put all returned documents into an array, and reverse sort by score
 4. select documents [N, N+M) from this array.

 This is a relatively simple task.  It gets more complicated once
 multiple passes, idf compensation, deduplication, etc. are added.

 -Mike


Thks Mike.

I find it more complicate than i think.

Is it the only way to solve my problem:

I have a project, it have 100g data, now i have 3-4 server for solr.






--
regards
jl





--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-05 Thread James liu

2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]:


James,

It looks like people already answered your questions.
Split your big index.
Put it on multiple servers.
Put Solr on each of those servers.
Write an application that searches multiple Solr instances in parallel.
Get N results from each, combine them, order by score.



How to cache its result? I hesitate that It will cache many data.


As far as I know, this is the best you can do with what is available from

Solr today.
For anything else, you'll have to roll up your sleeves and dig into the
code.

Good luck!

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:18:30 AM
Subject: Re: Does solr support Multi index and return by score and
datetime

Anyone have problem like this and how to solve it?




2007/4/5, James liu [EMAIL PROTECTED]:



 2007/4/5, Mike Klaas [EMAIL PROTECTED]:
 
  On 4/4/07, James liu [EMAIL PROTECTED] wrote:
 
 I think it is part of full-text search.
  
   I think query slavers and combin result by score should be the part
of
  solr.
  
   I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
   but i wanna use solr and i like it.
  
   Now i wanna find a good method to solve it by using solr and less
   coding.(More code will cost more time to write and test.)
 
  I agree that it would be an excellent addition to Solr, but it is a
  major undertaking, and so I wouldn't wait around for it if it is
  important to you.  Solr devs have code to write and test too :).
 
  If you document
  distribution is uniform random, then the norms converge to
  approximately equal values anyway.

 I don't know it.
  
   I don't know why u say document distribution. Does it mean if i
  write code
   independently, i will consider it?
 
  One of the complexities of queries multiple remote Solr/lucene
  instances is that the scores are not directly comparable as the term
  idf scores will be different.  However, in practical situations, this
  can be glossed over.
 
  This is the basic algorithm for single-pass querying multiple solr
  slaves.  Say you want results N to N + M (e.g 10 to 20).
 
  1. query each solr instance independently for N+M documents for the
  given query.  This should be done asynchronously (or you could spawn a
  thread per server).
  2. wait for all responses (or for a certain timeout)
  3. put all returned documents into an array, and reverse sort by score
  4. select documents [N, N+M) from this array.
 
  This is a relatively simple task.  It gets more complicated once
  multiple passes, idf compensation, deduplication, etc. are added.
 
  -Mike
 

 Thks Mike.

 I find it more complicate than i think.

 Is it the only way to solve my problem:

 I have a project, it have 100g data, now i have 3-4 server for solr.






 --
 regards
 jl




--
regards
jl







--
regards
jl


Re: Instructables on solr

2007-04-05 Thread Ryan McKinley

On 4/4/07, James liu [EMAIL PROTECTED] wrote:

I wanna know how to solve big index which seems u have big index.



As far as lucene is concerned, we have a relatively small index.
~300K docs  (and growing!)

I haven't even needed to tune things much - it is mostly default
settings from the example solrconfig.xml with cache sizes bumped up.
The performance has been fine and load average rarely breaks 1.0


Re: Instructables on solr

2007-04-05 Thread Bertrand Delacretaz

On 4/4/07, Ryan McKinley [EMAIL PROTECTED] wrote:


...We have been running solr for months as a band-aid, this release
integrates solr deeply...


Awesome - thanks for sharing this!

If you don't mind, it'd be cool to add some info to
http://wiki.apache.org/solr/PublicServers

-Bertrand


Re: Find docs close to a date

2007-04-05 Thread Chris Hostetter
: My docs have a date field and I need to find the two docs with
: a date which is closest to 2007-03-25T17:22:00Z.
:
: I use the following two queries to accomplish the task.
:
: date:{* TO 2007-03-25T17:22:00Z};date descstart=0rows=1
: date:{2007-03-25T17:22:00Z TO *};date ascstart=0rows=1
:
: However I need to make a lot of queries like these. I'm wondering if
: these kinds queries are expensive. Are there better alternatives for my
: task?

off the top of my head, i can't think of any better way to do what you are
doing out of hte box with Solr ... if you wanted to write a bit of
custom java code, a FunctionQuery ValueSource that made a bell curve
arround a particular value would be a very cool/vlearn/reusable solution
to this problem.

One thing to watch out for is that this gives you the doc with the latest
date prior to your input, and the doc with the earliest date after your
input -- which is not exactly the two docs with a date which is closest
to ... the two docs with the closest dates might have dates both below
hte input, or both above the input.




-Hoss



Re: Does solr support Multi index and return by score and datetime

2007-04-05 Thread Otis Gospodnetic
How to cache results?
Put them in a cache like memcached, for example, keyed off of query (can't 
exceed 250 bytes in the case of memcached, so you'll want to pack that query, 
perhaps use its MD5 as the cache key)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:57:07 AM
Subject: Re: Does solr support Multi index and return by score and datetime

2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]:

 James,

 It looks like people already answered your questions.
 Split your big index.
 Put it on multiple servers.
 Put Solr on each of those servers.
 Write an application that searches multiple Solr instances in parallel.
 Get N results from each, combine them, order by score.


How to cache its result? I hesitate that It will cache many data.


As far as I know, this is the best you can do with what is available from
 Solr today.
 For anything else, you'll have to roll up your sleeves and dig into the
 code.

 Good luck!

 Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

 - Original Message 
 From: James liu [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 5, 2007 1:18:30 AM
 Subject: Re: Does solr support Multi index and return by score and
 datetime

 Anyone have problem like this and how to solve it?




 2007/4/5, James liu [EMAIL PROTECTED]:
 
 
 
  2007/4/5, Mike Klaas [EMAIL PROTECTED]:
  
   On 4/4/07, James liu [EMAIL PROTECTED] wrote:
  
  I think it is part of full-text search.
   
I think query slavers and combin result by score should be the part
 of
   solr.
   
I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
but i wanna use solr and i like it.
   
Now i wanna find a good method to solve it by using solr and less
coding.(More code will cost more time to write and test.)
  
   I agree that it would be an excellent addition to Solr, but it is a
   major undertaking, and so I wouldn't wait around for it if it is
   important to you.  Solr devs have code to write and test too :).
  
   If you document
   distribution is uniform random, then the norms converge to
   approximately equal values anyway.
 
  I don't know it.
   
I don't know why u say document distribution. Does it mean if i
   write code
independently, i will consider it?
  
   One of the complexities of queries multiple remote Solr/lucene
   instances is that the scores are not directly comparable as the term
   idf scores will be different.  However, in practical situations, this
   can be glossed over.
  
   This is the basic algorithm for single-pass querying multiple solr
   slaves.  Say you want results N to N + M (e.g 10 to 20).
  
   1. query each solr instance independently for N+M documents for the
   given query.  This should be done asynchronously (or you could spawn a
   thread per server).
   2. wait for all responses (or for a certain timeout)
   3. put all returned documents into an array, and reverse sort by score
   4. select documents [N, N+M) from this array.
  
   This is a relatively simple task.  It gets more complicated once
   multiple passes, idf compensation, deduplication, etc. are added.
  
   -Mike
  
 
  Thks Mike.
 
  I find it more complicate than i think.
 
  Is it the only way to solve my problem:
 
  I have a project, it have 100g data, now i have 3-4 server for solr.
 
 
 
 
 
 
  --
  regards
  jl




 --
 regards
 jl






-- 
regards
jl





Re: Does solr support Multi index and return by score and datetime

2007-04-05 Thread James liu

2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]:


How to cache results?
Put them in a cache like memcached, for example, keyed off of query (can't
exceed 250 bytes in the case of memcached, so you'll want to pack that
query, perhaps use its MD5 as the cache key)



Yes,i use memcached and key is md5 query. thk ur advice.
I decrease count of documents because of ram is only 1g.

I think master use tomcat which use 20 solr instance. and slaveA and SlaveB
have 10 solr instance.
Web Server use lighttpd+php+memcached.

It is my design. but not test. Maybe u can show me ur experience.

Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu [EMAIL PROTECTED] 
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:57:07 AM
Subject: Re: Does solr support Multi index and return by score and
datetime

2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]:

 James,

 It looks like people already answered your questions.
 Split your big index.
 Put it on multiple servers.
 Put Solr on each of those servers.
 Write an application that searches multiple Solr instances in parallel.
 Get N results from each, combine them, order by score.


How to cache its result? I hesitate that It will cache many data.


As far as I know, this is the best you can do with what is available from
 Solr today.
 For anything else, you'll have to roll up your sleeves and dig into the
 code.

 Good luck!

 Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

 - Original Message 
 From: James liu [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 5, 2007 1:18:30 AM
 Subject: Re: Does solr support Multi index and return by score and
 datetime

 Anyone have problem like this and how to solve it?




 2007/4/5, James liu [EMAIL PROTECTED]:
 
 
 
  2007/4/5, Mike Klaas  [EMAIL PROTECTED]:
  
   On 4/4/07, James liu [EMAIL PROTECTED] wrote:
  
  I think it is part of full-text search.
   
I think query slavers and combin result by score should be the
part
 of
   solr.
   
I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
but i wanna use solr and i like it.
   
Now i wanna find a good method to solve it by using solr and less
coding.(More code will cost more time to write and test.)
  
   I agree that it would be an excellent addition to Solr, but it is a
   major undertaking, and so I wouldn't wait around for it if it is
   important to you.  Solr devs have code to write and test too :).
  
   If you document
   distribution is uniform random, then the norms converge to
   approximately equal values anyway.
 
  I don't know it.
   
I don't know why u say document distribution. Does it mean if i
   write code
independently, i will consider it?
  
   One of the complexities of queries multiple remote Solr/lucene
   instances is that the scores are not directly comparable as the term

   idf scores will be different.  However, in practical situations,
this
   can be glossed over.
  
   This is the basic algorithm for single-pass querying multiple solr
   slaves.  Say you want results N to N + M (e.g 10 to 20).
  
   1. query each solr instance independently for N+M documents for the
   given query.  This should be done asynchronously (or you could spawn
a
   thread per server).
   2. wait for all responses (or for a certain timeout)
   3. put all returned documents into an array, and reverse sort by
score
   4. select documents [N, N+M) from this array.
  
   This is a relatively simple task.  It gets more complicated once
   multiple passes, idf compensation, deduplication, etc. are added.
  
   -Mike
  
 
  Thks Mike.
 
  I find it more complicate than i think.
 
  Is it the only way to solve my problem:
 
  I have a project, it have 100g data, now i have 3-4 server for solr.
 
 
 
 
 
 
  --
  regards
  jl




 --
 regards
 jl






--
regards
jl







--
regards
jl


SEVERE: Error filterStart

2007-04-05 Thread Andrew Nagy
Hello, I downloaded the latest nightly snapshot of Solr and replaced my 
existing war with the new one.  Once I restarted tomcat, I get this error:


SEVERE: Error filterStart
Apr 5, 2007 10:11:28 AM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors

Any ideas as to what is causing this?  I deleted my index to start with 
a clean slate but I did not change any of my config files, do I need to 
update these or are the backwards compatible?


Thanks!
Andrew



Re: Find docs close to a date

2007-04-05 Thread nick19701


Chris Hostetter wrote:
 
 off the top of my head, i can't think of any better way to do what you are
 doing out of hte box with Solr ... if you wanted to write a bit of
 custom java code, a FunctionQuery ValueSource that made a bell curve
 arround a particular value would be a very cool/vlearn/reusable solution
 to this problem.
 
 One thing to watch out for is that this gives you the doc with the latest
 date prior to your input, and the doc with the earliest date after your
 input -- which is not exactly the two docs with a date which is closest
 to ... the two docs with the closest dates might have dates both below
 hte input, or both above the input.
 
 

Yes, I'm looking for the doc with the latest date prior to my input, 
and the doc with the earliest date after my input, not the two docs 
with a date which is closest to my input.

One application of this is to determine email's signature.
One person's email signature may change over time. So I only want
to compare one email with the emails immediately before or after
this one.

I haven't written any java code yet. But maybe someday I will
try for this one.

-- 
View this message in context: 
http://www.nabble.com/Find-docs-close-to-a-date-tf3507295.html#a9858867
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SEVERE: Error filterStart

2007-04-05 Thread Chris Hostetter

: SEVERE: Error filterStart
: Apr 5, 2007 10:11:28 AM org.apache.catalina.core.StandardContext start
: SEVERE: Context [/solr] startup failed due to previous errors

no clue at all ... the string filterStart doesn't appear anywhere in teh
solr code base at all as far as i can see.

is it possible that's a Tomcat error message relating to a problem with a
ServletFilter? (possibly the DispatchFilter in Solr) are there any earlier
messages that look suspicious?





-Hoss



Re: SEVERE: Error filterStart

2007-04-05 Thread Walter Underwood
This does seem to be a Tomcat config problem. Start with this search
to find other e-mail strings on this:

  http://www.google.com/search?q=SEVERE%3A+Error+filterStart

wunder

On 4/5/07 11:43 AM, Chris Hostetter [EMAIL PROTECTED] wrote:

 
 : SEVERE: Error filterStart
 : Apr 5, 2007 10:11:28 AM org.apache.catalina.core.StandardContext start
 : SEVERE: Context [/solr] startup failed due to previous errors
 
 no clue at all ... the string filterStart doesn't appear anywhere in teh
 solr code base at all as far as i can see.
 
 is it possible that's a Tomcat error message relating to a problem with a
 ServletFilter? (possibly the DispatchFilter in Solr) are there any earlier
 messages that look suspicious?
 
 
 
 
 
 -Hoss
 



Post in JSON format?

2007-04-05 Thread Jack L
Hello solr-user,

Query result in JSON format is really convenient, especially for
Python clients. Is there any plan to allow posting in JSON format?

-- 
Best regards,
Jack



Re: Post in JSON format?

2007-04-05 Thread Ryan McKinley

Everything is in place to make it an easy task.  A CSV update handler
was recently committed, a JSON loader should be a relatively
straightforward task.  But, I don't think anyone is working on it
yet...


On 4/5/07, Jack L [EMAIL PROTECTED] wrote:

Hello solr-user,

Query result in JSON format is really convenient, especially for
Python clients. Is there any plan to allow posting in JSON format?

--
Best regards,
Jack




Re: C# API for Solr

2007-04-05 Thread Jeff Rodenburg

I'm working on it right now.  The library is largely done, but I need to add
some documentation and a few examples for usage.

No promises, but I hope to have something available in the next few days.

-- j

On 4/5/07, Mike Austin [EMAIL PROTECTED] wrote:


I would be very interested in this. Any idea on when this will be
available?

Thanks

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Monday, April 02, 2007 1:44 AM
To: solr-user@lucene.apache.org
Subject: Re: C# API for Solr


Well, i think there will be a lot of people who will be very happy with
this C# client.

grts,m




Jeff Rodenburg [EMAIL PROTECTED]
31/03/2007 18:00
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
C# API for Solr






We built our first search system architecture around Lucene.Net back in
2005
and continued to make modifications through 2006.  We quickly learned that
search management is so much more than query algorithms and indexing
choices.  We were not readily prepared for the operational overhead that
our
Lucene-based search required: always-on availability, fast response times,
batch and real-time updates, etc.

Fast forward to 2007.  Our front-end is Microsoft-based, but we needed to
support parallel development on non-Microsoft architecture, and thus
needed
a cross-platform search system.  Hello Solr!  We've transitioned our
search
system to Solr with a Linux/Tomcat back-end, and it's been a champ.  We
now
use solr not only for standard keyword search, but also to drive queries
for
lots of different content sections on our site.  Solr has moved beyond
mission critical in our operation.

As we've proceeded, we've built out a nice C# client library to abstract
the
interaction from C# to Solr.  It's mostly generic and designed for

extensibilty.  With a few modifications, this could be a stand-alone
library
that works for others.

I have clearance from the organization to contribute our library to the
community if there's interest.  I'd first like to gauge the interest of
everyone before doing so; please reply if you do.

cheers,
jeff r.