Re: Instructables on solr
Can you elaborate on running SOLR-20 with a hibernate-solr auto link? You mean you listen to Hibernate events and use them to keep the index served by Solr in sync with the DB? I built a HibernateEventWatcher modeled after the compass framework that automatically gets notified on insert/update/delete. Anything saved that the SolrDocumentBuilder knows what to do with gets sent to solr automatically. This way the solr index stays in sync with the SQL index without any explict work. A first pass at this is here: http://solrstuff.org/svn/solrj-hibernate/src/org/apache/solr/client/solrj/hibernate/SolrSync.java lots of that changed in the production code... when SOLR-20 stabalizes, I'll put the good bits back in and hopefully post it in a 'contrib' section. Also, pooling for 30 seconds on the client side... - are you referring to keeping data cached in the Solr client for 30 seconds and every 30 second sending it to Solr for indexing? We are currently running with a single (no replication or load balancing) solr server. With multiple webapps pointing to it. Rather then manage commit timing on the client-side, we have autoCommit set to 1second. The multiple webapps can't start overlapping commits. Since commit flushes the caches and forces you to reopen the searchers, we want to do it as little as possible. This is required for instant access to uploaded images, but not required for stuff that can take a bit longer... for the other stuff the client keeps a queue (I called it pooling) of stuff to send. Every 30 secs, it sends it to solr in a bulk update. That time could be longer, I found it was the minimum time to avoid multiple unnecessary commits for our usage patterns. If so, why not index continuously, either in real-time or in some background thread that feeds off of a to index queue? yes, we have a background thread that queues changes and sends them all at once. ryan
Re: Does solr support Multi index and return by score and datetime
Anyone have problem like this and how to solve it? 2007/4/5, James liu [EMAIL PROTECTED]: 2007/4/5, Mike Klaas [EMAIL PROTECTED]: On 4/4/07, James liu [EMAIL PROTECTED] wrote: I think it is part of full-text search. I think query slavers and combin result by score should be the part of solr. I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations but i wanna use solr and i like it. Now i wanna find a good method to solve it by using solr and less coding.(More code will cost more time to write and test.) I agree that it would be an excellent addition to Solr, but it is a major undertaking, and so I wouldn't wait around for it if it is important to you. Solr devs have code to write and test too :). If you document distribution is uniform random, then the norms converge to approximately equal values anyway. I don't know it. I don't know why u say document distribution. Does it mean if i write code independently, i will consider it? One of the complexities of queries multiple remote Solr/lucene instances is that the scores are not directly comparable as the term idf scores will be different. However, in practical situations, this can be glossed over. This is the basic algorithm for single-pass querying multiple solr slaves. Say you want results N to N + M (e.g 10 to 20). 1. query each solr instance independently for N+M documents for the given query. This should be done asynchronously (or you could spawn a thread per server). 2. wait for all responses (or for a certain timeout) 3. put all returned documents into an array, and reverse sort by score 4. select documents [N, N+M) from this array. This is a relatively simple task. It gets more complicated once multiple passes, idf compensation, deduplication, etc. are added. -Mike Thks Mike. I find it more complicate than i think. Is it the only way to solve my problem: I have a project, it have 100g data, now i have 3-4 server for solr. -- regards jl -- regards jl
Re: Does solr support Multi index and return by score and datetime
2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]: James, It looks like people already answered your questions. Split your big index. Put it on multiple servers. Put Solr on each of those servers. Write an application that searches multiple Solr instances in parallel. Get N results from each, combine them, order by score. How to cache its result? I hesitate that It will cache many data. As far as I know, this is the best you can do with what is available from Solr today. For anything else, you'll have to roll up your sleeves and dig into the code. Good luck! Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 5, 2007 1:18:30 AM Subject: Re: Does solr support Multi index and return by score and datetime Anyone have problem like this and how to solve it? 2007/4/5, James liu [EMAIL PROTECTED]: 2007/4/5, Mike Klaas [EMAIL PROTECTED]: On 4/4/07, James liu [EMAIL PROTECTED] wrote: I think it is part of full-text search. I think query slavers and combin result by score should be the part of solr. I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations but i wanna use solr and i like it. Now i wanna find a good method to solve it by using solr and less coding.(More code will cost more time to write and test.) I agree that it would be an excellent addition to Solr, but it is a major undertaking, and so I wouldn't wait around for it if it is important to you. Solr devs have code to write and test too :). If you document distribution is uniform random, then the norms converge to approximately equal values anyway. I don't know it. I don't know why u say document distribution. Does it mean if i write code independently, i will consider it? One of the complexities of queries multiple remote Solr/lucene instances is that the scores are not directly comparable as the term idf scores will be different. However, in practical situations, this can be glossed over. This is the basic algorithm for single-pass querying multiple solr slaves. Say you want results N to N + M (e.g 10 to 20). 1. query each solr instance independently for N+M documents for the given query. This should be done asynchronously (or you could spawn a thread per server). 2. wait for all responses (or for a certain timeout) 3. put all returned documents into an array, and reverse sort by score 4. select documents [N, N+M) from this array. This is a relatively simple task. It gets more complicated once multiple passes, idf compensation, deduplication, etc. are added. -Mike Thks Mike. I find it more complicate than i think. Is it the only way to solve my problem: I have a project, it have 100g data, now i have 3-4 server for solr. -- regards jl -- regards jl -- regards jl
Re: Instructables on solr
On 4/4/07, James liu [EMAIL PROTECTED] wrote: I wanna know how to solve big index which seems u have big index. As far as lucene is concerned, we have a relatively small index. ~300K docs (and growing!) I haven't even needed to tune things much - it is mostly default settings from the example solrconfig.xml with cache sizes bumped up. The performance has been fine and load average rarely breaks 1.0
Re: Instructables on solr
On 4/4/07, Ryan McKinley [EMAIL PROTECTED] wrote: ...We have been running solr for months as a band-aid, this release integrates solr deeply... Awesome - thanks for sharing this! If you don't mind, it'd be cool to add some info to http://wiki.apache.org/solr/PublicServers -Bertrand
Re: Find docs close to a date
: My docs have a date field and I need to find the two docs with : a date which is closest to 2007-03-25T17:22:00Z. : : I use the following two queries to accomplish the task. : : date:{* TO 2007-03-25T17:22:00Z};date descstart=0rows=1 : date:{2007-03-25T17:22:00Z TO *};date ascstart=0rows=1 : : However I need to make a lot of queries like these. I'm wondering if : these kinds queries are expensive. Are there better alternatives for my : task? off the top of my head, i can't think of any better way to do what you are doing out of hte box with Solr ... if you wanted to write a bit of custom java code, a FunctionQuery ValueSource that made a bell curve arround a particular value would be a very cool/vlearn/reusable solution to this problem. One thing to watch out for is that this gives you the doc with the latest date prior to your input, and the doc with the earliest date after your input -- which is not exactly the two docs with a date which is closest to ... the two docs with the closest dates might have dates both below hte input, or both above the input. -Hoss
Re: Does solr support Multi index and return by score and datetime
How to cache results? Put them in a cache like memcached, for example, keyed off of query (can't exceed 250 bytes in the case of memcached, so you'll want to pack that query, perhaps use its MD5 as the cache key) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 5, 2007 1:57:07 AM Subject: Re: Does solr support Multi index and return by score and datetime 2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]: James, It looks like people already answered your questions. Split your big index. Put it on multiple servers. Put Solr on each of those servers. Write an application that searches multiple Solr instances in parallel. Get N results from each, combine them, order by score. How to cache its result? I hesitate that It will cache many data. As far as I know, this is the best you can do with what is available from Solr today. For anything else, you'll have to roll up your sleeves and dig into the code. Good luck! Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 5, 2007 1:18:30 AM Subject: Re: Does solr support Multi index and return by score and datetime Anyone have problem like this and how to solve it? 2007/4/5, James liu [EMAIL PROTECTED]: 2007/4/5, Mike Klaas [EMAIL PROTECTED]: On 4/4/07, James liu [EMAIL PROTECTED] wrote: I think it is part of full-text search. I think query slavers and combin result by score should be the part of solr. I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations but i wanna use solr and i like it. Now i wanna find a good method to solve it by using solr and less coding.(More code will cost more time to write and test.) I agree that it would be an excellent addition to Solr, but it is a major undertaking, and so I wouldn't wait around for it if it is important to you. Solr devs have code to write and test too :). If you document distribution is uniform random, then the norms converge to approximately equal values anyway. I don't know it. I don't know why u say document distribution. Does it mean if i write code independently, i will consider it? One of the complexities of queries multiple remote Solr/lucene instances is that the scores are not directly comparable as the term idf scores will be different. However, in practical situations, this can be glossed over. This is the basic algorithm for single-pass querying multiple solr slaves. Say you want results N to N + M (e.g 10 to 20). 1. query each solr instance independently for N+M documents for the given query. This should be done asynchronously (or you could spawn a thread per server). 2. wait for all responses (or for a certain timeout) 3. put all returned documents into an array, and reverse sort by score 4. select documents [N, N+M) from this array. This is a relatively simple task. It gets more complicated once multiple passes, idf compensation, deduplication, etc. are added. -Mike Thks Mike. I find it more complicate than i think. Is it the only way to solve my problem: I have a project, it have 100g data, now i have 3-4 server for solr. -- regards jl -- regards jl -- regards jl
Re: Does solr support Multi index and return by score and datetime
2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]: How to cache results? Put them in a cache like memcached, for example, keyed off of query (can't exceed 250 bytes in the case of memcached, so you'll want to pack that query, perhaps use its MD5 as the cache key) Yes,i use memcached and key is md5 query. thk ur advice. I decrease count of documents because of ram is only 1g. I think master use tomcat which use 20 solr instance. and slaveA and SlaveB have 10 solr instance. Web Server use lighttpd+php+memcached. It is my design. but not test. Maybe u can show me ur experience. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 5, 2007 1:57:07 AM Subject: Re: Does solr support Multi index and return by score and datetime 2007/4/5, Otis Gospodnetic [EMAIL PROTECTED]: James, It looks like people already answered your questions. Split your big index. Put it on multiple servers. Put Solr on each of those servers. Write an application that searches multiple Solr instances in parallel. Get N results from each, combine them, order by score. How to cache its result? I hesitate that It will cache many data. As far as I know, this is the best you can do with what is available from Solr today. For anything else, you'll have to roll up your sleeves and dig into the code. Good luck! Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 5, 2007 1:18:30 AM Subject: Re: Does solr support Multi index and return by score and datetime Anyone have problem like this and how to solve it? 2007/4/5, James liu [EMAIL PROTECTED]: 2007/4/5, Mike Klaas [EMAIL PROTECTED]: On 4/4/07, James liu [EMAIL PROTECTED] wrote: I think it is part of full-text search. I think query slavers and combin result by score should be the part of solr. I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations but i wanna use solr and i like it. Now i wanna find a good method to solve it by using solr and less coding.(More code will cost more time to write and test.) I agree that it would be an excellent addition to Solr, but it is a major undertaking, and so I wouldn't wait around for it if it is important to you. Solr devs have code to write and test too :). If you document distribution is uniform random, then the norms converge to approximately equal values anyway. I don't know it. I don't know why u say document distribution. Does it mean if i write code independently, i will consider it? One of the complexities of queries multiple remote Solr/lucene instances is that the scores are not directly comparable as the term idf scores will be different. However, in practical situations, this can be glossed over. This is the basic algorithm for single-pass querying multiple solr slaves. Say you want results N to N + M (e.g 10 to 20). 1. query each solr instance independently for N+M documents for the given query. This should be done asynchronously (or you could spawn a thread per server). 2. wait for all responses (or for a certain timeout) 3. put all returned documents into an array, and reverse sort by score 4. select documents [N, N+M) from this array. This is a relatively simple task. It gets more complicated once multiple passes, idf compensation, deduplication, etc. are added. -Mike Thks Mike. I find it more complicate than i think. Is it the only way to solve my problem: I have a project, it have 100g data, now i have 3-4 server for solr. -- regards jl -- regards jl -- regards jl -- regards jl
SEVERE: Error filterStart
Hello, I downloaded the latest nightly snapshot of Solr and replaced my existing war with the new one. Once I restarted tomcat, I get this error: SEVERE: Error filterStart Apr 5, 2007 10:11:28 AM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors Any ideas as to what is causing this? I deleted my index to start with a clean slate but I did not change any of my config files, do I need to update these or are the backwards compatible? Thanks! Andrew
Re: Find docs close to a date
Chris Hostetter wrote: off the top of my head, i can't think of any better way to do what you are doing out of hte box with Solr ... if you wanted to write a bit of custom java code, a FunctionQuery ValueSource that made a bell curve arround a particular value would be a very cool/vlearn/reusable solution to this problem. One thing to watch out for is that this gives you the doc with the latest date prior to your input, and the doc with the earliest date after your input -- which is not exactly the two docs with a date which is closest to ... the two docs with the closest dates might have dates both below hte input, or both above the input. Yes, I'm looking for the doc with the latest date prior to my input, and the doc with the earliest date after my input, not the two docs with a date which is closest to my input. One application of this is to determine email's signature. One person's email signature may change over time. So I only want to compare one email with the emails immediately before or after this one. I haven't written any java code yet. But maybe someday I will try for this one. -- View this message in context: http://www.nabble.com/Find-docs-close-to-a-date-tf3507295.html#a9858867 Sent from the Solr - User mailing list archive at Nabble.com.
Re: SEVERE: Error filterStart
: SEVERE: Error filterStart : Apr 5, 2007 10:11:28 AM org.apache.catalina.core.StandardContext start : SEVERE: Context [/solr] startup failed due to previous errors no clue at all ... the string filterStart doesn't appear anywhere in teh solr code base at all as far as i can see. is it possible that's a Tomcat error message relating to a problem with a ServletFilter? (possibly the DispatchFilter in Solr) are there any earlier messages that look suspicious? -Hoss
Re: SEVERE: Error filterStart
This does seem to be a Tomcat config problem. Start with this search to find other e-mail strings on this: http://www.google.com/search?q=SEVERE%3A+Error+filterStart wunder On 4/5/07 11:43 AM, Chris Hostetter [EMAIL PROTECTED] wrote: : SEVERE: Error filterStart : Apr 5, 2007 10:11:28 AM org.apache.catalina.core.StandardContext start : SEVERE: Context [/solr] startup failed due to previous errors no clue at all ... the string filterStart doesn't appear anywhere in teh solr code base at all as far as i can see. is it possible that's a Tomcat error message relating to a problem with a ServletFilter? (possibly the DispatchFilter in Solr) are there any earlier messages that look suspicious? -Hoss
Post in JSON format?
Hello solr-user, Query result in JSON format is really convenient, especially for Python clients. Is there any plan to allow posting in JSON format? -- Best regards, Jack
Re: Post in JSON format?
Everything is in place to make it an easy task. A CSV update handler was recently committed, a JSON loader should be a relatively straightforward task. But, I don't think anyone is working on it yet... On 4/5/07, Jack L [EMAIL PROTECTED] wrote: Hello solr-user, Query result in JSON format is really convenient, especially for Python clients. Is there any plan to allow posting in JSON format? -- Best regards, Jack
Re: C# API for Solr
I'm working on it right now. The library is largely done, but I need to add some documentation and a few examples for usage. No promises, but I hope to have something available in the next few days. -- j On 4/5/07, Mike Austin [EMAIL PROTECTED] wrote: I would be very interested in this. Any idea on when this will be available? Thanks -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 1:44 AM To: solr-user@lucene.apache.org Subject: Re: C# API for Solr Well, i think there will be a lot of people who will be very happy with this C# client. grts,m Jeff Rodenburg [EMAIL PROTECTED] 31/03/2007 18:00 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject C# API for Solr We built our first search system architecture around Lucene.Net back in 2005 and continued to make modifications through 2006. We quickly learned that search management is so much more than query algorithms and indexing choices. We were not readily prepared for the operational overhead that our Lucene-based search required: always-on availability, fast response times, batch and real-time updates, etc. Fast forward to 2007. Our front-end is Microsoft-based, but we needed to support parallel development on non-Microsoft architecture, and thus needed a cross-platform search system. Hello Solr! We've transitioned our search system to Solr with a Linux/Tomcat back-end, and it's been a champ. We now use solr not only for standard keyword search, but also to drive queries for lots of different content sections on our site. Solr has moved beyond mission critical in our operation. As we've proceeded, we've built out a nice C# client library to abstract the interaction from C# to Solr. It's mostly generic and designed for extensibilty. With a few modifications, this could be a stand-alone library that works for others. I have clearance from the organization to contribute our library to the community if there's interest. I'd first like to gauge the interest of everyone before doing so; please reply if you do. cheers, jeff r.