Re: Spell Check Handler

2007-10-11 Thread scott.tabar
Hoss, I had a feeling someone would be quoting Yonik's Law of Patches! ;-) For now, this is done. I created the changes, created JavaDoc comments on the various settings and their expected output, created a JUnit test for the SpellCheckerRequestHandler which tests various components of the

implemented StandardReqeustHandler to show top-results per facet-value. Is this the fastest way?

2007-10-11 Thread Britske
Since the title of my original post may not have been so clear, here a repost. //Geert-Jan Britske wrote: First of all, I just wanted to say that I just started working with Solr and really like the results I'm getting from Solr (in terms of performance, flexibility) as well as the good

Tomcat Solr Problem

2007-10-11 Thread Nishant Soni
Unable to get solr up in tomcat. Im getting the following log NFO: Using JNDI solr.home: E:/test/workspace/reviewGist/solr/home Oct 11, 2007 1:48:13 PM org.apache.solr.core.Config setInstanceDir INFO: Solr home set to 'E:/test/workspace/reviewGist/solr/home/' Oct 11, 2007 1:48:13 PM

Re: getting number of stored documents via rest api

2007-10-11 Thread Stefan Rinner
On Oct 10, 2007, at 6:49 PM, Chris Hostetter wrote: : I think search for *:* is the optimal code to do it. I don't think you can : do anything faster. FYI: getting the data from the xml returned by stats.jsp is definitely faster in the case where you really want all docs. if you want the

Re: Spell Check Handler

2007-10-11 Thread climbingrose
Hi all, I've been so busy the last few days so I haven't replied to this email. I modified SpellCheckerHandler a while ago to include support for multiword query. To be honest, I didn't have time to write unit test for the code. However, I deployed it in a production environment and it has been

Re: Spell Check Handler

2007-10-11 Thread climbingrose
Just to clarify this line of code: String[] suggestions = spellChecker.suggestSimilar(termText, numSug, req.getSearcher().getReader(), restrictToField, true); I only return suggestions if they are more popular than termText. You probably need to use code in Scott's patch to make this behaviour

Re: Availability Issues

2007-10-11 Thread Norberto Meijome
On Tue, 9 Oct 2007 10:12:51 -0400 David Whalen [EMAIL PROTECTED] wrote: So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have

Re: index size

2007-10-11 Thread Ravish Bhagdev
Hi All, I'm facing similar problem. I want to index entire document as a field. But I also want to be able to retrieve snippets (like Google/Nutch return in results page below the links). To achieve this I have to keep the document field to stored right? When I do this my index becomes huge 10

Re: WebException (ServerProtocolViolation) with SolrSharp

2007-10-11 Thread Filipe Correia
Jeff, Thanks! Your suggestion worked, instead of invoking ToString() on float values I've used ToString's other signature, which takes a an IFormatProvider: CultureInfo MyCulture = CultureInfo.InvariantCulture; this.Add(new IndexFieldValue(weight, weight.ToString(MyCulture.NumberFormat)));

Re: getting number of stored documents via rest api

2007-10-11 Thread Walter Underwood
This even works if you request 0 results. --wunder On 10/11/07 1:56 AM, Stefan Rinner [EMAIL PROTECTED] wrote: On Oct 10, 2007, at 6:49 PM, Chris Hostetter wrote: : I think search for *:* is the optimal code to do it. I don't think you can : do anything faster. FYI: getting the data

Re: Different search results for (german) singular/plural searches - looking for a solution

2007-10-11 Thread Martin Grotzke
Hi Daniel, thanx for your suggestions, being able to export a large synonyms.txt sounds very well! Thx cheers, Martin On Wed, 2007-10-10 at 23:38 +0200, Daniel Naber wrote: On Wednesday 10 October 2007 12:00, Martin Grotzke wrote: Basically I see two options: stemming and the usage of

Re: Different search results for (german) singular/plural searches - looking for a solution

2007-10-11 Thread Thomas Traeger
Martin Grotzke schrieb: Try the SnowballPorterFilterFactory with German2 as language attribute first and use synonyms for combined words i.e. Herrenhose = Herren, Hose. so you use a combined approach? Yes, we define the relevant parts of compounded words (keywords only) as synonyms

Re: Internal Server Error and waitSearcher=false for commit/optimize

2007-10-11 Thread Yonik Seeley
On 10/10/07, Jason Rennie [EMAIL PROTECTED] wrote: We're using solr 1.2 and a nightly build of the solrj client code. We very occasionally see things like this: org.apache.solr.client.solrj.SolrServerException: Error executing query at

Re: index size

2007-10-11 Thread Kevin Lewandowski
To achieve this I have to keep the document field to stored right? Yes, the field needs to be stored to return snippets. When I do this my index becomes huge 10 GB index, cause I have 10K docs but each is very lengthy HTML. Is there any better solution? Why is index created by nutch so

Re: Spell Check Handler

2007-10-11 Thread Matthew Runo
Where does the index come from in the first place? Do we have to enter the words, or are they entered as documents enter the SOLR index? I'd love to be able to use my own documents as the spell check index of correctly spelled words.

Re: WebException (ServerProtocolViolation) with SolrSharp

2007-10-11 Thread Jeff Rodenburg
Good to know, I think this needs to be a configurable value in the library (overridable, at a minimum.) What's outstanding for me on this is understanding the Solr side of the equation, and whether culture variance comes into play. What makes this even more interesting/confusing is how culture

Re: Internal Server Error and waitSearcher=false for commit/optimize

2007-10-11 Thread Jason Rennie
Many thanks for your reply, Yonik. On 10/11/07, Yonik Seeley [EMAIL PROTECTED] wrote: Caused by: org.apache.solr.common.SolrException: Internal Server Error Is there a longer stack trace somewhere concerning the internal server error? I shouldn't have bothered you with this. We've

Re: showing results per facet-value efficiently

2007-10-11 Thread Mike Klaas
On 10-Oct-07, at 4:16 AM, Britske wrote: However, I realized that for calculating the count for each of the facetvalues, the original standardrequesthandler already loops the doclist to check for matches. Therefore my implementation actually does double work, since it gets doclists for

Re: showing results per facet-value efficiently

2007-10-11 Thread Britske
yup that clarifies things a lot, thanks. Mike Klaas wrote: On 10-Oct-07, at 4:16 AM, Britske wrote: However, I realized that for calculating the count for each of the facetvalues, the original standardrequesthandler already loops the doclist to check for matches. Therefore my

quickie: do facetfields use same cached items in field cache as FQ-param?

2007-10-11 Thread Britske
say I have the following (partial) querystring:...facet=truefacet.field=country field 'country' is not tokenized, not multi-valued, and not boolean, so the field-cache approach is used. Morover, the following (partial) querystring is used as well: ..fq=country:france do these queries share

doubled/halved performance?

2007-10-11 Thread Mike Klaas
I'm seeing some interesting behaviour when doing benchmarks of query and facet performance. Note that the query cache is disabled, and the index is entirely in the OS disk cache. filterCache is fully primed. Often when repeatedly measuring the same query, I'll see pretty consistent

Re: doubled/halved performance?

2007-10-11 Thread Yonik Seeley
On 10/11/07, Mike Klaas [EMAIL PROTECTED] wrote: I'm seeing some interesting behaviour when doing benchmarks of query and facet performance. Note that the query cache is disabled, and the index is entirely in the OS disk cache. filterCache is fully primed. Often when repeatedly measuring

Instant deletes without committing

2007-10-11 Thread BrendanD
Hi, Is it possible to send a command to have Solr flush the deletesPending documents without doing a commit? I know there's a setting in solrconfig.xml for setting a threshold value, but I'd like to somehow kick it off on demand. We need this to be able to remove merchants from our product

Re: doubled/halved performance?

2007-10-11 Thread Mike Klaas
On 11-Oct-07, at 2:37 PM, Yonik Seeley wrote: On 10/11/07, Mike Klaas [EMAIL PROTECTED] wrote: I'm seeing some interesting behaviour when doing benchmarks of query and facet performance. Note that the query cache is disabled, and the index is entirely in the OS disk cache. filterCache is

Re: Instant deletes without committing

2007-10-11 Thread Mike Klaas
On 11-Oct-07, at 2:47 PM, BrendanD wrote: Hi, Is it possible to send a command to have Solr flush the deletesPending documents without doing a commit? I know there's a setting in solrconfig.xml for setting a threshold value, but I'd like to somehow kick it off on demand. We need this to be

Fwd: solr, snippets and stored field in nutch...

2007-10-11 Thread Ravish Bhagdev
Hey guys, Checkout this thread I opened on nutch mailing list. Looks like Solr can benefit from reusing Nutch's segment based storage strategy for efficiency in returning snippets, summaries etc without using Lucene stored fields? Was this considered before? Ravish -- Forwarded

Re: Instant deletes without committing

2007-10-11 Thread BrendanD
Yes, we have some huge performance issues with non-cached queries. So doing a commit is very expensive for us. We have our autowarm count for our filterCache and queryResultCache both set to 4096. But I don't think that's near high enough. We did have it as high as 16384 before, but it took over

Re: solr, snippets and stored field in nutch...

2007-10-11 Thread Mike Klaas
First, it should be noted that I am not an expert in Nutch's architure. I do think I understand what is being said there, however. Nutch is a distributed web search engine, and uses lucene as a indexing component. It is free to use external data structures to store data, and can store

Field name filter

2007-10-11 Thread Debra
When searching data, I need to process field name similar to processing terms. Lower-case of field name so field name is not case sensitive (all fileld names are lower case when indexed), have synonyms for field names, example-if user types article:abc or user types content:abc it both cases it

Add fields to query when processing

2007-10-11 Thread Debra
How can I add a field name to query dynamicly? Examle: If user types in stock replace it with quantity:[1 TO *] -- View this message in context: http://www.nabble.com/Add-fields-to-query-when-processing-tf4610578.html#a1317 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr, snippets and stored field in nutch...

2007-10-11 Thread Mike Klaas
On 11-Oct-07, at 4:34 PM, Ravish Bhagdev wrote: Hi Mike, Thanks for your reply :) I am not an expert of either! But, I understand that Nutch stores contents albeit in a separate data structure (they call segment as discussed in the thread), but what I meant was that this seems like much more

Re: quickie: do facetfields use same cached items in field cache as FQ-param?

2007-10-11 Thread Chris Hostetter
: ..fq=country:france : : do these queries share cached items in the fieldcache? (in this example: : country:france) or do they somehow live as seperate entities in the cache? : The latter would explain my fieldcache having evictions at the moment. FieldCache can't have evicitions. it's a

query syntax performance difference?

2007-10-11 Thread BrendanD
Hi, Is there a difference in the performance for the following 2 variations on query syntax? The first query was a response from Solr by using a single fq parameter in the URL. The second query was a response from Solr by using separate fq parameter in the URL, one for each field. str name=fq

autowarm static queries

2007-10-11 Thread BrendanD
Hi, I have the following query that I've found in my production logs: INFO: /select/

Re: autowarm static queries

2007-10-11 Thread Mike Klaas
On 11-Oct-07, at 6:47 PM, BrendanD wrote: Hi, I have the following query that I've found in my production logs: INFO: /select/ rows=0start=0f.category_id.facet.limit=-1facet=truefacet.field=cat egory_idfq=product_is_active:truefq=product_status_code:completefq=

Re: getting number of stored documents via rest api

2007-10-11 Thread Erik Hatcher
Another route to getting the number of documents is to get it from the LukeRequestHandler: http://localhost:8983/solr/admin/luke?numTerms=0 (numTerms=0 to get the fastest response possible) Erik On Oct 10, 2007, at 10:19 AM, Stefan Rinner wrote: Hi for some tests I need to know

Re: autowarm static queries

2007-10-11 Thread BrendanD
Mike Klaas wrote: On 11-Oct-07, at 6:47 PM, BrendanD wrote: Hi, I have the following query that I've found in my production logs: INFO: /select/ rows=0start=0f.category_id.facet.limit=-1facet=truefacet.field=cat egory_idfq=product_is_active:truefq=product_status_code:completefq=

Re: autowarm static queries

2007-10-11 Thread Mike Klaas
On 11-Oct-07, at 7:38 PM, BrendanD wrote: Unfortunately pretty much ALL of our fields are multi-valued. A product can exist in multiple categories, be sold by multiple merchants, and have multiple attributes with multiple attribute values assigned to it. E.g. an iPod in a special Gifts