Hoss,
I had a feeling someone would be quoting Yonik's Law of Patches! ;-)
For now, this is done.
I created the changes, created JavaDoc comments on the various settings
and their expected output, created a JUnit test for the
SpellCheckerRequestHandler
which tests various components of the
Since the title of my original post may not have been so clear, here a
repost.
//Geert-Jan
Britske wrote:
First of all, I just wanted to say that I just started working with Solr
and really like the results I'm getting from Solr (in terms of
performance, flexibility) as well as the good
Unable to get solr up in tomcat. Im getting the following log
NFO: Using JNDI solr.home: E:/test/workspace/reviewGist/solr/home
Oct 11, 2007 1:48:13 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to 'E:/test/workspace/reviewGist/solr/home/'
Oct 11, 2007 1:48:13 PM
On Oct 10, 2007, at 6:49 PM, Chris Hostetter wrote:
: I think search for *:* is the optimal code to do it. I don't
think you can
: do anything faster.
FYI: getting the data from the xml returned by stats.jsp is definitely
faster in the case where you really want all docs.
if you want the
Hi all,
I've been so busy the last few days so I haven't replied to this email. I
modified SpellCheckerHandler a while ago to include support for multiword
query. To be honest, I didn't have time to write unit test for the code.
However, I deployed it in a production environment and it has been
Just to clarify this line of code:
String[] suggestions = spellChecker.suggestSimilar(termText, numSug,
req.getSearcher().getReader(), restrictToField, true);
I only return suggestions if they are more popular than termText. You
probably need to use code in Scott's patch to make this behaviour
On Tue, 9 Oct 2007 10:12:51 -0400
David Whalen [EMAIL PROTECTED] wrote:
So, how would you build it if you could? Here are the specs:
a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have
Hi All,
I'm facing similar problem. I want to index entire document as a
field. But I also want to be able to retrieve snippets (like
Google/Nutch return in results page below the links).
To achieve this I have to keep the document field to stored right?
When I do this my index becomes huge 10
Jeff,
Thanks! Your suggestion worked, instead of invoking ToString() on
float values I've used ToString's other signature, which takes a an
IFormatProvider:
CultureInfo MyCulture = CultureInfo.InvariantCulture;
this.Add(new IndexFieldValue(weight,
weight.ToString(MyCulture.NumberFormat)));
This even works if you request 0 results. --wunder
On 10/11/07 1:56 AM, Stefan Rinner [EMAIL PROTECTED] wrote:
On Oct 10, 2007, at 6:49 PM, Chris Hostetter wrote:
: I think search for *:* is the optimal code to do it. I don't
think you can
: do anything faster.
FYI: getting the data
Hi Daniel,
thanx for your suggestions, being able to export a large synonyms.txt
sounds very well!
Thx cheers,
Martin
On Wed, 2007-10-10 at 23:38 +0200, Daniel Naber wrote:
On Wednesday 10 October 2007 12:00, Martin Grotzke wrote:
Basically I see two options: stemming and the usage of
Martin Grotzke schrieb:
Try the SnowballPorterFilterFactory with German2 as language attribute
first and use synonyms for combined words i.e. Herrenhose = Herren,
Hose.
so you use a combined approach?
Yes, we define the relevant parts of compounded words (keywords only) as
synonyms
On 10/10/07, Jason Rennie [EMAIL PROTECTED] wrote:
We're using solr 1.2 and a nightly build of the solrj client code. We very
occasionally see things like this:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at
To achieve this I have to keep the document field to stored right?
Yes, the field needs to be stored to return snippets.
When I do this my index becomes huge 10 GB index, cause I have 10K
docs but each is very lengthy HTML. Is there any better solution?
Why is index created by nutch so
Where does the index come from in the first place? Do we have to
enter the words, or are they entered as documents enter the SOLR index?
I'd love to be able to use my own documents as the spell check index
of correctly spelled words.
Good to know, I think this needs to be a configurable value in the library
(overridable, at a minimum.)
What's outstanding for me on this is understanding the Solr side of the
equation, and whether culture variance comes into play. What makes this
even more interesting/confusing is how culture
Many thanks for your reply, Yonik.
On 10/11/07, Yonik Seeley [EMAIL PROTECTED] wrote:
Caused by: org.apache.solr.common.SolrException: Internal Server Error
Is there a longer stack trace somewhere concerning the internal server
error?
I shouldn't have bothered you with this. We've
On 10-Oct-07, at 4:16 AM, Britske wrote:
However, I realized that for calculating the count for each of the
facetvalues, the original standardrequesthandler already loops the
doclist
to check for matches. Therefore my implementation actually does
double work,
since it gets doclists for
yup that clarifies things a lot, thanks.
Mike Klaas wrote:
On 10-Oct-07, at 4:16 AM, Britske wrote:
However, I realized that for calculating the count for each of the
facetvalues, the original standardrequesthandler already loops the
doclist
to check for matches. Therefore my
say I have the following (partial)
querystring:...facet=truefacet.field=country
field 'country' is not tokenized, not multi-valued, and not boolean, so the
field-cache approach is used.
Morover, the following (partial) querystring is used as well:
..fq=country:france
do these queries share
I'm seeing some interesting behaviour when doing benchmarks of query
and facet performance. Note that the query cache is disabled, and
the index is entirely in the OS disk cache. filterCache is fully
primed.
Often when repeatedly measuring the same query, I'll see pretty
consistent
On 10/11/07, Mike Klaas [EMAIL PROTECTED] wrote:
I'm seeing some interesting behaviour when doing benchmarks of query
and facet performance. Note that the query cache is disabled, and
the index is entirely in the OS disk cache. filterCache is fully
primed.
Often when repeatedly measuring
Hi,
Is it possible to send a command to have Solr flush the deletesPending
documents without doing a commit? I know there's a setting in solrconfig.xml
for setting a threshold value, but I'd like to somehow kick it off on
demand. We need this to be able to remove merchants from our product
On 11-Oct-07, at 2:37 PM, Yonik Seeley wrote:
On 10/11/07, Mike Klaas [EMAIL PROTECTED] wrote:
I'm seeing some interesting behaviour when doing benchmarks of query
and facet performance. Note that the query cache is disabled, and
the index is entirely in the OS disk cache. filterCache is
On 11-Oct-07, at 2:47 PM, BrendanD wrote:
Hi,
Is it possible to send a command to have Solr flush the deletesPending
documents without doing a commit? I know there's a setting in
solrconfig.xml
for setting a threshold value, but I'd like to somehow kick it off on
demand. We need this to be
Hey guys,
Checkout this thread I opened on nutch mailing list. Looks like Solr
can benefit from reusing Nutch's segment based storage strategy for
efficiency in returning snippets, summaries etc without using Lucene
stored fields?
Was this considered before?
Ravish
-- Forwarded
Yes, we have some huge performance issues with non-cached queries. So doing a
commit is very expensive for us. We have our autowarm count for our
filterCache and queryResultCache both set to 4096. But I don't think that's
near high enough. We did have it as high as 16384 before, but it took over
First, it should be noted that I am not an expert in Nutch's
architure. I do think I understand what is being said there, however.
Nutch is a distributed web search engine, and uses lucene as a
indexing component. It is free to use external data structures to
store data, and can store
When searching data, I need to process field name similar to processing
terms.
Lower-case of field name so field name is not case sensitive (all fileld
names are lower case when indexed),
have synonyms for field names, example-if user types article:abc or user
types content:abc it both cases it
How can I add a field name to query dynamicly?
Examle: If user types in stock replace it with quantity:[1 TO *]
--
View this message in context:
http://www.nabble.com/Add-fields-to-query-when-processing-tf4610578.html#a1317
Sent from the Solr - User mailing list archive at Nabble.com.
On 11-Oct-07, at 4:34 PM, Ravish Bhagdev wrote:
Hi Mike,
Thanks for your reply :)
I am not an expert of either! But, I understand that Nutch stores
contents albeit in a separate data structure (they call segment as
discussed in the thread), but what I meant was that this seems like
much more
: ..fq=country:france
:
: do these queries share cached items in the fieldcache? (in this example:
: country:france) or do they somehow live as seperate entities in the cache?
: The latter would explain my fieldcache having evictions at the moment.
FieldCache can't have evicitions. it's a
Hi,
Is there a difference in the performance for the following 2 variations on
query syntax? The first query was a response from Solr by using a single fq
parameter in the URL. The second query was a response from Solr by using
separate fq parameter in the URL, one for each field.
str name=fq
Hi,
I have the following query that I've found in my production logs:
INFO: /select/
On 11-Oct-07, at 6:47 PM, BrendanD wrote:
Hi,
I have the following query that I've found in my production logs:
INFO: /select/
rows=0start=0f.category_id.facet.limit=-1facet=truefacet.field=cat
egory_idfq=product_is_active:truefq=product_status_code:completefq=
Another route to getting the number of documents is to get it from
the LukeRequestHandler:
http://localhost:8983/solr/admin/luke?numTerms=0 (numTerms=0 to get
the fastest response possible)
Erik
On Oct 10, 2007, at 10:19 AM, Stefan Rinner wrote:
Hi
for some tests I need to know
Mike Klaas wrote:
On 11-Oct-07, at 6:47 PM, BrendanD wrote:
Hi,
I have the following query that I've found in my production logs:
INFO: /select/
rows=0start=0f.category_id.facet.limit=-1facet=truefacet.field=cat
egory_idfq=product_is_active:truefq=product_status_code:completefq=
On 11-Oct-07, at 7:38 PM, BrendanD wrote:
Unfortunately pretty much ALL of our fields are multi-valued. A
product can
exist in multiple categories, be sold by multiple merchants, and have
multiple attributes with multiple attribute values assigned to it.
E.g. an
iPod in a special Gifts
38 matches
Mail list logo