Re: spellcheck: issues

2008-10-10 Thread Jason Rennie
; } if (freq a.freq) { return -1; } return 0; } I could see you opening a JIRA issue in Lucene against the SC to make it so that the sorting could be overridden/pluggable. A patch to do so would be even better ;-) Cheers, Grant -- Jason Rennie Head of Machine Learning

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
Hi Grant, Here are solr config files (attached) and java code (included below) to recreate the test case. Jason ListPairString, Integer terms = new ArrayListPairString, Integer(); terms.add(new PairString, Integer(chanel, 834)); terms.add(new PairString, Integer(chant,

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
On Wed, Oct 8, 2008 at 1:24 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: Token: chane OMP: false Oct 8, 2008 1:19:56 PM org.apache.solr.core.SolrCore execute INFO: [spell] webapp=null path=/select

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
, as it is a bit more sophisticated than Levenstein when it comes to scoring. I just tried J-W and *yes* it seems to do a much better job! I'd certainly vote for that becoming the default :) Thanks for all the help! Much appreciated. Jason -- Jason Rennie Head of Machine Learning Technologies

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
On Wed, Oct 8, 2008 at 3:31 PM, Jason Rennie [EMAIL PROTECTED] wrote: I just tried J-W and *yes* it seems to do a much better job! I'd certainly vote for that becoming the default :) Ack! I did some more testing and J-W results started to get weird (including suggesting courses for coursets

Re: spellcheck: issues

2008-10-07 Thread Jason Rennie
On Tue, Oct 7, 2008 at 11:56 AM, Grant Ingersoll [EMAIL PROTECTED]wrote: Is there anyway you can write up a small test case? This definitely sounds like a bug. I tried adding single word documents according to the top ten suggestions and frequencies for chanl. I.e. I created a fresh index,

Re: spellcheck: issues

2008-10-07 Thread Jason Rennie
to reproduce it and see what's going on. On Oct 7, 2008, at 2:18 PM, Jason Rennie wrote: On Tue, Oct 7, 2008 at 11:56 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: Is there anyway you can write up a small test case? This definitely sounds like a bug. I tried adding single word

Re: Transitioning from Solr 1.2 to Solr 1.3

2008-10-06 Thread Jason Rennie
but haven't found a way to do this. Mike Tedesco -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/

Re: required keyword in all a document

2008-10-06 Thread Jason Rennie
) AND (k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag) AND (k1_en:french^100 OR k2_en:french^10 OR k3_en:french) Is there a better/more simple way to do this ? Thx in advance ! -- ~ | klessou | ~ -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http

spellcheck: issues

2008-10-06 Thread Jason Rennie
I've noticed a few issues with spellcheck as I've been testing it out for use on our site... 1. Rebuild breaks requests - I'm using rebuildOnCommit ATM. If a commit is going on and files are being rebuilt in the spellcheck data dir, spellcheck requests yield bogus answers. I.e. I can

Re: spellcheck: issues

2008-10-06 Thread Jason Rennie
, Jason Rennie [EMAIL PROTECTED] wrote: I've noticed a few issues with spellcheck as I've been testing it out for use on our site... 1. Rebuild breaks requests - I'm using rebuildOnCommit ATM. If a commit is going on and files are being rebuilt in the spellcheck data dir, spellcheck

Re: using spellcheckcomponent via solrj

2008-10-03 Thread Jason Rennie
be going on here? Thanks, Jason On Wed, Sep 24, 2008 at 4:22 PM, Jason Rennie [EMAIL PROTECTED] wrote: On Wed, Sep 24, 2008 at 4:07 PM, Grant Ingersoll [EMAIL PROTECTED]wrote: Just mimic the configuration for the spellCheckCompRH in the handler that you use for querying. Sounds even better

Re: How to tokenize/analyze docs for the spellchecker - at indexing and query time

2008-10-03 Thread Jason Rennie
Hi Martin, I'm a relative newbie to solr, have been playing with the spellcheck component and seem to have it working. I certainly can't explain what all is going on, but with any luck, I can help you get the spellchecker up-and-running. Additional replies in-lined below. On Wed, Oct 1, 2008

Re: spellcheck: buildOnOptimize?

2008-09-30 Thread Jason Rennie
On Fri, Sep 26, 2008 at 9:33 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Jason, can you please open a jira issue to add this feature? Done. https://issues.apache.org/jira/browse/SOLR-795 Jason

spellcheck: buildOnOptimize?

2008-09-25 Thread Jason Rennie
I see that there's an option to automatically rebuild the spelling index on a commit. That's a nice feature that we'll consider using, but we run commits every few thousand document updates, which would yield ~100 spelling index rebuilds a day. OTOH, we run an optimize about once/day which seems

using spellcheckcomponent via solrj

2008-09-24 Thread Jason Rennie
I've got SpellCheckComponent working on my index using queries like so: /solr/spellCheckCompRH?q=shartspellcheck.q=shartspellcheck=trueqt=sfdismax But, I haven't had any luck getting solrj to produce such queries. I can't find any way to change the url from /solr/select to

Re: using spellcheckcomponent via solrj

2008-09-24 Thread Jason Rennie
On Wed, Sep 24, 2008 at 4:07 PM, Grant Ingersoll [EMAIL PROTECTED]wrote: Just mimic the configuration for the spellCheckCompRH in the handler that you use for querying. Sounds even better. Let me make sure I'm reading you correctly. Is the idea to add lines like this to the requestHandler

Re: What's the bottleneck?

2008-09-12 Thread Jason Rennie
Thanks for all the replies! Mike: we're not using pf. Our qf is always status:0. The status field is 0 for all good docs (90%+) and some other integer for any docs we don't want returned. Jeyrl: federated search is definitely something we'll consider. On Fri, Sep 12, 2008 at 8:39 AM, Grant

Re: Question on how index works - runs out of disk space!

2008-09-11 Thread Jason Rennie
are easy to make via the solrj client we use. Though, for one of our indexes, we perform all of the updates offline and run an optimize before putting the index into production. Hope this helps. Cheers, Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http

What's the bottleneck?

2008-09-11 Thread Jason Rennie
, is there anything we could do to easily trim-down computation time (besides removing common words from the query)? Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/

Re: What's the bottleneck?

2008-09-11 Thread Jason Rennie
On Thu, Sep 11, 2008 at 11:54 AM, Mark Miller [EMAIL PROTECTED] wrote: What kind of traffic are you getting when it takes seconds? 1 request? 12? I'd estimate concurrency around 3, though the speed doesn't change much when we run the same query on a server with zero traffic. Jason

Re: What's the bottleneck?

2008-09-11 Thread Jason Rennie
On Thu, Sep 11, 2008 at 1:29 PM, [EMAIL PROTECTED] wrote: what is your index configuration??? Not sure what you mean. We're using 1.2, though we've tested with a recent nightly and didn't see a significant change in performance... What is your average size form the returned fields ???

Re: Index partioning

2008-09-10 Thread Jason Rennie
was discussing ... there's some good info on the wiki about the various options (they each have their trade offs to consider) http://wiki.apache.org/solr/MultipleIndexes -Hoss -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's

Re: Question on how index works - runs out of disk space!

2008-09-10 Thread Jason Rennie
use. The write lock is disabled on dev with lock-type = single. I am not sure if this matters. -Sundar _ Searching for the best deals on travel? Visit MSN Travel. http://in.msn.com/coxandkings -- Jason Rennie Head of Machine

Re: Less aggressive stemmer?

2008-08-22 Thread Jason Rennie
Kevin Guillaume, Many thanks for the pointers. It sounds like one of these two solutions will fit our needs. Cheers, Jason On Thu, Aug 21, 2008 at 5:33 PM, Guillaume Smet [EMAIL PROTECTED]wrote: On Thu, Aug 21, 2008 at 11:23 PM, Jason Rennie [EMAIL PROTECTED] wrote: Is there an option

Re: How to boost the score higher in case user query matches entire field value than just some words within a field

2008-08-21 Thread Jason Rennie
Doc2: cordless drill battery Doc3: cordless drill charger Searching for prodname:cordless drill will hit all three documents. So how can I make Doc1 score higher than the other two? BTW, I am using solr1.2. thanks! -Simon -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder

Less aggressive stemmer?

2008-08-21 Thread Jason Rennie
Is there an option to perform less aggressive stemming in solr? We're using the Porter stemmer. I see that there is an option for Snowball, but my understanding is that Snowball is a refinement of Porter rather than something radically different. I think we'd be best off with something very

Re: Administrative questions

2008-08-14 Thread Jason Rennie
On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman [EMAIL PROTECTED] wrote: Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! :) My pleasure. Was nice to hear recently that DJB is moving toward more flexible licensing terms. For

Re: Administrative questions

2008-08-13 Thread Jason Rennie
for a production environment. A bit tricky to set, but solid once you have it in place. http://cr.yp.to/daemontools.html Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/

Re: concurrent optimize and update

2008-08-12 Thread Jason Rennie
On Mon, Aug 11, 2008 at 6:41 PM, Yonik Seeley [EMAIL PROTECTED] wrote: It's safe... the adds will block until the commit or optimize has finished. By block, do you mean that the update connection(s) will be held open? Our optimizes take many minutes to complete. I'm thinking that this could

dismax bq

2008-08-05 Thread Jason Rennie
I'd like to be able to specify query term weights/boosts, which it sounds like bq was created for. I think my understanding from the wiki is a bit rough, so I'm hoping I might be able to get some questions answered here. Any thoughts/comments are much appreciated. I initially tried simply

Re: diversity in results

2008-08-04 Thread Jason Rennie
Thanks for the pointers. Looks interesting, at least as a starting point for something more sophisticated. Cheers, Jason On Mon, Aug 4, 2008 at 4:38 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: See https://issues.apache.org/jira/browse/SOLR-236 and

Re: diversity in results

2008-08-04 Thread Jason Rennie
, but I would use the mlt handler on the first result and remove all the ones that appear in both the MLT and query response. B -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/

pf nixes fl

2008-07-22 Thread Jason Rennie
Just tried adding a pf field to my request handler. When I did this, solr returned all document fields for each doc (no score) instead of returning the fields specified in fl. Bug? Feature? Anyone know what the reason for this behavior is? I'm using solr 1.2. Thanks, Jason

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
returns all document fields, no score field. Jason On Tue, Jul 22, 2008 at 2:55 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 22-Jul-08, at 11:53 AM, Jason Rennie wrote: Just tried adding a pf field to my request handler. When I did this, solr returned all document fields for each doc

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
Doh! I mistakenly changed the request handler from dismax to standard. Ignore me... Jason On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie [EMAIL PROTECTED] wrote: I'm using solrj and all I did was add a pf entry to solrconfig.xml. I don't think it could be an ampersand issue... Here's

Re: Internal Server Error and waitSearcher=false for commit/optimize

2007-10-11 Thread Jason Rennie
thread, so this option would not affect operations. In case you're curious, we use solr as the search engine for www.stylefeeder.com. It has served us very well so far, handling over 3000 queries/day. Thanks, Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http

Internal Server Error and waitSearcher=false for commit/optimize

2007-10-10 Thread Jason Rennie
Hello, We're using solr 1.2 and a nightly build of the solrj client code. We very occasionally see things like this: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process( QueryRequest.java:86) at