prefix matching

2009-04-23 Thread Tom Morton
Hi all,
  I'm trying to use prefixes to match similar strings to a query string.  I
have the following field type:

  fieldtype name=prefix stored=true indexed=true
class=solr.TextField
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=10/
  /analyzer
  /fieldtype

field:
   field name=wordPrefix type=prefix indexed=true stored=true/

copyField:
copyField source=word dest=wordPrefix/

If I apply this to an indexed string: ipod shuffle and query string:
shufle (missing f) I get matching terms for sh, shu shuf
Index Analyzer  ipodshuffle  ipodshuffle  ipodshuffle  ipipoipodshshushuf
shuffshufflshuffle Query Analyzer  shufle  shufle  shufle shshushufshufl
shufle
However when I query for with shufle i get no results:

http://localhost:8983/solr/select?q=wordPrefix%3Ashuflefl=wordPrefixqt=standarddebugQuery=on

lst name=debug
str name=rawquerystringwordPrefix:shufle/str
str name=querystringwordPrefix:shufle/str
-
str name=parsedquery
PhraseQuery(wordPrefix:sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl
hufle shufle)
/str
-
str name=parsedquery_toString
wordPrefix:sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle
shufle
/str

This post suggests that I need to set the Position Increment for the my
token filter, but I'm not sure how to do that or if it's possible.

http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4

Thoughts?  Thanks...Tom


number of matching documents incorrect during postOptimize

2008-08-11 Thread Tom Morton
Hi all,
   I'm trying to check that an import using the dataImportHandler was clean
before I take a snapshot of the index to be pulled via snappuller to query
nodes.  One of the checks I do is verify that a certain minimum number of
documents are returned for a query.  I do this in a script that I'm calling
via the postOptimize hook.  However, after a full import the numFound
results from the query are not accurate until after the postOptimize code
completes and so my checks are failing.

Glancing at the code this looks non-trivial to fix as the hook call is
pretty deep in the call stack.
org.apache.solr.handler.dataimport.DataImporter.doFullImport execute
eventually calls
org.apache.solr.update.UpdateHandler.callPostOptimizeCallbacks

One option would be to spawn and background a new job to check the status
with an initial sleep to wait for the postOptimize that spawned it to
finish.  This is pretty ugly and could lead to some race conditions but will
probably work.

Any better recommendations on how to acheive this functionality?

Thanks...Tom


1.3 DisMax and MoreLikeThis

2008-06-04 Thread Tom Morton
Hi,
   I wanted to use the new dismax support for more like this described in
SOLR-295 https://issues.apache.org/jira/browse/SOLR-295 but can't even get
the new syntax for dismax to work (described in
SOLR-281https://issues.apache.org/jira/browse/SOLR-281).
Any ideas if this functionality works?

Here's the relevant part of my solr config,

  requestHandler name=/genre class=solr.StandardRequestHandler
defType=dismax
lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
relatedExact^2 genre^0.5
 /str
 int name=ps100/int
 str name=q.alt*:*/str
/lst
  /requestHandler

Example query:
http://localhost:13280/solr/genre?indent=onversion=2.2q=terrence+howardstart=0rows=10fl=*%2Cscorewt=standarddebugQuery=onexplainOther=hl.fl=

Debug output: (I would expect to see dismax scoring)

str name=Contributor8843
11.151003 = (MATCH) sum of:
  6.925395 = (MATCH) weight(name:terrence in 63941), product of:
0.7880709 = queryWeight(name:terrence), product of:
  10.0431795 = idf(docFreq=234, numDocs=1988249)
  0.07846827 = queryNorm
8.787782 = (MATCH) fieldWeight(name:terrence in 63941), product of:
  1.0 = tf(termFreq(name:terrence)=1)
  10.0431795 = idf(docFreq=234, numDocs=1988249)
  0.875 = fieldNorm(field=name, doc=63941)
  4.2256074 = (MATCH) weight(name:howard in 63941), product of:
0.6155844 = queryWeight(name:howard), product of:
  7.84501 = idf(docFreq=2116, numDocs=1988249)
  0.07846827 = queryNorm
6.8643837 = (MATCH) fieldWeight(name:howard in 63941), product of:
  1.0 = tf(termFreq(name:howard)=1)
  7.84501 = idf(docFreq=2116, numDocs=1988249)
  0.875 = fieldNorm(field=name, doc=63941)


Here's my build info:
Solr Specification Version: 1.2.2008.06.02.15.21.48
Solr Implementation Version: 1.3-dev 662524M - tsmorton - 2008-06-02
15:21:48

Is this feature now broken or does it look like my config is wrong?

Thanks...Tom


Re: 1.3 DisMax and MoreLikeThis

2008-06-04 Thread Tom Morton
Hi,
  Thanks Yonik.  That fixed that.  I would be useful to change one of the
existing dismax query types in  the default solrconfig.xml to use this new
syntax (Especially since DisMaxRequestHandler is being deprecared.)

Thanks again...Tom

On Wed, Jun 4, 2008 at 11:19 AM, Yonik Seeley [EMAIL PROTECTED] wrote:

 On Wed, Jun 4, 2008 at 11:11 AM, Tom Morton [EMAIL PROTECTED] wrote:
I wanted to use the new dismax support for more like this described in
  SOLR-295 https://issues.apache.org/jira/browse/SOLR-295 but can't even
 get
  the new syntax for dismax to work (described in
  SOLR-281https://issues.apache.org/jira/browse/SOLR-281).
  Any ideas if this functionality works?
 
  Here's the relevant part of my solr config,
 
   requestHandler name=/genre class=solr.StandardRequestHandler
  defType=dismax

 defType is just another parameter and should appear in the defaults
 section below.
 -Yonik

 lst name=defaults
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qf
 relatedExact^2 genre^0.5
  /str
  int name=ps100/int
  str name=q.alt*:*/str
 /lst
   /requestHandler
 
  Example query:
 
 http://localhost:13280/solr/genre?indent=onversion=2.2q=terrence+howardstart=0rows=10fl=*%2Cscorewt=standarddebugQuery=onexplainOther=hl.fl=
 
  Debug output: (I would expect to see dismax scoring)
 
  str name=Contributor8843
  11.151003 = (MATCH) sum of:
   6.925395 = (MATCH) weight(name:terrence in 63941), product of:
 0.7880709 = queryWeight(name:terrence), product of:
   10.0431795 = idf(docFreq=234, numDocs=1988249)
   0.07846827 = queryNorm
 8.787782 = (MATCH) fieldWeight(name:terrence in 63941), product of:
   1.0 = tf(termFreq(name:terrence)=1)
   10.0431795 = idf(docFreq=234, numDocs=1988249)
   0.875 = fieldNorm(field=name, doc=63941)
   4.2256074 = (MATCH) weight(name:howard in 63941), product of:
 0.6155844 = queryWeight(name:howard), product of:
   7.84501 = idf(docFreq=2116, numDocs=1988249)
   0.07846827 = queryNorm
 6.8643837 = (MATCH) fieldWeight(name:howard in 63941), product of:
   1.0 = tf(termFreq(name:howard)=1)
   7.84501 = idf(docFreq=2116, numDocs=1988249)
   0.875 = fieldNorm(field=name, doc=63941)
 
 
  Here's my build info:
  Solr Specification Version: 1.2.2008.06.02.15.21.48
  Solr Implementation Version: 1.3-dev 662524M - tsmorton - 2008-06-02
  15:21:48
 
  Is this feature now broken or does it look like my config is wrong?
 
  Thanks...Tom
 



Boost support for MoreLikeThis fields

2008-06-04 Thread Tom Morton
Hi,
   SOLR-295 https://issues.apache.org/jira/browse/SOLR-295 mentions boost
support for morelikethis and then seems to have been subsumed by
SOLR-281https://issues.apache.org/jira/browse/SOLR-281.
To be clear, I'm talking about boosts for the mlt.fl fields and how they are
ranked rather than for the seeding query.  Has this feature gotten any
attention?

Thanks...Tom