Re: UIMA Error

2011-02-06 Thread Darx Oman
Hi
How to apply the AlchemyAPIAnnotator?
will this helps me with the *NamedEntityExtractionAnnotator?*
*thanx a lot Tommaso for you time*


Re: keepword file with phrases

2011-02-06 Thread lee carroll
Hi Bill,

quoting in the synonyms file did not produce the correct expansion :-(

Looking at Chris's comments now

cheers

lee

On 5 February 2011 23:38, Bill Bell billnb...@gmail.com wrote:

 OK that makes sense.

 If you double quote the synonyms file will that help for white space?

 Bill


 On 2/5/11 4:37 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : You need to switch the order. Do synonyms and expansion first, then
 : shingles..
 
 except then he would be building shingles out of all the permutations of
 words in his symonyms -- including the multi-word synonyms.  i don't
 *think* that's what he wants based on his example (but i may be wrong)
 
 : Have you tried using analysis.jsp ?
 
 he already mentioned he has, in his original mail, and that's how he can
 tell it's not working.
 
 lee: based on your followup post about seeing problems in the synonyms
 output, i suspect the problem you are having is with how the
 synonymfilter
 parses the synonyms file -- by default it assumes it should split on
 certain characters to creates multi-word synonyms -- but in your case the
 tokens you are feeding synonym filter (the output of your shingle filter)
 really do have whitespace in them
 
 there is a tokenizerFactory option that Koji added a hwile back to the
 SYnonymFilterFactory that lets you specify the classname of a
 TokenizerFactory to use when parsing the synonym rule -- that may be what
 you need to get your synonyms with spaces in them (so they work properly
 with your shingles)
 
 (assuming of course that i really understand your problem)
 
 
 -Hoss





Re: HTTP ERROR 400 undefined field: *

2011-02-06 Thread Erick Erickson
I *think* that there was a post a while ago saying that if you were
using trunk 3_x one of the recent changes required re-indexing, but don't
quote me on that.
Have you tried that?

Best
Erick

On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner jglaz...@beyondoblivion.comwrote:

 Sorry for the lack of details.

 It's all clear in my head.. :)

 We checked out the head revision from the 3.x branch a few weeks ago (
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We
 picked up r1058326.

 We upgraded from a previous checkout (r960098). I am using our customized
 schema.xml and the solrconfig.xml from the old revision with the new
 checkout.

 After upgrading I just copied the data folders from each core into the new
 checkout (hoping I wouldn't have to re-index the content, as this takes
 days).  Everything seems to work fine, except that now I can't get the score
 to return.

 The stack trace is attached.  I also saw this warning in the logs not sure
 exactly what it's talking about:

 Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion
 WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24
 emulation. You should at some point declare and reindex to at least 3.0,
 because 2.4 emulation is deprecated and will be removed in 4.0. This
 parameter will be mandatory in 4.0.

 Here is my request handler, the actual fields here are different than what
 is in mine, but I'm a little uncomfortable publishing how our companies
 search service works to the world:

 requestHandler name=standard class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str name=defTypeedismax/str
 bool name=tvtrue/bool
 !-- standard field to query on --
 str name=qffield_a^2 field_b^2 field_c^4 /str

 !-- automatic phrase boosting! --
 str name=pffield_d^10/str

 !-- boost function --
 !--
we'll comment this out for now becuase we're passing it to
 solr as a paramter.
Once we finalize the exact function we should move it here
 and take it out of the
query string.
--
 !--str name=bflog(linear(field_e,0.001,1))^10/str--
 str name=tie0.1/str
 /lst
 arr name=last-components
 strtvComponent/str
 /arr
 /requestHandler

 Anyway  Hopefully this is enough info, let me know if you need more.

 Jed.






 On 02/03/2011 10:29 PM, Chris Hostetter wrote:

 : I was working on an checkout of the 3.x branch from about 6 months ago.
 : Everything was working pretty well, but we decided that we should update
 and
 : get what was at the head.  However after upgrading, I am now getting
 this

 FWIW: please be specific.  head of what? the 3x branch? or trunk?  what
 revision in svn does that corrispond to? (the svnversion command will
 tell you)

 : HTTP ERROR 400 undefined field: *
 :
 : If I clear the fl parameter (default is set to *, score) then it works
 fine
 : with one big problem, no score data.  If I try and set fl=score I get
 the same
 : error except it says undefined field: score?!
 :
 : This works great in the older version, what changed?  I've googled for
 about
 : an hour now and I can't seem to find anything.

 i can't reproduce this using either trunk (r1067044) or 3x (r1067045)

 all of these queries work just fine...

http://localhost:8983/solr/select/?q=*
http://localhost:8983/solr/select/?q=solrfl=*,score
http://localhost:8983/solr/select/?q=solrfl=score
http://localhost:8983/solr/select/?q=solr

 ...you'll have to proivde us with a *lot* more details to help understand
 why you might be getting an error (like: what your configs look like, what
 the request looks like, what the full stack trace of your error is in the
 logs, etc...)




 -Hoss





Re: AND operator and dismax request handler

2011-02-06 Thread Erick Erickson
Try attaching debugQuery=on to your queries. The results will show
you exactly what the query is after it gets parsed and the difference
should stand out.

About dismax. Try looking at the minimum should match parameter,
that might do what you're looking for. Or, think about edismax if you're on
trunk or 3_x...

Best
Erick

On Sat, Feb 5, 2011 at 9:47 AM, Bagesh Sharma mail.bag...@gmail.com wrote:


 Hi friends, Please suggest me that how can i set query operator to AND for
 dismax request handler case.

 My problem is that i am searching a string water treatment plant using
 dismax request handler . The query formed is of such type 


 http://localhost:8884/solr/select/?q=water+treatment+plantq.alt=*:*start=0rows=5sort=score%20descqt=dismaxomitHeader=true

 My handling for dismax request handler in solrConfig.xml is -

 requestHandler name=dismax class=solr.DisMaxRequestHandler
 default=true
lst name=defaults
str name=facettrue/str
str name=echoParamsexplicit/str
float name=tie0.2/float

str name=qf
TDR_SUBIND_SUBTDR_SHORT^3
TDR_SUBIND_SUBTDR_DETAILS^2
TDR_SUBIND_COMP_NAME^1.5
TDR_SUBIND_LOC_STATE^3
TDR_SUBIND_PROD_NAMES^2.5
TDR_SUBIND_LOC_CITY^3
TDR_SUBIND_LOC_ZIP^2.5
TDR_SUBIND_NAME^1.5
TDR_SUBIND_TENDER_NO^1
/str

str name=pf
TDR_SUBIND_SUBTDR_SHORT^15
TDR_SUBIND_SUBTDR_DETAILS^10
TDR_SUBIND_COMP_NAME^20
/str

str name=qs1/str
int name=ps0/int
str name=mm20%/str
/lst
 /requestHandler


 In the final parsed query it is like

 +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 |
 TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water |
 TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 |
 TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 |
 TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 |
 TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 |
 TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 |
 TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0
 | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2
 (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 |
 TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant |
 TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 |
 TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 |
 TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:water treatment
 plant^10.0 | TDR_SUBIND_COMP_NAME:water treatment plant^20.0 |
 TDR_SUBIND_SUBTDR_SHORT:water treatment plant^15.0)~0.2



 Now it gives me results if any of the word is found from text water
 treatment plant. I think here OR operator is working which finally
 combines
 the results.

 Now i want only those results for which only complete text should be
 matching water treatment plant.

 1. I do not want to make any change in solrConfig.xml dismax handler. If
 possible then suggest any other handler to deal with it.

 2. Does there is really or operator is working in query. basically when i
 query like this

 q=%2Bwater%2Btreatment%2Bplantq.alt=*:*q.op=ANDstart=0rows=5sort=score
 desc,TDR_SUBIND_SUBTDR_OPEN_DATE
 ascomitHeader=truedebugQuery=trueqt=dismax

 OR


 q=water+AND+treatment+AND+plantq.alt=*:*q.op=ANDstart=0rows=5sort=score
 desc,TDR_SUBIND_SUBTDR_OPEN_DATE
 ascomitHeader=truedebugQuery=trueqt=dismax


 Then it is giving different results. Can you suggest what is the difference
 between above two queries.

 Please suggest me for full text search water treatment plant.

 Thanks for your response.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Separating Index Reader and Writer

2011-02-06 Thread Isan Fulia
Hi all,
I have setup two indexes one for reading(R) and other for writing(W).Index R
refers to the same data dir of W (defined in solrconfig via dataDir).
To make sure the R index sees the indexed documents of W , i am firing an
empty commit on R.
With this , I am getting performance improvement as compared to using the
same index for reading and writing .
Can anyone help me in knowing why this performance improvement is taking
place even though both the indexeses are pointing to the same data
directory.

-- 
Thanks  Regards,
Isan Fulia.


Re: Optimize seaches; business is progressing with my Solr site

2011-02-06 Thread Erick Erickson
What does debugQuery=on give you? Second, what optimizatons are you doing?
What shows up in they analysis page? does your admin page show the terms in
your copyfield you expect?

Best
Erick

On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon gear...@sbcglobal.net wrote:

 Thanks to LOTS of information from you guys, my site is up and working.
 It's
 only an API now, I need to work on my OWN front end, LOL!

 I have my second customer. My general purpose repository API is very useful
 I'm
 finding. I will soon be in the business of optimizing the search engine
 part.


 For example. I have a copy field that has the words, 'boogie woogie
 ballroom' on
 lots of records in the copy field. I cannot find those records using
 'boogie/boogi/boog', or the woogie versions of those, but I can with
 ballroom.
 For my VERY first lesson in optimization of search, what might be causing
 that,
 and where are the places to read on the Solr site on this?

 All the best on a Sunday, guys and gals.

  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



Re: Separating Index Reader and Writer

2011-02-06 Thread Peter Sturge
Hi,

We use this scenario in production where we have one write-only Solr
instance and 1 read-only, pointing to the same data.
We do this so we can optimize caching/etc. for each instance for
write/read. The main performance gain is in cache warming and
associated parameters.
For your Index W, it's worth turning off cache warming altogether, so
commits aren't slowed down by warming.

Peter


On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia isan.fu...@germinait.com wrote:
 Hi all,
 I have setup two indexes one for reading(R) and other for writing(W).Index R
 refers to the same data dir of W (defined in solrconfig via dataDir).
 To make sure the R index sees the indexed documents of W , i am firing an
 empty commit on R.
 With this , I am getting performance improvement as compared to using the
 same index for reading and writing .
 Can anyone help me in knowing why this performance improvement is taking
 place even though both the indexeses are pointing to the same data
 directory.

 --
 Thanks  Regards,
 Isan Fulia.



Re: Separating Index Reader and Writer

2011-02-06 Thread Isan Fulia
Hi peter ,
Can you elaborate a little on how performance gain is in cache warming.I am
getting a good improvement on search time.

On 6 February 2011 23:29, Peter Sturge peter.stu...@gmail.com wrote:

 Hi,

 We use this scenario in production where we have one write-only Solr
 instance and 1 read-only, pointing to the same data.
 We do this so we can optimize caching/etc. for each instance for
 write/read. The main performance gain is in cache warming and
 associated parameters.
 For your Index W, it's worth turning off cache warming altogether, so
 commits aren't slowed down by warming.

 Peter


 On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia isan.fu...@germinait.com
 wrote:
  Hi all,
  I have setup two indexes one for reading(R) and other for
 writing(W).Index R
  refers to the same data dir of W (defined in solrconfig via dataDir).
  To make sure the R index sees the indexed documents of W , i am firing an
  empty commit on R.
  With this , I am getting performance improvement as compared to using the
  same index for reading and writing .
  Can anyone help me in knowing why this performance improvement is taking
  place even though both the indexeses are pointing to the same data
  directory.
 
  --
  Thanks  Regards,
  Isan Fulia.
 




-- 
Thanks  Regards,
Isan Fulia.


Re: Separating Index Reader and Writer

2011-02-06 Thread Em

Hi Peter,

I must jump in this discussion: From a logical point of view what you are
saying makes only sense if both instances do not run on the same machine or
at least not on the same drive.

When both run on the same machine and the same drive, the overall used
memory should be equal plus I do not understand why this setup should affect
cache warming etc., since the process of rewarming should be the same.

Well, my knowledge about the internals is not very deep. But from just a
logical point of view - to me - the same is happening as if I would do it in
a single solr-instance. So what is the difference, what do I overlook?

Another thing: While W is committing and writing to the index, is there any
inconsistency in R or isn't there any, because W is writing a new Segment
and so for R there isn't anything different until the commit finished?
Are there problems during optimizing an index?

How do you inform R about the finished commit?

Thank you for your explanation, it's a really interesting topic!

Regards,
Em

Peter Sturge-2 wrote:
 
 Hi,
 
 We use this scenario in production where we have one write-only Solr
 instance and 1 read-only, pointing to the same data.
 We do this so we can optimize caching/etc. for each instance for
 write/read. The main performance gain is in cache warming and
 associated parameters.
 For your Index W, it's worth turning off cache warming altogether, so
 commits aren't slowed down by warming.
 
 Peter
 
 
 On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia isan.fu...@germinait.com
 wrote:
 Hi all,
 I have setup two indexes one for reading(R) and other for
 writing(W).Index R
 refers to the same data dir of W (defined in solrconfig via dataDir).
 To make sure the R index sees the indexed documents of W , i am firing an
 empty commit on R.
 With this , I am getting performance improvement as compared to using the
 same index for reading and writing .
 Can anyone help me in knowing why this performance improvement is taking
 place even though both the indexeses are pointing to the same data
 directory.

 --
 Thanks  Regards,
 Isan Fulia.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Separating-Index-Reader-and-Writer-tp2437666p2438730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimize seaches; business is progressing with my Solr site

2011-02-06 Thread Dennis Gearon
Hmmm, my default distance for geospatial was excluding the results, I  believe. 
I have to check to see if I was actually looking at the desired  return result 
for 'ballroom' alone. Mabye I wasn't.

But I saw a lot to learn when I applied the techniques you gave me. Thank you 
:-)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.





From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, February 6, 2011 8:21:15 AM
Subject: Re: Optimize seaches; business is progressing with my Solr site

What does debugQuery=on give you? Second, what optimizatons are you doing?
What shows up in they analysis page? does your admin page show the terms in
your copyfield you expect?

Best
Erick

On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon gear...@sbcglobal.net wrote:

 Thanks to LOTS of information from you guys, my site is up and working.
 It's
 only an API now, I need to work on my OWN front end, LOL!

 I have my second customer. My general purpose repository API is very useful
 I'm
 finding. I will soon be in the business of optimizing the search engine
 part.


 For example. I have a copy field that has the words, 'boogie woogie
 ballroom' on
 lots of records in the copy field. I cannot find those records using
 'boogie/boogi/boog', or the woogie versions of those, but I can with
 ballroom.
 For my VERY first lesson in optimization of search, what might be causing
 that,
 and where are the places to read on the Solr site on this?

 All the best on a Sunday, guys and gals.

  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



Re: HTTP ERROR 400 undefined field: *

2011-02-06 Thread Otis Gospodnetic
Yup, here it is, warning about needing to reindex:

http://twitter.com/#!/lucene/status/28694113180192768

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sun, February 6, 2011 9:43:00 AM
 Subject: Re: HTTP ERROR 400 undefined field: *
 
 I *think* that there was a post a while ago saying that if you were
 using  trunk 3_x one of the recent changes required re-indexing, but don't
 quote me  on that.
 Have you tried that?
 
 Best
 Erick
 
 On Fri, Feb 4, 2011  at 2:04 PM, Jed Glazner 
jglaz...@beyondoblivion.comwrote:
 
   Sorry for the lack of details.
 
  It's all clear in my head..  :)
 
  We checked out the head revision from the 3.x branch a few  weeks ago (
  https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/).  We
  picked up r1058326.
 
  We upgraded from a previous  checkout (r960098). I am using our customized
  schema.xml and the  solrconfig.xml from the old revision with the new
   checkout.
 
  After upgrading I just copied the data folders from  each core into the new
  checkout (hoping I wouldn't have to re-index the  content, as this takes
  days).  Everything seems to work fine,  except that now I can't get the 
score
  to return.
 
  The  stack trace is attached.  I also saw this warning in the logs not  sure
  exactly what it's talking about:
 
  Feb 3, 2011  8:14:10 PM org.apache.solr.core.Config getLuceneVersion
  WARNING: the  luceneMatchVersion is not specified, defaulting to LUCENE_24
  emulation.  You should at some point declare and reindex to at least 3.0,
  because  2.4 emulation is deprecated and will be removed in 4.0. This
  parameter  will be mandatory in 4.0.
 
  Here is my request handler, the actual  fields here are different than what
  is in mine, but I'm a little  uncomfortable publishing how our companies
  search service works to the  world:
 
  requestHandler name=standard  class=solr.SearchHandler default=true
  lst  name=defaults
  str  name=echoParamsexplicit/str
  str  name=defTypeedismax/str
  bool  name=tvtrue/bool
  !-- standard field to query on  --
  str name=qffield_a^2 field_b^2 field_c^4  /str
 
  !-- automatic phrase boosting! --
   str name=pffield_d^10/str
 
  !-- boost  function --
  !--
  we'll comment this out for now becuase we're passing it to
   solr as a paramter.
  Once we finalize the exact function we should move it here
  and  take it out of the
  query string.
  --
  !--str  name=bflog(linear(field_e,0.001,1))^10/str--
  str  name=tie0.1/str
  /lst
  arr  name=last-components
  strtvComponent/str
   /arr
  /requestHandler
 
  Anyway   Hopefully this is enough info, let me know if you need more.
 
   Jed.
 
 
 
 
 
 
  On 02/03/2011 10:29  PM, Chris Hostetter wrote:
 
  : I was working on an checkout of  the 3.x branch from about 6 months ago.
  : Everything was working  pretty well, but we decided that we should update
  and
  :  get what was at the head.  However after upgrading, I am now  getting
  this
 
  FWIW: please be specific.   head of what? the 3x branch? or trunk?  what
  revision in svn  does that corrispond to? (the svnversion command will
  tell  you)
 
  : HTTP ERROR 400 undefined field: *
   :
  : If I clear the fl parameter (default is set to *, score) then it  works
  fine
  : with one big problem, no score data.   If I try and set fl=score I get
  the same
  : error except  it says undefined field: score?!
  :
  : This works great in  the older version, what changed?  I've googled for
   about
  : an hour now and I can't seem to find  anything.
 
  i can't reproduce this using either trunk  (r1067044) or 3x (r1067045)
 
  all of these queries work  just fine...
 
 http://localhost:8983/solr/select/?q=*
  http://localhost:8983/solr/select/?q=solrfl=*,score
  http://localhost:8983/solr/select/?q=solrfl=score
  http://localhost:8983/solr/select/?q=solr
 
  ...you'll  have to proivde us with a *lot* more details to help understand
  why  you might be getting an error (like: what your configs look like,  
what
  the request looks like, what the full stack trace of your error  is in the
  logs,  etc...)
 
 
 
 
   -Hoss
 
 
 
 


Re: AND operator and dismax request handler

2011-02-06 Thread Grijesh

Hi Bagesh,

I think Hossman and Erick have given you the path that can you choos
and found the desired result.
Try mm value set to 0 to dismax work for your operators AND OR and NOT.

Thanx:
Grijesh
Lucid Imagination Inc.

On Sat, Feb 5, 2011 at 8:17 PM, Bagesh Sharma [via Lucene]
ml-node+2431391-1089615873-85...@n3.nabble.com wrote:
 Hi friends, Please suggest me that how can i set query operator to AND for
 dismax request handler case.

 My problem is that i am searching a string water treatment plant using
 dismax request handler . The query formed is of such type 

 http://localhost:8884/solr/select/?q=water+treatment+plantq.alt=*:*start=0rows=5sort=score%20descqt=dismaxomitHeader=true

 My handling for dismax request handler in solrConfig.xml is -

 requestHandler name=dismax class=solr.DisMaxRequestHandler
 default=true
         lst name=defaults
                 str name=facettrue/str
                 str name=echoParamsexplicit/str
                 float name=tie0.2/float

                 str name=qf
                         TDR_SUBIND_SUBTDR_SHORT^3
                         TDR_SUBIND_SUBTDR_DETAILS^2
                         TDR_SUBIND_COMP_NAME^1.5
                         TDR_SUBIND_LOC_STATE^3
                         TDR_SUBIND_PROD_NAMES^2.5
                         TDR_SUBIND_LOC_CITY^3
                         TDR_SUBIND_LOC_ZIP^2.5
                         TDR_SUBIND_NAME^1.5
                         TDR_SUBIND_TENDER_NO^1
                 /str

                 str name=pf
                         TDR_SUBIND_SUBTDR_SHORT^15
                         TDR_SUBIND_SUBTDR_DETAILS^10
                         TDR_SUBIND_COMP_NAME^20
                 /str

                 str name=qs1/str
                 int name=ps0/int
                 str name=mm20%/str
         /lst
 /requestHandler


 In the final parsed query it is like

 +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 |
 TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water |
 TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 |
 TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 |
 TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 |
 TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 |
 TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 |
 TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0
 | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2
 (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 |
 TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant |
 TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 |
 TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 |
 TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:water treatment
 plant^10.0 | TDR_SUBIND_COMP_NAME:water treatment plant^20.0 |
 TDR_SUBIND_SUBTDR_SHORT:water treatment plant^15.0)~0.2



 Now it gives me results if any of the word is found from text water
 treatment plant. I think here OR operator is working which finally combines
 the results.

 Now i want only those results for which only complete text should be
 matching water treatment plant.

 1. I do not want to make any change in solrConfig.xml dismax handler. If
 possible then suggest any other handler to deal with it.

 2. Does there is really or operator is working in query. basically when i
 query like this

 q=%2Bwater%2Btreatment%2Bplantq.alt=*:*q.op=ANDstart=0rows=5sort=score
 desc,TDR_SUBIND_SUBTDR_OPEN_DATE
 ascomitHeader=truedebugQuery=trueqt=dismax

 OR

 q=water+AND+treatment+AND+plantq.alt=*:*q.op=ANDstart=0rows=5sort=score
 desc,TDR_SUBIND_SUBTDR_OPEN_DATE
 ascomitHeader=truedebugQuery=trueqt=dismax


 Then it is giving different results. Can you suggest what is the difference
 between above two queries.

 Please suggest me for full text search water treatment plant.

 Thanks for your response.


 
 This email was sent by Bagesh Sharma (via Nabble)
 Your replies will appear at
 http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2431391.html
 To receive all replies by email, subscribe to this discussion




-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/AND-operator-and-dismax-request-handler-tp2431391p2441363.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance optimization of Proximity/Wildcard searches

2011-02-06 Thread Salman Akram
Only couple of thousand documents are added daily so the old OS cache should
still be useful since old documents remain same, right?

Also can you please comment on my other thread related to Term Vectors?
Thanks!

On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Yes, OS cache mostly remains (obviously index files that are no longer
 around
 are going to remain the OS cache for a while, but will be useless and
 gradually
 replaced by new index files).
 How long warmup takes is not relevant here, but what queries you use to
 warm up
 the index and how much you auto-warm the caches.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org
  Sent: Sat, February 5, 2011 4:06:54 AM
  Subject: Re: Performance optimization of Proximity/Wildcard searches
 
  Correct me if I am wrong.
 
  Commit in index flushes SOLR cache but of  course OS cache would still be
  useful? If a an index is updated every hour  then a warm up that takes
 less
  than 5 mins should be more than enough,  right?
 
  On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com
wrote:
 
   Salman,
  
   Warming up may be useful if your  caches are getting decent hit ratios.
   Plus, you
   are warming up  the OS cache when you warm up.
  
   Otis
   
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
   Lucene ecosystem  search :: http://search-lucene.com/
  
  
  
   - Original  Message 
From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Fri, February 4, 2011 3:33:41 PM
Subject: Re:  Performance optimization of Proximity/Wildcard searches
   
 I know so we are not really using it for regular warm-ups (in any
  case
index
is updated on hourly basis). Just tried  few times to compare
 results.
The
issue is I am not  even sure if warming up is useful for such
  regular
 updates.
   
   
   
On Fri, Feb 4, 2011  at 5:16 PM, Otis  Gospodnetic 
   otis_gospodne...@yahoo.com
   wrote:
   
 Salman,
 
 I only skimmed your email, but wanted  to say that  this part
 sounds a
   little
 suspicious:
 
   Our warm up script currently  executes  all distinct queries in
 our
logs
  having  count  5. It was run  yesterday (with all the  indexing
update
 every

 It sounds  like this will make  warmup take a long time,
 assuming
you
 have
 more than a  handful distinct  queries in your logs.

 Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem  search :: http://search-lucene.com/

 

 - Original  Message  
  From: Salman Akram salman.ak...@northbaysolutions.net
To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
Sent: Tue, January 25, 2011 6:32:48 AM
   Subject: Re: Performance  optimization of Proximity/Wildcard
 searches
 
  By warmed  index you  only mean warming the SOLR cache or OS
 cache? As
   I
said
  our index is updated every hour so I am  not sure how much SOLR
  cache
  would
   be helpful but OS cache should still be  helpful, right?
  
  I  haven't compared the results   with a proper script but from
 manual
  testing
   here are  some of the observations.
  
  'Recent' queries which  are  in cache of  course return
 immediately
   (only
 if
they are exactly same - even  if they took 3-4 mins first
 time).  I
will
 need
  to test how  many recent  queries stay in  cache but still this
 would
work
 only
  for very common   queries.  User can run different queries and I
 want
   at
   least
  them to be at 'acceptable'  level  (5-10 secs) even if  not very
 fast.
 
   Our warm up script currently   executes all distinct queries in
  our
   logs
  having count  5. It  was  run  yesterday (with all the indexing
   update
  every
   hour after that) and today when  I  executed some of the same
queries
 again
   their time seemed a little less  (around  15-20%), I am  not
 sure if
   this
 means
   anything. However,  still their  time is not acceptable.
  
  What do you  think is the best way to  compare  results? First
 run all
   the
   warm
  up queries and then execute same randomly andcompare?
 
  We are using Windows  server, would it make a  big difference if
  we
   move
  to
  Linux? Our load is not  high but some  queries are really
  complex.
 
   Also I  was hoping to move to SSD in last after trying out  all
software
   options. Is that an  agreed fact that on large indexes (which
 don't
   fit
  in
  RAM)