Is it possible to apply index-time synonyms just for a section of the index

2009-06-25 Thread anuvenk

I've posted a few questions on synonyms before and finally understood how it
worked and settled with index-time synonyms. Seems to work much better than
query time synonyms. But now @ my work, they have a special request. They
want certain synonyms to be applied only to certain sections of the index.
For example, we have legal faqs, forms etc and we have attorneys in our
index.
The following synonyms for example,
california,san diego
florida,miami
So for a search 'real estate san diego', it makes sense to return all faqs,
forms for 'california' in the index but doesn't make sense to return a real
estate attorney elsewhere in california (like burbank) besides just
restricting to san diego attorneys.
To be more clear I want to be able to return all california faqs  forms for
'real estate san diego' but not all california attorneys for the same. That
means, i should index the faqs, forms with the state = city mappings as
above but not for attorneys.
Well I could index all other resources like faqs, forms first with these
synonyms, then remove them and index attorneys. But that wouldn't work well
in my case because we have a scheduler set up that runs every night to index
any new resources from our database.
Can someone suggest a good solution for this?




-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24209490.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is it possible to apply index-time synonyms just for a section of the index

2009-06-25 Thread anuvenk

That's right. Simple. I can very well do that. Why didn't I think of it.
Thanks.

rswart wrote:
 
 What is stopping you from defining different field types for faqs and
 attorneys? One with index time synomyms and one without.
 
 
 
 anuvenk wrote:
 
 I've posted a few questions on synonyms before and finally understood how
 it worked and settled with index-time synonyms. Seems to work much better
 than query time synonyms. But now @ my work, they have a special request.
 They want certain synonyms to be applied only to certain sections of the
 index.
 For example, we have legal faqs, forms etc and we have attorneys in our
 index.
 The following synonyms for example,
 california,san diego
 florida,miami
 So for a search 'real estate san diego', it makes sense to return all
 faqs, forms for 'california' in the index but doesn't make sense to
 return a real estate attorney elsewhere in california (like burbank)
 besides just restricting to san diego attorneys.
 To be more clear I want to be able to return all california faqs  forms
 for 'real estate san diego' but not all california attorneys for the
 same. That means, i should index the faqs, forms with the state = city
 mappings as above but not for attorneys.
 Well I could index all other resources like faqs, forms first with these
 synonyms, then remove them and index attorneys. But that wouldn't work
 well in my case because we have a scheduler set up that runs every night
 to index any new resources from our database.
 Can someone suggest a good solution for this?
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24210788.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms

2009-06-03 Thread anuvenk

I happened to revisit this post that I had started long time back. I'm still
using the same query time synonyms. Now i want to be able to map cities to
states in the synonyms and continuing to have this issue with the multi-word
synonyms. Could you please explain what you've done to overcome this issue
again please. I didn't quite understand what HIER_FAMILIY_01, SYN_FAMILY_01
are. Thanks.

lorenzo zhak wrote:
 
 Hi,
 
 I had to work with this kind of sides effects reguarding multiwords
 synonyms.
 We installed solr on our project that extensively uses synonyms, a big
 list that sometimes could bring out some wrong match as the one
 noticed by Anuvenk
 for instance
 
 dui = drunk driving defense
  or
 dui,drunk driving defense,drunk driving law
 query for dui matches dui = drunk driving defense and dui,drunk
 driving defense,drunk driving law
 
 in order to prevent this kind of behavior I gave for every synonyms
 family (saying a single line in the file) a unique identifier,
 so the list looks like :
 
 dui = HIER_FAMILIY_01
 drunk driving defense = HIER_FAMILIY_01
 SYN_FAMILY_01, dui,drunk driving defense,drunk driving law
 
 I also set the synonyms filter at index time with expand=false, and at
 query time with expand=false
 
 so in this way, the matched synonyms (multi words or single words) in
 documents are replaced with their family identifier, and not all the
 possibilities. Indexing with expand=true will add words in documents
 that could be matched alone, ignoring the fact that they belong to
 multiwords expression, and this could end up with a wrong match
 (intending syns mix) at query time.
 
 so in this way a query for dui, will be changed by the synonym
 filter at query time with HIER_FAMILIY_01 or SYN_FAMILY_01 so
 documents that contains only single words like drunk, driving or
 law will not be matched since only a document with the phrase drunk
 driving law would have been indexed with SYN_FAMILY_01.
 
 The approach worked pretty good on our project and we do not notice
 any sides effects on the searches, it only removes matched documents
 that were considered as noise of the synonyms mix issue.
 
 I think this could be usefull to add this kind of approach on the solr
 synoyms filter section of the wiki,
 
 Cheers
 
 Laurent
 
 
 On Dec 2, 2007 3:41 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:
 Hi (changing to solr-user list)

 Yes it is, especially if the terms left of = are multi-spaced.  Check
 out the Wiki, one page there explains this nicely.

 Otis
 -
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

 - Original Message 
 From: anuvenk anuvenkat...@hotmail.com
 To: solr-...@lucene.apache.org
 Sent: Saturday, December 1, 2007 1:21:49 AM
 Subject: Re: synonyms


 Ideally, would it be a good idea to pass the index data through the
  synonyms
 filter while indexing?
 Also,
 say i have this mapping
 dui = drunk driving defense
  or
 dui,drunk driving defense,drunk driving law

 so matches for dui, will also bring up matches for drunk driving law
  (the
 whole phrase) or does it also bring up all matches for 'drunk' ,
 'driving','law'  ?



 Yonik Seeley wrote:
 
  On Nov 30, 2007 5:39 PM, anuvenk anuvenkat...@hotmail.com wrote:
  Should data be re-indexed everytime synonyms like
  word1,word2
  or
  word1 = word2
 
  are added to synonyms.txt
 
  Yes, if it changes the index (if it's used in the index anaylzer as
  opposed to just the query analyzer).
 
  -Yonik
 
 

 --
 View this message in context:
  http://www.nabble.com/synonyms-tf4925232.html#a14100346
 Sent from the Solr - Dev mailing list archive at Nabble.com.





 
 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-synonyms-tp14116132p23860862.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is there Downside to a huge synonyms file?

2009-06-03 Thread anuvenk

I tried adding some city to state mappings in the synonyms file. I'm using
the dismax handler for phrase matching. So as  when i add more  more city
to state mappings, I end up with zero results for state based searches.
Eg: ca,california,los angeles
 ca,california,san diego
 ca,california,san francisco
 ca,california,burbankand so on
now a city based search returns a few other california results but a state
based search like dui california is returning zero results. 
I checked the parsedquery_toString and I see no 'OR' although the default
operator is 'OR' in schema. It looks like its trying to find matches for all
those cities as they are mapped to 'california' and hence returns zero
results. How to force dismax to use 'OR' and not 'AND' even though the
schema has 'OR'.
Or is this how dismax works? Can someone explain how to overcome this
problem. 
Here is my custom request handler that extends dismax
requestHandler name=qfacet class=solr.DisMaxRequestHandler 
lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfname^2.0 text^0.8/str
 !-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 - 5
shld match; above 6 - 90% match --
 str name=mm3lt;-1 4lt;-1 5lt;-1 6lt;90%/str
 str name=pf
 text^0.8 name^2.0
 /str
 int name=qs4/int
 int name=ps4/int
 str name=fl
 *,score
 /str  

/lst
lst name=invariants
  !--str name=facet.fieldresourceType/str
  str name=facet.fieldcategory/str
  str name=facet.fieldstateName/str--
  str name=facet.sortfalse/str
  int name=facet.mincount1/int
/lst
  /requestHandler

Thanks.



Otis Gospodnetic wrote:
 
 
 Hello,
 
 300K is a pretty small index.  I wouldn't worry about the number of
 synonyms unless you are turning a single term into dozens of ORed terms.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: anuvenk anuvenkat...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 2, 2009 11:28:43 PM
 Subject: Re: Is there Downside to a huge synonyms file?
 
 
 I'm using query time synonyms. I have more fields in my index though.
 This is
 just an example or sample of data from my index. Yes, we don't have
 millions
 of documents. Could be around 300,000 and might increase in future. The
 reason i'm using query time synonyms is because of the nature of my data.
 I
 can't re-index the data everytime i add or remove a synonym. But for this
 particular requirement is it best to have index time synonyms because of
 the
 multi-word synonym nature. Again if i add more cities list to the synonym
 file, I can't be re-indexing all the data over and over again. 
 
 
 
 anuvenk wrote:
  
  In my index i have legal faqs, forms, legal videos etc with a state
 field
  for each resource.
  Now if i search for real estate san diego, I want to be able to return
  other 'california' results i.e results from san francisco.
  I have the following fields in the index
  
  title  state  
  description...
  real estate san diego example 1   california some
  description
  real estate carlsbad example 2 california some desc
  
  so when i search for real estate san francisco, since there is no
 match, i
  want to be able to return the other real estate results in california
  instead of returning none. Because sometimes they might be searching
 for a
  real estate form and city probably doesn't matter. 
  
  I have two things in mind. One is adding a synonym mapping
  san diego, california
  carlsbad, california
  san francisco, california
  
  (which probably isn't the best way)
  hoping that search for san francisco real estate would map san
 francisco
  to california and hence return the other two california results
  
  OR
  
  adding the mapping of city to state in the index itself like..
  
  title state city   
 
 
  description...
  real estate san diego eg 1california   carlsbad, san francisco, san
  diegosome description
  real estate carlsbad eg 2  california   carlsbad, san francisco,
 san
  diegosome description
  
  which of the above two is better. Does a huge synonym file affect
  performance. Or Is there a even better way? I'm sure there is but I
 can't
  put my finger on it yet  I'm not familiar with java either.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23844761.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23861631.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is there Downside to a huge synonyms file?

2009-06-03 Thread anuvenk

A small addition to my earlier post. I wonder if its because of the 'mm'
param, which requires that until 3 words in search phrase, all the words
should be matched. If i alter this now, i'd get ir-relevant results for a
lot of popular 1, 2, 3 word search terms. How to solve for this? 

anuvenk wrote:
 
 I tried adding some city to state mappings in the synonyms file. I'm using
 the dismax handler for phrase matching. So as  when i add more  more
 city to state mappings, I end up with zero results for state based
 searches.
 Eg: ca,california,los angeles
  ca,california,san diego
  ca,california,san francisco
  ca,california,burbankand so on
 now a city based search returns a few other california results but a state
 based search like dui california is returning zero results. 
 I checked the parsedquery_toString and I see no 'OR' although the default
 operator is 'OR' in schema. It looks like its trying to find matches for
 all those cities as they are mapped to 'california' and hence returns zero
 results. How to force dismax to use 'OR' and not 'AND' even though the
 schema has 'OR'.
 Or is this how dismax works? Can someone explain how to overcome this
 problem. 
 Here is my custom request handler that extends dismax
 requestHandler name=qfacet class=solr.DisMaxRequestHandler 
 lst name=defaults
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qfname^2.0 text^0.8/str
  !-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 -
 5 shld match; above 6 - 90% match --
  str name=mm3lt;-1 4lt;-1 5lt;-1 6lt;90%/str
  str name=pf
  text^0.8 name^2.0
  /str
  int name=qs4/int
  int name=ps4/int
  str name=fl
  *,score
  /str  
 
 /lst
 lst name=invariants
   !--str name=facet.fieldresourceType/str
   str name=facet.fieldcategory/str
   str name=facet.fieldstateName/str--
   str name=facet.sortfalse/str
   int name=facet.mincount1/int
 /lst
   /requestHandler
 
 Thanks.
 
 
 
 Otis Gospodnetic wrote:
 
 
 Hello,
 
 300K is a pretty small index.  I wouldn't worry about the number of
 synonyms unless you are turning a single term into dozens of ORed terms.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: anuvenk anuvenkat...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 2, 2009 11:28:43 PM
 Subject: Re: Is there Downside to a huge synonyms file?
 
 
 I'm using query time synonyms. I have more fields in my index though.
 This is
 just an example or sample of data from my index. Yes, we don't have
 millions
 of documents. Could be around 300,000 and might increase in future. The
 reason i'm using query time synonyms is because of the nature of my
 data. I
 can't re-index the data everytime i add or remove a synonym. But for
 this
 particular requirement is it best to have index time synonyms because of
 the
 multi-word synonym nature. Again if i add more cities list to the
 synonym
 file, I can't be re-indexing all the data over and over again. 
 
 
 
 anuvenk wrote:
  
  In my index i have legal faqs, forms, legal videos etc with a state
 field
  for each resource.
  Now if i search for real estate san diego, I want to be able to return
  other 'california' results i.e results from san francisco.
  I have the following fields in the index
  
  title  state  
  description...
  real estate san diego example 1   california some
  description
  real estate carlsbad example 2 california some
 desc
  
  so when i search for real estate san francisco, since there is no
 match, i
  want to be able to return the other real estate results in california
  instead of returning none. Because sometimes they might be searching
 for a
  real estate form and city probably doesn't matter. 
  
  I have two things in mind. One is adding a synonym mapping
  san diego, california
  carlsbad, california
  san francisco, california
  
  (which probably isn't the best way)
  hoping that search for san francisco real estate would map san
 francisco
  to california and hence return the other two california results
  
  OR
  
  adding the mapping of city to state in the index itself like..
  
  title state city  
  
 
  description...
  real estate san diego eg 1california   carlsbad, san francisco,
 san
  diegosome description
  real estate carlsbad eg 2  california   carlsbad, san francisco,
 san
  diegosome description
  
  which of the above two is better. Does a huge synonym file affect
  performance. Or Is there a even better way? I'm sure there is but I
 can't
  put my finger on it yet  I'm not familiar with java either.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Is-there-Downside

Re: Dismax handler phrase matching question

2009-06-02 Thread anuvenk

I have to search over multiple fields so passing everything in the 'q' might
not be neat. Can something be done with the facet.query to accomplish this.
I'm using the facet parameters. I'm not familiar with java so not sure if a
function query could be used to accomplish this. Any other thoughts?


Shalin Shekhar Mangar wrote:
 
 On Tue, Jun 2, 2009 at 12:53 AM, anuvenk anuvenkat...@hotmail.com wrote:
 

 title  state

 dui faq1   california
 dui faq2   florida
 dui faq3   federal

 Now I want to be able to return federal results irrespective of the
 state.
 For example dui california should return all federal results for 'dui'
 also
 along with california results.

 
 Perhaps you just need to create your query in such a way that both match?
 
 q=title:(dui california) state:(dui california) state:federal
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Dismax-handler-phrase-matching-question-tp23820340p23840154.html
Sent from the Solr - User mailing list archive at Nabble.com.



Is there Downside to a huge synonyms file?

2009-06-02 Thread anuvenk

In my index i have legal faqs, forms, legal videos etc with a state field for
each resource.
Now if i search for real estate san diego, I want to be able to return other
'california' results i.e results from san francisco.
I have the following fields in the index

title  state  
description...
real estate san diego example 1   california some
description
real estate carlsbad example 2 california some desc

so when i search for real estate san francisco, since there is no match, i
want to be able to return the other real estate results in california
instead of returning none. Because sometimes they might be searching for a
real estate form and city probably doesn't matter. 

I have two things in mind. One is adding a synonym mapping
san diego, california
carlsbad, california
san francisco, california

(which probably isn't the best way)
hoping that search for san francisco real estate would map san francisco to
california and hence return the other two california results

OR

adding the mapping of city to state in the index itself like..

title state city
  
description...
real estate san diego eg 1california   carlsbad, san francisco, san
diegosome description
real estate carlsbad eg 2  california   carlsbad, san francisco, san
diegosome description

which of the above two is better. Does a huge synonym file affect
performance. Or Is there a even better way? I'm sure there is but I can't
put my finger on it yet  I'm not familiar with java either.

-- 
View this message in context: 
http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23842527.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is there Downside to a huge synonyms file?

2009-06-02 Thread anuvenk

I'm using query time synonyms. I have more fields in my index though. This is
just an example or sample of data from my index. Yes, we don't have millions
of documents. Could be around 300,000 and might increase in future. The
reason i'm using query time synonyms is because of the nature of my data. I
can't re-index the data everytime i add or remove a synonym. But for this
particular requirement is it best to have index time synonyms because of the
multi-word synonym nature. Again if i add more cities list to the synonym
file, I can't be re-indexing all the data over and over again. 



anuvenk wrote:
 
 In my index i have legal faqs, forms, legal videos etc with a state field
 for each resource.
 Now if i search for real estate san diego, I want to be able to return
 other 'california' results i.e results from san francisco.
 I have the following fields in the index
 
 title  state  
 description...
 real estate san diego example 1   california some
 description
 real estate carlsbad example 2 california some desc
 
 so when i search for real estate san francisco, since there is no match, i
 want to be able to return the other real estate results in california
 instead of returning none. Because sometimes they might be searching for a
 real estate form and city probably doesn't matter. 
 
 I have two things in mind. One is adding a synonym mapping
 san diego, california
 carlsbad, california
 san francisco, california
 
 (which probably isn't the best way)
 hoping that search for san francisco real estate would map san francisco
 to california and hence return the other two california results
 
 OR
 
 adding the mapping of city to state in the index itself like..
 
 title state city  
 
 description...
 real estate san diego eg 1california   carlsbad, san francisco, san
 diegosome description
 real estate carlsbad eg 2  california   carlsbad, san francisco, san
 diegosome description
 
 which of the above two is better. Does a huge synonym file affect
 performance. Or Is there a even better way? I'm sure there is but I can't
 put my finger on it yet  I'm not familiar with java either.
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23844761.html
Sent from the Solr - User mailing list archive at Nabble.com.



Dismax handler phrase matching question

2009-06-01 Thread anuvenk

Hello,

   I'm using the dismax handler for the phrase matching. I have a few legal
resources in my index in the following format for example

title  state 

dui faq1   california   
dui faq2   florida
dui faq3   federal

Now I want to be able to return federal results irrespective of the state.
For example dui california should return all federal results for 'dui' also
along with california results. i was thinking of a synonym mapping for the
states like 'state name' = 'federal' 
(i.e california,federal
florida, federal
maine, federal
etc
)
Is there a better way though?
-- 
View this message in context: 
http://www.nabble.com/Dismax-handler-phrase-matching-question-tp23820340p23820340.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread anuvenk

Somebody please help clear this doubt. What more could i do with the dismax
handler to remove results that don't have 'word1'', 'word2', 'word3' etc in
a search phrase not within 5 words of one another, to not come up in the
results?


anuvenk wrote:
 
 From the solr wiki, it sounded like if qs is set to 5 for example,  if
 the search term is 'child custody', only docs with 'child'  'custody'
 within 5 words of one another would be returned in results. Is this
 correct? If so, it doesn't seem to be working for me. I see docs with
 'child'  'custody' more than 5 words of one another (excluding stop
 words) which is resulting in bad user experience as those docs are not so
 relevant. What more could i do to improve quality in the results?
 

-- 
View this message in context: 
http://www.nabble.com/Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20648109.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Please Help !! Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread anuvenk

Thanks for the response. Well my current ps setting works great for most
search terms. But say this typical example, north dakota 1031 exchange
lawyers - we don't have any relevant docs in the index. Solr is returning
the irrelevant doc, just because it found 'lawyer', exchange, north  dakota
somewhere. I thought if there is a way to just not return any results if
they are not within close proximity, it would be great. 

Yonik Seeley wrote:
 
 On Sun, Nov 23, 2008 at 11:51 PM, anuvenk [EMAIL PROTECTED]
 wrote:
 Please help someone...i've been waiting for an answer for the last couple
 of
 days  no one seems to be helping out here. I did search the wiki  this
 forum for an answer. But couldn't find an answer. I know if ps is set to
 5
 words within 5 words of one another receive a boost in score. But is
 there a
 way to not return results that have the words in search terms more than 5
 words apart. ?
 
 Not with dismax.  I'm not sure why it's a problem, given that with
 enough boost you should be able to ensure that all of the results with
 a slop less than 5 appear before other results.
 Anyway, if you want to restrict results to those with a slop of 5, use
 the standard query parser with an explicit sloppy phrase query:
 
 north dakota 1031 exchange lawyers~5
 
 -Yonik
 
 
 Typical example: north dakota 1031 exchange lawyers
 My first result is absolutely ir-relevant. It returned a north dakota doc
 though but had an occurrence of attorney somewhere  an occurrence of
 exchange (not related to 1031 exchange though). They were not within 5
 words
 of one another. My guys have been hammering me reg this relevancy issue.
 Please help someone.

 anuvenk wrote:

 From the solr wiki, it sounded like if qs is set to 5 for example,  if
 the search term is 'child custody', only docs with 'child'  'custody'
 within 5 words of one another would be returned in results. Is this
 correct? If so, it doesn't seem to be working for me. I see docs with
 'child'  'custody' more than 5 words of one another (excluding stop
 words) which is resulting in bad user experience as those docs are not
 so
 relevant. What more could i do to improve quality in the results?


 --
 View this message in context:
 http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20654906.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20655014.html
Sent from the Solr - User mailing list archive at Nabble.com.



Question about Query Phrase Slop (qs) in dismax

2008-11-22 Thread anuvenk

From the solr wiki, it sounded like if qs is set to 5 for example,  if the
search term is 'child custody', only docs with 'child'  'custody' within 5
words of one another would be returned in results. Is this correct? If so,
it doesn't seem to be working for me. I see docs with 'child'  'custody'
more than 5 words of one another (excluding stop words) which is resulting
in bad user experience as those docs are not so relevant. What more could i
do to improve quality in the results?
-- 
View this message in context: 
http://www.nabble.com/Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20643003.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about dismax 'mm' - give boost to searches by location

2008-11-21 Thread anuvenk

Since i didn't receive any response, i think i my question wasn't very clear.
If the phrase has 4 words (last will and testament florida - and will be
removed by stopwordfilter), right now solr matches docs with atleast 3 out
of those 4 words. So whats happening is last will and testament from all
states are returned although user specifically asked for florida will. I
don't want to alter the 'mm' either because its working fine for other
searches. Just for the search terms with a 'location' , i want to be able to
match all words. Any easy way to do this? Someone please?

anuvenk wrote:
 
 I use the 'dismax handler' for my phrase matching. And i have the 'mm' set
 this way:
 Up to 3 words, match all
 up to 4, match 3
 up to 4, match 3  so on
 Its been working fine, but for certain phrases like 'san diego drunk
 driving defense attorney', its brings up dui attorneys for other cities
 first because the way i've set the 'mm' its trying to match docs with any
 4 words from the search phrase. In such cases how do i make solr return
 the san diego listing first. I don't want to make phrase matching stricter
 either (i.e don't want to change the current 'mm' configuration)
 Any way to solve for this?
 

-- 
View this message in context: 
http://www.nabble.com/Question-about-dismax-%27mm%27---give-boost-to-searches-by-location-tp20606730p20628404.html
Sent from the Solr - User mailing list archive at Nabble.com.



Question about dismax 'mm' - give boost to searches by location

2008-11-20 Thread anuvenk

I use the 'dismax handler' for my phrase matching. And i have the 'mm' set
this way:
Up to 3 words, match all
up to 4, match 3
up to 4, match 3  so on
Its been working fine, but for certain phrases like 'san diego drunk driving
defense attorney', its brings up dui attorneys for other cities first
because the way i've set the 'mm' its trying to match docs with any 4 words
from the search phrase. In such cases how do i make solr return the san
diego listing first. I don't want to make phrase matching stricter either
(i.e don't want to change the current 'mm' configuration)
Any way to solve for this?
-- 
View this message in context: 
http://www.nabble.com/Question-about-dismax-%27mm%27---give-boost-to-searches-by-location-tp20606730p20606730.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr sorting question

2008-05-27 Thread anuvenk

Question about sorting with solr. I want to group results in certain sort
order so i can split them  display in tabs easily.
I want to be able to have a custom sort order instead of sort=cat asc score
desc
In the above mentioned way, categories are grouped in ascending order. But i
want certain categories to come up first in the sort order. I don't want
them to be grouped in ascending order. Please shed some light anyone. How to
do it. Is it possible? 
-- 
View this message in context: 
http://www.nabble.com/solr-sorting-question-tp17498596p17498596.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-26 Thread anuvenk

Thanks a lot for clearing my doubts. Would you know if the solr wiki is up to
date with the documentation for the new features that are being added? I
totally rely on the solr wiki documentation for my project. If you may,
please send me the files you had mentioned and i'll be happy to test them. I
appreciate your help !!

scott.tabar wrote:
 
 Anuvenk,
 
 Sorry for this Third email, but I was reading your question below and I
 think it warrants yet another reply.
 
 Just some background from my focus and involvement, and hence the
 generation of the JavaDocs.  I was primarily interested in having a Solr
 based spell checker that behaved more like a traditional spell checker. 
 In my application, when I generated the input in to Solr for inclusion of
 the spell checker indexer, I was only interested in single words and not
 multi-word sets.  My intentions was to send multiple words to the handler
 and have it return details on each word as it stands independently when
 the parameter multiWords was set, otherwise it was to use all input words
 as a single check against the handler.  As such, in my original efforts, I
 had no multiple words in a single term, as you were asking below.  That is
 not to say it is not possible, but I just wanted to let you know the
 original focus of my work.
 
 I did look a little closer at the JavaDocs and it looks like they have
 been updated from what I originally generated.  So perhaps they may be up
 to date?
 
 One thing I would like to point out, is that I put some efforts in
 creating a test case for the SpellCheckerRequestHandler.  If it still
 exists (I have not checked the head for a long time) then it would be a
 good starting point to do some simple testing with limited data sets of
 your own.  Just make a copy of it, and then feed in multi-word terms and
 see how it responds do the different settings.  This will also allow you
 to play around with the configuration settings in the schema and
 solrconfig files without impacting your actual Solr instance and the turn
 around time could be in the seconds and not minutes with each alteration
 of a new test.  
 
 The locations in svn and file names of the unit tests that I created were:
   /test/test-files/solr/conf/schema-spellchecker.xml
   /test/test-files/solr/conf/solrconfig-spellchecker.xml
   /test/org/apache/solr/handler/SpellCheckerRequestHandlerTest.java
 
 If these do not existing in svn currently, let me know and I can pass
 along the contents and you can recreate them locally to test with.
 
   Best of luck,
 Scott Tabar
 
  anuvenk [EMAIL PROTECTED] wrote: 
 
 Thanks. But i'm looking at this
 http://.../spellchecker?indent=ononlyMorePopular=trueaccuracy=.6suggestionCount=20q=facial+salophosphoprotein
 on
 http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html
 It seems to return results (well in the example) 
 with and without extendedResults=true
 does it mean that 'facial salophosphoprotein' was a single term in the
 index. 
 
 
 hossman wrote:
 
 : 
 : I did try with the latest nightly build and followed the steps outlined
 in
 : http://wiki.apache.org/solr/SpellCheckerRequestHandler
 : with regards to creating new catchall field 'spell' of type 'spell' and
 : copied my text fields to 'spell' at index time.
 : Still q=grapics returns 'graphics'
 : but q=grapics card returns nothing.
 : But the same queries return the correct spelling with string
 fieldtypes.
 : Any fix available? 
 
 I don't think Otis was suggesting any specific fix was available in the 
 nightly builds, i believe he was just addressing specificly that if there 
 was a bug someone commited a fix for you didnt' need to wait for 1.3 -- 
 you can test it now using the nightly builds.
 
 That said: I don't see any currently open or recent resolved bugs 
 related to spellchecking and multiple words ... i believe (but i'm not 
 100% positive) that multi word spell correction will work, as long as 
 your dictionary contaisn those multiple words as individual terms
 
 ie: if you want graphics card to be a suggestion for grapics card
 then 
 you need to use a termSourceField in which graphics card is a single 
 term (either because it is untokenized, or maybe because you use a 
 word-based ngram tokenfilter, etc...)
 
 alternately, if you want to get graphics asdfghjk as a suggestion for
 grapics asdfghjk (even though asdfghjk isn't in your index at all), 
 hiting the spellcorrection handler for each input word individually is 
 probably your best bet.
 
 
 :  You don't need to wait for 1.3 to be released - you can simply use a
 :  recent nightly build.
 
 
 -Hoss
 
 
 
 
 -- 
 View this message in context:
 http://www.nabble.com/spellcheckhandler-tp14627712p15100704.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15115105.html
Sent from the Solr - User mailing list archive

Re: spellcheckhandler

2008-01-25 Thread anuvenk

Thanks. But i'm looking at this
http://.../spellchecker?indent=ononlyMorePopular=trueaccuracy=.6suggestionCount=20q=facial+salophosphoprotein
on
http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html
It seems to return results (well in the example) 
with and without extendedResults=true
does it mean that 'facial salophosphoprotein' was a single term in the
index. 


hossman wrote:
 
 : 
 : I did try with the latest nightly build and followed the steps outlined
 in
 : http://wiki.apache.org/solr/SpellCheckerRequestHandler
 : with regards to creating new catchall field 'spell' of type 'spell' and
 : copied my text fields to 'spell' at index time.
 : Still q=grapics returns 'graphics'
 : but q=grapics card returns nothing.
 : But the same queries return the correct spelling with string fieldtypes.
 : Any fix available? 
 
 I don't think Otis was suggesting any specific fix was available in the 
 nightly builds, i believe he was just addressing specificly that if there 
 was a bug someone commited a fix for you didnt' need to wait for 1.3 -- 
 you can test it now using the nightly builds.
 
 That said: I don't see any currently open or recent resolved bugs 
 related to spellchecking and multiple words ... i believe (but i'm not 
 100% positive) that multi word spell correction will work, as long as 
 your dictionary contaisn those multiple words as individual terms
 
 ie: if you want graphics card to be a suggestion for grapics card then 
 you need to use a termSourceField in which graphics card is a single 
 term (either because it is untokenized, or maybe because you use a 
 word-based ngram tokenfilter, etc...)
 
 alternately, if you want to get graphics asdfghjk as a suggestion for
 grapics asdfghjk (even though asdfghjk isn't in your index at all), 
 hiting the spellcorrection handler for each input word individually is 
 probably your best bet.
 
 
 :  You don't need to wait for 1.3 to be released - you can simply use a
 :  recent nightly build.
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15100704.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spell Check Handler

2008-01-25 Thread anuvenk

I followed your instructions exactly. But still have trouble with multiword
queries
for eg: q=grapics returns 'graphics'
but q=grapics card returns nothing.
I even tried with the latest nightly build but didn't solve the problem. Any
solution available.

scott.tabar wrote:
 
 Matthew,
 
 Thanks for the question.  The answer is that they come from your own
 indexes so the dictionary is based upon the actual words that are already
 stored in Solr.  This makes sense; if the spell checker is suggesting a
 word that is not in the Solr index, then it will not help the user find
 what they are looking for.
 
 You can control which fields in Solr can feed the spell checker.  Also you
 can have more than one spell checker that is focused on a specific
 subjects.
 
 The following example of a SpellCheckerRequestHandler is based upon the
 one I created for the test case.  You need to add this to yor
 solrconfig.xml file.  You can view the whole thing within the Solr source
 code once it is commited in to the main stream.  The path is:
 /src/test/test-files/solr/conf/solrconfig-spellchecker.xml and
 schema-spellchecker.xml in the same directory.
 
   !-- SpellCheckerRequestHandler takes in a word (or several words) as
 the
value of the q parameter and returns a list of alternative
 spelling
suggestions.  If invoked with a ...cmd=rebuild, it will rebuild
 the
spellchecker index.
   --
   requestHandler name=spellchecker
 class=solr.SpellCheckerRequestHandler startup=lazy
 !-- default values for query parameters --
  lst name=defaults
int name=suggestionCount20/int
float name=accuracy0.60/float
  /lst
  
  !-- Main init params for handler --
  
  !-- The directory where your SpellChecker Index should live.   --
  !-- May be absolute, or relative to the Solr dataDir directory.
 --
  !-- If this option is not specified, a RAM directory will be used
 --
  str name=spellcheckerIndexDirspell/str
  
  !-- the field in your schema that you want to be able to build --
  !-- your spell index on. This should be a field that uses a very --
  !-- simple FieldType without a lot of Analysis (ie: string) --
  str name=termSourceFieldspell/str
  
/requestHandler
 
 Some comments:
   - The termSourceField should be a field you have defined within your
 solr schema file.  See notes below about the use of this field.
   - The spellcheckeerIndexDir is the name of the directory that contain
 the spellchecker indexes.  In my example, I used spell, and it will be at
 the same level of data and conf.  You can name it what ever you would like
 to.
   - if you use the name of /spellchecker the url will be more RESTful
   - if you need to have more than one spell checker in use at a time, then
 you will need to change the name, spellcheckerIndexDir, and
 termSourceField
   - If you have more than one spell checker hitting the same index
 directory, then when you rebuild the index through one of the handlers the
 other handlers will not know it has been reindexed.  To resolve this
 issue, you may have to restart Solr.  
 
 
 The following components are from the schema-spellchecker.xml file:
 
   fieldType name=spellText class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.StandardFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.StandardFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType
 
 
field name=spell type=spellText indexed=true stored=true /
 
 
 
 Some comments on Schema items above:
   - The fieldType must be contained within the types
   - The spellText content can be named what every you want
   - The spellText fieldType should not be too aggressive on stemming or
 modifying the the contents of the field
   - Could use string instead of the defined fieldType of spellText, but it
 does not have to be that restrictive
 
   - The field spellText needs to be within the fields group with your
 other defined fields
   - You could always use the copyField to either copy another fields
 content into your spell field: 
   copyField source=misc dest=spell/
 
 
 Some notes on the name of the handler:
   - If you precede the name with / you can use the following url instead
 of the second one:
   - using the name of /spellchecker
  http://yourSolrSite/solr/spellchecker?q=sialophosphoprotein 
   - using the name of 

Re: Is it possible to add synonyms run time?

2008-01-25 Thread anuvenk

Here is what it means by injecting at query time:

This is the text field definition i have in my schema

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.TrimFilterFactory /
!--filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/   -- 
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

and a catchall field of type 'text'
field name=text type=text indexed=true stored=false
multiValued=true/

and i do a copyfield to copy my important fields that needs to be searched
during phrase matching to 'text' field at index time.

and the default search field in my case is this catchall field
defaultSearchFieldtext/defaultSearchField


You can see that i've commented out the syn filter at index time but
included it at query time.

Just add your synonyms to the synonyms.txt file and they'll be taken in to
account during query time. You can check that in the parsedquery_tostring
using ur admin tool (localhost:8983/solr/admin/form.jsp)

Like its been discussed here, injecting at index time is helpful in finding
more matches except that everytime u add synonyms, u'll have to re-index
data. It wasn't ideal in my case as my index is huge so i had to do it only
at query time, but that poses some problems sometimes..

i have a couple of unanswered questions , if you may know the answer please
help me.

http://www.nabble.com/Index-time-synonyms-td15073889.html
http://www.nabble.com/solr-synonyms-behaviour-td15051211.html

Ravish Bhagdev wrote:
 
 I see, thanks a lot for this, makes things clear now.
 
 So just to make sure I understand this bit, by injecting synonyms at
 query time you mean basically adding terms implicitly to keywords
 behind the scenes before passing it to solr?  Or is there are more
 conventional method or interface that is being suggested?
 
 Thanks for all the help!
 
 Ravish
 
 On Jan 25, 2008 3:59 PM, Erick Erickson [EMAIL PROTECTED] wrote:
 To me, it's really a question of where the work should be done given your
 problem space. Injecting synonyms at index time allows the queries to be
 simpler/faster. Injecting the synonyms at query time gets complex but is
 more flexible.

 As always, it's a time/space tradeoff. If you're willing to pay the space
 penalty for increased query speed, inject at index time. Otherwise
 you can inject at query time.

 And the query-time injection performance hit may not be trivial.
 Consider,
 for instance, span queries. Do you want to pay the price at query time
 for,
 say a BooleanQuery that is composed of 5 SpanQueries where each
 term in each SpanQuery consists of several OR clauses because of
 synonym injection? Perhaps you do and perhaps you don't. It all depends
 upon what your data looks like and what your performance criteria are.

 And you can do other tricks. Consider rather than indexing all the terms,
 only index the canonical term. That is, consider hit and the synonyms
 strike, popular, punch. you could index hit for any of the 4
 terms,
 then do the same substitution for your query. Which would make your
 index smaller *and* your queries faster.

 But you're right. Injecting synonyms at index time really requires a
 fixed
 synonym list that doesn't vary by user. So if you want synonym
 lists on a per-user basis, you're probably going to have to inject
 synonyms
 at query time.

 Best
 Erick


 On Jan 25, 2008 9:46 AM, Ravish Bhagdev [EMAIL PROTECTED] wrote:

  Yes, I'm fairly new as well.
 
  So do you mean adding words to the query effectively doing an or
  between synonymous terms?  That sounds simple way of doing it, if this
  works, what makes indexing with synonyms useful?
 
  Ravish
 
  On Jan 25, 2008 2:42 PM, Jon Lehto [EMAIL PROTECTED] wrote:
   Hi Ravish,
  
   You may want to think about the synonym dictionary as being a tool on
  the 

Index time synonyms

2008-01-24 Thread anuvenk

I have a hard time understanding the synonyms behaviour..especially because i
don't have the syn filter at index time.

If i have this synonym at index time

Alternative Sentence,Probation before Judgement,Pretrial Diversion

does all occurrence of 'alternative sentence' also get indexed as 'probation
judgement' and 'pretrial diversion' ?
or does it do this wierd grouping 
(alternative probation pretrial)(sentence diversion)judgement

so all occurrences of 'alternative' will be indexed as 'sentence' and
'diversion' ? Then what about the word 'judgement'?
Please someone help me understand this. I have another  question related to
synonyms posted here 
http://www.nabble.com/solr-synonyms-behaviour-td15051211.html
..please help with that too...


-- 
View this message in context: 
http://www.nabble.com/Index-time-synonyms-tp15073889p15073889.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr synonyms behaviour

2008-01-23 Thread anuvenk

I need to understand this synonym behaviour

I have this synonym
divorce mediation,alternative dispute resolution

so when i do a debug this is the parsedquery_tostring i see:
(((text:divorc^0.8 | name:divorc^2.0)~0.01 (text:mediat^0.8 |
name:mediat^2.0)~0.01)~2) (text:(divorc altern) (disput mediat)
resolut~5^0.8 | name:(divorc altern) (disput mediat) resolut~5^2.0)~0.01

I understand how its grouping the synonyms like this:
(divorc altern) (disput mediat) resolut

Now what i don't understand is how its doing the matching

Does it mean it will find all matches with either of the words (divorc
altern), either of the words (disput mediat) (and/or) resolut

I have the synonym filter only at query time coz i can't re-index data (or
portion of data) everytime i add a synonym and a couple of other reasons.

Could someone please explain how the matching works in this case. thanks.

-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p15051211.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-23 Thread anuvenk

I did try with the latest nightly build and followed the steps outlined in
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to creating new catchall field 'spell' of type 'spell' and
copied my text fields to 'spell' at index time.
Still q=grapics returns 'graphics'
but q=grapics card returns nothing.
But the same queries return the correct spelling with string fieldtypes.
Any fix available? 

Otis Gospodnetic wrote:
 
 You don't need to wait for 1.3 to be released - you can simply use a
 recent nightly build.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: anuvenk [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, January 21, 2008 12:35:52 AM
 Subject: Re: spellcheckhandler
 
 
 I followed the steps outlined in 
 http://wiki.apache.org/solr/SpellCheckerRequestHandler
 with regards to setting up of the schema with a new field 'spell' and
 copying other fields to this 'spell' field at index time.
 It works fine with single word queries but doesn't return anything for
 multi-word queries. I read previous posts where this has been
  discussed. I
 read that some of the active members are in the process of releasing
  patches
 that fixes this problem. I'm actually trying to implement this spell
  check
 in the production set up. Is it absolutely not possible to get spell
  check
 results back for multi-word queries, should i wait for 1.3 release. If
  there
 is any other option please educate me. In case a patch was already
  released,
 how to add it to the current 1.2 version that i'm using?
 -- 
 View this message in context:
  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15051336.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-22 Thread anuvenk


I did try with the latest nightly build. The problem still exists. 
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano   returns   'ipod nano' which is good

2) with termsourcefield set to 'spell' (which is the catchall field of
'spell' fieldtype according to the tutorial 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
that has my text fields copied in to it at index time)
q=grapics returns 'graphics' 
but q=grapics card returns nothing.

Not sure if i'm missing something. Please help!!


Otis Gospodnetic wrote:
 
 You don't need to wait for 1.3 to be released - you can simply use a
 recent nightly build.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: anuvenk [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, January 21, 2008 12:35:52 AM
 Subject: Re: spellcheckhandler
 
 
 I followed the steps outlined in 
 http://wiki.apache.org/solr/SpellCheckerRequestHandler
 with regards to setting up of the schema with a new field 'spell' and
 copying other fields to this 'spell' field at index time.
 It works fine with single word queries but doesn't return anything for
 multi-word queries. I read previous posts where this has been
  discussed. I
 read that some of the active members are in the process of releasing
  patches
 that fixes this problem. I'm actually trying to implement this spell
  check
 in the production set up. Is it absolutely not possible to get spell
  check
 results back for multi-word queries, should i wait for 1.3 release. If
  there
 is any other option please educate me. In case a patch was already
  released,
 how to add it to the current 1.2 version that i'm using?
 -- 
 View this message in context:
  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 



-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15025889.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-22 Thread anuvenk

I did try with the latest nightly build and followed the steps outlined in 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to creating new catchall field 'spell' of type 'spell' and
copied my text fields to 'spell' at index time.
Still q=grapics returns 'graphics'
but q=grapics card returns nothing.
But the same queries return the correct spelling with string fieldtypes.
Any fix available?

Otis Gospodnetic wrote:
 
 You don't need to wait for 1.3 to be released - you can simply use a
 recent nightly build.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: anuvenk [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, January 21, 2008 12:35:52 AM
 Subject: Re: spellcheckhandler
 
 
 I followed the steps outlined in 
 http://wiki.apache.org/solr/SpellCheckerRequestHandler
 with regards to setting up of the schema with a new field 'spell' and
 copying other fields to this 'spell' field at index time.
 It works fine with single word queries but doesn't return anything for
 multi-word queries. I read previous posts where this has been
  discussed. I
 read that some of the active members are in the process of releasing
  patches
 that fixes this problem. I'm actually trying to implement this spell
  check
 in the production set up. Is it absolutely not possible to get spell
  check
 results back for multi-word queries, should i wait for 1.3 release. If
  there
 is any other option please educate me. In case a patch was already
  released,
 how to add it to the current 1.2 version that i'm using?
 -- 
 View this message in context:
  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15026217.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-21 Thread anuvenk

I did try with the latest nightly build. The problem still exists. 
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano   returns   'ipod nano' which is good

2) with termsourcefield set to 'spell' (which is the catchall field of
'spell' fieldtype according to the tutorial 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
that has my text fields copied in to it at index time)
q=grapics returns 'graphics' which is good
but q=grapics card returns nothing.

Not sure if i'm missing something. Please help!!


Otis Gospodnetic wrote:
 
 You don't need to wait for 1.3 to be released - you can simply use a
 recent nightly build.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: anuvenk [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, January 21, 2008 12:35:52 AM
 Subject: Re: spellcheckhandler
 
 
 I followed the steps outlined in 
 http://wiki.apache.org/solr/SpellCheckerRequestHandler
 with regards to setting up of the schema with a new field 'spell' and
 copying other fields to this 'spell' field at index time.
 It works fine with single word queries but doesn't return anything for
 multi-word queries. I read previous posts where this has been
  discussed. I
 read that some of the active members are in the process of releasing
  patches
 that fixes this problem. I'm actually trying to implement this spell
  check
 in the production set up. Is it absolutely not possible to get spell
  check
 results back for multi-word queries, should i wait for 1.3 release. If
  there
 is any other option please educate me. In case a patch was already
  released,
 how to add it to the current 1.2 version that i'm using?
 -- 
 View this message in context:
  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15002379.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr 1.3

2008-01-20 Thread anuvenk

when will this be released? where can i find the list of
improvements/enhancements in 1.3 if its been documented already?
-- 
View this message in context: 
http://www.nabble.com/solr-1.3-tp14989395p14989395.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3

2008-01-20 Thread anuvenk

Thanks. Would this be the latest code from the trunk that you mentioned?
http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip


climbingrose wrote:
 
 I don't think they (Solr developers) have a time frame for 1.3 release.
 However, I've been using the latest code from the trunk and I can tell you
 it's quite stable. The only problem is the documentation sometimes doesn't
 cover lastest changes in the code. You'll probably have to dig into the
 code
 itself or post a question here and many people will be happy to help you.
 
 On Jan 21, 2008 12:07 PM, anuvenk [EMAIL PROTECTED] wrote:
 

 when will this be released? where can i find the list of
 improvements/enhancements in 1.3 if its been documented already?
 --
 View this message in context:
 http://www.nabble.com/solr-1.3-tp14989395p14989395.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 
 Cuong Hoang
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-tp14989395p14989689.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3

2008-01-20 Thread anuvenk

Could you please let me know the location from where i can get it.

climbingrose wrote:
 
 I'm using code pulled directly from Subversion.
 
 On Jan 21, 2008 12:34 PM, anuvenk [EMAIL PROTECTED] wrote:
 

 Thanks. Would this be the latest code from the trunk that you mentioned?
 http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip


 climbingrose wrote:
 
  I don't think they (Solr developers) have a time frame for 1.3 release.
  However, I've been using the latest code from the trunk and I can tell
 you
  it's quite stable. The only problem is the documentation sometimes
 doesn't
  cover lastest changes in the code. You'll probably have to dig into the
  code
  itself or post a question here and many people will be happy to help
 you.
 
  On Jan 21, 2008 12:07 PM, anuvenk [EMAIL PROTECTED] wrote:
 
 
  when will this be released? where can i find the list of
  improvements/enhancements in 1.3 if its been documented already?
  --
  View this message in context:
  http://www.nabble.com/solr-1.3-tp14989395p14989395.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Regards,
 
  Cuong Hoang
 
 

 --
 View this message in context:
 http://www.nabble.com/solr-1.3-tp14989395p14989689.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 
 Cuong Hoang
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-tp14989395p14989802.html
Sent from the Solr - User mailing list archive at Nabble.com.



Term vector

2008-01-20 Thread anuvenk

what are term vectors? How do they help with mlt?
-- 
View this message in context: 
http://www.nabble.com/Term-vector-tp14990408p14990408.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Update the index

2008-01-20 Thread anuvenk

http://wiki.apache.org/solr/UpdateXmlMessages
Is this what you are looking for. Index the document again and it should
overwrite the older one with the same id.

Gavin-39 wrote:
 
 Hi,
   Can some one point me to a location where it describes how to update an
 already indexed document? I was thinking there is and update tag
 explained somewhere but cant find it.
 
 Thanks,
 -- 
 Gavin Selvaratnam,
 Project Leader
 
 hSenid Mobile Solutions
 Phone: +94-11-2446623/4 
 Fax: +94-11-2307579 
 
 Web: http://www.hSenidMobile.com 
  
 Make it happen
 
 Disclaimer: This email and any files transmitted with it are confidential
 and intended solely for 
 the use of the individual or entity to which they are addressed. The
 content and opinions 
 contained in this email are not necessarily those of hSenid Software
 International. 
 If you have received this email in error please contact the sender.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Update-the-index-tp14991443p14991551.html
Sent from the Solr - User mailing list archive at Nabble.com.



spell check component

2008-01-19 Thread anuvenk

Is it possible to add a spell check component so i don't have to issue a
separate request to solr to do the spell checking? Sorry if this question is
naive..am just learning to use solr.

searchComponent name=spellcheck
class=org.apache.solr.handler.component.spellcheckComponent /

and add it to the search handler like this

arr name=spellcheck-components
  strspellcheck/str
/arr

what would the name of the spell check component be?

-- 
View this message in context: 
http://www.nabble.com/spell-check-component-tp14973651p14973651.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-19 Thread anuvenk

I was going to do this
create a new field(termsourcefield) called 'spell'
field name=spell type=spell indexed=true stored=false
multiValued=true/
of type 'spell'
fieldType name=spell class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
/fieldType

copy my 'name' and 'body' fields to this 'spell' field at index time
copyField source=name dest=spell/
   copyField source=body dest=spell/

But like you had mentioned, the tutorial says we have to use it on a field
thats not tokenized. Now how to use my tokenized fields 'body' and 'name' to
build my spell index? 

How to use it effectively for spell checking on multi-word queries?


anuvenk wrote:
 
 Is it possible to implement something like this with the spellcheckhandler
 
 Like how google does,..
 
 say i search for 'chater 13 bakrupcy',
 
 should be able to display these..
 
 did you search for 'chapter 13 bankruptcy'
 
 Has someone been able to do this?
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p14977717.html
Sent from the Solr - User mailing list archive at Nabble.com.



phrase slop param in dismax handler

2008-01-05 Thread anuvenk

How does adding a phrase slop in the handler help?
I tried ps=25 along with some pf values. I assumed that it means this..for
eg: a search term, 'child custody battle' means documents which have the
words 'child','custody','battle' within 25 words of one another will rank
high. Is that correct?
-- 
View this message in context: 
http://www.nabble.com/phrase-slop-param-in-dismax-handler-tp14631171p14631171.html
Sent from the Solr - User mailing list archive at Nabble.com.



what are tf,idf,fieldNorm,queryNorm.?

2008-01-05 Thread anuvenk

I understand tf means term frequency. For eg: if the search term is 'chapter
7', does tf mean how frequently 'chapter 7' occurs in the docs? Does it take
in to account the total number of words in a doc to determine frequency.
Also what is idf, fieldNorm and queryNorm. Trying to understand how solr
calculates the solr score. 
-- 
View this message in context: 
http://www.nabble.com/what-are-tf%2Cidf%2CfieldNorm%2CqueryNorm.--tp14639048p14639048.html
Sent from the Solr - User mailing list archive at Nabble.com.



parsedquery_ToString

2008-01-04 Thread anuvenk

Is the parsedquery_ToString, the one passed to solr after all the tokenizing
and analyzing of the query? 
For the search term 'chapter 7' i have this parsedquery_ToString
str name=parsedquery_toString
+(text:(bankruptci chap 7) (7 chapter chap) 7 bankruptci^0.8 |
((name:bankruptci name:chap)^2.0))~0.01 (text:(bankruptci chap 7) (7
chapter chap) 7 bankruptci~50^0.8 | ((name:bankruptci name:chap)^2.0))~0.01
/str

I have these synonyms
chap 7 = bankruptcy
chapter = bankruptcy
chap = chapter
chapter 7 = bankruptcy
bankrupcy = bankruptcy
chap,7,chap7,chapter 7,chapter 7 bankruptcy,chap 7

But seem to have a little bit of trouble understanding how its building this
parsedquery_Tostring

Can someone explain. If i can understand this, i'll be able to debug better
and analyze why i don't get expected results for some of the search terms
and what change i could make to the associated synonyms. 
-- 
View this message in context: 
http://www.nabble.com/parsedquery_ToString-tp14627131p14627131.html
Sent from the Solr - User mailing list archive at Nabble.com.



spellcheckhandler

2008-01-04 Thread anuvenk

Is it possible to implement something like this with the spellcheckhandler

Like how google does,..

say i search for 'chater 13 bakrupcy',

should be able to display these..

did you search for 'chapter 13 bankruptcy'

Has someone been able to do this?
-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p14627712.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr results debugging

2008-01-04 Thread anuvenk

I've been using the solr admin form with debug=true to do some in-depth
analysis on some results. Could someone explain how to make sense of
this..This is the debugging info for the first result i got.


10.201284 = (MATCH) sum of:
  6.2467875 = (MATCH) max plus 0.01 times others of:
6.236769 = (MATCH) weight(text:(probat trust live inherit)
testament^0.8 in 48784), product of:
  0.7070911 = queryWeight(text:(probat trust live inherit)
testament^0.8), product of:
0.8 = boost
18.032305 = idf(text:(probat trust live inherit) testament^0.8)
0.049015578 = queryNorm
  8.820319 = (MATCH) fieldWeight(text:(probat trust live inherit)
testament^0.8 in 48784), product of:
2.236068 = tf(phraseFreq=5.0)
18.032305 = idf(text:(probat trust live inherit)
.

and it continues some more..

search query: will

synonyms that i have: will, living will, last will and testament, living
trust, inheritance,probate

here is my request handler:

(portion of it)

str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qftext^0.8 name^2.0/str
 !-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 - 5
shld match; above 6 - 90% match --
 str name=mm3lt;-1 4lt;-1 5lt;-1 6lt;90%/str
 str name=pf
 text^0.8 name^2.0
 /str
 int name=ps50/int
-- 
View this message in context: 
http://www.nabble.com/solr-results-debugging-tp14628463p14628463.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr word delimiter

2008-01-04 Thread anuvenk

I have the word delimiter filter factory in the text field definition both at
index and query time. 
But it does have some negative effects on some search terms like h1-b visa
It splits this in to three tokens h,1,b. Now if i understand right, does
solr look for matches for 'h' separately, '1' separately and 'b' separately
because they are three different tokens. This is giving some undesired
results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere. How
to solve this problem?
I tried adding synonym like h1-b = h1b visa
It does filter some results, but i'm trying to find a global solution rather
adding synonyms for all kinds of immigration forms like i-94, k-1 etc
-- 
View this message in context: 
http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html
Sent from the Solr - User mailing list archive at Nabble.com.