subject:"RE\: synonyms"

Re: Synonyms relationships

2018-10-31 Thread Doug Turnbull

Synonyms in Solr are really a kind of "programmers" tool, useful for
mapping terms to other terms. This need not correspond to linguistic
notions of a synonym or hypernomy/hyponomy.

That being said, there's probably half a dozen approaches for doing these
kinds of taxonomical relationships in Solr on top of synonyms

Here's some resources / techniques we use at OpenSource Connections for
clients
https://www.youtube.com/watch?v=90F30PS-884
https://opensourceconnections.com/blog/2017/11/21/solr-synonyms-mea-culpa/
https://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/

(last one is ES, but same ideas apply...)

Best,
-Doug

On Wed, Oct 31, 2018 at 6:20 AM Nicolas Paris 
wrote:

> Hi
>
> Does SolR provide a way to describe synonyms relationships such
> "equivalent to" ,"narrower thant", "broader than" ?
>
> It turns out both postgres and oracle do, but I can't find any related
> information in the documentation.
>
> This is useful to allow generalizing the terms of the research or not.
>
> Thanks ,
>
>
> --
> nicolas
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: synonyms for Solr Cloud -

2018-09-30 Thread Zheng Lin Edwin Yeo

You can have the synonyms text file in the same config folder as the rest
of your files like solrconfig.xml that you will push to Solr Cloud.
When you push the config file to Solr Cloud, the synonyms text file will be
push in to Solr Cloud together.
In your solrconfig.xml, you will need to add the SynonymFilterFactory as
per normal.

Regards,
Edwin

On Wed, 19 Sep 2018 at 11:58, Rathor, Piyush (US - Philadelphia) <
prat...@deloitte.com> wrote:

> Hi All,
>
>
>
> How can we add a synonyms text file to solr cloud. I have a text file with
> comma separated synonyms.
>
>
>
>
>
> *Thanks & Regards*
>
> *Piyush Rathor*
>
> Consultant
>
> Deloitte Digital (Salesforce.com / Force.com)
>
> Deloitte Consulting Pvt. Ltd.
>
> *Office*: +1 (615) 209 4980
>
> *Mobile *: +1 (302) 397 1491
>
> prat...@deloitte.com | www.deloitte.com
>
> [image: cid:image001.png@01D012F3.6C4D42E0]
>
> Please consider the environment before printing.
>
>
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message and any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, by you is strictly prohibited.
>
> v.E.1
>

Re: synonyms question

2018-07-18 Thread ennio

Vicenzo,

Thank you for the tip. I restarted Solr and it worked.

-Ennio



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread Vincenzo D'Amore

Have you reloaded the core (or restarted Solr) after the change in the synonyms 
file?

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 17 Jul 2018, at 20:04, ennio  wrote:
> 
> No not using SolrCloud.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread ennio

No not using SolrCloud.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread Vincenzo D'Amore

Ennio, do you know if you have SolrCloud?

On Tue, Jul 17, 2018 at 7:19 PM ennio  wrote:

> Erick,
>
> I'm invoking the synonym at query time.
>
> Here is my fieldType definition.
>
>  positionIncrementGap="100">
>   
> 
>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
> 
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Vincenzo D'Amore

Re: synonyms question

2018-07-17 Thread ennio

Erick,

I'm invoking the synonym at query time.

Here is my fieldType definition.


  







  
  







  





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread Andrea Gazzarini


Hi Ennio,
could you please share:

 * your configuration (specifically the field type declaration in your
   schema)
 * the query (please add debug=true) and the corresponding query response

Best,
Andrea

On 17/07/18 17:35, Ennio Bozzetti wrote:

I'm trying to get my synonyms to work, but for this one keyword I cannot get it 
to work.

I added the following to my synonyms file.

fiber,fibre

But when I search for fiber or fibre it does not work.

Fiber is the American English spelling and Fibre is the British English 
spelling.

My field type is set to text_en would that be why?

Thanks,

Ennio Bozzetti
Senior Web Programmer
THORLABS
(973) 300-2561
www.thorlabs.com

Re: synonyms question

2018-07-17 Thread Erick Erickson

You have to look at the analysis chain for text_en. Is the synonym
factory being invoked? If so at indexing time or query time? If
indexing time, did you have the synonym defined when you indexed the
data originally? If in cloud mode did you push the configs to
Zookeeper and reload the collection before indexing and/or querying?

The admin UI>>collection>>analysis page is very helpful.

Best,
Erick

On Tue, Jul 17, 2018 at 8:35 AM, Ennio Bozzetti  wrote:
> I'm trying to get my synonyms to work, but for this one keyword I cannot get 
> it to work.
>
> I added the following to my synonyms file.
>
> fiber,fibre
>
> But when I search for fiber or fibre it does not work.
>
> Fiber is the American English spelling and Fibre is the British English 
> spelling.
>
> My field type is set to text_en would that be why?
>
> Thanks,
>
> Ennio Bozzetti
> Senior Web Programmer
> THORLABS
> (973) 300-2561
> www.thorlabs.com
>

RE: Synonyms in query time, configured as managed resouces

2016-12-26 Thread Daniel Moura

(now in the correct thread... sorry)

Hi Eric,

My use case is quite simple. I was using the following configuration:

But once my client is asking to add some synonyms, I changed to managed 
resources. And now, I just want to have sure if the following configuration is 
correct:

What do you think? In terms of configuration, this is well defined?
I tested it without errors.
But... How can I know (or have sure) that my managed synonyms are being used in 
query time? How to test it? How to validate it?

Regards,
Daniel Moura

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 26 de dezembro de 2016 16:01
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Synonyms in query time, configured as managed resouces

What happens when you test it? Are you getting some kind of error?

Best,
Erick

On Mon, Dec 26, 2016 at 7:19 AM, Daniel Moura <daniel.mo...@novabase.pt> wrote:
> Hi all!
>
> I will need your help guys.
>
> We now need to know if the following definition is correct to have synonyms 
> in query time and configured as managed resources, for the watson_text type 
> that is the type used to indexed fields.
>
>  class="com.ibm.watson.hector.plugins.fieldtype.WatsonTextField" 
> omitNorms="false" omitTermFreqAndPositions="false" indexed="true" 
> termOffsets="true" stored="true" termPositions="true" 
> termVectors="true">   class="solr.StandardTokenizerFactory"/>
> 
> 
>  protected="lang/protwords_en.txt"/>
> 
> 
> 
> 
>  
>  ignoreCase="true"/> 
> 
>  protected="lang/protwords_en.txt"/>
> 
> 
> 
>
>
>
> Thank you, I'll wait your answer asap.
>
> Cheers,
>
> DM
>

Re: Synonyms in query time, configured as managed resouces

2016-12-26 Thread Erick Erickson

What happens when you test it? Are you getting some kind of error?

Best,
Erick



On Mon, Dec 26, 2016 at 7:19 AM, Daniel Moura  wrote:
> Hi all!
>
> I will need your help guys.
>
> We now need to know if the following definition is correct to have synonyms 
> in query time and configured as managed resources, for the watson_text type 
> that is the type used to indexed fields.
>
>  class="com.ibm.watson.hector.plugins.fieldtype.WatsonTextField" 
> omitNorms="false" omitTermFreqAndPositions="false" indexed="true" 
> termOffsets="true" stored="true" termPositions="true" termVectors="true">
> 
> 
> 
> 
>  protected="lang/protwords_en.txt"/>
> 
> 
> 
> 
> 
>  ignoreCase="true"/>
> 
> 
>  protected="lang/protwords_en.txt"/>
> 
> 
> 
>
>
>
> Thank you, I'll wait your answer asap.
>
> Cheers,
>
> DM
>

RE: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Markus Jelsma

Hello - it looks like you have synonyms enabled at query time, which is fine, 
but also means TF*IDF stats are different for tbrush and toothbrush, causing 
this order to be the way it is. There is no solution available in Solr right 
now that would boost user-entered terms over expanded synonyms but you can try 
this [1] plugin. You can solve the second problem by using phrase boosting in 
dismax, see the pf parameter.

[1]: https://github.com/healthonnet/hon-lucene-synonyms

M.

-Original message-
> From:Brian Narsi 
> Sent: Tuesday 1st December 2015 2:36
> To: solr-user@lucene.apache.org
> Subject: Synonyms in Search Results and More Accurate Matches
> 
> I am using edismax with mm=1 and qs=6
> 
> I have a field type with synonyms attached to it.
> A sample synonym is:
> 
> toothbrush tbrush
> 
> For the following data:
> 
> 1) Phillips toothbrush
> 
> 2) Oral-B tbrush
> 
> 3) Phillips Sonicare toothbrush
> 
> 
> If a user searches for
> 
> q = tbrush
> 
> I am getting
> 
> 1), 3), 2)
> 
> i.e. the one with exact synonym match is last
> 
> If a user searches for
> 
> q = toothbrush
> 
> Result is
> 
> 2), 1), 3)
> 
> i.e. one with exact match is not first
> 
> 
> So
> 
> 1) I need the exact matches to be first and then the other matches (because
> of being a synonym)
> 
> 2) If a user searches for Phillips Sonicare toothbrush; I need the results
> in the following order 3), 1), 2) i.e. most relevant results should be
> higher
> 
> What changes do I need to make in order to accomplish this?
> 
> Thanks in advance,
>

Re: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Erik Hatcher

One technique that works well is to use copyField to end up with two indexed 
fields, one with synonyms, one without.  Then you can qf=title^5 
title_with_synonyms^1 with edismax and weight the “exacter” field higher than 
one with synonyms.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Nov 30, 2015, at 8:36 PM, Brian Narsi  wrote:
> 
> I am using edismax with mm=1 and qs=6
> 
> I have a field type with synonyms attached to it.
> A sample synonym is:
> 
> toothbrush tbrush
> 
> For the following data:
> 
> 1) Phillips toothbrush
> 
> 2) Oral-B tbrush
> 
> 3) Phillips Sonicare toothbrush
> 
> 
> If a user searches for
> 
> q = tbrush
> 
> I am getting
> 
> 1), 3), 2)
> 
> i.e. the one with exact synonym match is last
> 
> If a user searches for
> 
> q = toothbrush
> 
> Result is
> 
> 2), 1), 3)
> 
> i.e. one with exact match is not first
> 
> 
> So
> 
> 1) I need the exact matches to be first and then the other matches (because
> of being a synonym)
> 
> 2) If a user searches for Phillips Sonicare toothbrush; I need the results
> in the following order 3), 1), 2) i.e. most relevant results should be
> higher
> 
> What changes do I need to make in order to accomplish this?
> 
> Thanks in advance,

Re: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Jack Krupansky

Index-time synonym expansion maximizes recall (not missing any documents,
especially partial matches), but minimizes precision and relevancy - you
are unable to select or boost exact matches. Ditto for ngrams.

As Erik indicates, using edismax with separate fields for precision (exact
matches) and recall (even the most remote partial match to avoid missing
any documents) with a much higher boost for exact matches.


-- Jack Krupansky

On Tue, Dec 1, 2015 at 10:10 AM, Erik Hatcher 
wrote:

> One technique that works well is to use copyField to end up with two
> indexed fields, one with synonyms, one without.  Then you can qf=title^5
> title_with_synonyms^1 with edismax and weight the “exacter” field higher
> than one with synonyms.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
> > On Nov 30, 2015, at 8:36 PM, Brian Narsi  wrote:
> >
> > I am using edismax with mm=1 and qs=6
> >
> > I have a field type with synonyms attached to it.
> > A sample synonym is:
> >
> > toothbrush tbrush
> >
> > For the following data:
> >
> > 1) Phillips toothbrush
> >
> > 2) Oral-B tbrush
> >
> > 3) Phillips Sonicare toothbrush
> >
> >
> > If a user searches for
> >
> > q = tbrush
> >
> > I am getting
> >
> > 1), 3), 2)
> >
> > i.e. the one with exact synonym match is last
> >
> > If a user searches for
> >
> > q = toothbrush
> >
> > Result is
> >
> > 2), 1), 3)
> >
> > i.e. one with exact match is not first
> >
> >
> > So
> >
> > 1) I need the exact matches to be first and then the other matches
> (because
> > of being a synonym)
> >
> > 2) If a user searches for Phillips Sonicare toothbrush; I need the
> results
> > in the following order 3), 1), 2) i.e. most relevant results should be
> > higher
> >
> > What changes do I need to make in order to accomplish this?
> >
> > Thanks in advance,
>
>

Re: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Brian Narsi

I do not have synonyms enabled at query time. Below is my fieldtype














On Tue, Dec 1, 2015 at 4:18 AM, Markus Jelsma 
wrote:

> Hello - it looks like you have synonyms enabled at query time, which is
> fine, but also means TF*IDF stats are different for tbrush and toothbrush,
> causing this order to be the way it is. There is no solution available in
> Solr right now that would boost user-entered terms over expanded synonyms
> but you can try this [1] plugin. You can solve the second problem by using
> phrase boosting in dismax, see the pf parameter.
>
> [1]: https://github.com/healthonnet/hon-lucene-synonyms
>
> M.
>
> -Original message-
> > From:Brian Narsi 
> > Sent: Tuesday 1st December 2015 2:36
> > To: solr-user@lucene.apache.org
> > Subject: Synonyms in Search Results and More Accurate Matches
> >
> > I am using edismax with mm=1 and qs=6
> >
> > I have a field type with synonyms attached to it.
> > A sample synonym is:
> >
> > toothbrush tbrush
> >
> > For the following data:
> >
> > 1) Phillips toothbrush
> >
> > 2) Oral-B tbrush
> >
> > 3) Phillips Sonicare toothbrush
> >
> >
> > If a user searches for
> >
> > q = tbrush
> >
> > I am getting
> >
> > 1), 3), 2)
> >
> > i.e. the one with exact synonym match is last
> >
> > If a user searches for
> >
> > q = toothbrush
> >
> > Result is
> >
> > 2), 1), 3)
> >
> > i.e. one with exact match is not first
> >
> >
> > So
> >
> > 1) I need the exact matches to be first and then the other matches
> (because
> > of being a synonym)
> >
> > 2) If a user searches for Phillips Sonicare toothbrush; I need the
> results
> > in the following order 3), 1), 2) i.e. most relevant results should be
> > higher
> >
> > What changes do I need to make in order to accomplish this?
> >
> > Thanks in advance,
> >
>

Re: Synonyms within FQ

2015-06-01 Thread John Blythe

after further investigation it looks like the synonym i was testing against
was only associated with one of their multiple divisions (despite being the
most common name for them!). it looks like this may clear the issue up, but
thanks anyway!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-

Re: Synonyms within FQ

2015-06-01 Thread John Blythe

Thanks Erick!

On Mon, Jun 1, 2015 at 11:29 AM, Erick Erickson erickerick...@gmail.com
wrote:

 For future reference, fq clauses are parsed just like the q clause;
 they can be arbitrarily complex.
 Best,
 Erick
 On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote:
 after further investigation it looks like the synonym i was testing against
 was only associated with one of their multiple divisions (despite being the
 most common name for them!). it looks like this may clear the issue up, but
 thanks anyway!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-

Re: Synonyms within FQ

2015-06-01 Thread Erick Erickson

For future reference, fq clauses are parsed just like the q clause;
they can be arbitrarily complex.

Best,
Erick

On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote:
 after further investigation it looks like the synonym i was testing against
 was only associated with one of their multiple divisions (despite being the
 most common name for them!). it looks like this may clear the issue up, but
 thanks anyway!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-

Re: Synonyms Search using solr

2014-10-24 Thread Walter Underwood

Use the SynonymFilterFactory in the indexer part of your analyzer chain. 

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Oct 23, 2014, at 11:04 PM, Danesh Kuruppu dknkuru...@gmail.com wrote:

 Hi all,
 
 I need to get the synonyms search using solr. what the best possible way of
 doing this. is there are any documentation to follow.
 
 Thanks
 Danesh

Re: Synonyms - 20th and 20

2014-06-19 Thread Erick Erickson

You almost certainly have WordDelimiterFilterFactory in your analysis
chain after the synonym insertion. It's _job_ is to split on
letter/non-letter transitions.

The admin/analysis page is your friend.

Best,
Erick

On Wed, Jun 18, 2014 at 12:47 PM, Diego Fernandez difer...@redhat.com wrote:
 What tokenizer and filters are you using?

 Diego Fernandez - 爱国
 Software Engineer
 US GSS Supportability - Diagnostics


 - Original Message -
 I have a synonyms.txt file which has
 20th,twentieth

 Once I apply the synonym, I see 20th, twentieth and 20 for 20th.
 Does anyone know where 20 comes from? How can I have only 20th and
 twentieth?

 Thanks,

 Jae

Re: Synonyms - 20th and 20

2014-06-18 Thread Diego Fernandez

What tokenizer and filters are you using?

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


- Original Message -
 I have a synonyms.txt file which has
 20th,twentieth
 
 Once I apply the synonym, I see 20th, twentieth and 20 for 20th.
 Does anyone know where 20 comes from? How can I have only 20th and
 twentieth?
 
 Thanks,
 
 Jae

Re: Synonyms and spellings

2014-01-28 Thread Alexei Martchenko

2) There are some synonym lists on the web, they aren't always complete but
I keep analyzing fields and tokens in order to polish my synonyms. And I
like to use tools like http://www.visualthesaurus.com/ to aid me.

Hope this helps :-)


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-28 rashmi maheshwari maheshwari.ras...@gmail.com

 Hi,

 Questions 1)  Why do we use Spellings file under solr core conf folder?
 What spellings do we enter in this?

 Question 2) : Implementing all synonyms is a tough thing. From where could
 i get list of as many synonyms as we could see in google search?




 --
 Rashmi
 Be the change that you want to see in this world!
 www.minnal.zor.org
 disha.resolve.at
 www.artofliving.org

Re: Synonyms and spellings

2014-01-28 Thread rashmi maheshwari

Thanks for quick response Alexei.

I will check this link to prepare synonym list.


On Tue, Jan 28, 2014 at 11:00 PM, Alexei Martchenko 
ale...@martchenko.com.br wrote:

 2) There are some synonym lists on the web, they aren't always complete but
 I keep analyzing fields and tokens in order to polish my synonyms. And I
 like to use tools like http://www.visualthesaurus.com/ to aid me.

 Hope this helps :-)


 alexei martchenko
 Facebook http://www.facebook.com/alexeiramone |
 Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
 Steam http://steamcommunity.com/id/alexeiramone/ |
 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
 Github https://github.com/alexeiramone | (11) 9 7613.0966 |


 2014-01-28 rashmi maheshwari maheshwari.ras...@gmail.com

  Hi,
 
  Questions 1)  Why do we use Spellings file under solr core conf folder?
  What spellings do we enter in this?
 
  Question 2) : Implementing all synonyms is a tough thing. From where
 could
  i get list of as many synonyms as we could see in google search?
 
 
 
 
  --
  Rashmi
  Be the change that you want to see in this world!
  www.minnal.zor.org
  disha.resolve.at
  www.artofliving.org
 




-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org

Re: synonyms and term position

2013-10-09 Thread Furkan KAMACI

Could you send screenshot of  admin Analysis page when trying to analyze
that words?


2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Hi:

 I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a
 problem using SynonymFilterFactory within the process chain
 SynonymFilterFactory, StopFilterFactory .

 I have configured synonyms.txt to expand the word AIO as: all-in-one. Well,
 when using solr 1.4 I get the following result (term position) when
 analysing the string one aio two.

 Solr 1.4 after synonym:

 term position |1 | 2 |3 |4 |5
 term text |one| all |in |one |two

 Solr 1.4 after stopfilter (in term is deleted and terms all and one
 are consecutive)

 term position |1 | 2 |4 |5
 term text |one| all |one |two



 But when using solr4.4 I get:

 Solr 4.4 after synonym:

 term position |1 | 2 |3 |4 |3
 term text |one| all |in |one |two

 Solr 4.4 after stop (in is deleted and the term two is now close to
 all :

 term position |1 | 2 |4 |3
 term text |one| all |one |two



 The problem is that the second word two is in position 3 in solr4.4 so
 when I try to search aio, in solr1.4 I get results, but find nothing using
 Solr4. Is there any option to configure solr4 that imitates solr1.4
 behavior.


 Regards.




 Please, find attached the fieldtype configuration.

 fieldType name=text class=solr.TextField positionIncrementGap=100
 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt /
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt /
 /analyzer
 /fieldType

Re: synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo

Sure,

Find attached the screenshots with almost all the analysis, (dont worry
about the lowercase and the porter stemmer)

Regards.




On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as: all-in-one.
 Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in solr4.4 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType

Re: synonyms and term position

2013-10-09 Thread Furkan KAMACI

Does two has a synonym of in and one?


2013/10/9 Furkan KAMACI furkankam...@gmail.com

 Does two has a synonym of in and one?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Sure,

 Find attached the screenshots with almost all the analysis, (dont worry
 about the lowercase and the porter stemmer)

 Regards.




 On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm
 having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as: all-in-one.
 Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and
 one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in solr4.4
 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType

Re: synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo

No, it has no synonyms.


On Wed, Oct 9, 2013 at 10:48 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Does two has a synonym of in and one?


 2013/10/9 Furkan KAMACI furkankam...@gmail.com

 Does two has a synonym of in and one?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Sure,

 Find attached the screenshots with almost all the analysis, (dont worry
 about the lowercase and the porter stemmer)

 Regards.




 On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI 
 furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm
 having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as: all-in-one.
 Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and
 one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close
 to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in solr4.4
 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType

Re: synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo

The synonyms.txt has defined the next associations defined.

AIO=All in one
aio=all-in-one

Regards.


On Wed, Oct 9, 2013 at 11:05 AM, Alvaro Cabrerizo topor...@gmail.comwrote:

 No, it has no synonyms.


 On Wed, Oct 9, 2013 at 10:48 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Does two has a synonym of in and one?


 2013/10/9 Furkan KAMACI furkankam...@gmail.com

 Does two has a synonym of in and one?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Sure,

 Find attached the screenshots with almost all the analysis, (dont worry
 about the lowercase and the porter stemmer)

 Regards.




 On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI 
 furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to
 analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm
 having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as:
 all-in-one. Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and
 one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close
 to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in
 solr4.4 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType

Re: synonyms not working

2013-09-11 Thread Erick Erickson

Attach debug=query to your URL and inspect the parsed
query, you should be seeing the substitutions if you're
configured correctly. Multi-word synonyms at query time
have the getting through the query parser problem.

Best
Erick


On Wed, Sep 11, 2013 at 11:04 AM, cheops m.schm...@mediaskill.de wrote:

 Hi,
 I'm using solr4.4 and try to use different synonyms based on different
 fieldtypes:

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType


 ...I have the same fieldtype for english (name=text_general_en and
 synonyms=synonyms_en.txt).
 The first fieldtype works fine, my synonyms are processed and the result is
 as expected. But the en-version doesn't seem to work. I'm able to find
 the
 original english words but the synonyms are not processed.
 ps: yes, i know using synonyms at query time is not a good idea :-) ... but
 can't change it here

 Any help would be appreciated!

 Thank you.

 Best regards
 Marcus



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/synonyms-not-working-tp4089318.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms not working

2013-09-11 Thread cheops

thanx for your help. could solve the problem meanwhile!
i used 
analyzer type=query_en
...which is wrong, it must be
analyzer type=query





--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-not-working-tp4089318p4089345.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms with wildcard search

2013-07-30 Thread Jack Krupansky

Sorry, but Solr synonym processing does not know about wildcards, so it is 
bypassed when a wildcard is present.


Technically, it could probably be enhanced to support them, at least for 
some common special cases such as yours, but that prospect won't help you 
right now.


Your best bet is to preprocess your queries in your application layer and 
perform the mapping there.


-- Jack Krupansky

-Original Message- 
From: Sandeep Gupta

Sent: Tuesday, July 30, 2013 5:22 AM
To: solr-user@lucene.apache.org
Subject: Synonyms with wildcard search

Hello All,

I want to know whether it is possible to make a query of word which has
synonym+wildcard.

For example :  I have one field which is type of text_en (default fieldType
in 4.3.1)
And synonym.txt file has this entry
colour = color

Now when I am using full text search as colour* (with wild card) then
search result is not returning the keyword of type colorology... (as in
case If I use color* then I am getting this word)

So any suggestions as how I can achieve this Or its not possible.

Thanks
Sandeep

Re: Synonyms problem

2013-04-03 Thread Shawn Heisey

On 3/29/2013 12:14 PM, Plamen Mihaylov wrote:
 Can I ask you another question: I have Magento + Solr and have a
 requirement to create an admin magento module, where I can add/remove
 synonyms dynamically. Is this possible? I searched google but it seems not
 possible.

If you change the synonym list that you are using in your index analyzer
chain, you must rebuild your entire index.  If you don't, the updated
synonyms will only affect newly added records.  This is because the
index analyzer is only applied at index time.

Thanks,
Shawn

Re: Synonyms problem

2013-03-29 Thread Thomas Krämer | ontopica

Hi Plamen

You should set expand to true during

analyzer type=index

filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  ignoreCase=true expand=true/


...

Greetings,

Thomas

Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players, where
 using Solr indexing their data. I have defined synonyms like: NY, New York.
 When I search for New York - there are 145 results found, but when I search
 for NY - there are 142 results found. Why there is a diff and how can I fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 !-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=false/ --
 
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.LengthFilterFactory min=2 max=100
 /
 !-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
 /analyzer
 analyzer type=query
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
 tokenizer class=solr.WhitespaceTokenizerFactory /
 
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 /
 filter class=solr.LowerCaseFilterFactory /
 !-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
 /analyzer
 /fieldType
 
 
 Thanks in advance.
 Plamen
 


-- 

ontopica GmbH
Prinz-Albert-Str. 2b
53113 Bonn
Germany
fon: +49-228-227229-22
fax: +49-228-227229-77
web: http://www.ontopica.de
ontopica GmbH
Sitz der Gesellschaft: Bonn

Geschäftsführung: Thomas Krämer, Christoph Okpue
Handelsregister: Amtsgericht Bonn, HRB 17852

Re: Synonyms problem

2013-03-29 Thread Walter Underwood

Also, all the filters need to be after the tokenizer. There are two synonym 
filters specified, one before the tokenizer and one after.

I'm surprised that works at all. Shouldn't that be fatal error when loading the 
config?

wunder

On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:

 Hi Plamen
 
 You should set expand to true during
 
 analyzer type=index
 
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  ignoreCase=true expand=true/
 
 
 ...
 
 Greetings,
 
 Thomas
 
 Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players, where
 using Solr indexing their data. I have defined synonyms like: NY, New York.
 When I search for New York - there are 145 results found, but when I search
 for NY - there are 142 results found. Why there is a diff and how can I fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
 fieldType name=text class=solr.TextField
 positionIncrementGap=100
analyzer type=index
filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
tokenizer class=solr.WhitespaceTokenizerFactory /
!-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
ignoreCase=true expand=false/ --
 
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
 /
!-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
/analyzer
analyzer type=query
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
tokenizer class=solr.WhitespaceTokenizerFactory /
 
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType
 
 
 Thanks in advance.
 Plamen
 
 
 
 -- 
 
 ontopica GmbH
 Prinz-Albert-Str. 2b
 53113 Bonn
 Germany
 fon: +49-228-227229-22
 fax: +49-228-227229-77
 web: http://www.ontopica.de
 ontopica GmbH
 Sitz der Gesellschaft: Bonn
 
 Geschäftsführung: Thomas Krämer, Christoph Okpue
 Handelsregister: Amtsgericht Bonn, HRB 17852
 
 

--
Walter Underwood
wun...@wunderwood.org

Re: Synonyms problem

2013-03-29 Thread Steve Rowe

The XPath expressions used to collect the charFilter sequence, the tokenizer, 
and the token filter sequence are evaluated independently of each other - see 
line #244 through #251:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_2_0/solr/core/src/java/org/apache/solr/schema/FieldTypePluginLoader.java?view=markup#l232

Steve

On Mar 29, 2013, at 12:37 PM, Walter Underwood wun...@wunderwood.org wrote:

 Also, all the filters need to be after the tokenizer. There are two synonym 
 filters specified, one before the tokenizer and one after.
 
 I'm surprised that works at all. Shouldn't that be fatal error when loading 
 the config?
 
 wunder
 
 On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
 
 Hi Plamen
 
 You should set expand to true during
 
 analyzer type=index
 
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
 
 
 ...
 
 Greetings,
 
 Thomas
 
 Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players, where
 using Solr indexing their data. I have defined synonyms like: NY, New York.
 When I search for New York - there are 145 results found, but when I search
 for NY - there are 142 results found. Why there is a diff and how can I fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=true expand=true/
   tokenizer class=solr.WhitespaceTokenizerFactory /
   !-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
   ignoreCase=true expand=false/ --
 
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
   filter class=solr.LengthFilterFactory min=2 max=100
 /
   !-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
   /analyzer
   analyzer type=query
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
   tokenizer class=solr.WhitespaceTokenizerFactory /
 
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
   catenateNumbers=0 catenateAll=0 /
   filter class=solr.LowerCaseFilterFactory /
   !-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
   filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
   /analyzer
   /fieldType
 
 
 Thanks in advance.
 Plamen
 
 
 
 -- 
 
 ontopica GmbH
 Prinz-Albert-Str. 2b
 53113 Bonn
 Germany
 fon: +49-228-227229-22
 fax: +49-228-227229-77
 web: http://www.ontopica.de
 ontopica GmbH
 Sitz der Gesellschaft: Bonn
 
 Geschäftsführung: Thomas Krämer, Christoph Okpue
 Handelsregister: Amtsgericht Bonn, HRB 17852
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org

Re: Synonyms problem

2013-03-29 Thread Plamen Mihaylov

Guys,

This is a commented line where expand is false. I moved the synonym filter
after tokenizer, but the result is the same.

Actual configuration:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
/
!-- filter class=solr.SnowballPorterFilterFactory
language=English / --
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType

2013/3/29 Walter Underwood wun...@wunderwood.org

 Also, all the filters need to be after the tokenizer. There are two
 synonym filters specified, one before the tokenizer and one after.

 I'm surprised that works at all. Shouldn't that be fatal error when
 loading the config?

 wunder

 On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:

  Hi Plamen
 
  You should set expand to true during
 
  analyzer type=index
  
  filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
   ignoreCase=true expand=true/
 
 
  ...
 
  Greetings,
 
  Thomas
 
  Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
  Hey guys,
 
  I have the following problem - I have a website with sport players,
 where
  using Solr indexing their data. I have defined synonyms like: NY, New
 York.
  When I search for New York - there are 145 results found, but when I
 search
  for NY - there are 142 results found. Why there is a diff and how can I
 fix
  this?
 
  Configuration snippets:
 
  synonyms.txt
 
  ...
  NY, New York
  ...
 
  --
  schema.xml
 
  ...
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer type=index
 filter class=solr.
  SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 !-- we will only use synonyms at query time filter
  class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=false/ --
 
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.PhoneticFilterFactory
  encoder=DoubleMetaphone inject=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory
 /
 filter class=solr.LengthFilterFactory min=2
 max=100
  /
 !-- filter class=solr.SnowballPorterFilterFactory
  language=English / --
 /analyzer
 analyzer type=query
 filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true /
 tokenizer class=solr.WhitespaceTokenizerFactory /
 
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 /
 filter class=solr.LowerCaseFilterFactory /
 !-- filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/ --
 filter

Re: Synonyms problem

2013-03-29 Thread Walter Underwood

There are several problems with this config.

Indexing uses the phonetic filter, but query does not. This almost guarantees 
that nothing will match. Numbers could match, if the filter passes them.

Query time has two stopword filters with different lists. Indexing only has 
one. This isn't fatal, but it is pretty weird. Is letterstops.txt trying to do 
the same thing as the length filter? If so, use the length filter both place. 
Or not at all. Deleting single all single characters is a bad idea. You'll 
never find Vitamin C.

The same synonyms are used at index and query time, which is unnecessary. Only 
use synonyms at index time unless you really know what you are doing and have a 
special need.

wunder

On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote:

 Guys,
 
 This is a commented line where expand is false. I moved the synonym filter
 after tokenizer, but the result is the same.
 
 Actual configuration:
 
fieldType name=text class=solr.TextField
 positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
 /
!-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType
 
 2013/3/29 Walter Underwood wun...@wunderwood.org
 
 Also, all the filters need to be after the tokenizer. There are two
 synonym filters specified, one before the tokenizer and one after.
 
 I'm surprised that works at all. Shouldn't that be fatal error when
 loading the config?
 
 wunder
 
 On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
 
 Hi Plamen
 
 You should set expand to true during
 
 analyzer type=index
 
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
 
 
 ...
 
 Greetings,
 
 Thomas
 
 Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players,
 where
 using Solr indexing their data. I have defined synonyms like: NY, New
 York.
 When I search for New York - there are 145 results found, but when I
 search
 for NY - there are 142 results found. Why there is a diff and how can I
 fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=true expand=true/
   tokenizer class=solr.WhitespaceTokenizerFactory /
   !-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
   ignoreCase=true expand=false/ --
 
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
   filter class=solr.RemoveDuplicatesTokenFilterFactory
 /
   filter class=solr.LengthFilterFactory min=2
 max=100
 /
   !-- filter

Re: Synonyms problem

2013-03-29 Thread Plamen Mihaylov

Thank you a lot, Walter. I removed most of the filters and now it returns
the same number of results. It looks simply this way:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

Can I ask you another question: I have Magento + Solr and have a
requirement to create an admin magento module, where I can add/remove
synonyms dynamically. Is this possible? I searched google but it seems not
possible.

Regards
Plamen

2013/3/29 Walter Underwood wun...@wunderwood.org

 There are several problems with this config.

 Indexing uses the phonetic filter, but query does not. This almost
 guarantees that nothing will match. Numbers could match, if the filter
 passes them.

 Query time has two stopword filters with different lists. Indexing only
 has one. This isn't fatal, but it is pretty weird. Is letterstops.txt
 trying to do the same thing as the length filter? If so, use the length
 filter both place. Or not at all. Deleting single all single characters is
 a bad idea. You'll never find Vitamin C.

 The same synonyms are used at index and query time, which is unnecessary.
 Only use synonyms at index time unless you really know what you are doing
 and have a special need.

 wunder

 On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote:

  Guys,
 
  This is a commented line where expand is false. I moved the synonym
 filter
  after tokenizer, but the result is the same.
 
  Actual configuration:
 
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.PhoneticFilterFactory
  encoder=DoubleMetaphone inject=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.LengthFilterFactory min=2 max=100
  /
 !-- filter class=solr.SnowballPorterFilterFactory
  language=English / --
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 /
 filter class=solr.LowerCaseFilterFactory /
 !-- filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/ --
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.StopFilterFactory ignoreCase=true
  words=letterstops.txt enablePositionIncrements=true /
 /analyzer
 /fieldType
 
  2013/3/29 Walter Underwood wun...@wunderwood.org
 
  Also, all the filters need to be after the tokenizer. There are two
  synonym filters specified, one before the tokenizer and one after.
 
  I'm surprised that works at all. Shouldn't that be fatal error when
  loading the config?
 
  wunder
 
  On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
 
  Hi Plamen
 
  You should set expand to true during
 
  analyzer type=index
  
  filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  ignoreCase=true expand=true/
 
 
  ...
 
  Greetings,
 
  Thomas
 
  Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
  Hey guys,
 
  I have the following problem - I have a website with sport players,
  where
  using Solr indexing their data. I have defined synonyms like: NY, New
  York.
  When I search for New York - there are 145 results found, but when I
  search
  for NY - there are 142 results found. Why there is a diff and how can
 I
  fix
  this?
 
  Configuration snippets:
 
  synonyms.txt
 
  ...

Re: Synonyms and trailing wildcard

2013-01-15 Thread Jack Krupansky

It's certainly true that wildcard suppresses the synonym filter since it is 
not multi-term aware.


Other than implementing your own version of the synonym filter that was 
multi-term aware and interpreted wildcards, you may have to do your own 
preprocessor.


Or, you could do index-time synonyms, so that bill, billy, will, 
willy, and william were all indexed at the same location. Then the bil* 
wildcard would match william sincebill is also indexed at the same 
location.


-- Jack Krupansky

-Original Message- 
From: Roberto Isaac Gonzalez

Sent: Tuesday, January 15, 2013 3:10 PM
To: solr-user@lucene.apache.org
Subject: Synonyms and trailing wildcard

Hi

I'm working on adding nicknames capability to our system. It's basically a
synonym mapping stored in a nicknames.txt file that uses the SynonymFilter
framework.

In one of our search boxes (used for lookups), we automatically append a
trailing wildcard.

There's one use case we're dealing with which is expanding synonyms even if
there's a trailing wildcard.

i.e. Q: Bill*
Expected Results: Bill, Billie, William

Q: Bil*
Expected Results: Bill, so no synonym expansion.

Basically, for synonym expansion, we want to treat the token as if it
didn't contain the trailing wildcard and we also *don't* want to expand the
wildcard before doing the synonym matches.

We tried using the multiterm analysis chain but by definition that expects
one token *in* and one token
*out*(org.apache.solr.schema.TextField.analyzeMultiTerm()) so it
throws an
exception.

I'm looking for options about implementing this scenario and some of the
options I've explored are:

1. Use the multiterm analysis chain and allow Synonym expansion, so one
token in and multiple tokens out.
2. Iterate ourselves and see if the multiterm analysis chain returns more
than one token, if it does, then remove the SynonymFilter from the analysis
chain, something similar to ExtendedDismaxQParser.shouldRemoveStopFilter().
3. ExtendedDismaxQParser.preProcessUserQuery() to OR the non-wildcarded
term.

What do you guys think?


Best Regards,
Roberto Gonzalez

Re: Synonyms Phrase not working

2012-10-02 Thread Bernd Fehling

Hi,

because your search for /?q=produto_nome:lubrificante intimo is
a phrase search and will be handled different.

Your other search gets the synonyms, but the last synonym is a multi-word 
synonym
and not a phrase
... produto_nome:lubrificante intimo) ))

See also:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Regards
Bernd


Am 01.10.2012 19:02, schrieb Gustav:
 Hello Everyone,
 Im having a problem using the SynonymFilterFactory in a query analyzer.
 
 That's my synonyms.txt file:
 
 sexo = Preservativo, vaselina, viagra, lubrificante intimo
 
 And that is the fieldtype in which it is implemented:
 
 fieldType class=solr.TextField name=produto_nome_synonyms
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 enablePositionIncrements=true ignoreCase=true words=stopwords.txt/
 filter class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.NGramFilterFactory maxGramSize=25
 minGramSize=1/
   /analyzer
 
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SynonymFilterFactory expand=true
 ignoreCase=true synonyms=synonyms.txt
 tokenizerFactory=KeywordTokenizerFactory/
 filter class=solr.StopFilterFactory
 enablePositionIncrements=true ignoreCase=true words=stopwords.txt/
   /analyzer
 /fieldType
 
 The problem here is:
 When i search for /?q=produto_nome:lubrificante intimo Solr returns 8
 documents, that matches because of the n-gram filter factory, but when i
 search for /?q=produto_nome:sexo Solr brings no results.
 I was expecting the same result as /?q=lubrificante intimo , as configured
 in the synonyms.
 
 i turned debugQuery=true and got the following parsedquery:
 
 str name=parsedquery
 +DisjunctionMaxQuery(((produto_nome:preservativo produto_nome:vaselina
 produto_nome:viagra produto_nome:lubrificante intimo) ))
 /str
 
 I Dont undersant why it brings no results. 
 Any ideas? 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Synonyms-Phrase-not-working-tp4011237.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
Universitätsstr. 25 und Wissensmanagement
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*

Re: Synonyms Phrase not working

2012-10-02 Thread Mikhail Khludnev

Gustav,

AFAIK, multi words synonyms is one of the weak points for Lucene/Solr. I'm
going to propose a solution approach at forthcoming Eurocon
http://www.apachecon.eu/schedule/presentation/18/ . You are welcome!



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Synonyms and hyphens

2012-07-10 Thread Chris Hostetter


Which version of Solr are you using?

: Terms with embedded special characters are treated as phrases with spaces in
: place of the special characters. So, gb-mb is treated as if you had enclosed
: the term in quotes.

take a look at autoGeneratePhraseQueries option on your field type ... 
dependingon the version attribute of your schema / it may be 
defaulting to true.

Setting it to false should cause it to treat gb and mb as distinct 
terms.



-Hoss

Re: Synonyms and Regions Taxonomy

2012-07-05 Thread Tri Cao

I don't think there's a synonym file for this use case. I am not even sure if
synonym is the right way to handle it.

I think the better way to improve recall is to mark up your documents with
a hidden field of is the geographic relations. For example, before indexing,
you can add a field to all documents containing South America, something
like: South America is a subcontinent, that is consisted of the countries 
Brazil,
Chile, Argentina, …

This data can come from various sources, such as wikipedia, wordnet, etc.


On Jul 5, 2012, at 4:12 AM, Stephen Lacy wrote:

 When a user types in South America they want to be able to see documents
 containing Brazil, Chile etc.
 No I have already thrown together a list of countries and continents
 however I'm a little more ambitious,
 I would like to get a lot more regions such as american states as well or
 Former members of the USSR...
 Are there ready made synonym files or taxonomies in a different format.
 Are synonyms the best way of achieving this? Perhaps there is a better way?
 Any pitfalls or advice on this subject from someone who has done this
 before would be appreciated.
 Thanks
 
 Stephen

Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi

Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote:

 Hi,

 I'm not sure if anybody has experienced this behavior before or not.
 I noticed that 'hyphen' plays a very important role here.
 I used Solr's default example directory.

 http://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in  parsedquery:+name:gb +name:gib +name:gigabyte
 +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

 While searching 
 http://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in parsedquery:+(name:gb name:gib name:gigabyte
 name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

 If you notice to the first query - with hyphens - you can see that the
 results of
 parsing is totally different. I know that hyphens are special characters
 in Solr,
 but there's no way that the first query returns any entry because it's
 asking for
 ALL synonyms.

 Am I missing something here?

 Thanks


 --
 Alireza Salimi
 Java EE Developer





-- 
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Jack Krupansky

Terms with embedded special characters are treated as phrases with spaces in 
place of the special characters. So, gb-mb is treated as if you had 
enclosed the term in quotes.


-- Jack Krupansky
-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi 
alireza.sal...@gmail.comwrote:



Hi,

I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
results in  parsedquery:+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

While searching 
http://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND

results in parsedquery:+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi

Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.comwrote:

 Terms with embedded special characters are treated as phrases with spaces
 in place of the special characters. So, gb-mb is treated as if you had
 enclosed the term in quotes.

 -- Jack Krupansky
 -Original Message- From: Alireza Salimi
 Sent: Wednesday, July 04, 2012 6:50 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Synonyms and hyphens


 Hi,

 Does anybody know why hyphen '-' and q.op=AND causes such a big difference
 between the two queries? I thought hyphens are removed by StandardTokenizer
 which means theoretically the two queries should be the same!

 Thanks

 On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com*
 *wrote:

  Hi,

 I'm not sure if anybody has experienced this behavior before or not.
 I noticed that 'hyphen' plays a very important role here.
 I used Solr's default example directory.

 http://localhost:8983/solr/**select/?q=name:(gb-mb)**
 version=2.2start=0rows=10**indent=ondebugQuery=on**
 indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in  parsedquery:+name:gb +name:gib +name:gigabyte
 +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

 While searching http://localhost:8984/solr/**
 select/?q=name:(gbmb)version=**2.2start=0rows=10indent=on**
 debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in parsedquery:+(name:gb name:gib name:gigabyte
 name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

 If you notice to the first query - with hyphens - you can see that the
 results of
 parsing is totally different. I know that hyphens are special characters
 in Solr,
 but there's no way that the first query returns any entry because it's
 asking for
 ALL synonyms.

 Am I missing something here?

 Thanks


 --
 Alireza Salimi
 Java EE Developer





 --
 Alireza Salimi
 Java EE Developer




-- 
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Jack Krupansky

There is one other detail that should clarify the situation. At query time, 
the query parser itself is breaking your query into space-delimited terms, 
and only calling the analyzer for each of those terms, each of which will be 
treated as if a quoted phrase. So it doesn't matter whether it is the 
standard analyzer or word delimiter filter or other filter that is breaking 
up the compound term.


And the default query operator only applies to the terms as the query 
parser parsed them, not for the sub-terms of a compound term like CD-ROM or 
gb-mb.


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Terms with embedded special characters are treated as phrases with spaces
in place of the special characters. So, gb-mb is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by 
StandardTokenizer

which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com*
*wrote:

 Hi,


I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/**select/?q=name:(gb-mb)**
version=2.2start=0rows=10**indent=ondebugQuery=on**
indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
results in  parsedquery:+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)version=**2.2start=0rows=10indent=on**
debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
results in parsedquery:+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi

ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky j...@basetechnology.comwrote:

 There is one other detail that should clarify the situation. At query
 time, the query parser itself is breaking your query into space-delimited
 terms, and only calling the analyzer for each of those terms, each of which
 will be treated as if a quoted phrase. So it doesn't matter whether it is
 the standard analyzer or word delimiter filter or other filter that is
 breaking up the compound term.

 And the default query operator only applies to the terms as the query
 parser parsed them, not for the sub-terms of a compound term like CD-ROM or
 gb-mb.


 -- Jack Krupansky

 -Original Message- From: Alireza Salimi
 Sent: Wednesday, July 04, 2012 12:05 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Synonyms and hyphens

 Wow, I didn't know that. Is there a way to disable this feature? I mean, is
 it something coming from the Analyzer?

 On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  Terms with embedded special characters are treated as phrases with spaces
 in place of the special characters. So, gb-mb is treated as if you had
 enclosed the term in quotes.

 -- Jack Krupansky
 -Original Message- From: Alireza Salimi
 Sent: Wednesday, July 04, 2012 6:50 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Synonyms and hyphens


 Hi,

 Does anybody know why hyphen '-' and q.op=AND causes such a big difference
 between the two queries? I thought hyphens are removed by
 StandardTokenizer
 which means theoretically the two queries should be the same!

 Thanks

 On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com
 *
 *wrote:

  Hi,


 I'm not sure if anybody has experienced this behavior before or not.
 I noticed that 'hyphen' plays a very important role here.
 I used Solr's default example directory.

 http://localhost:8983/solr/select/?q=name:(gb-mb)**http://localhost:8983/solr/**select/?q=name:(gb-mb)**
 version=2.2start=0rows=10indent=ondebugQuery=on**
 indent=onwt=jsonq.op=ANDhtt**p://localhost:8983/solr/**
 select/?q=name:(gb-mb)**version=2.2start=0rows=10**
 indent=ondebugQuery=on**indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 

 results in  parsedquery:+name:gb +name:gib +name:gigabyte
 +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

 While searching http://localhost:8984/solr/**
 select/?q=name:(gbmb)version=2.2start=0rows=10indent=**on**
 debugQuery=onindent=onwt=jsonq.op=ANDhttp://**
 localhost:8984/solr/select/?q=**name:(gbmb)version=2.2start=**
 0rows=10indent=on**debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 

 results in parsedquery:+(name:gb name:gib name:gigabyte
 name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

 If you notice to the first query - with hyphens - you can see that the
 results of
 parsing is totally different. I know that hyphens are special characters
 in Solr,
 but there's no way that the first query returns any entry because it's
 asking for
 ALL synonyms.

 Am I missing something here?

 Thanks


 --
 Alireza Salimi
 Java EE Developer





 --
 Alireza Salimi
 Java EE Developer




 --
 Alireza Salimi
 Java EE Developer




-- 
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Jack Krupansky

You could pre-process your queries to convert hyphen and other special 
characters to spaces.


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky 
j...@basetechnology.comwrote:



There is one other detail that should clarify the situation. At query
time, the query parser itself is breaking your query into space-delimited
terms, and only calling the analyzer for each of those terms, each of 
which

will be treated as if a quoted phrase. So it doesn't matter whether it is
the standard analyzer or word delimiter filter or other filter that is
breaking up the compound term.

And the default query operator only applies to the terms as the query
parser parsed them, not for the sub-terms of a compound term like CD-ROM 
or

gb-mb.


-- Jack Krupansky

-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 12:05 PM

To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, 
is

it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.com*
*wrote:

 Terms with embedded special characters are treated as phrases with spaces

in place of the special characters. So, gb-mb is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big 
difference

between the two queries? I thought hyphens are removed by
StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com
*
*wrote:

 Hi,



I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/select/?q=name:(gb-mb)**http://localhost:8983/solr/**select/?q=name:(gb-mb)**
version=2.2start=0rows=10indent=ondebugQuery=on**
indent=onwt=jsonq.op=ANDhtt**p://localhost:8983/solr/**
select/?q=name:(gb-mb)**version=2.2start=0rows=10**
indent=ondebugQuery=on**indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND


results in  parsedquery:+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)version=2.2start=0rows=10indent=**on**
debugQuery=onindent=onwt=jsonq.op=ANDhttp://**
localhost:8984/solr/select/?q=**name:(gbmb)version=2.2start=**
0rows=10indent=on**debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND


results in parsedquery:+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer

RE: synonyms

2012-05-03 Thread Noordeen, Roxy

Jack,
I am also using synonyms at query side, but so far i am able to use only single 
words to work, multi words is not working for me. I didn't want to use synonyms 
during indexing, to avoid re indexing.

Is there a way for solr to support multi words?
Ex: 
John Cena, John, Cena
Or
Triple H, DX, tripleh, hhh.

Thanks
Roxy


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, May 02, 2012 8:53 PM
To: solr-user@lucene.apache.org
Subject: Re: synonyms

There are lots of different strategies for dealing with synonyms, depending 
on what exactly is most important and what exactly your are willing to 
tolerate.

In your latest example, you seem to be using string fields, which is 
somewhat different form the text synonyms we talk about in Solr. You can 
certainly have multiple string fields, or even a multi-valued string field 
to store variations on selected categories of terms. That works well when 
you have a well-defined number of categories. So, you can have a user query 
go against a combination of normal text fields and these category string 
fields.

If that is sufficient for your application, great.

-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia
Sent: Wednesday, May 02, 2012 6:57 PM
To: solr-user@lucene.apache.org
Subject: RE: synonyms

Thanks for your answers, now I have another cuestions,if I develop the
filter to replacement the current synonym filter,I understand that this
procces would be in time of the indexing because in time of the query search
there are a lot problems knows. if so, how can I do for create my  index
file.

For example:
I have two synonyms Nou cam, Cataluña  for  barcelona  in the data base


Opcion 1)
In time of the indexing would create 2 records like this:

doc
   fieldbarcelonafield
   fieldCamp Noufield
...
doc

and

doc
   fieldbarcelonafield
   fieldCataluñafield
...
doc

Opcion 2)

or only would create  one record like this:

doc
   fieldbarcelonafield
   fieldCamp Nou,Cataluñafield
...
doc


If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when
I looking for by barcelona the Solr return 2 records and that is one error
because barcelona is only one

IF it create the opcion 2 , I have searching wiht wildcards for example 
*Camp Nou* o *Cataluña* y the solr would return one records, the same case 
if searching by barcelona solr would return one recors that is good , but i 
want to know if is the better option or solr have another caracteristic 
betters  that can resolve this topic of one better way.

Re: synonyms

2012-05-03 Thread Jack Krupansky

If query-side multi-term synonyms are important to your application, your 
best bet may be to implement a preprocessor that expands them to an OR 
sequences of phrases before submitting the query to Solr. That would also 
give you an opportunity to boost a preferred synonym.


For example, a user query of

abc John Cena xyz

would be preprocessed and sent to Solr as

abc (John Cena OR Cena John) xyz

You could also consider using phrase slop to handle simple name reversal:

abc John Cena~1 xyz

That would also allow a middle initial or name, for example. But, you need 
to consider whether you really want that. The simple phrases give you 
explicit control.


If the synonym is in a phrase, you might need to consider re-generating the 
entire phrase:


abc def John Cena uvw xyz

to

abc (def John Cena uvw OR def Cena John uvw) xyz

As a side note, the query parser in LucidWorks Enterprise (and LucidWorks 
Cloud) does support multi-term synonyms at query term for normal text 
fields, but it does so by bypassing the the processing of the Solr synonym 
filter and simply using the synonym file to preprocess the query terms 
before completing the term analysis. But, that won't do you any good if you 
are not using the Lucid products.


-- Jack Krupansky

-Original Message- 
From: Noordeen, Roxy

Sent: Thursday, May 03, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: RE: synonyms

Jack,
I am also using synonyms at query side, but so far i am able to use only 
single words to work, multi words is not working for me. I didn't want to 
use synonyms during indexing, to avoid re indexing.


Is there a way for solr to support multi words?
Ex:
John Cena, John, Cena
Or
Triple H, DX, tripleh, hhh.

Thanks
Roxy


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, May 02, 2012 8:53 PM
To: solr-user@lucene.apache.org
Subject: Re: synonyms

There are lots of different strategies for dealing with synonyms, depending
on what exactly is most important and what exactly your are willing to
tolerate.

In your latest example, you seem to be using string fields, which is
somewhat different form the text synonyms we talk about in Solr. You can
certainly have multiple string fields, or even a multi-valued string field
to store variations on selected categories of terms. That works well when
you have a well-defined number of categories. So, you can have a user query
go against a combination of normal text fields and these category string
fields.

If that is sufficient for your application, great.

-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia

Sent: Wednesday, May 02, 2012 6:57 PM
To: solr-user@lucene.apache.org
Subject: RE: synonyms

Thanks for your answers, now I have another cuestions,if I develop the
filter to replacement the current synonym filter,I understand that this
procces would be in time of the indexing because in time of the query search
there are a lot problems knows. if so, how can I do for create my  index
file.

For example:
I have two synonyms Nou cam, Cataluña  for  barcelona  in the data base


Opcion 1)
In time of the indexing would create 2 records like this:

doc
  fieldbarcelonafield
  fieldCamp Noufield
...
doc

and

doc
  fieldbarcelonafield
  fieldCataluñafield
...
doc

Opcion 2)

or only would create  one record like this:

doc
  fieldbarcelonafield
  fieldCamp Nou,Cataluñafield
...
doc


If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when
I looking for by barcelona the Solr return 2 records and that is one error
because barcelona is only one

IF it create the opcion 2 , I have searching wiht wildcards for example
*Camp Nou* o *Cataluña* y the solr would return one records, the same case
if searching by barcelona solr would return one recors that is good , but i
want to know if is the better option or solr have another caracteristic
betters  that can resolve this topic of one better way.

Re: synonyms

2012-05-02 Thread Jack Krupansky

I'm not sure I completely follow, but are you simply saying that you want to 
have a synonym filter that reads the synonym table from a database rather 
than the current text file? If so, sure, you could develop a replacement for 
the current synonym filter which loads its table from a database, but you 
would have to develop that code yourself (or get some assistance doing it.)


If that is not what you are trying to do, please explain in a little more 
detail.


-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia

Sent: Wednesday, May 02, 2012 4:31 PM
To: solr-user@lucene.apache.org
Subject: synonyms

Hello everbody,

I have a doubt with respect to synonyms in Solr, In our company  we are 
lookink for one solution to resolve synonyms from database and not from one 
text file like SynonymFilterFactory do it.


The idea is save all the synonyms in the database, indexing and  they will 
be ready to one query, but we haven't found one solution from database.


Another idea is create one plugin that extend to SynonymFilterFactory but I 
don't know if this is posible.


I hope someone can help me.

regards,

Carlos Andrés García García

RE: synonyms

2012-05-02 Thread Noordeen, Roxy

Another solution is to write a script to read the database and create the 
synonyms.txt file, dump the file to solr and reload the core.
This gives you the custom synonym solution.

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, May 02, 2012 4:54 PM
To: solr-user@lucene.apache.org
Subject: Re: synonyms

I'm not sure I completely follow, but are you simply saying that you want to 
have a synonym filter that reads the synonym table from a database rather 
than the current text file? If so, sure, you could develop a replacement for 
the current synonym filter which loads its table from a database, but you 
would have to develop that code yourself (or get some assistance doing it.)

If that is not what you are trying to do, please explain in a little more 
detail.

-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia
Sent: Wednesday, May 02, 2012 4:31 PM
To: solr-user@lucene.apache.org
Subject: synonyms

Hello everbody,

I have a doubt with respect to synonyms in Solr, In our company  we are 
lookink for one solution to resolve synonyms from database and not from one 
text file like SynonymFilterFactory do it.

The idea is save all the synonyms in the database, indexing and  they will 
be ready to one query, but we haven't found one solution from database.

Another idea is create one plugin that extend to SynonymFilterFactory but I 
don't know if this is posible.

I hope someone can help me.

regards,

Carlos Andrés García García

RE: synonyms

2012-05-02 Thread Carlos Andres Garcia

Thanks for your answers, now I have another cuestions,if I develop the
filter to replacement the current synonym filter,I understand that this
procces would be in time of the indexing because in time of the query search
there are a lot problems knows. if so, how can I do for create my  index
file.

For example:
I have two synonyms Nou cam, Cataluña  for  barcelona  in the data base


Opcion 1)
In time of the indexing would create 2 records like this:

doc
   fieldbarcelonafield
   fieldCamp Noufield
...
doc

and

doc
   fieldbarcelonafield
   fieldCataluñafield
...
doc

Opcion 2)

or only would create  one record like this:

doc
   fieldbarcelonafield
   fieldCamp Nou,Cataluñafield
...
doc


If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when
I looking for by barcelona the Solr return 2 records and that is one error
because barcelona is only one

IF it create the opcion 2 , I have searching wiht wildcards for example *Camp 
Nou* o *Cataluña* y the solr would return one records, the same case if 
searching by barcelona solr would return one recors that is good , but i want 
to know if is the better option or solr have another caracteristic betters  
that can resolve this topic of one better way.

Re: synonyms

2012-05-02 Thread Jack Krupansky

There are lots of different strategies for dealing with synonyms, depending 
on what exactly is most important and what exactly your are willing to 
tolerate.


In your latest example, you seem to be using string fields, which is 
somewhat different form the text synonyms we talk about in Solr. You can 
certainly have multiple string fields, or even a multi-valued string field 
to store variations on selected categories of terms. That works well when 
you have a well-defined number of categories. So, you can have a user query 
go against a combination of normal text fields and these category string 
fields.


If that is sufficient for your application, great.

-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia

Sent: Wednesday, May 02, 2012 6:57 PM
To: solr-user@lucene.apache.org
Subject: RE: synonyms

Thanks for your answers, now I have another cuestions,if I develop the
filter to replacement the current synonym filter,I understand that this
procces would be in time of the indexing because in time of the query search
there are a lot problems knows. if so, how can I do for create my  index
file.

For example:
I have two synonyms Nou cam, Cataluña  for  barcelona  in the data base


Opcion 1)
In time of the indexing would create 2 records like this:

doc
  fieldbarcelonafield
  fieldCamp Noufield
...
doc

and

doc
  fieldbarcelonafield
  fieldCataluñafield
...
doc

Opcion 2)

or only would create  one record like this:

doc
  fieldbarcelonafield
  fieldCamp Nou,Cataluñafield
...
doc


If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when
I looking for by barcelona the Solr return 2 records and that is one error
because barcelona is only one

IF it create the opcion 2 , I have searching wiht wildcards for example 
*Camp Nou* o *Cataluña* y the solr would return one records, the same case 
if searching by barcelona solr would return one recors that is good , but i 
want to know if is the better option or solr have another caracteristic 
betters  that can resolve this topic of one better way.

Re: synonyms

2012-05-02 Thread Sohail Aboobaker

I think regular sync of database table with synonym text file seems to be
simplest of the solutions. It will allow you to use Solr natively without
any customization and it is not very complicated operation to update
synonyms file with entries in database.

Re: Synonyms file in solr

2012-04-25 Thread Lee Carroll

Your example are not synonyms so i don't think synonyms.txt by itself
is going to work.
This sounds like tagging using a taxonomy. Values written to the field
storing this taxonomy could be like:

livingthing/animal/cat [doc about cats]
livingthing/animal/dog [doc about dogs]
livingthing/animal [doc about animals in general]
livingthing/animal/cat  livingthing/animal/dog [doc about cats and dogs]

If you need a free text search solution rather than a metadata field
search as above you will need to pre-process your docs looking for
entities in your taxonomy and replace the entity tokens with the above
taxonomic tokens, perhaps placing these into a specialist field for
searching. A solr analysis chain which mimics such pre-processing may
get you some mileage, something like

copyfield content - taxoKeywords

taxoKeywords field analysis

tokenise
lowercase
minimal stem (sure their is one minimal english stem i think its called
keepwords [cat dog animal livingthing]
synonym replacement [livingthing/animal/cat - cat,
livingthing/animal/dog - dog, etc]

I'd go for preprocessing outside of solr but the keepwords / synonms
might work for you

cheers lee c







On 23 April 2012 09:34, Guys paul.albare...@gmail.com wrote:
 I have some problems with the synonyms file, it seems i can't make it work
 the way i'd want.

 Here is an exemple :

 I have these words : cat, animal, dog, living thing, baby shark

 if i search for animal OR animals, i'd like to have the results for cat,
 animal, dog, baby shark as well as their plural cats, dogs, animals and baby
 sharks.

 if i search for cat, i only want the results with cat or cats. Same for dog.

 if i search for living thing, i want the results with living thing, living
 things, animal or animals. So no dogs, cats...

 So the words are in a hierarchy : living thing(s) - animal(s) - [dog(s),
 cat(s), baby shark(s)]

 I've tried a lot of thing but i can't get the results i want and i really
 need your help :-(


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Synonyms-file-in-solr-tp3931838p3931838.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms with dashes '-'

2011-12-06 Thread Erick Erickson

Details matter. Your analysis chain on the field may well
be the issue.

Look at the terms in the field (admin/schema browser).
Look at debugQuery=on to see how the query is parsed
Look at the admin/analysis page to see the effects of the analysis chain.

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Dec 5, 2011 at 11:08 AM, Zoran | Bax-shop.nl
zoran.bi...@bax-shop.nl wrote:
 Hello,

 When i add a synonym to synonyms.txt it works fine. For example:
 foo = bar (when searching for foo, also bar gets found)

 But this won't work (asume bar-bar is somewhere indexed) :
 foo = bar-bar

 what should i do to enable the searching of synonyms with dashes in them?

 Thank you,

 Zoran

Re: Synonyms 1 fetching 2001, how to avoid

2011-11-25 Thread Erick Erickson

Please review:
http://wiki.apache.org/solr/UsingMailingLists

You haven't shown the relevant parts of your configs.
You haven't shown the queries you're using, with debugQuery=on
You haven't shown the input
You haven't explained why you think synonyms have anything
 to do with the problem.

So it's really hard to say much of anything.

Best
Erick

On Wed, Nov 23, 2011 at 6:30 PM, RaviWhy raveend...@yahoo.com wrote:
 Hi,

 I am searching on movie titles. with synonyms text file mapped to 1,one.

 With this, when I am searching for '1'  I am expecting '1 in kind' but I am
 getting results which have titles like 2001: My year .

 I am using query time analyser with

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
                                        ignoreCase=true expand=true /

 I am going to try with expand=false. But anything else I need to look at?


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Synonyms-1-fetching-2001-how-to-avoid-tp3532398p3532398.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms problem

2011-09-07 Thread Ahmet Arslan

Simply multi-word synonyms are recommended to use at index time. 

As explained here: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

--- On Wed, 9/7/11, roySolr royrutten1...@gmail.com wrote:

 From: roySolr royrutten1...@gmail.com
 Subject: Synonyms problem
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 7, 2011, 1:46 PM
 hello,

 I have some problems with synonyms. I will show some
 examples to descripe
 the problem:

 Data:

 High school Lissabon
 High school Barcelona
 University of applied science

 When a user search for IFD i want all the results back. So
 i want to use
 this synonyms at query time:

 IFD = high school lissabon, high school
 barcelona,University of applied
 science

 The data is stored in the field schools.

 Schools type looks like this:

    fieldType name=schools
 class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
     charFilter
 class=solr.HTMLStripCharFilterFactory/
           tokenizer
 class=solr.PatternTokenizerFactory pattern=\s|,|- /
         filter
 class=solr.LowerCaseFilterFactory/
       /analyzer
       analyzer type=query
     charFilter
 class=solr.HTMLStripCharFilterFactory/
           tokenizer
 class=solr.PatternTokenizerFactory pattern=\s|,|- /
         filter
 class=solr.LowerCaseFilterFactory/
         filter
 class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
       /analyzer
     /fieldType

 AS you can see i use some pattern tokenizer which splits on
 whitespace. When
 i use the synonyms at query time the
 analytics show me this:

 high           | school
 | lissabon    | science
 high           | school
 | barcelona  | 
 university   | of   
    | applied      |

 When i search for IFD i get no results. I found this in
 debugQuery:

 schools:(high high university) (school school of)
 (lissaban barcelona
 applied) (science)

 With this i see the problem: solr tries a lot of
 combinations but not the
 right one. I thought i could
 escape the whitespaces in the synonyms(High\ school\
 Lissabon). Then the
 analytics shows me better results:

 High school Lissabon
 High school Barcelona
 University of applied science

 Then SOLR search for high school Lissabon but in my index
 it is tokenized
 on whitespace, still no results.

 I'm stuck, can someone help me??

 Thanks
 R

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Synonyms-problem-tp3316287p3316287.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.

RE: Synonyms Not Working when using SRC DEST

2011-09-07 Thread Jaeger, Jay - DOT

 I have a very huge schema spanning up to 10K lines , if I use query time it
 will be huge hit for me because one term will be mapped to multiple terms .
 similar in the case of allergy

I think maybe you mean synonym file, rather than the schema?  I doubt that the 
number of lines matters all that much, though undoubtedly some.  I expect that 
Solr loads that synonym file into some kind of hash map, rather than searching 
it linearly -- though I have not looked at the code for that.

 I replace allergy during the index with doctors , So it shouldn't be part of
 the document ?

Yes indeed, doctors would be in the index, and would give you a hit on that 
document when searched.  But because your synonym file specifies replacement, 
that means that allergy is *NOT* part of the index, hence, when you searched on 
allergy, you got no results.

As far as synonym expansion being a huge hit, no, not really, I think.  
Besides, if you are not getting what you want or need, speed becomes pretty 
much irrelevant.  We did some performance testing:  modest single server (i.e., 
a laptop running Windows XP with only 2GB total memory available), pretty much 
configured out of the box with jetty, except that we added waffle 
authentication.  The data was names, addresses and the like (not text) -- 7+ 
million rows, with considerable synonym expansion:  200 first name synonyms, 
433 last name synonyms, expanded at both index time and search time.

We then did a search test driven from those same synonyms files, by randomly 
picking out a name from the first and last name list, the idea being that most 
likely names did have some synonyms.

Under Solr 3.1, once the OS file system cache got some entries in there, 
running with 8 concurrent client search threads sending HTTP search requests 
(done in perl) we averaged about .50 seconds per request, or over 55,000 
searches per hour.

JRJ

-Original Message-
From: balaji [mailto:mcabal...@gmail.com] 
Sent: Tuesday, September 06, 2011 7:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms Not Working when using SRC  DEST

 It won't work given your current schema.  To get the desired results, you
 would need to expand your synonyms at both index AND query time.  Right now
 your schema seems to specify it only at index time.


I have a very huge schema spanning up to 10K lines , if I use query time it
will be huge hit for me because one term will be mapped to multiple terms .
similar in the case of allergy

I doesn't want to go with comma separated as it will give
some erroneous results  and more over allergy and doctors are not equivalent
terms to be used in comma



 So, as the other respondent indicated, currently you replace allergy with
 the other list when indexing, and since allergy is not replaced during
 query, it gets no hits.


I replace allergy during the index with doctors , So it shouldn't be part of
the document ?


Thanks
Balaji


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-Not-Working-when-using-SRC-DEST-tp3313862p3315287.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Synonyms Not Working when using SRC DEST

2011-09-07 Thread Jaeger, Jay - DOT

Also, just to make one thing just a bit more clear.   You can specify two 
different kinds of entries in synonym files.  See 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters 
(solr.SynonymFilterFactory)


One is replacement, where the words before the = are *replaced* by the right 
hand side, i.e., the words on the left hand side disappear.  This is what you 
are currently doing according to your original message:

#Explicit mappings match any token sequence on the LHS of =
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod = ipod,
sea biscuit, sea biscit = seabiscuit



The other is equivalence, where each term is expanded into the entire list, if 
you do the following, with expand set to true:

#Equivalent synonyms may be separated with commas and give
#no explicit mapping.  In this case the mapping behavior will
#be taken from the expand parameter in the schema.  This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos



So, if instead of:

allergy test  =  Doctors, Doctors-Medical, PHYSICIANS, Physicians  Surgeons

You specified


allergy test = allergy test, Doctors, Doctors-Medical, PHYSICIANS, Physicians 
 Surgeons 

Or 

allergy test, Doctors, Doctors-Medical, PHYSICIANS, Physicians  Surgeons

with expand set to true,  then you might get the behavior your desire:  
Allergy test would get indexed, along with Doctors and all of the rest.  
The difference being that in the second case, any of those terms (e.g. 
Docotrs) would also get indexed as Allergy test which might not be what you 
desire, in which case the first one would do what you want.

I expect that all you really need to do is:

allergy test = allergy test, Doctors, Doctors-Medical, PHYSICIANS, Physicians 
 Surgeons

to solve your problem.

JRJ

-Original Message-
From: balaji [mailto:mcabal...@gmail.com] 
Sent: Tuesday, September 06, 2011 7:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms Not Working when using SRC  DEST

 It won't work given your current schema.  To get the desired results, you
 would need to expand your synonyms at both index AND query time.  Right now
 your schema seems to specify it only at index time.


I have a very huge schema spanning up to 10K lines , if I use query time it
will be huge hit for me because one term will be mapped to multiple terms .
similar in the case of allergy

I doesn't want to go with comma separated as it will give
some erroneous results  and more over allergy and doctors are not equivalent
terms to be used in comma



 So, as the other respondent indicated, currently you replace allergy with
 the other list when indexing, and since allergy is not replaced during
 query, it gets no hits.


I replace allergy during the index with doctors , So it shouldn't be part of
the document ?


Thanks
Balaji


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-Not-Working-when-using-SRC-DEST-tp3313862p3315287.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms Not Working when using SRC DEST

2011-09-07 Thread balaji

 So, if instead of:

 allergy test  =  Doctors, Doctors-Medical, PHYSICIANS, Physicians 
 Surgeons

 You specified


 allergy test = allergy test, Doctors, Doctors-Medical, PHYSICIANS,
 Physicians  Surgeons


   I followed the above approach  allergy test = allergy test, Doctors,
Doctors-Medical, PHYSICIANS, Physicians  Surgeons  and it works as
expected , Thanks for making it more clear

Thanks
Balaji


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-Not-Working-when-using-SRC-DEST-tp3313862p3316691.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms Not Working when using SRC DEST

2011-09-06 Thread Chris Hostetter


: *allergy test  =  Doctors, Doctors-Medical, PHYSICIANS, Physicians 
: Surgeons
..
: analyzer type=index
...
: filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
: ignoreCase=true expand=true/
...
: But when I do a search for allergy , I get 0 results

You've configured your field so that any time the terms allergy 
and test appear in sequence in a field value you index, those terms are 
removed and replaced by new terms (Doctors, Doctors-Medical, etc...)

So if the term allergy only appears in the source text followed by the 
term test then it will never actually be indexed in your document, so a 
serach for it will never match.

You can see this exact behavior in the screen shot you posted of the 
analysis tool...

: http://lucene.472066.n3.nabble.com/file/n3313862/Screenshot-1.png 

...after the synonyn filter, the term allergy is not in your indexed 
terms.

: when i change the synonym file to a comma separated I am able to see the
: results

because when using a comma instead of = you are saying if any of these 
term sequences exist, expand it to *all* of these term sequences.

Please note the docs on SYnonymFilter, particularly the examples...

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

-Hoss

Re: Synonyms Not Working when using SRC DEST

2011-09-06 Thread balaji

Hi Chris

The Terms Doctors , Doctors-Medical are all present in my Document body,
title fields etc..  but Allergy Test is not . So what I am doing in synonym
file is if a user searches for allergy test bring me results that match
Doctors etc.. i.e 
Explicit mappings match any token sequence on the LHS of =  and replace
with all alternatives on the RHS.

So when I do a search allergy test it should map with doctors and
should bring me results but it is not mapping . Is there any way I make it
work

Hope it clarifies 


Thanks
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-Not-Working-when-using-SRC-DEST-tp3313862p3314222.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Synonyms Not Working when using SRC DEST

2011-09-06 Thread Jaeger, Jay - DOT

It won't work given your current schema.  To get the desired results, you would 
need to expand your synonyms at both index AND query time.  Right now your 
schema seems to specify it only at index time.

So, as the other respondent indicated, currently you replace allergy with the 
other list when indexing, and since allergy is not replaced during query, it 
gets no hits.

It almost sounds like a case where you could consider synonym expansion only at 
query time, rather than at index time (though that is usually not advisable for 
reasons discussed on the Wiki).  Then Allergy would get expanded during a 
search, and hit the documents with Doctors, etc.

JRJ

-Original Message-
From: balaji [mailto:mcabal...@gmail.com] 
Sent: Tuesday, September 06, 2011 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms Not Working when using SRC  DEST

Hi Chris

The Terms Doctors , Doctors-Medical are all present in my Document body,
title fields etc..  but Allergy Test is not . So what I am doing in synonym
file is if a user searches for allergy test bring me results that match
Doctors etc.. i.e 
Explicit mappings match any token sequence on the LHS of =  and replace
with all alternatives on the RHS.

So when I do a search allergy test it should map with doctors and
should bring me results but it is not mapping . Is there any way I make it
work

Hope it clarifies 


Thanks
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-Not-Working-when-using-SRC-DEST-tp3313862p3314222.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms Not Working when using SRC DEST

2011-09-06 Thread balaji

 It won't work given your current schema.  To get the desired results, you
 would need to expand your synonyms at both index AND query time.  Right now
 your schema seems to specify it only at index time.


I have a very huge schema spanning up to 10K lines , if I use query time it
will be huge hit for me because one term will be mapped to multiple terms .
similar in the case of allergy

I doesn't want to go with comma separated as it will give
some erroneous results  and more over allergy and doctors are not equivalent
terms to be used in comma



 So, as the other respondent indicated, currently you replace allergy with
 the other list when indexing, and since allergy is not replaced during
 query, it gets no hits.


I replace allergy during the index with doctors , So it shouldn't be part of
the document ?


Thanks
Balaji


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-Not-Working-when-using-SRC-DEST-tp3313862p3315287.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms vs replacements

2011-08-29 Thread Erick Erickson

See here abou the multi word problem
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

As for the rest, it's a tradeoff (surprise, surprise, surprise G).

You're right, expanding at index time leads to a somewhat
larger index, but less complex queries. And if you change
your synonyms file, you need to re-index from scratch

Indexing at query time lets you keep your synonyms up to
date. But the queries are more complex and somewhat
slower...

Which is better depends (tm), so pick your poison. One
strategy is to expand at index time, and *also* expand
at query time, but with a different synonym file. The idea
is that your query-time synonym file is the set of terms that
you want to add to your index-time expansion next
time you can re-index from scratch. Then periodically you
merge your query-time syns into your index-time syns, re-index
from scratch and empty your query-time syns. Rinse, repeat.

So, there isn't really a right answer. Personally I prefer to
expand at index time, but that's largely a preference.

Best
Erick

On Fri, Aug 26, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote:
 Hello all,



 Which is better?   Say you add an index time synonym between nunchuck
 and nunchuk and then both words will be in the document and both will be
 searchable.   I can get the same exact behavior by putting an index time
 replacement of nunchuck = nunchuk and a search time replacement of the
 same.



 I figured the replacement strategy keeps the the index size slightly
 smaller by only having the one term in the index, but the synonym
 strategy only requires you update the master, not the slave farm, and
 requires slightly less work for the searchers during a user query.  Are
 there any other considerations I should be aware of?



 Thanks



 BTW nunchuk is the correct spelling.  J

Re: synonyms problem

2011-06-06 Thread Erick Erickson

What does call synonym methods in Java mean? That is, what are
you trying to accomplish and from where?

Best
Erick

On Sun, Jun 5, 2011 at 9:48 PM, deniz denizdurmu...@gmail.com wrote:
 well i have changed it into text... but still confused about how to use
 synonyms...

 and also I want to know how to call synonym methods in java... i have tried
 to use synonymmap and some other similar things but nothing happens...
 anyone can give me a sample or a website that i can find examples about solr
 in java?

 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3028353.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms problem

2011-06-06 Thread deniz

well i was trying to say that; i have changed the config files for synonyms
and so on but nothing happens so i thought i needed to do something in java
code too... i was trying to ask about that...

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3032666.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms problem

2011-06-05 Thread deniz

well i have changed it into text... but still confused about how to use
synonyms... 

and also I want to know how to call synonym methods in java... i have tried
to use synonymmap and some other similar things but nothing happens...
anyone can give me a sample or a website that i can find examples about solr
in java?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3028353.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms problem

2011-06-02 Thread Gora Mohanty

On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote:
 Hi all,

 here is a piece from my solfconfig:
[...]
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
[...]

Please provide further details, e.g., is your field in schema.xml using
this fieldType, one example line from the synonyms.txt file, how are
you searching, what results you expect to get, and what are the actual
results.

Also, while this is not the issue here, normally the fieldType
string is a non-analyzed field, and one would normally use
a different fieldType, e.g., text for data that are to be analyzed.

Regards,
Gora

Re: synonyms problem

2011-06-02 Thread lee carroll

Deniz,

it looks like you are missing an index anlayzer ? or have you removed
that for brevity ?

lee c

On 2 June 2011 10:41, Gora Mohanty g...@mimirtech.com wrote:
 On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote:
 Hi all,

 here is a piece from my solfconfig:
 [...]
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 [...]

 Please provide further details, e.g., is your field in schema.xml using
 this fieldType, one example line from the synonyms.txt file, how are
 you searching, what results you expect to get, and what are the actual
 results.

 Also, while this is not the issue here, normally the fieldType
 string is a non-analyzed field, and one would normally use
 a different fieldType, e.g., text for data that are to be analyzed.

 Regards,
 Gora

Re: synonyms problem

2011-06-02 Thread lee carroll

oh and its a string field change this to be text if you need analysis

class=solr.StrField

lee c

On 2 June 2011 11:45, lee carroll lee.a.carr...@googlemail.com wrote:
 Deniz,

 it looks like you are missing an index anlayzer ? or have you removed
 that for brevity ?

 lee c

 On 2 June 2011 10:41, Gora Mohanty g...@mimirtech.com wrote:
 On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote:
 Hi all,

 here is a piece from my solfconfig:
 [...]
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 [...]

 Please provide further details, e.g., is your field in schema.xml using
 this fieldType, one example line from the synonyms.txt file, how are
 you searching, what results you expect to get, and what are the actual
 results.

 Also, while this is not the issue here, normally the fieldType
 string is a non-analyzed field, and one would normally use
 a different fieldType, e.g., text for data that are to be analyzed.

 Regards,
 Gora

Re: synonyms problem

2011-06-02 Thread François Schiettecatte

Are you sure solr.StrField is the way to go with this? solr.StrField stores the 
entire text verbatim and I am pretty sure skips any analysis. Perhaps you 
should use solr.TextField instead.

François

On Jun 2, 2011, at 2:28 AM, deniz wrote:

 Hi all,
 
 here is a piece from my solfconfig:   
 
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
  /analyzer
/fieldType
 
 
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms problem

2011-06-02 Thread deniz

oh thank you for reminding me about string and text issues... I will change
it asap... and about index analyzer i just removed if for brevity... 

i will try again and if it fails will post here again...

thank you so much

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3018185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms valid only in specific categories of data

2011-06-01 Thread lee carroll

I don't think you can assign a synonyms file dynamically to a field.
you would need to create multiple fields for each lang / cat phrases
and have their own synonyms file referenced for each field. that would
be a lot of fields.



On 1 June 2011 09:59, Spyros Kapnissis ska...@yahoo.com wrote:
 Hello to all,


 I have a collection of text phrases in more than 20 languages that I'm 
 indexing
 in solr. Each phrase belongs to one of about 30 different phrase categories. I
 have specified different fields for each language and added a synonym filter 
 at
 query time. I would however like the synonym filter to take into account the
 category as well. So, a specific synonym should be valid and used only in one 
 or
 more categories per language. (the category is indexed in another field).

 Is this somehow possible in the current SynonymFilterFactory implementation?

 Hope it makes sense.

 Thank you,
 Spyros

Re: Synonyms valid only in specific categories of data

2011-06-01 Thread Spyros Kapnissis

Yes that would probably be a lot of fields.. I guess a way would be to extend 
the SynonymFilter and change the format of the synonyms.txt file to take the 
categories into account. 


Thanks again for your answer.




From: lee carroll lee.a.carr...@googlemail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, June 1, 2011 12:23 PM
Subject: Re: Synonyms valid only in specific categories of data

I don't think you can assign a synonyms file dynamically to a field.
you would need to create multiple fields for each lang / cat phrases
and have their own synonyms file referenced for each field. that would
be a lot of fields.



On 1 June 2011 09:59, Spyros Kapnissis ska...@yahoo.com wrote:
 Hello to all,


 I have a collection of text phrases in more than 20 languages that I'm 
 indexing
 in solr. Each phrase belongs to one of about 30 different phrase categories. I
 have specified different fields for each language and added a synonym filter 
 at
 query time. I would however like the synonym filter to take into account the
 category as well. So, a specific synonym should be valid and used only in one 
 or
 more categories per language. (the category is indexed in another field).

 Is this somehow possible in the current SynonymFilterFactory implementation?

 Hope it makes sense.

 Thank you,
 Spyros

Re: Synonyms: whitespace problem

2011-03-30 Thread royr

Thanks, it works!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-whitespace-problem-tp2730953p2753720.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms: whitespace problem

2011-03-25 Thread Ahmet Arslan

 I have a problem with the synonyms. SOLR strips the
 synonyms on white space.
 An example:
 
 manchester united, reds, manunited
 
 My index looks like this:
 
 manchester
 united
 red
 manunited
 
 i want this:
 manchester united
 red
 manunited


You can escape white spaces with back slash.

manchester\ united, reds, manunited

Re: Synonyms question

2011-03-08 Thread Jan Høydahl

http://lmgtfy.com/?q=solr+synonym

(First hit gives many examples)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. mars 2011, at 10.06, Darx Oman wrote:

 Hi guys
 
 How to put this in synonyms.txt
 
 US
 
 USA
 
 United States of America

Re: synonyms file, and example cases

2011-01-25 Thread Stefan Matheis

Cam,

the examples with the provided inline-documentation should help you, no?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

The Backslash \ in that context looks like an Escaping-Character, to avoid
the = to be interpreted as assign-command

Regards
Stefan

On Tue, Jan 25, 2011 at 2:31 AM, Cam Bazz camb...@gmail.com wrote:

 Hello,

 I have been looking at the solr synonym file that was an example, I
 did not understand some notation:

 aaa = 

 bbb = 1 2

 ccc = 1,2

 a\=a = b\=b

 a\,a = b\,b

 fooaaa,baraaa,bazaaa

 The first one says search for  when query is aaa. am I correct?
 the second one finds 1 2 when query is bbb
 the third one is find 1 or 2 when query is ccc

 the fourth, and fifth one I have not understood.

 the last one, i assume is a group, bidirectional mapping between
 fooaaa,baraaa,bazaaa

 I am especially interested with this last one, if I do aaa,bbb it will
 find aaa and bbb when either aaa or bbb is queryied?

 am I correct in those assumptions?

 Best regards,
 C.B.

Re: Synonyms at index time

2011-01-11 Thread Grant Ingersoll


On Jan 10, 2011, at 10:57 PM, TxCSguy wrote:

 
 Hi,
 
 I'm not sure if this question is better posted in Solr - User or Solr - Dev,
 but I'll start here.
 
 I'm interested to find some documentation that describes in detail how
 synonym expansion is handled at index time.  
 http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
 This  article explains what the index looks like for three example
 documents.  However, I'm looking for some documentation about what the index
 (the inverted index) looks like when synonyms are thrown into the mix.  

Synonyms are injected by the token filter and appear as any other tokens.  
Usually they are at the same position as the original word.  Try using Solr's 
Analysis tool (via the admin) to see what it looks like.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Re: synonyms database

2010-12-23 Thread lee carroll

Hi ramzesua,
Synonym lists will often be application specific and will of course be
language specific. Given this I don't think you can talk about a generic
solr synonym list, just won't be very helpful in lots of cases.

What are you hoping to achieve with your synonyms for your app?




On 23 December 2010 11:50, ramzesua michaelnaza...@gmail.com wrote:


 Hi all. Where can I get synonyms database for Solr?
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/synonyms-database-tp2136076p2136076.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms not working with copyfield

2010-05-17 Thread Chris Hostetter


: fields during indexing. However, my search interface is just a text
: box like Google and I need to take the query and return only those
: documents that match ALL terms in the query and if I am going to take

as mentioned previously in this thread: this is exactly what the dismax 
QParser was designed for.

-Hoss

Re: synonyms not working with copyfield

2010-05-13 Thread Gary

Hi Surajit
I aint sure if this is any help, but I had a similar problem but with stop 
words, they were not working with dismax queries. Well to cut a long story it 
seems that all the querying fields need to be configured with stopwords.

Maybe this has the similar affect with Synonyms confguration, thus your 
copyField should be defined as a type that is configured with the 
SynonymFilterFactory, just like 
person_name.

You can find some guidance here:

http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

Gary

Re: synonyms not working with copyfield

2010-05-13 Thread Ahmet Arslan

 I have indexed person names in solr using synonym expansion
 and am getting a
 match when I explicitly use that field in my query
 (name:query). However,
 when I copy that field into another field using copyfield
 and search on that
 field, I don't get a match. Below are excerpts from
 schema.txt. I am new to
 Solr and appreciate any help! Thanks.
 
 Surajit
 
 fieldType name=person_name class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
         tokenizer
 class=solr.WhitespaceTokenizerFactory/
         filter
 class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=0
 catenateWords=1
 catenateNumbers=0 catenateAll=0
 splitOnCaseChange=1/
         filter
 class=solr.LowerCaseFilterFactory/
         filter
 class=solr.SynonymFilterFactory
 synonyms=person-synonyms.txt ignoreCase=true
 expand=true/
       /analyzer
 /fieldType
 
 
 
 
 field name=sngr type=person_name multiValued=true
 indexed=true
 stored=true required=false / 
 
 
 
 
 field name=text type=text indexed=true
 stored=true
 multiValued=true/
 
 
 
 
  copyField source=sngr dest=text/

CopyField just copies raw text, i mean not analyzed. Do you have a filter
class=solr.SynonymFilterFactory synonyms=person-synonyms.txt 
ignoreCase=true expand=true/  in your text fieldType definition?

Re: synonyms not working with copyfield

2010-05-13 Thread surajit


Thanks much! I added a synonym filter to the copyfield and it started working
which is good, but the different fields that I copy into the copyfield need
different analysis and I no longer am able to do that. I can, of course,
search against the individual fields instead of the copyfield, but I want to
return a match only if ALL terms in the query are matched in the overall
document (as in an AND) and if I search against individual fields I am not
sure of an easy way to figure out if all terms matched in the overall
document. Any ideas?

surajit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815263.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms not working with copyfield

2010-05-13 Thread Sachin


 take a look at the DismaxRequestHandler:

http://wiki.apache.org/solr/DisMaxRequestHandler

 


 

 

-Original Message-
From: surajit surajit.em...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, May 13, 2010 9:52 pm
Subject: Re: synonyms not working with copyfield



Thanks much! I added a synonym filter to the copyfield and it started working
which is good, but the different fields that I copy into the copyfield need
different analysis and I no longer am able to do that. I can, of course,
search against the individual fields instead of the copyfield, but I want to
return a match only if ALL terms in the query are matched in the overall
document (as in an AND) and if I search against individual fields I am not
sure of an easy way to figure out if all terms matched in the overall
document. Any ideas?

surajit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815263.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms not working with copyfield

2010-05-13 Thread Chris Hostetter

: which is good, but the different fields that I copy into the copyfield need
: different analysis and I no longer am able to do that. I can, of course,

Fundementally, Solr can only apply a single analysis chain to all of 
the text in a given field -- regardless of where it may be copied from.  
if it didn't, there would be no way to get matches at query time.

the query analysis has to make sense for the index analysis, so it has 
to be consistent.



-Hoss

Re: synonyms not working with copyfield

2010-05-13 Thread surajit

Understood and I can work with that limitation by using separate
fields during indexing. However, my search interface is just a text
box like Google and I need to take the query and return only those
documents that match ALL terms in the query and if I am going to take
the query and match it against each field (separately), how do I get
back documents matching all user terms? One soln I can think of is to
take all the field-specific analysis out of solr and do it as a
pre-process step, but want to make sure there isn't an alternative
within Solr.

surajit

On Thu, May 13, 2010 at 12:42 PM, Chris Hostetter-3 [via Lucene]
ml-node+815302-427668360-263...@n3.nabble.com wrote:
: which is good, but the different fields that I copy into the copyfield
need
: different analysis and I no longer am able to do that. I can, of course,

Fundementally, Solr can only apply a single analysis chain to all of
the text in a given field -- regardless of where it may be copied from.
if it didn't, there would be no way to get matches at query time.

the query analysis has to make sense for the index analysis, so it has
to be consistent.

-Hoss

View message @
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815302.html
To unsubscribe from Re: synonyms not working with copyfield, click here.

--
View this message in context:
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815426.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms not working with copyfield

2010-05-13 Thread Nick Martin

Hi,

You could use a copyField against all fields and then AND the query terms
given. Quite restrictive but all terms would then have to be present to match.
I'm still a relative newbie to Solr so perhaps I'm horribly wrong.

Cheers

Nick

On 13 May 2010, at 18:18, surajit wrote:

surajit

the query analysis has to make sense for the index analysis, so it has
to be consistent.

-Hoss

View message @
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815302.html
To unsubscribe from Re: synonyms not working with copyfield, click here.

--
View this message in context:
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815426.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms problem

2010-03-22 Thread Armando Ota


Have you tried increasing memory size ?

we had some out of memory problems when we used default memory size ..

Kind regards

Armando

michaelnazaruk wrote:

Hi all! I have a little problem with synonyms:
when I set my synonyms.txt file such as:
aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
it's all right! But if I set this file such as
aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
I get exception that not enough memory

Re: synonyms problem

2010-03-22 Thread Lance Norskog

How large is the document, and how often does 'aberrant' appear in it?
Are the other words also in the document?

What is the full analysis stack? There might be interactions between
the SynonymFilter and other filters.

What does the admin/analysis.jsp page show? Does it throw OutOfMemory also?

Does stemming turn two of the terms into the same term?

On Mon, Mar 22, 2010 at 7:48 AM, Armando Ota armando...@siol.net wrote:
 Have you tried increasing memory size ?

 we had some out of memory problems when we used default memory size ..

 Kind regards

 Armando

 michaelnazaruk wrote:

 Hi all! I have a little problem with synonyms:
 when I set my synonyms.txt file such as:

 aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
 it's all right! But if I set this file such as

 aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
 I get exception that not enough memory






-- 
Lance Norskog
goks...@gmail.com

RE: Synonyms from Database

2010-01-11 Thread Peter A. Kirk

You could try to take the code for SynonymFilterFactory as a starting point, 
and adapt it to obtain the synonym configuration from another source than a 
text file.

But I'm not sure what you mean by checking for synonyms at query time. As I 
understand it, Solr works like that anyway - depending on how you configure it. 
The only difference between your new SynonymFilterFactory and Solr's default 
would be where it obtains the synonym configuration from.

You can get Solr to re-read the configuration by issuing a reload command. 
See http://wiki.apache.org/solr/CoreAdmin#RELOAD.

Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:p...@alpha-solutions.dk


-Original Message-
From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] 
Sent: 10. januar 2010 16:20
To: solr-user@lucene.apache.org
Subject: Synonyms from Database

Hi :
 Is there any work done in providing synonyms from a database instead of
synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced
on the fly in the application. This can then be used at query time to check
for synonyms.

I know I am not putting thoughts to the performance implications of this
approach, but will love to hear about others thoughts.

~Ravi.

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 
08:35:00

Re: Synonyms from Database

2010-01-11 Thread Ravi Gidwani

Thanks all for your replies.

I guess what I meant by Query time, and as I understand solr  (and I may be
wrong here) I can add synonyms.txt in the query analyser as follows:

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   
 /analyzer

By this my understanding is , even if the document (at index time) has a
word mathematics and my synonyms.txt file has:

mathematics=math,maths,

a query for math will match mathematics. Since we have the synonyms.txt
in the query analyzer. So I was curious about the database approach on
similar lines.

I get the point of the performance, and I think that is a big NO NO for this
approach. But the idea was to allow changing the synonyms on the fly (more
like adaptive synonyms) and improve the hits.

I guess the only way (as Otis suggested) is to rewrite the file and reload
configuration (as Peter suggested). This might be a performance hit (rewrite
the file) and reload, but I guess still much better than the reading from DB
?

Thanks again for your comments.

~Ravi.


2010/1/10 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
  Ravi,
 
  I think if your synonyms were in a DB, it would be trivial to
 periodically dump them into a text file Solr expects.  You wouldn't want to
 hit the DB to look up synonyms at query time...
 Why query time. Can it not be done at startup time ?
 
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
  - Original Message 
  From: Ravi Gidwani ravi.gidw...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 9, 2010 10:20:18 PM
  Subject: Synonyms from Database
 
  Hi :
   Is there any work done in providing synonyms from a database
 instead of
  synonyms.txt file ? Idea is to have a dictionary in DB that can be
 enhanced
  on the fly in the application. This can then be used at query time to
 check
  for synonyms.
 
  I know I am not putting thoughts to the performance implications of this
  approach, but will love to hear about others thoughts.
 
  ~Ravi.
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com

RE: Synonyms from Database

2010-01-11 Thread Peter A. Kirk

Hi - I don't think you'll see a performance hit using a DB for your synonym 
configuration as opposed to a text file. 

The configuration is only done once (at startup) - or when you reload. You 
won't be reloading every minute, will you? After reading the configuration, the 
synonyms are available to Solr via the SynonymFilter object (at least as I 
understand it from looking at the code).

The reload feature actually sounds quite neat - it will reload in the 
background, and switch in the newly read configuration when it's ready - so 
hopefully no down-time waiting for configuration.

Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:p...@alpha-solutions.dk


-Original Message-
From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] 
Sent: 11. januar 2010 22:43
To: solr-user@lucene.apache.org; noble.p...@gmail.com
Subject: Re: Synonyms from Database

Thanks all for your replies.

I guess what I meant by Query time, and as I understand solr  (and I may be
wrong here) I can add synonyms.txt in the query analyser as follows:

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   
 /analyzer

By this my understanding is , even if the document (at index time) has a
word mathematics and my synonyms.txt file has:

mathematics=math,maths,

a query for math will match mathematics. Since we have the synonyms.txt
in the query analyzer. So I was curious about the database approach on
similar lines.

I get the point of the performance, and I think that is a big NO NO for this
approach. But the idea was to allow changing the synonyms on the fly (more
like adaptive synonyms) and improve the hits.

I guess the only way (as Otis suggested) is to rewrite the file and reload
configuration (as Peter suggested). This might be a performance hit (rewrite
the file) and reload, but I guess still much better than the reading from DB
?

Thanks again for your comments.

~Ravi.


2010/1/10 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
  Ravi,
 
  I think if your synonyms were in a DB, it would be trivial to
 periodically dump them into a text file Solr expects.  You wouldn't want to
 hit the DB to look up synonyms at query time...
 Why query time. Can it not be done at startup time ?
 
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
  - Original Message 
  From: Ravi Gidwani ravi.gidw...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 9, 2010 10:20:18 PM
  Subject: Synonyms from Database
 
  Hi :
   Is there any work done in providing synonyms from a database
 instead of
  synonyms.txt file ? Idea is to have a dictionary in DB that can be
 enhanced
  on the fly in the application. This can then be used at query time to
 check
  for synonyms.
 
  I know I am not putting thoughts to the performance implications of this
  approach, but will love to hear about others thoughts.
 
  ~Ravi.
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com


No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 
08:35:00

Re: Synonyms from Database

2010-01-11 Thread Erik Hatcher



On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:
The reload feature actually sounds quite neat - it will reload in  
the background, and switch in the newly read configuration when  
it's ready - so hopefully no down-time waiting for configuration.


Correct me if I'm wrong, but I don't think that it's true about a  
reload working in the background.  While a core is reloading (and  
warming), it is unavailable for search.  right?  I think you have to  
create a new core, and then swap to keep things alive constantly.


Erik

Re: Synonyms from Database

2010-01-11 Thread Shalin Shekhar Mangar

On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher erik.hatc...@gmail.comwrote:


 On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:

 The reload feature actually sounds quite neat - it will reload in the
 background, and switch in the newly read configuration when it's ready -
 so hopefully no down-time waiting for configuration.


 Correct me if I'm wrong, but I don't think that it's true about a reload
 working in the background.  While a core is reloading (and warming), it is
 unavailable for search.  right?  I think you have to create a new core, and
 then swap to keep things alive constantly.


Core reload swaps the old core with a new core on the same configuration
files with no downtime. See CoreContainer#reload.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Synonyms from Database

2010-01-11 Thread Erik Hatcher



On Jan 11, 2010, at 5:50 AM, Shalin Shekhar Mangar wrote:

On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher  
erik.hatc...@gmail.comwrote:




On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:

The reload feature actually sounds quite neat - it will reload in  
the
background, and switch in the newly read configuration when  
it's ready -

so hopefully no down-time waiting for configuration.



Correct me if I'm wrong, [me saying something wrong]


Core reload swaps the old core with a new core on the same  
configuration

files with no downtime. See CoreContainer#reload.


Sweet!  Thanks for the correction.

Erik

1 2 >

1 - 100 of 146 matches

Mail list logo