Re: Query regarding solr plugin.

2011-04-25 Thread rajini maski
Erick ,
*
*
* Thanks.* It was actually a copy mistake. Anyways i did a redo of all the
below mentioned steps. I had given class name as
filter class=pointcross.orchSynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/

I did it again now following few different steps following this link :
http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm


1 ) Created new package in src folder . *org.apache.pointcross.synonym*.This
is having class Synonym.java

2) Now did a right click on same package and selected export option-Java
tab-JAR File-Selected the path for package - finish

3) This created jar file in specified location. Now followed in cmd  , jar
tfv
org.apache.pointcross.synonym. the following was desc in cmd.

:\Apps\Rajani Eclipse\Solr141_jarjar -
tfv org.apache.pointcross.synonym.Synonym.jar
  25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF
  383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project
 2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath
 1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc

4) Now placed same jar file in solr home/lib folder .Solrconfig.xml
 enabled lib dir=./lib / and in schema  filter class=synonym.Synonym
synonyms=synonyms.txt ignoreCase=true expand=true/

5) Restart tomcat : http://localhost:8097/finding1

Error SEVERE: org.apache.solr.common.SolrException: Error loading class
'pointcross.synonym.Synonym'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:835)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)


I am basically trying to enable this jar functionality to solr. Please let
me know the mistake here.

Rajani




On Fri, Apr 22, 2011 at 6:29 PM, Erick Erickson erickerick...@gmail.comwrote:

 First I appreciate your writeup of the problem, it's very helpful when
 people
 take the time to put in the details

 I can't reconcile these two things:

 {{{filter class=org.apache.pco.search.orchSynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/

 as org.apache.solr.common.SolrException: Error loading class
 'pointcross.orchSynonymFilterFactory' at}}}

 This seems to indicate that your config file is really looking for
 pointcross.orchSynonymFilterFactory rather than
 org.apachepco.search.orchSynonymFilterFactory.

 Do you perhaps have another definition in your config
 pointcross.orchSynonymFilterFactory?

 Try running jar -tfv your jar file to see what classes
 are actually defined in the file in the solr lib directory. Perhaps
 it's not what you expect (Perhaps Eclipse did something
 unexpected).

 Given the anomaly above (the error reported doesn't correspond to
 the class you defined) I'd also look to see if you have any old
 jars lying around that you somehow get to first.

 Finally, is there any chance that your
 pointcross.orchSynonymFilterFactory
 is a dependency of org.apachepco.search.orchSynonymFilterFactory? In
 which case Solr may be finding
 org.apachepco.search.orchSynonymFilterFactory
 but failing to load a dependency (that would have to be put in the lib
 or the jar).

 Hope that helps
 Erick



 On Fri, Apr 22, 2011 at 3:00 AM, rajini maski rajinima...@gmail.com
 wrote:
  One doubt regarding adding the solr plugin.
 
 
   I have a new java file created that includes few changes in
  SynonymFilterFactory.java. I want this java file to be added to solr
  instance.
 
  I created a package as : org.apache.pco.search
  This includes OrcSynonymFilterFactory java class extends
  BaseTokenFilterFactory implements ResourceLoaderAware {code.}
 
  Packages included: import org.apache.solr.analysis.*;
 
  import org.apache.lucene.analysis.Token;
  import org.apache.lucene.analysis.TokenStream;
  import org.apache.solr.common.ResourceLoader;
  import org.apache.solr.common.util.StrUtils;
  import org.apache.solr.util.plugin.ResourceLoaderAware;
 
  import java.io.File;
  import java.io.IOException;
  import java.io.Reader;
  import java.io.StringReader;
  import java.util.ArrayList;
  import java.util.List;
 
 
   I exported this java file in eclipse,
   selecting  File tab-Export to package
  -org.apache.pco.search-OrchSynonymFilterFactory.java
   and generated jar file - org.apache.pco.orchSynonymFilterFactory.jar
 
   This jar file placed in /lib folder of solr home instance
   Changes in solr config - lib dir=./lib /
 
   Now i want to add this in schema fieldtype for synonym filter as
 
  filter class=org.apache.pco.search.orchSynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
 
  But i am not able to do it.. It has an error
  as 

Re: Suggester with multi terms

2011-04-25 Thread Em
blocky,

Shingles should be your way.

Regards,
Em

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-with-multi-terms-tp2859547p2860419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Different Cluster Results on Different Servers, with same SOLR setup

2011-04-25 Thread Pawan Darira
Hi

I have same Solr 1.4 setup on two different servers, One for production 
One for Staging. My production server gives proper cluster   Staging server
give wrong cluster. The problem is for date related cluster only

I have checked all the configuration  setup. everything seems fine. i am
creating index through DIH

p.s. my application  solr setup is similar on staging  production

please suggest any solution.

-- 
Thanks,
Pawan Darira


Re: Query regarding solr plugin.

2011-04-25 Thread Erick Erickson
Looking at things more carefully, it may be one of your dependent classes
that's not being found.

A couple of things to try.

1 when you do a 'jar -tfv yourjar, you should see
output like:
 1183 Sun Jun 06 01:31:14 EDT 2010
org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class
and your filter statement may need the whole path, in this example...
filter class=org.apache.lucene.analysis.sinks.TokenTypeSink/ (note, this
is just an example of the pathing, this class has nothing to do with
your filter)...

2 But I'm guessing your path is actually OK, because I'd expect to be seeing a
class not found error. So my guess is that your class depends on
other jars that
aren't packaged up in your jar and if you find which ones they are and copy them
to your lib directory you'll be OK. Or your code is throwing an error
on load. Or
something like that...

3 to try to understand what's up, I'd back up a step. Make a really
stupid class
that doesn't do anything except derive from BaseTokenFilterFacotry and see if
you can load that. If you can, then your process is OK and you need to
find out what classes your new filter depend on. If you still can't, then we can
see what else we can come up with..

Best
Erick

On Mon, Apr 25, 2011 at 2:34 AM, rajini maski rajinima...@gmail.com wrote:
 Erick ,
 *
 *
 * Thanks.* It was actually a copy mistake. Anyways i did a redo of all the
 below mentioned steps. I had given class name as
 filter class=pointcross.orchSynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/

 I did it again now following few different steps following this link :
 http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm


 1 ) Created new package in src folder . *org.apache.pointcross.synonym*.This
 is having class Synonym.java

 2) Now did a right click on same package and selected export option-Java
 tab-JAR File-Selected the path for package - finish

 3) This created jar file in specified location. Now followed in cmd  , jar
 tfv
 org.apache.pointcross.synonym. the following was desc in cmd.

 :\Apps\Rajani Eclipse\Solr141_jarjar -
 tfv org.apache.pointcross.synonym.Synonym.jar
  25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF
  383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project
  2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath
  1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc

 4) Now placed same jar file in solr home/lib folder .Solrconfig.xml
  enabled lib dir=./lib / and in schema  filter class=synonym.Synonym
 synonyms=synonyms.txt ignoreCase=true expand=true/

 5) Restart tomcat : http://localhost:8097/finding1

 Error SEVERE: org.apache.solr.common.SolrException: Error loading class
 'pointcross.synonym.Synonym'
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
 at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
 at
 org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
 at
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
 at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:835)
 at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)


 I am basically trying to enable this jar functionality to solr. Please let
 me know the mistake here.

 Rajani




 On Fri, Apr 22, 2011 at 6:29 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 First I appreciate your writeup of the problem, it's very helpful when
 people
 take the time to put in the details

 I can't reconcile these two things:

 {{{filter class=org.apache.pco.search.orchSynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/

 as org.apache.solr.common.SolrException: Error loading class
 'pointcross.orchSynonymFilterFactory' at}}}

 This seems to indicate that your config file is really looking for
 pointcross.orchSynonymFilterFactory rather than
 org.apachepco.search.orchSynonymFilterFactory.

 Do you perhaps have another definition in your config
 pointcross.orchSynonymFilterFactory?

 Try running jar -tfv your jar file to see what classes
 are actually defined in the file in the solr lib directory. Perhaps
 it's not what you expect (Perhaps Eclipse did something
 unexpected).

 Given the anomaly above (the error reported doesn't correspond to
 the class you defined) I'd also look to see if you have any old
 jars lying around that you somehow get to first.

 Finally, is there any chance that your
 pointcross.orchSynonymFilterFactory
 is a dependency of org.apachepco.search.orchSynonymFilterFactory? In
 which case Solr may be finding
 org.apachepco.search.orchSynonymFilterFactory
 but failing to load a dependency (that would have to be put in the lib
 or the jar).

 Hope that helps
 Erick



 On Fri, Apr 22, 2011 at 3:00 AM, rajini maski rajinima...@gmail.com
 wrote:
  One doubt regarding adding the solr plugin.
 
 
           I have a new java file created that 

Re: Different Cluster Results on Different Servers, with same SOLR setup

2011-04-25 Thread Erick Erickson
There's not much information to go on here. You haven't stated the
problem so people unfamiliar with your setup can understand it. What
is the error you're getting? Show us the configurations, please.

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Apr 25, 2011 at 4:56 AM, Pawan Darira pawan.dar...@gmail.com wrote:
 Hi

 I have same Solr 1.4 setup on two different servers, One for production 
 One for Staging. My production server gives proper cluster   Staging server
 give wrong cluster. The problem is for date related cluster only

 I have checked all the configuration  setup. everything seems fine. i am
 creating index through DIH

 p.s. my application  solr setup is similar on staging  production

 please suggest any solution.

 --
 Thanks,
 Pawan Darira



Re: Unable to load EntityProcessor implementation for entity:16865747177753

2011-04-25 Thread vrpar...@gmail.com
Thanks firdous_kind86

i replace tikaentityprocessor with xpathentityprocessor and works fine

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-load-EntityProcessor-implementation-for-entity-16865747177753-tp2846513p2861229.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to concatenate two nodes of xml with xpathentityprocessor

2011-04-25 Thread vrpar...@gmail.com
hello ,

i am using Xpathentityprocessor to do index xml files

below is my xml file

Full
   Customer name=a id=1 .. other attributes 
CustomerA/Customer
   Customer name=b id=2 .. other attributes  ThisB/Customer
   Customer name=c id=3 .. other attributes  AnyC/Customer
/Full

now i want to concatenate in index so that when i search it gives below
result

CData with id attribute---  like str id=1CustomerA/strstr
id=2ThisB/str or something like that 

is it possible by RegexTransformer or templatetransformer? i did googling
little for both but could not get excat/useful solution

Thanks

Vishal Parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2861260.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis

2011-04-25 Thread Brian Lamb
It finds something under match but just nothing under response. I tried
turning on debugQuery=on but I did not see anything that jumped out at me as
a bug or anything. Is there some kind of threshold setting that I can tinker
with to see if that is the problem?

On Sun, Apr 24, 2011 at 2:37 AM, Grant Ingersoll gsing...@apache.orgwrote:


 On Apr 21, 2011, at 8:46 PM, Brian Lamb wrote:

  Hi all,
 
  I have an mlt search set up on my site with over 2 million records in the
  index. Normally, my results look like:
 
  response
   lst name=responseHeader
 int name=status0/int
 int name=QTime204/int
   /lst
   result name=match numFound=41750 start=0
 doc
   str name=titleSome result./str
 /doc
   /result
   result name=response numFound=130872 start=0
 doc
   str name=titleA similar result/str
 /doc
 ...
   /result
  /response
 
  And there are 100 results under response. However, in some cases, there
 are
  no results under response. Why is this the case and is there anything I
  can do about it?

 Is it because it couldn't find anything?  Or are you thinking there is a
 bug?  You might try adding debugQuery=true and see what gets parsed, etc.
 and then try running that query.


 
  Here is my mlt configuration:
 
  requestHandler name=/mlt class=solr.MoreLikeThisHandler
   lst name=defaults
 str name=mlt.fltitle,score/str
 int name=mlt.mindf1/int
 int name=rows100/int
 str name=fl*,score/str
/lst
  /requestHandler
 
  And here is the URL I use to get results:
  http://localhost:8983/solr/mlt/?q=title:Some random title
 
  Any help on this matter would be greatly appreciated. Thanks!

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem docs using Solr/Lucene:
 http://www.lucidimagination.com/search




RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen
Sorry, that was supposed to be just another way to say the same thing...
OK look here is my current situation.  Even with preserveOriginal and
concatAll set, I am still getting an even odder result.

I set up sku=218078624 with title= Beanbag AppleTV Friction Dash Mount
for GPS  and index it in dev.

The search and index analyzer stack are the same.  When I do this search
in the solr admin page I get zero results  sku:218078624  title:AppleTV
 but when I do this search I get one result  sku:218078624
title:appletv .  This is the opposite of what was happening before I
added the preserve original setting.  In the analysis page I plug in
that title and term, and it looks to me like it should match... which is
why I started asking about term positions and such.  I don't understand
why I don't get a hit in both cases.  It is so weird.



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Friday, April 22, 2011 5:55 PM
To: Robert Petersen
Cc: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen rober...@buy.com
wrote:
 I can repeatedly demonstrate this in my dev environment, where I get
 entirely different results searching for AppleTV vs. appletv

You originally said I cannot get a match between AppleTV on the
indexing side and appletv on the search side.
Getting different numbers of results or different results is slightly
different.

For example, if there were a document with Apple TV in it, then a
query of AppleTV would match that doc, but a query of appletv
would not.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Yonik Seeley
On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote:
 The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Good protwords.txt ?

2011-04-25 Thread Otis Gospodnetic
Hi,

Are there any good / comprehensive examples of protwords.txt for English?
Or good stemdict.txt examples that work with StemmerOverrideFilterFactory?

Would be good to have a good example to include in Solr distribution...

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



RE: Solr - Multi Term highlighting issue

2011-04-25 Thread Ramanathapuram, Rajesh
Hi Robert, 

Thanks for your help. 

This looks much closer to my issue(may be not). Unfortunately, I can't
switch to solr version 3.1 yet. 
I hope to revisit and update this post when I do.

Thanks

thanks  regards,
Rajesh Ramana 
Enterprise Applications, Turner Broadcasting System, Inc.
404.878.7474 


-Original Message-
From: Ramanathapuram, Rajesh [mailto:rajesh.ramanathapu...@turner.com] 
Sent: Sunday, April 24, 2011 1:58 AM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

I think I am using ver 1.4, I 'll try to review the link you provided
later today.

Rajesh Ramana




On Apr 24, 2011, at 12:52 AM, Robert Muir rcm...@gmail.com wrote:

 On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh 
 rajesh.ramanathapu...@turner.com wrote:
 What is really weird is if I search for srchterm1 and srchterm2 
 separately, the results come up fine. If I search for multiple terms,

 this issue seems to happen when the terms are separated by html tags 
 and special characters like ') / \' etc...
 
 
 What version of Solr are you using? Because you are saying the issue 
 only happens when terms involve special characters, its possible it 
 could be this bug: https://issues.apache.org/jira/browse/LUCENE-2874,
 with the overlapping terms being created by the WordDelimiterFilter.
 
 This is fixed in 3.1.


Re: Good protwords.txt ?

2011-04-25 Thread Robert Muir
On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 Are there any good / comprehensive examples of protwords.txt for English?
 Or good stemdict.txt examples that work with StemmerOverrideFilterFactory?

 Would be good to have a good example to include in Solr distribution...


I brought this up a while ago (as I am probably more than 50-60% done
with all of this via 2+2lemma.txt) and there was no interest:

http://www.lucidimagination.com/search/document/180c90276e589d68/solr_example_synonyms_file


Automatic synonyms for multiple variations of a word

2011-04-25 Thread Otis Gospodnetic
Hi,

How do people handle cases where synonyms are used and there are multiple 
version of the original word that really need to point to the same set of 
synonyms?

For example:
Consider singular and plural of the word responsibility.  One might have 
synonyms defined like this:

  responsibility, obligation, duty

But the plural responsibilities is not in there, and thus it will not get 
expanded to the synonyms above! That's a problem.

Sure, one could change the synonyms file to look like this:

  responsibility, responsibilities, obligation, duty

But that means somebody needs to think of all variations of the word! 

Is there a something one can do to get all variations of the word to map to the 
same synonyms without having to explicitly specify all variations of the word?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen
Aha!  I knew something must be awry, but when I looked at the analysis
page output, well it sure looked like it should match.  :)

OK here is the query side WDF that finally works, I just turned
everything off.  (yay)  First I tried just completely removeing WDF from
the query side analyzer stack but that didn't work.  So anyway I suppose
I should turn off the catenate all plus the preserve original settings,
reindex, and see if I still get a match huh?  (PS  thank you very much
for the help!!!)

  filter class=solr.WordDelimiterFilterFactory
generateWordParts=0
generateNumberParts=0
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=0
/  



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Monday, April 25, 2011 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com
wrote:
 The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
FilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Lucene Rev Stump the Chump

2011-04-25 Thread Grant Ingersoll
Hey everyone,

As you no doubt by now know, Lucene Revolution, the second annual Lucene/Solr 
conference sponsored by Lucid Imagination, is happening out in San Francisco at 
the end of May.  There are a lot of really great talks and speakers from across 
the spectrum  (check out lucenerevolution.org if you haven't already) on how 
people tackled and solved tough problems across the Lucene/Solr space. 

Now, it's time for  _your_ toughest, most challenging Solr/Lucene questions.  
Back by popular demand at this year's Revolution conference, I'll be on the hot 
seat for Stump The Chump! -- where I'll spontaneously field Solr/Lucene 
questions I've never seen before in front of a hundreds of people. 

But in order to be a success, we need your questions/problems/challenges.  
Please email a description of your Lucene/Solr problem to 
i...@lucenerevolution.org (don't reply here, as I don't want to see it ahead of 
time)

You can read more details online at http://bit.ly/stump-grant  

Even if you won't be able to make it to San Francisco, please send in any good 
questions you would be interested to see me tackle under the spotlight.  We'll 
record the session on video and post it online shortly after the conference 
(we're exploring a webcast -- still TBD). 

Grant

Re: Multi-word Solr Synonym issue

2011-04-25 Thread Chris Hostetter

: Subject: Multi-word Solr Synonym issue
: In-Reply-To: banlktikq66d40+dprrdyihshsjhdmxs...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
I have a field 'type' that has several values. If it's type 'foo' then 
it also has a field 'restriction_id'.

What I want is a filter query which says either it's not a 'foo' or if 
it is then it has the restriction '1'

I expect two matches - one of type 'bar' and one of type 'foo' 

Neither

 fq=(-type:foo OR restriction_id:1)
 fq={!dismax q.op=OR}-type:foo restriction_id:1

produce any results.

 fq=restriction_id:1

gets the 'foo' typed result.

 fq=type:bar 

get the 'bar' typed result.

Either of these

  fq=type:[* TO *] OR (type:foo AND restriction_id:1)
  fq=type:(bar OR quux OR fleeg) OR restriction_id:1

do work but are very, very slow to the point of unusability (our indexes 
are pretty large).

Searching round it seems like other people have experienced similar 
issues and the answer has been Lucene just doesn't work like that

When dealing with Lucene people are strongly encouraged to think in 
terms of MUST, MUST_NOT and SHOULD (which are represented in the query 
parser as the prefixes +, - and the default) instead of in terms of 
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's 
QueryParser) is not a strict Boolean Logic system, so it's best not to 
try and think of it like one.

  http://wiki.apache.org/lucene-java/BooleanQuerySyntax

Am I just out of luck? Might edismax help here?

Simon








Re: Negative OR in fq field not working as expected

2011-04-25 Thread Jonathan Rochkind
The solr 'lucene' query parser (that's being used there, in an fq) 
sometimes has trouble with pure negative clauses in an OR.


Even though it can handle pure negative queries like -type:foo, it 
has trouble with pure negative in an OR like you are doing. At least in 
1.4.1, don't know if it's been improved in 3.1.  I _think_ you may have 
a case it has trouble with.


This is what I do instead, to rewrite the query to mean the same thing 
but not give the lucene query parser trouble:


fq=( (*:* AND -type:foo) OR restriction_id:1)

*:* means everything, so (*:* AND -type:foo) means the same thing as 
just -type:foo, but can get around the lucene query parsers troubles.


So that might work for you.

Dismax has even WORSE problems with pure negative, with no easy way to 
get around em, so switching to dismax is probably not helpful there.


On 4/25/2011 4:27 PM, Simon Wistow wrote:

I have a field 'type' that has several values. If it's type 'foo' then
it also has a field 'restriction_id'.

What I want is a filter query which says either it's not a 'foo' or if
it is then it has the restriction '1'

I expect two matches - one of type 'bar' and one of type 'foo'

Neither

  fq=(-type:foo OR restriction_id:1)
  fq={!dismax q.op=OR}-type:foo restriction_id:1

produce any results.

  fq=restriction_id:1

gets the 'foo' typed result.

  fq=type:bar

get the 'bar' typed result.

Either of these

   fq=type:[* TO *] OR (type:foo AND restriction_id:1)
   fq=type:(bar OR quux OR fleeg) OR restriction_id:1

do work but are very, very slow to the point of unusability (our indexes
are pretty large).

Searching round it seems like other people have experienced similar
issues and the answer has been Lucene just doesn't work like that

When dealing with Lucene people are strongly encouraged to think in
terms of MUST, MUST_NOT and SHOULD (which are represented in the query
parser as the prefixes +, - and the default) instead of in terms of
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's
QueryParser) is not a strict Boolean Logic system, so it's best not to
try and think of it like one.

   http://wiki.apache.org/lucene-java/BooleanQuerySyntax

Am I just out of luck? Might edismax help here?

Simon









Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
 This is what I do instead, to rewrite the query to mean the same thing but 
 not give the lucene query parser trouble:
 
 fq=( (*:* AND -type:foo) OR restriction_id:1)
 
 *:* means everything, so (*:* AND -type:foo) means the same thing as 
 just -type:foo, but can get around the lucene query parsers troubles.
 
 So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works 
it's also unbelievably slow (~30s query time).

Would writing my own Query Parser help here?

Simon






Re: normalizing the score

2011-04-25 Thread Chris Hostetter


: All I found was: 
http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score
: 
: where Hoss suggests to normalize depending on the maxScore.

to be clear, i do not (nor have i ever) suggested that someone normalize 
based on maxScore.

my point there was that when [people *insist* on providing osme sort of 
normalization, the maxScore is always available if they want to use it

: I am not comfortable with that since, at least, I want that a search for 
: the wombats in a directory of mathematical concepts, and display that 
: all scores are pretty bad and not display 1.0 for matches that are only 
: on the word the.

the crux of the problem is in deciding what you want to normalize relative 
to -- the ideal solution is to normalize relative the maximum *possible* 
score for *any* query against your corpus, but that's not something that's 
generally feasible to do (and based on experiments i tried once, it didn't 
seem like it would be very useful anyway)

: It seems that the strategy would be to normalize by maxScore if the maxScore 
is bigger than 1.0.
: Can you confirm that?
: Isn't there going to be similar edge cases as above?
: 
: I remember a time where Lucene results' score were always normalized. 
: That seems to be not in SOLR, or?

once upon a time, lucene's most beginer freindly api did provide 
normalized scores, using the approach you described (divide by max score 
if max score greater then 1.0) and it had all of the problems you might 
expect -- but some people liked it because they had an irrational dislike 
for scores greater then 1.

Solr has never supported those psuedo-nromalize scores, and lucene's java 
API eventually got rid of them.

-Hoss


Re: Negative OR in fq field not working as expected

2011-04-25 Thread Yonik Seeley
On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow si...@thegestalt.org wrote:
 On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
 This is what I do instead, to rewrite the query to mean the same thing but
 not give the lucene query parser trouble:

 fq=( (*:* AND -type:foo) OR restriction_id:1)

 *:* means everything, so (*:* AND -type:foo) means the same thing as
 just -type:foo, but can get around the lucene query parsers troubles.

 So that might work for you.

 Thanks for confirming my suspicions.

 Unfortunately I've tried that as well and, whilst it works
 it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if type is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: normalizing the score

2011-04-25 Thread Paul Libbrecht
Thanks for the precision Hoss,

that is helpful an explanation.
I am still unsure how it is ever possible to display score-bars for which you 
need some normalization... but that's for another day.

I feel indications of match quality is still somehow a science that has not 
blossomed yet.
Sorting by score is, however, in very good shape.

paul


Le 25 avr. 2011 à 22:53, Chris Hostetter a écrit :

 
 
 : All I found was: 
 http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score
 : 
 : where Hoss suggests to normalize depending on the maxScore.
 
 to be clear, i do not (nor have i ever) suggested that someone normalize 
 based on maxScore.
 
 my point there was that when [people *insist* on providing osme sort of 
 normalization, the maxScore is always available if they want to use it
 
 : I am not comfortable with that since, at least, I want that a search for 
 : the wombats in a directory of mathematical concepts, and display that 
 : all scores are pretty bad and not display 1.0 for matches that are only 
 : on the word the.
 
 the crux of the problem is in deciding what you want to normalize relative 
 to -- the ideal solution is to normalize relative the maximum *possible* 
 score for *any* query against your corpus, but that's not something that's 
 generally feasible to do (and based on experiments i tried once, it didn't 
 seem like it would be very useful anyway)
 
 : It seems that the strategy would be to normalize by maxScore if the 
 maxScore is bigger than 1.0.
 : Can you confirm that?
 : Isn't there going to be similar edge cases as above?
 : 
 : I remember a time where Lucene results' score were always normalized. 
 : That seems to be not in SOLR, or?
 
 once upon a time, lucene's most beginer freindly api did provide 
 normalized scores, using the approach you described (divide by max score 
 if max score greater then 1.0) and it had all of the problems you might 
 expect -- but some people liked it because they had an irrational dislike 
 for scores greater then 1.
 
 Solr has never supported those psuedo-nromalize scores, and lucene's java 
 API eventually got rid of them.
 
 -Hoss



Re: Negative OR in fq field not working as expected

2011-04-25 Thread Jonathan Rochkind

Yeah, I do the (*:* AND -type:foo) OR something:else

thing on my own pretty big index, and it's not slow at all.  At least no 
slower than doing any other X OR Y where X and Y both include lots of 
results.


Pre-warming the field cache for, in this case, the 'type' field may 
help. Same as it would if 'X' were just type:bar (not negated) where 
type:bar matched about the same number or documents as -type:foo 
does in your case.  In general, there's nothing special that should make 
that slow, it's a pretty ordinary query, really. Just using weird syntax 
to get around lucene query parser  issues.


[Obligatory mention: This may have nothing to do with your issue, but I 
have found occasions where not having enough RAM allocated to Solr 1.4.1 
can make things terribly slow, even though there is no OutOfMemory error 
or other error in the logs. Especially if you are doing facetting and/or 
StatsComponent.  Excaserbated if you are using the default JVM GC 
strategies instead of picking some of the concurrent strategies.]


On 4/25/2011 5:02 PM, Yonik Seeley wrote:

On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistowsi...@thegestalt.org  wrote:

On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:

This is what I do instead, to rewrite the query to mean the same thing but
not give the lucene query parser trouble:

fq=( (*:* AND -type:foo) OR restriction_id:1)

*:* means everything, so (*:* AND -type:foo) means the same thing as
just -type:foo, but can get around the lucene query parsers troubles.

So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works
it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if type is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco



Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said:
 It really shouldn't be that slow... how many documents are in your
 index, and how many match -type:foo?

Total number of docs is 161,000,000

 type:foo  39,000,000
-type:foo 122,200,000 
 type:bar 90,000,000

We're aware it's large and we're in the process or splitting the index 
up but I was just hoping that there was a workaround I could use in 
order to reclaim some performance.






Re: Reloading synonyms.txt without downtime

2011-04-25 Thread Chris Hostetter

: Apparently, when one RELOADs a core, the synonyms file is not reloaded.  Is 
this 
: 
: the expected behaviour?  Is it the desired behaviour?

this is not expected, nor is it desired (by me) nor can i reproduce the 
problem you are talking about.

steps i attempted to reproduce:

1) started the example (on trunk)

2) loaded the analysis.jsp page, changed the field pulldown to type and 
entered text for the type name.  entered bbbfoo in the Field value 
(Query) box, and hit the button.

3) verified that synonym filter produced ar as a query time synonym.

4) edited the example synony.txt file to add bbbxxx to the list of 
synonyms for bbbfoo

5) hit this url: 
http://localhost:8983/solr/admin/cores?action=RELOADcore=collection1

6) went back to the analysis.jsp page and hit the button again.

7) verified that the results changed, and now bbbxxx was produced as 
well.

If you are seeing situations where after a core reload you do *not* see 
changes to the synonyms.txt file, then either there is an edge case bug, 
or perhaps you aren't changing what you think?

providing more details about your setup and steps to reproduce would be 
helpful.

: Issue https://issues.apache.org/jira/browse/SOLR-1307 mentions this a bit, 
but 
: doesn't go in a lot of depth.

I don't understand this sentence ... that issue is a feature request for a 
(new) general way for plugins to re-init themselves (or some aspect of 
their config) with out requing an entire core reload, i don't see any 
comments in that issue (other then the one where you mention this thread) 
suggesting that a core reload doesn't currently cause synonyms to reload 
... if you can be specific about what you mean that would be helpful.

-Hoss


Problems with Spellchecker in 3.1

2011-04-25 Thread Bob Sandiford
Oops.  Sorry.  I'm hijacking my own thread to put a real Subject in place...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Bob Sandiford
 Sent: Monday, April 25, 2011 5:34 PM
 To: solr-user@lucene.apache.org
 Subject:
 
 Hi, all.
 
 We're having some troubles with the Solr Spellcheck Response.  We're
 running version 3.1.
 
 Overview:  If we search for something really ugly like:  
 kljhklsdjahfkljsdhf book rck
 
 then when we get back the response, there's a suggestions list for
 'rck', but no suggestions list for the other two words.  For 'book',
 that's fine, because it is 'spelled correctly' (i.e. we got hits on the
 word) and there shouldn't be any suggestions.  For the ugly thing,
 though, there aren't any hits.
 
 The problem is that when we're handling the result, we can't tell the
 difference between no suggestions for a 'correctly spelled' term, and
 no suggestions for something that's odd like this.
 
 (Now - this is happening with searches that aren't as obviously garbage
 - this was just to illustrate the point).
 
 Our setup:
 We're running multiple shards, which may be part of the issue.  For
 example, 'book' might be found in one of the shards, but not another.
 
 I don't *think* this has anything to do with our schema, since it's
 really how the Search Suggestions are being returned to us.
 
 What we'd really like to see is the response coming back with an
 indication that a word wasn't found / had no suggestions.  We've hacked
 around in the code a little bit to do this, but were wondering if
 anyone has come across this, and what approaches you've taken.
 
 Here's the xml we're getting back from the search:
 
 
 ?xml version=1.0 encoding=UTF-8?
 response
 
 lst name=responseHeader
   int name=status0/int
   int name=QTime56/int
   lst name=params
 str name=spellchecktrue/str
 str name=facettrue/str
 str name=sortscore desc, RELEVANCE_SORT_nsort desc/str
 str name=shards.qtspellcheckedStandard/str
 str name=hl.mergeContiguoustrue/str
 str name=facet.limit1000/str
 str name=hltrue/str
 str name=fl ELECTRONIC_ACCESS_display ISBN_display TITLE_boost
 FORMAT_display score MEDIA_TYPE_display AUTHOR_boost LOCALURL_display
 UPC_display id DOC_ID_display CHILD_SITE_display DS_EC
 PRIMARY_AUTHOR_boost PRIMARY_TITLE_boost DS_ID TOPIC_display
 ASSET_NAME_display OCLC_display/str
 str
 name=shardslocalhost:8983/solr/SD_ILS/,localhost:8983/solr/SD_ASSET/
 /str
 arr name=facet.field
   strAUTHOR_facet/str
   strFORMAT_facet/str
   strLANGUAGE_facet/str
   strPUBDATE_nfacet/str
   strSUBJECT_facet/str
   strABCDEF_cfacet/str
 /arr
 str name=qtspellcheckedStandard/str
 arr name=fq
   strACCESS_LEVEL_nfacet:0/str
   strCLEARANCE_nfacet:0/str
   strNEED_TO_KNOWS_facet:@@EMPTY@@/str
   strCITIZENSHIPS_facet:@@EMPTY@@/str
   strRESTRICTIONS_facet:@@EMPTY@@/str
 /arr
 str name=facet.mincount1/str
 str name=indenttrue/str
 str name=hl.fl*/str
 str name=rows12/str
 str name=hl.snippets5/str
 str name=start0/str
 str name=qTITLE_boost:kljhklsdjahfkljsdhf book rck~100^200.0
 OR PRIMARY_AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^100.0 OR
 DOC_TEXT:kljhklsdjahfkljsdhf book rck~100^2 OR
 PRIMARY_TITLE_boost:kljhklsdjahfkljsdhf book rck~100^1000.0 OR
 AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^20.0 OR
 textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND
 textFuzzy:rck~0.7/str
   /lst
 /lst
 result name=response numFound=0 start=0 maxScore=0.0/
 lst name=facet_counts
   lst name=facet_queries/
   lst name=facet_fields
 lst name=AUTHOR_facet/
 lst name=FORMAT_facet/
 lst name=LANGUAGE_facet/
 lst name=PUBDATE_nfacet/
 lst name=SUBJECT_facet/
 lst name=ABCDEF_cfacet/
   /lst
   lst name=facet_dates/
   lst name=facet_ranges/
 /lst
 lst name=highlighting/
 lst name=spellcheck
   lst name=suggestions
 lst name=rck
   int name=numFound5/int
   int name=startOffset362/int
   int name=endOffset365/int
   int name=origFreq0/int
   arr name=suggestion
 lst
   str name=wordrock/str
   int name=freq24000/int
 /lst
 lst
   str name=wordrick/str
   int name=freq6048/int
 /lst
 lst
   str name=wordrack/str
   int name=freq84/int
 /lst
 lst
   str name=wordreck/str
   int name=freq78/int
 /lst
 lst
   str name=wordruck/str
   int name=freq30/int
 /lst
   /arr
 /lst
 bool name=correctlySpelledfalse/bool
   /lst
 /lst
 /response
 
 
 
 Thanks!
 
 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.com




Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution

2011-04-25 Thread Jay Hill
I've worked with a lot of different Solr implementations, and one area that
is emerging more and more is using Solr in combination with other big data
solutions. My company, Lucid Imagination, has added a two-day course to our
upcoming Lucene Revolution conference, Scaling Search with Big Data and
Solr, that covers Hadoop  Solr, on May 23-24 - it'll be at Lucene
Revolution in San Francisco (the conference is on May 25-26 -- see
lucenerevolution.org).

Description: The class covers Hadoop from the ground up, including
MapReduce, the Hadoop Distributed File System (HDFS), cluster management,
etc., before continuing on to connect it to Solr. Students will study common
use cases for generating search indexes from big data, typical patterns for
the data processing workflow, and how to make it all work reliably at scale.
We will explore in-depth an example of processing 1 billion records to
create a faceted Solr search solution.

This course will be presented on May 23 and 24 at the Lucene Revolution
conference in San Francisco (the conference is on May 25-26 -- see
lucenerevolution.org). Details here:
http://lucenerevolution.org/training#solr-scaling

I've been asked by a lot of Solr users whether Lucid offers anything like
this, so I know there is a lot of interest out there.

-Jay


solr sorting on multiple conditions, please help

2011-04-25 Thread James Lin
Hi Folks,

I got a problem on solr sorting as below:

sort=query({!v=area_id: 78153}) desc, score desc

What I want to achieve is sort by if there is a match with area_id, then
sort by the actual score

problem is, area_id is a multiple value, the result I am getting does not
sort by the actual score even the results all matches area_id 78153

I am getting results like this

Area 2, score 0.21
Area 3, score 0.38
Area 4, score 0.23

but the result should be like this

Area 3, score 0.38
Area 4, score 0.23
Area 2, score 0.21

Thanks heaps in advanced.

Regards

James


Re: Good protwords.txt ?

2011-04-25 Thread Otis Gospodnetic
Hi Robert,

That's some old thread from 1969 - that's before my time! :)

I'm not sure what 2+2lemma.txt is... aha, I see it on 
http://wordlist.sourceforge.net/12dicts-readme-r5.html -- a headword + N 
related 
words.  I don't think this will help me tame the overly aggressive Porter 
stemmer, although your sample stemmer corrections for textTight, the 
plural-only stemmer (via StemmerOverrideFilter) looks good and like something 
that *would* help me tame Porter.

errataerratum
newsnews
radii  radius
cavalrymen  cavalryman
...

Is the full dictionary you've built available anywhere for download?

Thanks,
Otis
P.S.
I saw that thread at http://search-lucene.com/m/jeWPi1X3FVw started a debate 
over what to include by default, concerns over performance, etc. -- I'd say 
it's 
better to include things like the above and comment it out (if we are afraid of 
poor performance out of the box or some such) than not providing it at all.

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Robert Muir rcm...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, April 25, 2011 2:20:45 PM
 Subject: Re: Good protwords.txt ?
 
 On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com  wrote:
  Hi,
 
  Are there any good / comprehensive examples  of protwords.txt for English?
  Or good stemdict.txt examples that work  with StemmerOverrideFilterFactory?
 
  Would be good to have a good  example to include in Solr distribution...
 
 
 I brought this up a  while ago (as I am probably more than 50-60% done
 with all of this via  2+2lemma.txt) and there was no interest:
 
http://www.lucidimagination.com/search/document/180c90276e589d68/solr_example_synonyms_file
e
 


Re: Automatic synonyms for multiple variations of a word

2011-04-25 Thread Otis Gospodnetic
Hi Otis  Robert,

 - Original Message 


 How do people handle cases where synonyms are used and there are  multiple 
 version of the original word that really need to point to the same  set of 
 synonyms?
 
 For example:
 Consider singular and plural of the  word responsibility.  One might have 
 synonyms defined like  this:
 
   responsibility, obligation, duty
 
 But the plural  responsibilities is not in there, and thus it will not get 
 expanded to the  synonyms above! That's a problem.
 
 Sure, one could change the synonyms  file to look like this:
 
   responsibility, responsibilities,  obligation, duty
 
 But that means somebody needs to think of all variations  of the word! 

Yes, that seems to be the case now, as it was in 2008:
http://search-lucene.com/m/gLwUCV0qU02subj=Re+Synonyms+and+stemming+revisited
http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that 
suggestion doesn't actually work)

 Is there a something one can do to get all variations of  the word to map to 
the 

 same synonyms without having to explicitly specify  all variations of the 
word?

I think this is where Robert's 2+2lemma pointer may help because the 2+lemma 
list contains records where a headword is followed by a list of other 
variations of the word.  The way I think this would help is by simply taking 
that list and turning it into the synonyms file format, and then merging in the 
actual synonyms.

For example, if I have the word responsibility, then from 2+2lemma I should 
be 
able to get that responsibilities is one of the variants of responsibility. 
 
I should then be able to take those 2 words and stick them in synonyms file 
like 
this:

  responsibility, responsibilities

And then append actual synonyms to that:

  responsibility, responsibilities, obligation, duty

But I may then need to actually expand synonyms themselves, too (again using 
data from 2+2lemma):

  responsibility, responsibilities, obligation, obligations, duty, duties


I haven't tried this yet.  Just theorizing and hoping for feedback.

Does this sound about right?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



Re: Automatic synonyms for multiple variations of a word

2011-04-25 Thread Lance Norskog
This has come up with stemming: you can stem your synonym list with
the FieldAnalyzer Solr http call, then save the final chewed-up terms
as a new synonym file. You then use that one in the analyzer stack
below the stemmer filter.

On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Otis  Robert,

  - Original Message 


 How do people handle cases where synonyms are used and there are  multiple
 version of the original word that really need to point to the same  set of
 synonyms?

 For example:
 Consider singular and plural of the  word responsibility.  One might have
 synonyms defined like  this:

   responsibility, obligation, duty

 But the plural  responsibilities is not in there, and thus it will not get
 expanded to the  synonyms above! That's a problem.

 Sure, one could change the synonyms  file to look like this:

   responsibility, responsibilities,  obligation, duty

 But that means somebody needs to think of all variations  of the word!

 Yes, that seems to be the case now, as it was in 2008:
 http://search-lucene.com/m/gLwUCV0qU02subj=Re+Synonyms+and+stemming+revisited
 http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that
 suggestion doesn't actually work)

 Is there a something one can do to get all variations of  the word to map to
the

 same synonyms without having to explicitly specify  all variations of the
 word?

 I think this is where Robert's 2+2lemma pointer may help because the 2+lemma
 list contains records where a headword is followed by a list of other
 variations of the word.  The way I think this would help is by simply taking
 that list and turning it into the synonyms file format, and then merging in 
 the
 actual synonyms.

 For example, if I have the word responsibility, then from 2+2lemma I should 
 be
 able to get that responsibilities is one of the variants of 
 responsibility.
 I should then be able to take those 2 words and stick them in synonyms file 
 like
 this:

  responsibility, responsibilities

 And then append actual synonyms to that:

  responsibility, responsibilities, obligation, duty

 But I may then need to actually expand synonyms themselves, too (again using
 data from 2+2lemma):

  responsibility, responsibilities, obligation, obligations, duty, duties


 I haven't tried this yet.  Just theorizing and hoping for feedback.

 Does this sound about right?

 Thanks,
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/





-- 
Lance Norskog
goks...@gmail.com


Re: Automatic synonyms for multiple variations of a word

2011-04-25 Thread Otis Gospodnetic
Right, instead of this in synonyms file:

  responsibility, obligation, duty

 
I could stem each of the above words/synonyms and have something like this in 
synonyms file:

  respons, oblig, duti

But somehow this feels bad (well, so does sticking word variations in what's 
supposed to be a synonyms file), partly because it means that the person adding 
new synonyms would need to know what they stem to (or always check it against 
Solr before editing the file).

I've never seen anyone actually use such a synonyms file in production, have 
you?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, April 26, 2011 12:20:05 AM
 Subject: Re: Automatic synonyms for multiple variations of a word
 
 This has come up with stemming: you can stem your synonym list with
 the  FieldAnalyzer Solr http call, then save the final chewed-up terms
 as a new  synonym file. You then use that one in the analyzer stack
 below the stemmer  filter.
 
 On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com  wrote:
  Hi Otis  Robert,
 
   - Original Message  
 
 
  How do people handle cases where synonyms  are used and there are  multiple
  version of the original word that  really need to point to the same  set of
   synonyms?
 
  For example:
  Consider singular and  plural of the  word responsibility.  One might 
have
  synonyms  defined like  this:
 
responsibility, obligation,  duty
 
  But the plural  responsibilities is not in there,  and thus it will not 
get
  expanded to the  synonyms above! That's a  problem.
 
  Sure, one could change the synonyms  file to  look like this:
 
responsibility, responsibilities,   obligation, duty
 
  But that means somebody needs to think  of all variations  of the word!
 
  Yes, that seems to be the case  now, as it was in 2008:
  
http://search-lucene.com/m/gLwUCV0qU02subj=Re+Synonyms+and+stemming+revisited
  http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think  that
  suggestion doesn't actually work)
 
  Is there a  something one can do to get all variations of  the word to map 
   
to
 the
 
  same synonyms without having to  explicitly specify  all variations of the
  word?
 
  I think  this is where Robert's 2+2lemma pointer may help because the 
2+lemma
   list contains records where a headword is followed by a list of other
   variations of the word.  The way I think this would help is by simply  
taking
  that list and turning it into the synonyms file format, and then  merging 
  in 
the
  actual synonyms.
 
  For example, if I have  the word responsibility, then from 2+2lemma I 
should be
  able to get  that responsibilities is one of the variants of 
responsibility.
  I  should then be able to take those 2 words and stick them in synonyms 
  file  
like
  this:
 
   responsibility,  responsibilities
 
  And then append actual synonyms to  that:
 
   responsibility, responsibilities, obligation,  duty
 
  But I may then need to actually expand synonyms themselves,  too (again 
using
  data from 2+2lemma):
 
   responsibility,  responsibilities, obligation, obligations, duty, duties
 
 
   I haven't tried this yet.  Just theorizing and hoping for  feedback.
 
  Does this sound about right?
 
   Thanks,
  Otis
  
  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 
 
 -- 
 Lance  Norskog
 goks...@gmail.com
 


Re: Query regarding solr plugin.

2011-04-25 Thread rajini maski
Thanks Erick. I have added my replies to the points you did mention. I am
somewhere going wrong. I guess do I need to club both the jars or something
? If yes, how do i do that? I have no much idea about java and jar files.
Please guide me here.

A couple of things to try.

1 when you do a 'jar -tfv yourjar, you should see
output like:
 1183 Sun Jun 06 01:31:14 EDT 2010
org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class
and your filter statement may need the whole path, in this example...
filter class=org.apache.lucene.analysis.sinks.TokenTypeSink/ (note,
this
is just an example of the pathing, this class has nothing to do with
your filter)...

I could see this output..

2 But I'm guessing your path is actually OK, because I'd expect to be
seeing a
class not found error. So my guess is that your class depends on
other jars that
aren't packaged up in your jar and if you find which ones they are and copy
them
to your lib directory you'll be OK. Or your code is throwing an error
on load. Or
something like that...

There is jar - apache-solr-core-1.4.1.jar this has the
BaseTokenFilterFacotry class and the Synonymfilterfactory class..I made the
changes in second class file and created it as new. Now i created a jar of
that java file and placed this in solr home/lib and also placed
apache-solr-core-1.4.1.jar file in lib folder of solr home.  [solr home -
c:\orch\search\solr  lib path - c:\orch\search\solr\lib]

3 to try to understand what's up, I'd back up a step. Make a really
stupid class
that doesn't do anything except derive from BaseTokenFilterFacotry and see
if
you can load that. If you can, then your process is OK and you need to
find out what classes your new filter depend on. If you still can't, then we
can
see what else we can come up with..


I am perhaps doing same. In the synonymfilterfactory class, there is a
function parse rules which takes delimiters as one of the input parameter.
Here i changed  comma ',' to '~' tilde symbol and  thats it.


Regards,
Rajani


On Mon, Apr 25, 2011 at 6:26 PM, Erick Erickson erickerick...@gmail.comwrote:

 Looking at things more carefully, it may be one of your dependent classes
 that's not being found.

 A couple of things to try.

 1 when you do a 'jar -tfv yourjar, you should see
 output like:
  1183 Sun Jun 06 01:31:14 EDT 2010
 org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class
 and your filter statement may need the whole path, in this example...
 filter class=org.apache.lucene.analysis.sinks.TokenTypeSink/ (note,
 this
 is just an example of the pathing, this class has nothing to do with
 your filter)...

 2 But I'm guessing your path is actually OK, because I'd expect to be
 seeing a
 class not found error. So my guess is that your class depends on
 other jars that
 aren't packaged up in your jar and if you find which ones they are and copy
 them
 to your lib directory you'll be OK. Or your code is throwing an error
 on load. Or
 something like that...

 3 to try to understand what's up, I'd back up a step. Make a really
 stupid class
 that doesn't do anything except derive from BaseTokenFilterFacotry and see
 if
 you can load that. If you can, then your process is OK and you need to
 find out what classes your new filter depend on. If you still can't, then
 we can
 see what else we can come up with..

 Best
 Erick

 On Mon, Apr 25, 2011 at 2:34 AM, rajini maski rajinima...@gmail.com
 wrote:
  Erick ,
  *
  *
  * Thanks.* It was actually a copy mistake. Anyways i did a redo of all
 the
  below mentioned steps. I had given class name as
  filter class=pointcross.orchSynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
 
  I did it again now following few different steps following this link :
 
 http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm
 
 
  1 ) Created new package in src folder .
 *org.apache.pointcross.synonym*.This
  is having class Synonym.java
 
  2) Now did a right click on same package and selected export option-Java
  tab-JAR File-Selected the path for package - finish
 
  3) This created jar file in specified location. Now followed in cmd  ,
 jar
  tfv
  org.apache.pointcross.synonym. the following was desc in cmd.
 
  :\Apps\Rajani Eclipse\Solr141_jarjar -
  tfv org.apache.pointcross.synonym.Synonym.jar
   25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF
   383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project
   2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath
   1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc
 
  4) Now placed same jar file in solr home/lib folder .Solrconfig.xml
   enabled lib dir=./lib / and in schema  filter
 class=synonym.Synonym
  synonyms=synonyms.txt ignoreCase=true expand=true/
 
  5) Restart tomcat : http://localhost:8097/finding1
 
  Error SEVERE: org.apache.solr.common.SolrException: Error loading class
  'pointcross.synonym.Synonym'
  at