Re: LSA Implementation

2007-11-28 Thread Eswar K
Lance,

It does cover European languages, but pretty much nothing on Asian languages
(CJK).

- Eswar

On Nov 28, 2007 1:51 AM, Norskog, Lance [EMAIL PROTECTED] wrote:

 WordNet itself is English-only. There are various ontology projects for
 it.

 http://www.globalwordnet.org/ is a separate world language database
 project. I found it at the bottom of the WordNet wikipedia page. Thanks
 for starting me on the search!

 Lance

 -Original Message-
 From: Eswar K [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 26, 2007 6:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: LSA Implementation

 The languages also include CJK :) among others.

 - Eswar

 On Nov 27, 2007 8:16 AM, Norskog, Lance [EMAIL PROTECTED] wrote:

  The WordNet project at Princeton (USA) is a large database of
 synonyms.
  If you're only working in English this might be useful instead of
  running your own analyses.
 
  http://en.wikipedia.org/wiki/WordNet
  http://wordnet.princeton.edu/
 
  Lance
 
  -Original Message-
  From: Eswar K [mailto:[EMAIL PROTECTED]
  Sent: Monday, November 26, 2007 6:34 PM
  To: solr-user@lucene.apache.org
  Subject: Re: LSA Implementation
 
  In addition to recording which keywords a document contains, the
  method examines the document collection as a whole, to see which other

  documents contain some of those same words. this algo should consider
  documents that have many words in common to be semantically close, and

  ones with few words in common to be semantically distant. This simple
  method correlates surprisingly well with how a human being, looking at

  content, might classify a document collection. Although the algorithm
  doesn't understand anything about what the words *mean*, the patterns
  it notices can make it seem astonishingly intelligent.
 
  When you search an such  an index, the search engine looks at
  similarity values it has calculated for every content word, and
  returns the documents that it thinks best fit the query. Because two
  documents may be semantically very close even if they do not share a
  particular keyword,
 
  Where a plain keyword search will fail if there is no exact match,
  this algo will often return relevant documents that don't contain the
  keyword at all.
 
  - Eswar
 
  On Nov 27, 2007 7:51 AM, Marvin Humphrey [EMAIL PROTECTED]
 wrote:
 
  
   On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
  
We essentially are looking at having an implementation for doing
search which can return documents having conceptually similar
words without necessarily having the original word searched for.
  
   Very challenging.  Say someone searches for LSA and hits an
   archived
 
   version of the mail you sent to this list.  LSA is a reasonably
   discriminating term.  But so is Eswar.
  
   If you knew that the original term was LSA, then you might look
   for documents near it in term vector space.  But if you don't know
   the original term, only the content of the document, how do you know

   whether you should look for docs near lsa or eswar?
  
   Marvin Humphrey
   Rectangular Research
   http://www.rectangular.com/
  
  
  
 



Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-28 Thread Siegfried Goeschl

Hi Noberto,

JAMon is all about aggregating statistical data and displaying the 
information for a web browser - the main beauty is that it is easy to 
define what you are monitoring such as querying domain objects per customer.


Cheers,

Siegfried Goeschl

Norberto Meijome wrote:

On Tue, 27 Nov 2007 18:18:16 +0100
Siegfried Goeschl [EMAIL PROTECTED] wrote:

  

Hi folks,

working on a closed source project for an IP concerned company is not 
always fun ... we combined SOLR with JAMon 
(http://jamonapi.sourceforge.net/) to keep an eye of the query times and 
this might be of general interest


+) JAMon comes with a ready-to-use ServletFilter
+) we extended this implementation to keep track for queries issued by a 
customer and the requested domain objects, e.g. artist, album, track
+) this allows us to keep track of the execution times and their 
distribution to find quickly long running queries without having access 
to the access.log from a web browser
+) a small presentation can be found at 
http://people.apache.org/~sgoeschl/presentations/jamon-20070717.pdf

+) if it is of general I can rewrite the code as contribution



Thanks Siegfried,

I am further interested in  plugging this information into something like Nagios , Cacti , Zenoss , bigsister , Openview or your monitoring system of choice, but I haven't had much time to look into this yet. How does JAMon compare to JMX ( http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/) ? 


cheers,
B

_
{Beto|Norberto|Numard} Meijome

There are no stupid questions, but there are a LOT of inquisitive idiots.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


  


SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Daniel Alheiros
Hi

I experienced a very unpleasant problem recently, when my search indexing
adaptor was changed to add some new fields. The problem is my schema didn't
follow those changes (new fields added), and after that SOLR was silently
ignoring all documents I sent.

Neither SOLR Java client or SOLR server returned me an error code or log
message. In the server side, nothing was logged and the client received a
standard success return.

Why didn't my documents got indexed and this new fields were just ignored?
That is what I think it was supposed to do.

Please let me know your thoughts.

Regards,
Daniel 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Memory use with sorting problem

2007-11-28 Thread Chris Laux
Just wanted to add the solution to this problem, in case someone finds
the matching description in the archives (see below).

By reducing the granularity of the timestamp field (stored as slong)
from seconds to minutes the number of unique values was reduced by an
order of magnitude (there are about 500.000 minutes in a year) and hence
the memory use was also reduced.

Chris


Chris Laux wrote:
 Hi again,
 
 in the meantime I discovered the use of jmap (I'm not a Java programmer)
 and found that all the memory was being used up by String and char[]
 objects.
 
 The Lucene docs have the following to say on sorting memory use:
 
 For String fields, the cache is larger: in addition to the above
 array, the value of every term in the field is kept in memory. If there
 are many unique terms in the field, this could be quite large.
 
 (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Sort.html)
 
 I am sorting on the slong schema type, which is of course stored as a
 string. The above quote seems to indicate that it is possible for a
 field not to be a string for the purposes of the sort, while I took it
 from LiA that everything is a string to Lucene.
 
 What can I do to make sure the additional memory is not used by every
 unique term? i.e. how to have the slong not be a String field?
 
 Cheers,
 Chris
 
 
 Chris Laux wrote:
 Hi all,

 I've been struggling with this problem for over a month now, and
 although memory issues have been discussed often, I don't seem to be
 able to find a fitting solution.

 The index is merely 1.5 GB large, but memory use quickly fills out the
 heap max of 1 GB on a 2 GB machine. This then works fine until
 auto-warming starts. Switching the latter off altogether is unattractive
 as it leads to response times of up to 30 s. When auto-warming starts, I
 get this error:

 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.QueryResultKey
 @e0b93139:java.lang.OutOfMemoryError: Java heap space

 Now when I reduce the size of caches (to a fraction of the default
 settings) and number of warming Searchers (to 2), memory use is not
 reduced and the problem stays. Only deactivating auto-warming will help.
 When I set the heap size limit higher (and go into swap space), all the
 extra memory seems to be used up right away, independently from
 auto-warming.

 This all seems to be closely connected to sorting by a numerical field,
 as switching this off does make memory use a lot more friendly.

 Is it normal to need that much Memory for such a small index?

 I suspect the problem is in Lucene, would it be better to post on their
 list?

 Does anyone know a better way of getting the sorting done?

 Thanks in advance for your help,

 Chris


 This is the field setup in schema.xml:

 field name=id type=long stored=true required=true
 multiValued=false /
 field name=user-id type=long stored=true required=true
 multiValued=false /
 field name=text type=text indexed=true multiValued=false /
 field name=created type=slong indexed=true multiValued=false /

 And this is a sample query:

 select/?q=solrstart=0rows=20sort=created+desc


 



Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Ravish Bhagdev
Yup, I do remember that happening to me before.

Is this intentionally so?

Ravish

On Nov 28, 2007 1:41 PM, Daniel Alheiros [EMAIL PROTECTED] wrote:
 Hi

 I experienced a very unpleasant problem recently, when my search indexing
 adaptor was changed to add some new fields. The problem is my schema didn't
 follow those changes (new fields added), and after that SOLR was silently
 ignoring all documents I sent.

 Neither SOLR Java client or SOLR server returned me an error code or log
 message. In the server side, nothing was logged and the client received a
 standard success return.

 Why didn't my documents got indexed and this new fields were just ignored?
 That is what I think it was supposed to do.

 Please let me know your thoughts.

 Regards,
 Daniel


 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain personal 
 views which are not the views of the BBC unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in reliance 
 on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.




Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Erik Hatcher


On Nov 28, 2007, at 8:41 AM, Daniel Alheiros wrote:
I experienced a very unpleasant problem recently, when my search  
indexing
adaptor was changed to add some new fields. The problem is my  
schema didn't
follow those changes (new fields added), and after that SOLR was  
silently

ignoring all documents I sent.


Is your schema perhaps configured to ignore undefined fields?

Erik



Re: CJK Analyzers for Solr

2007-11-28 Thread Walter Underwood
With Ultraseek, we switched to a dictionary-based segmenter for Chinese
because the N-gram highlighting wasn't acceptable to our Chinese customers.

I guess it is something to check for each application.

wunder

On 11/27/07 10:46 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 For what it's worth I worked on indexing and searching a *massive* pile of
 data, a good portion of which was in CJ and some K.  The n-gram approach was
 used for all 3 languages and the quality of search results, including
 highlighting was evaluated and okay-ed by native speakers of these languages.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: Walter Underwood [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, November 27, 2007 2:41:38 PM
 Subject: Re: CJK Analyzers for Solr
 
 Dictionaries are surprisingly expensive to build and maintain and
 bi-gram is surprisingly effective for Chinese. See this paper:
 
http://citeseer.ist.psu.edu/kwok97comparing.html
 
 I expect that n-gram indexing would be less effective for Japanese
 because it is an inflected language. Korean is even harder. It might
 work to break Korean into the phonetic subparts and use n-gram on
 those.
 
 You should not do term highlighting with any of the n-gram methods.
 The relevance can be very good, but the highlighting just looks dumb.
 
 wunder
 
 On 11/27/07 8:54 AM, Eswar K [EMAIL PROTECTED] wrote:
 
 Is there any specific reason why the CJK analyzers in Solr were
  chosen to be
 n-gram based instead of it being a morphological analyzer which is
  kind of
 implemented in Google as it considered to be more effective than the
  n-gram
 ones?
 
 Regards,
 Eswar
 
 
 
 On Nov 27, 2007 7:57 AM, Eswar K [EMAIL PROTECTED] wrote:
 
 thanks james...
 
 How much time does it take to index 18m docs?
 
 - Eswar
 
 
 On Nov 27, 2007 7:43 AM, James liu [EMAIL PROTECTED]  wrote:
 
 i not use HYLANDA analyzer.
 
 i use je-analyzer and indexing at least 18m docs.
 
 i m sorry i only use chinese analyzer.
 
 
 On Nov 27, 2007 10:01 AM, Eswar K [EMAIL PROTECTED] wrote:
 
 What is the performance of these CJK analyzers (one in lucene and
 hylanda
 )?
 We would potentially be indexing millions of documents.
 
 James,
 
 We would have a look at hylanda too. What abt japanese and korean
 analyzers,
 any recommendations?
 
 - Eswar
 
 On Nov 27, 2007 7:21 AM, James liu [EMAIL PROTECTED]
  wrote:
 
 I don't think NGram is good method for Chinese.
 
 CJKAnalyzer of Lucene is 2-Gram.
 
 Eswar K:
  if it is chinese analyzer,,i recommend
  hylanda(www.hylanda.com),,,it
 is
 the best chinese analyzer and it not free.
  if u wanna free chinese analyzer, maybe u can try je-analyzer.
  it
 have
 some problem when using it.
 
 
 
 On Nov 27, 2007 5:56 AM, Otis Gospodnetic 
 [EMAIL PROTECTED]
 wrote:
 
 Eswar,
 
 We've uses the NGram stuff that exists in Lucene's
 contrib/analyzers
 instead of CJK.  Doesn't that allow you to do everything that
  the
 Chinese
 and CJK analyzers do?  It's been a few months since I've looked
  at
 Chinese
 and CJK Analzyers, so I could be off.
 
 Otis
 
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: Eswar K [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, November 26, 2007 8:30:52 AM
 Subject: CJK Analyzers for Solr
 
 Hi,
 
 Does Solr come with Language analyzers for CJK? If not, can you
 please
 direct me to some good CJK analyzers?
 
 Regards,
 Eswar
 
 
 
 
 
 
 --
 regards
 jl
 
 
 
 
 
 --
 regards
 jl
 
 
 
 
 
 
 



query parsing wildcards

2007-11-28 Thread Charles Hornberger
I'm confused by some behavior I'm seeing in Solr (i'm using 1.2.0). I
have a field named description, declared with the following
fieldType:

fieldType name=textTightUnstemmed class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

The problem I'm having is that when I search for description:deck*, I
get the results I expect; when I search for description:Deck*, I get
nothing. I want both queries to return the same result set. (I'm using
the standard request handler.)

Interestingly, when I search for description:Deck from the web
interface, the debug output shows that the query term is converted to
lowercase:

str name=rawquerystringdescription:Deck/str
str name=querystringdescription:Deck/str
str name=parsedquerydescription:deck/str
str name=parsedquery_toStringdescription:deck/str

... but when I search for description:Deck*, it shows that it is not:

str name=rawquerystringdescription:Deck*/str
str name=querystringdescription:Deck*/str
str name=parsedquerydescription:Deck*/str
str name=parsedquery_toStringdescription:Deck*/str

What am I doing wrong here?

Also, when I use the Field Analysis tool for description:Deck*, it
shows the following (sorry for the bad copy/paste):

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text   Deck*
term type   word
source start,end0,5
org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
expand=false, ignoreCase=true}
term position   1
term text   Deck*
term type   word
source start,end0,5
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position   1
term text   Deck*
term type   word
source start,end0,5
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=0, catenateWords=1, generateWordParts=0,
catenateAll=0, catenateNumbers=1}
term position   1
term text   Deck
term type   word
source start,end0,4
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1
term text   deck
term type   word
source start,end0,4
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1
term text   deck
term type   word
source start,end0,4

Thanks,
Charlie


Re: SOLR / Tomcat JNDI Settings

2007-11-28 Thread vis

Thanks a lot Hossman; this solved it for me.

Essential for me was to understand that I had to create a solr.xml file in
tomcat_home\conf\Catalina\localhost
see hereunder in the quote an example.

The docbase should point to the .war file somewhere on my system.
The value-attribute for the Environment name=solr/home value=    
/ should point to a directory where tomcat can create the lucene/solr index
files.  That home directory should also contain the conf directory from the
example in the solr distribution.

And that was it.
 

hossman wrote:
 
 !--
 
 An example of declaring a specific tomcat context file that
 points at our solr.war (anywhere we want it) and a Solr Home
 directory (any where we want it) using JNDI.
 
 We could have multiple context files like this, with different
 names (and different Solr Home settings) to support multiple
 indexes on one box.
 
 --
 Context
   docBase=/var/tmp/ac-demo/apache-solr-1.2.0/dist/apache-solr-1.2.0.war
   debug=0
   crossContext=true 
 
   Environment name=solr/home
value=/var/tmp/ac-demo/books-solr-home/
type=java.lang.String
override=true /
 /Context
 

-- 
View this message in context: 
http://www.nabble.com/Tomcat-JNDI-Settings-tf4753435.html#a14001375
Sent from the Solr - User mailing list archive at Nabble.com.



Re: query parsing wildcards

2007-11-28 Thread Charles Hornberger
I should have Googled better. It seems that my question has been asked
and answered already, and not just once:

  http://www.nabble.com/Using-wildcard-with-accented-words-tf4673239.html
  
http://groups.google.com/group/acts_as_solr/browse_thread/thread/42920dc2dcc5fa88

On Nov 28, 2007 9:42 AM, Charles Hornberger
[EMAIL PROTECTED] wrote:
 I'm confused by some behavior I'm seeing in Solr (i'm using 1.2.0). I
 have a field named description, declared with the following
 fieldType:

 fieldType name=textTightUnstemmed class=solr.TextField
 positionIncrementGap=100 
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 The problem I'm having is that when I search for description:deck*, I
 get the results I expect; when I search for description:Deck*, I get
 nothing. I want both queries to return the same result set. (I'm using
 the standard request handler.)

 Interestingly, when I search for description:Deck from the web
 interface, the debug output shows that the query term is converted to
 lowercase:

 str name=rawquerystringdescription:Deck/str
 str name=querystringdescription:Deck/str
 str name=parsedquerydescription:deck/str
 str name=parsedquery_toStringdescription:deck/str

 ... but when I search for description:Deck*, it shows that it is not:

 str name=rawquerystringdescription:Deck*/str
 str name=querystringdescription:Deck*/str
 str name=parsedquerydescription:Deck*/str
 str name=parsedquery_toStringdescription:Deck*/str

 What am I doing wrong here?

 Also, when I use the Field Analysis tool for description:Deck*, it
 shows the following (sorry for the bad copy/paste):

 Query Analyzer
 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
 expand=false, ignoreCase=true}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
 ignoreCase=true}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.WordDelimiterFilterFactory
 {generateNumberParts=0, catenateWords=1, generateWordParts=0,
 catenateAll=0, catenateNumbers=1}
 term position   1
 term text   Deck
 term type   word
 source start,end0,4
 org.apache.solr.analysis.LowerCaseFilterFactory {}
 term position   1
 term text   deck
 term type   word
 source start,end0,4
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
 term position   1
 term text   deck
 term type   word
 source start,end0,4

 Thanks,
 Charlie



Re: SOLR 1.2 - Updates sent containing fields that are not on the Schema fail silently

2007-11-28 Thread Chris Hostetter

: I didn't know that trick.

erik is refering to this in the example schema.xml...

   !-- uncomment the following to ignore any fields that don't already match 
an existing 
field name or dynamic field, rather than reporting them as an error. 
alternately, change the type=ignored to some other type e.g. text 
if you want 
unknown fields indexed and/or stored by default -- 
   !--dynamicField name=* type=ignored /--

...but it sounds like you are having some other problem ... you said that 
when you POST your documents with extra fields you get a 200 
response but the documents aren't getting indexed at all correct?

that is not suppose to happen, Solr should be generating an error.  can 
you give us more info on your setup: what does your schema.xml look like, 
what does your update code look like (you said you were using SolrJ i 
believe?) what does Solr log when these updates happen, etc...



-Hoss



Re: query parsing wildcards

2007-11-28 Thread Chris Hostetter

: I should have Googled better. It seems that my question has been asked
: and answered already, and not just once:

right, wildcard and prefix queries aren't analyzed by the query 
parser (there's more on the why of this in the Lucene-Java FAQ).

To clarify one other part of your question

:  Also, when I use the Field Analysis tool for description:Deck*, it
:  shows the following (sorry for the bad copy/paste):

the analysis tool only shows you the analysis portion of 
indexing/querying ... it knows nothing about which query parser you are 
using, so it doesn't know anything about any special query parser 
characters (like *).  The output it gave you shows you want the 
standard request handler would have done if you'd used the standard 
request handler to search for...
 description:Deck*
or:  description:Deck\*

(where the * character is 'escaped')



-Hoss



RequestHandler shared resources

2007-11-28 Thread Grant Ingersoll
I have an object that I would like to share between two or more  
RequestHandlers.  One request handler will be responsible for the  
object and the other I would like to handle information requests about  
what the object is doing.  Thus, I need to share the object between  
the handlers.  Short of using a static, does anyone have any  
recommended way of doing this?  In a pure servlet, I could use the  
ServletContext.  Or am I missing something?


Thanks,
Grant


Re: RequestHandler shared resources

2007-11-28 Thread Ryan McKinley

Grant Ingersoll wrote:
I have an object that I would like to share between two or more 
RequestHandlers.  One request handler will be responsible for the object 
and the other I would like to handle information requests about what the 
object is doing.  Thus, I need to share the object between the 
handlers.  Short of using a static, does anyone have any recommended way 
of doing this?  In a pure servlet, I could use the ServletContext.  Or 
am I missing something?




RequestHandlers can know about each other by asking SolrCore

core.getRequestHandler( myhandler )

If you are using 1.3-dev, make the RequestHandler implement 
SolrCoreAware and then inform( SolrCore ) will be called *after* 
everything is initialized.


is that what you need?

ryan


Re: RequestHandler shared resources

2007-11-28 Thread Grant Ingersoll
Yeah, I think that would work.  Actually, I should be able to get all  
the request handlers and then look for instances of the req handlers  
that I need.


Thanks!

-Grant

On Nov 28, 2007, at 4:42 PM, Ryan McKinley wrote:


Grant Ingersoll wrote:
I have an object that I would like to share between two or more  
RequestHandlers.  One request handler will be responsible for the  
object and the other I would like to handle information requests  
about what the object is doing.  Thus, I need to share the object  
between the handlers.  Short of using a static, does anyone have  
any recommended way of doing this?  In a pure servlet, I could use  
the ServletContext.  Or am I missing something?


RequestHandlers can know about each other by asking SolrCore

   core.getRequestHandler( myhandler )

If you are using 1.3-dev, make the RequestHandler implement  
SolrCoreAware and then inform( SolrCore ) will be called *after*  
everything is initialized.


is that what you need?

ryan





Re: RequestHandler shared resources

2007-11-28 Thread Chris Hostetter
: Yeah, I think that would work.  Actually, I should be able to get all the
: request handlers and then look for instances of the req handlers that I need.

or configure reqHandler B with the name of reqHandler A that owns the 
resource so it knows who to ask.


-Hoss



LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Rob Casson
think i'm just doing something wrong...

was experimenting with the spellcheck handler with the nightly
checkout from 11-28; seems my spellchecking is case-sensitive, even
tho i think i'm adding the LowerCaseFilterFactory to both the index
and query analyzers.

here's a brief rundown of my testing steps.

from schema.xml:

fieldtype name=spell class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldtype

field name=title type=text indexed=true stored=true
multiValued=true/
field name=spelling type=spell indexed=true stored=stored
multiValued=true/

copyField source=title dest=spelling/



from solrconfig.xml:

requestHandler name=spellchecker
class=solr.SpellCheckerRequestHandler startup=lazy
lst name=defaults
int name=suggestionCount1/int
float name=accuracy0.5/float
/lst
str name=spellcheckerIndexDirspell/str
str name=termSourceFieldspelling/str
/requestHandler



adding the doc:

curl http://localhost:8983/solr/update -H Content-Type: text/xml
--data-binary 'adddocfield
name=titleThorne/field/doc/add'
curl http://localhost:8983/solr/update -H Content-Type: text/xml
--data-binary 'optimize /'



building the spellchecker:

http://localhost:8983/solr/select/?q=Thorneqt=spellcheckercmd=rebuild



querying the spellchecker:

results from http://localhost:8983/solr/select/?q=Thorneqt=spellchecker

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
str name=wordsThorne/str
str name=existfalse/str
arr name=suggestions
strthorne/str
/arr
/response

results from http://localhost:8983/solr/select/?q=thorneqt=spellchecker

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime2/int
/lst
str name=wordsthorne/str
str name=existtrue/str
arr name=suggestions/
/response


any pointers as to what i'm doing wrong, misinterpreting?  i suspect
i'm just doing something bone-headed in the analyzer sections...

thanks as always,

rob casson
miami university libraries


Re: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Rob Casson
lance,

thanks for the quick replylooks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'

i could certainly just lowercase in my client, but just confirming
that i'm not just screwing it up in the firstplace :)

thanks again,
rc

On Nov 28, 2007 8:11 PM, Norskog, Lance [EMAIL PROTECTED] wrote:
 There are a few parameters for limiting what words are added to the
 dictionary.  You might be trimming out 'thorne'. See this page:

 http://wiki.apache.org/solr/SpellCheckerRequestHandler


 -Original Message-
 From: Rob Casson [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 28, 2007 4:25 PM
 To: solr-user@lucene.apache.org
 Subject: LowerCaseFilterFactory and spellchecker

 think i'm just doing something wrong...

 was experimenting with the spellcheck handler with the nightly checkout
 from 11-28; seems my spellchecking is case-sensitive, even tho i think
 i'm adding the LowerCaseFilterFactory to both the index and query
 analyzers.

 here's a brief rundown of my testing steps.

 from schema.xml:

 fieldtype name=spell class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldtype

 field name=title type=text indexed=true stored=true
 multiValued=true/
 field name=spelling type=spell indexed=true stored=stored
 multiValued=true/

 copyField source=title dest=spelling/

 

 from solrconfig.xml:

 requestHandler name=spellchecker
 class=solr.SpellCheckerRequestHandler startup=lazy
 lst name=defaults
 int name=suggestionCount1/int
 float name=accuracy0.5/float
 /lst
 str name=spellcheckerIndexDirspell/str
 str name=termSourceFieldspelling/str
 /requestHandler

 

 adding the doc:

 curl http://localhost:8983/solr/update -H Content-Type: text/xml
 --data-binary 'adddocfield
 name=titleThorne/field/doc/add'
 curl http://localhost:8983/solr/update -H Content-Type: text/xml
 --data-binary 'optimize /'

 

 building the spellchecker:

 http://localhost:8983/solr/select/?q=Thorneqt=spellcheckercmd=rebuild

 

 querying the spellchecker:

 results from http://localhost:8983/solr/select/?q=Thorneqt=spellchecker

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 /lst
 str name=wordsThorne/str
 str name=existfalse/str
 arr name=suggestions
 strthorne/str
 /arr
 /response

 results from http://localhost:8983/solr/select/?q=thorneqt=spellchecker

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime2/int
 /lst
 str name=wordsthorne/str
 str name=existtrue/str
 arr name=suggestions/
 /response


 any pointers as to what i'm doing wrong, misinterpreting?  i suspect i'm
 just doing something bone-headed in the analyzer sections...

 thanks as always,

 rob casson
 miami university libraries



RE: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Norskog, Lance
Oops, sorry, didn't think that through.

The query to the spellchecker is not filtered through the field query
definition. You have to do your own lower-case transformation when you
do the query.  This is a simple thing to resolve. But, I'm working with
international alphabets and I would like 'protege' and 'protege with
both e's accented` to match. The ISOLatin1 filter does this in indexing
 querying. But I have to rip off the code and use it in my app to
preprocess words for spell-checks.

Lance

-Original Message-
From: Rob Casson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 28, 2007 5:16 PM
To: solr-user@lucene.apache.org
Subject: Re: LowerCaseFilterFactory and spellchecker

lance,

thanks for the quick replylooks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'

i could certainly just lowercase in my client, but just confirming that
i'm not just screwing it up in the firstplace :)

thanks again,
rc

On Nov 28, 2007 8:11 PM, Norskog, Lance [EMAIL PROTECTED] wrote:
 There are a few parameters for limiting what words are added to the 
 dictionary.  You might be trimming out 'thorne'. See this page:

 http://wiki.apache.org/solr/SpellCheckerRequestHandler


 -Original Message-
 From: Rob Casson [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 28, 2007 4:25 PM
 To: solr-user@lucene.apache.org
 Subject: LowerCaseFilterFactory and spellchecker

 think i'm just doing something wrong...

 was experimenting with the spellcheck handler with the nightly 
 checkout from 11-28; seems my spellchecking is case-sensitive, even 
 tho i think i'm adding the LowerCaseFilterFactory to both the index 
 and query analyzers.

 here's a brief rundown of my testing steps.

 from schema.xml:

 fieldtype name=spell class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldtype

 field name=title type=text indexed=true stored=true
 multiValued=true/
 field name=spelling type=spell indexed=true stored=stored
 multiValued=true/

 copyField source=title dest=spelling/

 

 from solrconfig.xml:

 requestHandler name=spellchecker
 class=solr.SpellCheckerRequestHandler startup=lazy
 lst name=defaults
 int name=suggestionCount1/int
 float name=accuracy0.5/float
 /lst
 str name=spellcheckerIndexDirspell/str
 str name=termSourceFieldspelling/str
 /requestHandler

 

 adding the doc:

 curl http://localhost:8983/solr/update -H Content-Type: text/xml
 --data-binary 'adddocfield
 name=titleThorne/field/doc/add'
 curl http://localhost:8983/solr/update -H Content-Type: text/xml
 --data-binary 'optimize /'

 

 building the spellchecker:

 http://localhost:8983/solr/select/?q=Thorneqt=spellcheckercmd=rebuil
 d

 

 querying the spellchecker:

 results from 
 http://localhost:8983/solr/select/?q=Thorneqt=spellchecker

 ?xml version=1.0 encoding=UTF-8? response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 /lst
 str name=wordsThorne/str
 str name=existfalse/str
 arr name=suggestions
 strthorne/str
 /arr
 /response

 results from 
 http://localhost:8983/solr/select/?q=thorneqt=spellchecker

 ?xml version=1.0 encoding=UTF-8? response
 lst name=responseHeader
 int name=status0/int
 int name=QTime2/int
 /lst
 str name=wordsthorne/str
 str name=existtrue/str
 arr name=suggestions/
 /response


 any pointers as to what i'm doing wrong, misinterpreting?  i suspect
i'm
 just doing something bone-headed in the analyzer sections...

 thanks as always,

 rob casson
 miami university libraries



RE: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread Norskog, Lance
There are a few parameters for limiting what words are added to the
dictionary.  You might be trimming out 'thorne'. See this page:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

-Original Message-
From: Rob Casson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 28, 2007 4:25 PM
To: solr-user@lucene.apache.org
Subject: LowerCaseFilterFactory and spellchecker

think i'm just doing something wrong...

was experimenting with the spellcheck handler with the nightly checkout
from 11-28; seems my spellchecking is case-sensitive, even tho i think
i'm adding the LowerCaseFilterFactory to both the index and query
analyzers.

here's a brief rundown of my testing steps.

from schema.xml:

fieldtype name=spell class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter
class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldtype

field name=title type=text indexed=true stored=true
multiValued=true/
field name=spelling type=spell indexed=true stored=stored
multiValued=true/

copyField source=title dest=spelling/



from solrconfig.xml:

requestHandler name=spellchecker
class=solr.SpellCheckerRequestHandler startup=lazy
lst name=defaults
int name=suggestionCount1/int
float name=accuracy0.5/float
/lst
str name=spellcheckerIndexDirspell/str
str name=termSourceFieldspelling/str
/requestHandler



adding the doc:

curl http://localhost:8983/solr/update -H Content-Type: text/xml
--data-binary 'adddocfield
name=titleThorne/field/doc/add'
curl http://localhost:8983/solr/update -H Content-Type: text/xml
--data-binary 'optimize /'



building the spellchecker:

http://localhost:8983/solr/select/?q=Thorneqt=spellcheckercmd=rebuild



querying the spellchecker:

results from http://localhost:8983/solr/select/?q=Thorneqt=spellchecker

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
str name=wordsThorne/str
str name=existfalse/str
arr name=suggestions
strthorne/str
/arr
/response

results from http://localhost:8983/solr/select/?q=thorneqt=spellchecker

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime2/int
/lst
str name=wordsthorne/str
str name=existtrue/str
arr name=suggestions/
/response


any pointers as to what i'm doing wrong, misinterpreting?  i suspect i'm
just doing something bone-headed in the analyzer sections...

thanks as always,

rob casson
miami university libraries


Re: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread John Stewart
Rob,

Let's say it worked as you want it to in the first place.  If the
query is for Thurne, wouldn't you get thorne (lower-case 't') as the
suggestion?  This may look weird for proper names.

jds


Schema class configuration syntax

2007-11-28 Thread Norskog, Lance
Hi-
 
What is the filter element in an analyzer element that will load
this class:
 
org.apache.lucene.analysis.cn.ChineseFilter
 
This did not work:
 
 filter class=org.apache.lucene.analysis.cn.ChineseFilter /

This is in Solr 1.2.
 
Thanks,
 
Lance Norskog