why is query so slow

2009-03-17 Thread pcurila

hello,

I created index with 1.5m docs. When I am post query without facets it
returns in a moment.
When I post query with one facets it takes 14s.

lst name=responseHeader
int name=status0/int
int name=QTime14263/int
−
lst name=params
str name=facettrue/str
str name=indenton/str
str name=facet.mincount1/str
str name=start0/str
str name=qzamok
/str
str name=facet.limit-1/str
str name=facet.fieldwasCreatedBy_fct/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
result name=response numFound=16055 start=0


when I add filter that returns only one docs it takes same time


lst name=responseHeader
int name=status0/int
int name=QTime13249/int
−
lst name=params
str name=facettrue/str
str name=indenton/str
str name=facet.mincount1/str
str name=start0/str
str name=qzamok
/str
str name=facet.limit-1/str
str name=facet.fieldwasCreatedBy_fct/str
str name=fqwasCreatedBy_fct=Martin Benka/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
−
result name=response numFound=1 start=0

Why?
Can anybody explain me what am I doing wrong and how to speed up response
time.

Peter
-- 
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22554340.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: why is query so slow

2009-03-17 Thread Erik Hatcher
How many terms are in the wasCreatedBy_fct field?   How is that field  
and its type configured?


Solr 1.3?  Or trunk?  Trunk contains massive faceting speed  
improvements.


Erik



On Mar 17, 2009, at 4:21 AM, pcurila wrote:



hello,

I created index with 1.5m docs. When I am post query without facets it
returns in a moment.
When I post query with one facets it takes 14s.

lst name=responseHeader
int name=status0/int
int name=QTime14263/int
−
lst name=params
str name=facettrue/str
str name=indenton/str
str name=facet.mincount1/str
str name=start0/str
str name=qzamok
/str
str name=facet.limit-1/str
str name=facet.fieldwasCreatedBy_fct/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
result name=response numFound=16055 start=0


when I add filter that returns only one docs it takes same time


lst name=responseHeader
int name=status0/int
int name=QTime13249/int
−
lst name=params
str name=facettrue/str
str name=indenton/str
str name=facet.mincount1/str
str name=start0/str
str name=qzamok
/str
str name=facet.limit-1/str
str name=facet.fieldwasCreatedBy_fct/str
str name=fqwasCreatedBy_fct=Martin Benka/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
−
result name=response numFound=1 start=0

Why?
Can anybody explain me what am I doing wrong and how to speed up  
response

time.

Peter
--
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22554340.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: why is query so slow

2009-03-17 Thread Toby Cole

Peter,
	If possible try running a 1.4-snapshot of Solr, the faceting  
improvements are quite remarkable.
However, if you can't run unreleased code, it might be an idea to try  
reducing the number of unique terms (try indexing surnames only?).

Toby.

On 17 Mar 2009, at 10:01, pcurila wrote:



I am using 1.3


How many terms are in the wasCreatedBy_fct field?   How is that field
and its type configured?

field contains author names and there are lots of them.

here is type configuration:

fieldType name=facet class=solr.TextField  
positionIncrementGap=100

analyzer
tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
/fieldType

field name=wasCreatedBy_fct type=facet indexed=true  
stored=true

multiValued=true/




--
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html
Sent from the Solr - User mailing list archive at Nabble.com.



Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: why is query so slow

2009-03-17 Thread pcurila

I am using 1.3

 How many terms are in the wasCreatedBy_fct field?   How is that field  
 and its type configured? 
field contains author names and there are lots of them. 

here is type configuration:

fieldType name=facet class=solr.TextField positionIncrementGap=100
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
/fieldType

field name=wasCreatedBy_fct type=facet indexed=true stored=true
multiValued=true/




-- 
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html
Sent from the Solr - User mailing list archive at Nabble.com.



Lemmatisation search in Solr

2009-03-17 Thread dabboo

Hi,

I am implementing Lemmatisation in Solr, which means if user looks for
Mouse then it should display results of Mouse and Mice both. I understand
that this is something context search. I think of using synonym for this but
then synonyms.txt will be having so many records and this will keep on
adding.

Please suggest how I can implement it in some other way.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html
Sent from the Solr - User mailing list archive at Nabble.com.



Special Characters search in solr

2009-03-17 Thread dabboo

Hi,

I am searching with any query string, which contains special characters like
è in it. for e.g. If I search for tèst then it shud return all the results
which contains tèst and test etc. There are other special characters also.

I have updated my server.xml file of tomcat server and included UTF-8 as
encoding type in the server entry but still it is not working.

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
Sent from the Solr - User mailing list archive at Nabble.com.



stop word search

2009-03-17 Thread revas
Hi,

I have a query like this

content:the AND iuser_id:5

which means return all docs of user id 5 which have the word the in
content .Since 'the' is a stop word ,this query executes as just user_id :5
inspite of the AND clause ,Whereas the expected result here is since there
is no result for  the  ,no results shloud be returned.

Am i missing anythin here?

Regards


Re: Indexing the directory

2009-03-17 Thread Eric Pugh

Victor,

I'd recommend look at the tutorial at http://lucene.apache.org/solr/tutorial.html 
 and using the list for more specific questions.  Also, there a list  
of companies (as well as mine!) that do support of Solr at http://wiki.apache.org/solr/Support 
 that eTrade can contract with to provide indepth support.


Eric Pugh

On Mar 16, 2009, at 6:25 PM, Huang, Zijian(Victor) wrote:




Hi, all:
   I am new to SOLR, can anyone please tell me what do I do to index
a some text files in a local directory?

Thanks

Victor




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Re: Lemmatisation search in Solr

2009-03-17 Thread Grant Ingersoll
Have you looked for any open source lemmatizers?  I didn't find any in  
a quick search, but there probably are some out there.


Also, is there a particular reason you are after lemmatization instead  
of stemming?  Maybe a light stemmer plus synonyms might suffice?


On Mar 17, 2009, at 6:02 AM, dabboo wrote:



Hi,

I am implementing Lemmatisation in Solr, which means if user looks for
Mouse then it should display results of Mouse and Mice both. I  
understand
that this is something context search. I think of using synonym for  
this but

then synonyms.txt will be having so many records and this will keep on
adding.

Please suggest how I can implement it in some other way.

Thanks,
Amit Garg
--
View this message in context: 
http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Special Characters search in solr

2009-03-17 Thread Grant Ingersoll
You will need to create a field that handles the accents in order to  
do this.  Start by looking at the ISOLatin1AccentFilter.


-Grant

On Mar 17, 2009, at 7:31 AM, dabboo wrote:



Hi,

I am searching with any query string, which contains special  
characters like
è in it. for e.g. If I search for tèst then it shud return all the  
results
which contains tèst and test etc. There are other special characters  
also.


I have updated my server.xml file of tomcat server and included  
UTF-8 as

encoding type in the server entry but still it is not working.

Please suggest.

Thanks,
Amit Garg
--
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Lemmatisation search in Solr

2009-03-17 Thread dabboo

stemming and synonyms are working fine in the application but these are
working individually. I guess I will need to add the values in synomyms.txt
to achieve it. Am I right?

Actually its the project requirement to implement the lemmatisation. I also
looked out for lemmatisation but couldnt get any.

Thanks,
Amit



Grant Ingersoll-6 wrote:
 
 Have you looked for any open source lemmatizers?  I didn't find any in  
 a quick search, but there probably are some out there.
 
 Also, is there a particular reason you are after lemmatization instead  
 of stemming?  Maybe a light stemmer plus synonyms might suffice?
 
 On Mar 17, 2009, at 6:02 AM, dabboo wrote:
 

 Hi,

 I am implementing Lemmatisation in Solr, which means if user looks for
 Mouse then it should display results of Mouse and Mice both. I  
 understand
 that this is something context search. I think of using synonym for  
 this but
 then synonyms.txt will be having so many records and this will keep on
 adding.

 Please suggest how I can implement it in some other way.

 Thanks,
 Amit Garg
 -- 
 View this message in context:
 http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22555854.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://www.nabble.com/Lemmatisation-search-in-Solr-tp22555854p22558113.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Special Characters search in solr

2009-03-17 Thread dabboo

This is the entry in schema.xml

fieldType name=text class=solr.TextField positionIncrementGap=100
omitNorms=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!--tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory /--
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
analyzer class=org.apache.lucene.analysis.ru.RussianAnalyzer/
  
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ISOLatin1AccentFilterFactory/ 
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
!--analyzer
class=org.apache.lucene.analysis.ru.RussianAnalyzer/--
 filter class=solr.ShingleFilterFactory outputUnigrams=true
outputUnigramIfNoNgram=true maxShingleSize=99/

   
  /analyzer
/fieldType



dabboo wrote:
 
 I have added this filter factory in my schema.xml also but still that is
 not working. I am sorry but I didnt get as how to create the field to
 handle the accents.
 
 Please help.
 
 
 Grant Ingersoll-6 wrote:
 
 You will need to create a field that handles the accents in order to  
 do this.  Start by looking at the ISOLatin1AccentFilter.
 
 -Grant
 
 On Mar 17, 2009, at 7:31 AM, dabboo wrote:
 

 Hi,

 I am searching with any query string, which contains special  
 characters like
 è in it. for e.g. If I search for tèst then it shud return all the  
 results
 which contains tèst and test etc. There are other special characters  
 also.

 I have updated my server.xml file of tomcat server and included  
 UTF-8 as
 encoding type in the server entry but still it is not working.

 Please suggest.

 Thanks,
 Amit Garg
 -- 
 View this message in context:
 http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: stop word search

2009-03-17 Thread Erick Erickson
Well, by definition, using an analyzer that removes stopwords
*should* do this at query time. This assumes that you used
an analyzer that removed stopwords at index and query time.
The stopwords are not in the index.

You can get the behavior you expect by using an analyzer at
query time that does NOT remove stopwords, and one at
indexing time that *does* remove stopwords. Gut I'm having a
hard time imagining that this would result in a good user experience.

I mean anytime that you had a stopword in the query where the
stopword was required, no results would be returned. Which would
be hard to explain to a user

What is it you're trying to accomplish?

Best
Erick



On Tue, Mar 17, 2009 at 7:40 AM, revas revas...@gmail.com wrote:

 Hi,

 I have a query like this

 content:the AND iuser_id:5

 which means return all docs of user id 5 which have the word the in
 content .Since 'the' is a stop word ,this query executes as just user_id :5
 inspite of the AND clause ,Whereas the expected result here is since
 there
 is no result for  the  ,no results shloud be returned.

 Am i missing anythin here?

 Regards



Re: Special Characters search in solr

2009-03-17 Thread Erick Erickson
Did you reindex after you incorporated the ISOLatin... filter?

On Tue, Mar 17, 2009 at 8:40 AM, dabboo ag...@sapient.com wrote:


 This is the entry in schema.xml

fieldType name=text class=solr.TextField positionIncrementGap=100
 omitNorms=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!--tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory /--
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
analyzer class=org.apache.lucene.analysis.ru.RussianAnalyzer/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
!--analyzer
 class=org.apache.lucene.analysis.ru.RussianAnalyzer/--
 filter class=solr.ShingleFilterFactory outputUnigrams=true
 outputUnigramIfNoNgram=true maxShingleSize=99/


  /analyzer
/fieldType



 dabboo wrote:
 
  I have added this filter factory in my schema.xml also but still that is
  not working. I am sorry but I didnt get as how to create the field to
  handle the accents.
 
  Please help.
 
 
  Grant Ingersoll-6 wrote:
 
  You will need to create a field that handles the accents in order to
  do this.  Start by looking at the ISOLatin1AccentFilter.
 
  -Grant
 
  On Mar 17, 2009, at 7:31 AM, dabboo wrote:
 
 
  Hi,
 
  I am searching with any query string, which contains special
  characters like
  è in it. for e.g. If I search for tèst then it shud return all the
  results
  which contains tèst and test etc. There are other special characters
  also.
 
  I have updated my server.xml file of tomcat server and included
  UTF-8 as
  encoding type in the server entry but still it is not working.
 
  Please suggest.
 
  Thanks,
  Amit Garg
  --
  View this message in context:
 
 http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr search with Auto Spellchecker

2009-03-17 Thread Shyamsunder Reddy
I have the same question in mind. How can I configure the same standard request 
handler to handle the spell check for given query?
I mean instead of calling 
http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=globl for 
spelling checking the following query request
should take care of both querying and spell checking:
http://localhost:8983/solr/select?q=globl


--- On Wed, 3/11/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

From: Shalin Shekhar Mangar shalinman...@gmail.com
Subject: Re: Solr search with Auto Spellchecker
To: solr-user@lucene.apache.org
Date: Wednesday, March 11, 2009, 9:33 AM

On Wed, Mar 11, 2009 at 7:00 PM, Narayanan, Karthikeyan 
karthikeyan.naraya...@gs.com wrote:

 Is it possible get the search results from the spell corrected word in a
 single solr search query?.  Like I search for the word globl and the
 correct spelling is global.. The query should return results matching
 with the word global.  Would appreciate any ideas..


No, you'll need to make two queries.

-- 
Regards,
Shalin Shekhar Mangar.



  

Re: How to use spell checker

2009-03-17 Thread Shyamsunder Reddy
How can I configure the same standard request handler to handle the spell check 
for given query? I mean instead of calling 
http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=elepents for 
spelling checking the following query request
should take care of both querying and spell checking:
http://localhost:8983/solr/select?q=elepents

Thanks


--- On Tue, 3/3/09, Grant Ingersoll gsing...@apache.org wrote:

From: Grant Ingersoll gsing...@apache.org
Subject: Re: How to use spell checker
To: solr-user@lucene.apache.org
Date: Tuesday, March 3, 2009, 2:03 PM

See http://wiki.apache.org/solr/SpellCheckComponent


On Mar 3, 2009, at 1:23 AM, dabboo wrote:

 
 Hi,
 
 I am trying to implement the spell check feature in solr with lucene. for
 e.g. if any record contains elephants and user enters elepents, even
 then also, it should return the results with the correct spelling i.e.
 elephants.
 
 Please suggest.
 
 Thanks,
 Amit Garg
 --View this message in context: 
 http://www.nabble.com/How-to-use-spell-checker-tp22303127p22303127.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search




  

spellchecker: returning results even with misspelt words

2009-03-17 Thread Ingo Renner

Hi all,

I'd like to achieve the following:

When searching for e.g. two words, one of them being spelt correctly  
the other one misspelt I'd like to receive results for the correct  
word but would still like to get spelling suggestions for the wrong  
word.


Currently when I search for misspelt words I get suggestions, but no  
results at all although there would be results when searching for the  
correct word only.


Hope you understand what I want to achieve as it's a little hard to  
explain.



all the best
Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2





Re: Solr search with Auto Spellchecker

2009-03-17 Thread Ingo Renner


Am 17.03.2009 um 14:39 schrieb Shyamsunder Reddy:

I have the same question in mind. How can I configure the same  
standard request handler to handle the spell check for given query?
I mean instead of calling http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=globl 
 for spelling checking the following query request

should take care of both querying and spell checking:
http://localhost:8983/solr/select?q=globl


in your solrconfig.xml add the spellchecker component to your standard  
request handler.


  requestHandler name=standard class=solr.SearchHandler  
default=true

!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !--
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
--
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
  /requestHandler


best
Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2





Solr: delta-import, help needed

2009-03-17 Thread Giovanni De Stefano
Hello all,

I have a table TEST in an Oracle DB with the following columns: URI
(varchar), CONTENT (varchar), CREATION_TIME (date).

The primary key both in the DB and Solr is URI.

Here is my data-config.xml:

dataConfig
  dataSource
driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:thin:@localhost:1521/XE
user=username
password=password
  /
  document name=Test
entity
name=test_item
pk=URI
query=select URI,CONTENT from TEST
*deltaQuery=select URI,CONTENT from TEST where
TO_CHAR(CREATION_TIME,'-MM-DD HH:MI:SS') 
'${dataimporter.last_index_time}' *

  field column=URI name=uri/
  field column=CONTENT name=content/
/entity
  /document
/dataConfig

The problem is that anytime I perform a delta-import, the index keeps being
populated as if new documents were added. In other words, I am not able to
UPDATE an existing document or REMOVE a document that is not anymore in the
DB.

What am I missing? How should I specify my deltaQuery?

Thanks a lot in advance!

Giovanni


Re: Is optimize always a commit?

2009-03-17 Thread sunnyfr

Hi 

If I want to commit without optimize.
Because Ive that :  start
commit(optimize=true,waitFlush=false,waitSearcher=true)
but I don't want to optimize otherwise my replication will take every time
the full index folder.

Thanks a lot guys for ur help,



ryantxu wrote:
 
 yes.  optimize also commits
 
 Maximilian Hütter wrote:
 Hi,
 
 maybe this is a stupid question, but is a optimize always a commit?
 In the log it looks like it:
 
 start commit(optimize=true,waitFlush=false,waitSearcher=true)
 
 I just wanted to be sure.
 
 Best regards,
 
 Max
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-optimize-always-a-commit--tp15498266p22562206.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellchecker: returning results even with misspelt words

2009-03-17 Thread Shyamsunder Reddy
I think if you use spellcheck.collate=true, you will still receive the
results for correct word and suggestion for wrong word.

I have name field (which is first name+last name) configured for spell
check. I have name entry: GUY  SHUMAKER. I am trying to find out person
names where either 'GUY' or 'SHUMAKER' or both are spelled wrong.

1. Last Name spelled wrong as 'SHAMAKER'

http://localhost:8090/solr/select?q=NAME:GUY%20SHAMAKERfq=TYPE:PERSONspellcheck=truespellcheck.collate=true

It return all results that match 'GUY' and spelling suggestion for
'SHAMAKER' as 'SHUMAKER'

2. First Name spelled wrong as 'GYY'

http://localhost:8090/solr/select?q=NAME:GYY
SHUMAKERfq=TYPE:PERSONspellcheck=truespellcheck.collate=true

It return No results and spelling suggestion for 'GYY' as 'GUY' and
collation as str name=collationNAME:guy SHUMAKER/str

Note:But here I expected result that match SHUAMKER

3. Both first name and last name spelled wrong as: GYY SHAMAKER
http://localhost:8090/solr/select?q=NAME:GYY%20SHAMAKERfq=TYPE:PERSONspellcheck=truespellcheck.collate=true

Here no results, but received suggestion for both words and collation.

It is similar to your scenario?

Also why NO results are returned for case 2.

--- On Tue, 3/17/09, Ingo Renner i...@typo3.org wrote:

From: Ingo Renner i...@typo3.org
Subject: spellchecker: returning results even with misspelt words
To: solr-user@lucene.apache.org
Date: Tuesday, March 17, 2009, 9:52 AM

Hi all,

I'd like to achieve the following:

When searching for e.g. two words, one of them being spelt correctly the other 
one misspelt I'd like to receive results for the correct word but would still 
like to get spelling suggestions for the wrong word.

Currently when I search for misspelt words I get suggestions, but no results at 
all although there would be results when searching for the correct word only.

Hope you understand what I want to achieve as it's a little hard to explain.


all the best
Ingo

--Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2






  

Re: stemming (maybe?) question

2009-03-17 Thread Jon Drukman

Yonik Seeley wrote:

Not sure... I just took the stock solr example, and it worked fine.

I inserted o'meara into example/exampledocs/solr.xml
 field name=featuresAdvanced o'meara Full-Text Search
Capabilities using Lucene/field

the indexed everything:  ./post.sh *.xml

Then queried in various ways:
q=o'meara
q=omeara
q=o%20meara

All of the queries found the solr doc.


i grabbed the original example schema.xml and made my username field use 
the following definition:


fieldType name=text_user class=solr.TextField 
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


i removed the stopwords and porter stuff because for proper names i 
don't want that.


seems to work fine now, thanks!
-jsd-



Re: optimize an index as fast as possible

2009-03-17 Thread Marc Sturlese

Thanks Mark, that really did the job! The speed loss in update time is more
than compensated at optimizing time!

Now I am trying to do another test... but not sure if Lucene have this
option, I am using Lucene 2.9-dev.

As I am working with 3G index and always have to optimize (as I said before,
I tried not to optimize to send my index via rsync faster but the speed loss
to serve request in the slaves was huge). I wander if it's possible to do
block optimizing (I have just invented the word). The example would be...
I have a 3G index optimized. I start executing updates to the index.. would
be possible to keep doing optimizes just on the new created segments?... so
I would still have the 3G index and would be building another big index from
the segments created from the updates This way I would have to send via
rsync to the slaves just the new blog (suposing the slaves already had the
3G index because I would have sended it before).
Is there any way to do something similar to that?
This has come to my mind cause I have to serve the index to the slaves as
many times as possible... and optimizing the index in just one block makes
rsync job to take a long time.

Thanks in advance


markrmiller wrote:
 
 Marc Sturlese wrote:
 Hey there,
 I am creating an index of 3G... it's fast indexing but optimization takes
 about 10 min. I need to optimize it every time I update as if I don't do
 that, search requests will be much slower.
 Wich parameter configuration would be the best to make optimize as fast
 as
 possible (I don't mind to use a lot of memory, at least for testing, if I
 can speed up the process).
 Actually I am using for the IndexWriter:

 ramBufferSizeMB1024/ramBufferSizeMB
 maxMergeDocs2147483647/maxMergeDocs
 maxFieldLength1/maxFieldLength
 writeLockTimeout1000/writeLockTimeout
 commitLockTimeout1/commitLockTimeout
 mergeFactor10/mergeFactor
 Am I missing any important parameter to do that job?
 Thanks in advance

   
 How about using a merge factor of 2? This way you are pretty much always 
 optimized (old large segment, new small segment at most) - you pay a bit 
 in update speed, but I've found it to be very reasonable for many 
 applications.
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/optimize-an-index-as-fast-as-possible-tp22543267p22565158.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Shard Query Problem

2009-03-17 Thread Chris Hostetter

: here is the whole file, if it helps

as i said before, i don't know much about the inner workings of 
distributed search, but nothing about your config seems odd to me.  it 
seems like it should work fine.

a wild the shot in the dark: instead of using a requestHandler named 
standard and urls that start with 
http://lca2-s5-pc04:8080/solr/select?; try registering a handler name 
starting with a slash (ie: requestHandler name=/foo) and then 
explicitly refer to that handler name in your URL 
(http://lca2-s5-pc04:8080/solr/foo?shards=...;)

This suggestion is based on the suposition that *maybe* the legacy support 
for /select and the qt param doesn't play nicely with distributed 
searching ... but as i said, this is really just a wild guess.


: ?xml version=1.0 encoding=UTF-8 ?
: !--
:  Licensed to the Apache Software Foundation (ASF) under one or more
:  contributor license agreements.  See the NOTICE file distributed with
:  this work for additional information regarding copyright ownership.
:  The ASF licenses this file to You under the Apache License, Version 2.0
:  (the License); you may not use this file except in compliance with
:  the License.  You may obtain a copy of the License at
: 
:  http://www.apache.org/licenses/LICENSE-2.0
: 
:  Unless required by applicable law or agreed to in writing, software
:  distributed under the License is distributed on an AS IS BASIS,
:  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
:  See the License for the specific language governing permissions and
:  limitations under the License.
: --
: 
: config
:   !-- Set this to 'false' if you want solr to continue working after it has
: 
:encountered an severe configuration error.  In a production
: environment,
:you may want solr to keep working even if one handler is
: mis-configured.
: 
:You may also set this to false using by setting the system property:
:  -Dsolr.abortOnConfigurationError=false
:  --
: 
: 
abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError
: 
:   !-- Used to specify an alternate directory to hold all index data
:other than the default ./data under the Solr home.
:If replication is in use, this should match the replication
: configuration. --
:   dataDir${solr.data.dir:./solr/data}/dataDir
: 
: 
:   indexDefaults
:!-- Values here affect all index writers and act as a default unless
: overridden. --
: useCompoundFilefalse/useCompoundFile
: 
: mergeFactor10/mergeFactor
: !--
:  If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will
: flush based on whichever limit is hit first.
: 
:  --
: !--maxBufferedDocs1000/maxBufferedDocs--
: !-- Tell Lucene when to flush documents to disk.
: Giving Lucene more memory for indexing means faster indexing at the cost
: of more RAM
: 
: If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will
: flush based on whichever limit is hit first.
: 
: --
: ramBufferSizeMB32/ramBufferSizeMB
: maxMergeDocs2147483647/maxMergeDocs
: maxFieldLength1/maxFieldLength
: writeLockTimeout1000/writeLockTimeout
: commitLockTimeout1/commitLockTimeout
: 
: !--
:  Expert: Turn on Lucene's auto commit capability.
:  This causes intermediate segment flushes to write a new lucene
:  index descriptor, enabling it to be opened by an external
:  IndexReader.
:  NOTE: Despite the name, this value does not have any relation to Solr's
: autoCommit functionality
:  --
: !--luceneAutoCommitfalse/luceneAutoCommit--
: !--
:  Expert:
:  The Merge Policy in Lucene controls how merging is handled by Lucene.
: The default in 2.3 is the LogByteSizeMergePolicy, previous
:  versions used LogDocMergePolicy.
: 
:  LogByteSizeMergePolicy chooses segments to merge based on their size.
: The Lucene 2.2 default, LogDocMergePolicy chose when
:  to merge based on number of documents
: 
:  Other implementations of MergePolicy must have a no-argument
: constructor
:  --
: 
: 
!--mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy--
: 
: !--
:  Expert:
:  The Merge Scheduler in Lucene controls how merges are performed.  The
: ConcurrentMergeScheduler (Lucene 2.3 default)
:   can perform merges in the background using separate threads.  The
: SerialMergeScheduler (Lucene 2.2 default) does not.
:  --
: 
: 
!--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler--
: 
: !--
:   This option specifies which Lucene LockFactory implementation to use.
: 
:   single = SingleInstanceLockFactory - suggested for a read-only index
:or when there is no possibility of another process trying
:to modify the index.
:   native = NativeFSLockFactory
:   simple = SimpleFSLockFactory
: 
:   (For backwards compatibility with Solr 1.2, 'simple' 

Solr SpellCheker configuration for multiple fields same time

2009-03-17 Thread Shyamsunder Reddy
My advanced search option allows users to search for three different fields 
same time.
The fields are - first name, last name and org name. Now I have to add spell 
checking feature for the fields.

When wrong spelling is entered for each of these words like first name: jahn, 
last name: smath, org and org name: bpple

the search result should return a suggestion like (collation) firstname:john 
AND lastname:smith AND orgname: apple


What is the best approach to implement spell checking for these three different 
fields:

1. Build a single directory for all fields by copying them into a 'spell' field 
as:
    schema.xml configuration
    !--Setup simple analysis for spell checking--
    fieldType name=textSpell class=solr.TextField 
positionIncrementGap=100 
  analyzer
    tokenizer class=solr.StandardTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
    /fieldType
    field name=FIRST_NAME type=text indexed=true stored=true/
    field name=LAST_NAME type=text indexed=true stored=true/
 field name=ORG_NAME type=text indexed=true stored=true 
required=true/
    field name=spell type=textSpell indexed=true stored=true 
multiValued=true/
    
    copyField source=FIRST_NAME dest=spell/
    copyField source=LAST_NAME dest=spell/
    copyField source=ORG_NAME dest=spell/
  
    solrconfig.xml configuration
    searchComponent name=spellcheck class=solr.SpellCheckComponent    
    str name=queryAnalyzerFieldTypetextSpell/str    
    lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=spellcheckIndexDir./spellchecker/str
    /lst
    /searchComponent

Now the queries:
1a. 
URL/select?q=FIRST_NAME:jahnLAST_NAME:smathORG_NAME:bpplespellcheck=true

The spell check searches against the dictionary './spllechecker' returns the 
suggestions as
FIRST_NAME:john, LAST_NAME:smath and ORG_NAME:apple. Works as expected.

1b. URL/select?q=LAST_NAME:jahnspellcheck=true
The spell check searches against the dictionary './spllechecker' returns the 
suggestions for LAST_NAME as 'john'
But there is no last name 'john' for the field LAST_NAME. So the sub sequent 
search returns NO results, which is not accepted.

So, this approach seems to be wrong for me..

2. Build a separate directory for each field. 
    schema.xml configuration
    !--Setup simple analysis for spell checking--
    fieldType name=textSpell class=solr.TextField 
positionIncrementGap=100 
  analyzer
    tokenizer class=solr.StandardTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
    /fieldType
    field name=FIRST_NAME type=text indexed=true stored=true/
    field name=LAST_NAME type=text indexed=true stored=true/
 field name=ORG_NAME type=text indexed=true stored=true 
required=true/
    field name=spell_fname type=textSpell indexed=true stored=true 
multiValued=true/
    field name=spell_lname type=textSpell indexed=true stored=true 
multiValued=true/
    field name=spell_org_name type=textSpell indexed=true stored=true 
multiValued=true/
    
    copyField source=FIRST_NAME dest=spell_fname/
    copyField source=LAST_NAME dest=spell_lname/
    copyField source=ORG_NAME dest=spell_org_name/
  
    solrconfig.xml configuration
    searchComponent name=spellcheck class=solr.SpellCheckComponent    
    str name=queryAnalyzerFieldTypetextSpell/str    
    lst name=spellchecker
      str name=namefirstname/str
      str name=fieldspell_fname/str
      str name=spellcheckIndexDir./fname_spellchecker/str
    /lst  
    lst name=spellchecker
      str name=namelastname/str
      str name=fieldspell_lname/str
      str name=spellcheckIndexDir./lname_spellchecker/str
    /lst  
    lst name=spellchecker
      str name=nameoname/str
      str name=fieldspell_org_name/str
      str name=spellcheckIndexDir./orgname_spellchecker/str
    /lst
    /searchComponent
    
Now the queries:
1a. 
URL/select?q=FIRST_NAME:jahnLAST_NAME:smathORG_NAME:bpplespellcheck=true

How can I mention in the query to search against different dictionaries for 
different fields like
FIRST_NAME in fname_spellchecker, LAST_NAME in lname_spellchecker and ORG_NAME 
in orgname_spellchecker?

Or can I make the spell checker to store the field names and its values.

Please discuss my approaches and suggest a solution?



  

NPE creating EmbeddedSolrServer

2009-03-17 Thread Alexandre Rafalovitch
Hello,

I am trying to create a basic single-core embedded Solr instance. I
figured out how to setup a single core instance and got (I believe)
all files in right places. However,  I am unable to run trivial code
without exception:

SolrServer solr = new EmbeddedSolrServer(
new CoreContainer(
D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm,
new
File(D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr.xml)),
core);

The exception (with context) is:

WARNING: No queryConverter defined, using default converter
Mar 17, 2009 6:15:01 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@b02928 main
Mar 17, 2009 6:15:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:147)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1228)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1034)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Mar 17, 2009 6:15:02 PM org.apache.solr.core.SolrCore execute
INFO: [core] webapp=null path=null
params={start=0q=fast_warmrows=10} status=500 QTime=47

I am not sure where to look further. Source code (at my level of
knowledge) is not very helpful.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/


Re: Indexing issue

2009-03-17 Thread Chris Hostetter

: I have two cores in different machines which are referring to the same data 
directory.

this isn't really considered a supported configuration ... both solr 
instances are going to try and own the directory for updating, and 
unless you do somethign special to ensure only one has control you are
going to have problems...

: below error.   HTTP Status 500 - java.io.FileNotFoundException: 
: \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The 
: system cannot find the file specified) java.lang.RuntimeException: 
: java.io.FileNotFoundException: 
: \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The 

...like this.  one core is mucking with the files in a way the other core 
doesn't know about.

: I have changed lockType to simple and none, but still no luck…
: Could you please correct me if I am doing wrong?

none isn't going to help you -- it's just going to make the problem 
worse (two misconfigured  instances of Solr in the same JVM could corrupt 
eachother with lockType=none).

simple is only going to help you on some filesystems -- sicne you said 
these two solr instances are running on different machines, that implies 
NFS (or something like it) and SimpleFSLockFactory doesn't work reliably 
in those cases.

If you want to get something like this working, you'll probably need 
to setup your own network based lockType (instead of relying on the 
filesystem)


-Hoss


Re: Relevancy and date sorting in same search?

2009-03-17 Thread Chris Hostetter

: I'm trying to think of a way to use both relevancy and date sorting in
: the same search. If documents are recent (say published within the last
: 2 years), I want to use all of the boost functions, BQ parameters, and
: normal Lucene scoring functions, but for documents older than two years,
: I don't want any of those scores to apply - only a sort by date.

Yonik recently commited some code that makes it possible to express a 
function query string that refers to an arbitrary param name which is 
evaluated as a query and the scores for each document are used 
as a ValueSource input for the function.  I imagine you could combine 
this with a custom function that returns the value from one ValueSource if 
it's low enough (the date field) otherwise it returns the value from an 
alternate ValueSource (the query)


-Hoss



Re: muticore setup with tomcat

2009-03-17 Thread Chris Hostetter

You haven't really given us a lot of information to work with...

what shows up in your logs?
what did you name the context fragment file?
where did you put the context fragment file?
where did you put the multicore directory?

sharing *exact* directory lisings and the *exact* commands you've 
executed is much more likely to help people understand what you're seeing.

For example: the SolrTomcat wiki page shows an exact set of shell commands 
to install solr and tomcat on linux or cygwin and get it running against a 
simple example ... if you can provide a similar set commands showing 
*exactly* what you've done, people might be able to spot the problem (or 
try the steps themselve and reproduce the problem)

http://wiki.apache.org/solr/SolrTomcat

: Date: Mon, 9 Mar 2009 14:55:47 +0530

: Hi,
: 
: I am trying to do amulticore set up..
: 
: I added the following from the 1.3 solr download to new dir called multicore
: 
: core0 ,core1,solr.xml and solr.war
: 
: in the tomcat context fragment i have defined as
: 
: Context docBase=c:/multicore/solr.war debug=0 crossContext=true 
:Environment name=solr/home type=java.lang.String value=C:\multicore
: override=true /
: /Context
: http://localhost:8080/multicore/admin
: http://localhost:8080/multicore/admin/core0
: 
: The above 2 ursl give me resource not found error
: 
: the solr.xml is the default one from the download.
: 
: Please tell me as to what needs to be changed to make this work in tomcat
: 
: Regards
: Sujatha
: 



-Hoss



Re: multicore file path

2009-03-17 Thread Chris Hostetter

This is a feature of the ShowFileRequestHandler -- it doesn't let people 
browse files outside of hte conf directory.

I suppose this behavior could be made configurable (right now the only 
config option is hidden for excluding specific files ... we could have 
an option to allow files that would normally be hidden)


: Date: Mon, 9 Mar 2009 07:33:48 -0400
: From: Gargate, Siddharth sgarg...@ptc.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: multicore file path
: 
: I am trying out multicore environment with single schema and solrconfig
: file. Below is the folder structure 
:  
: Solr/
:conf/
:  schema.xml
:  solrconfig.xml
:core0/
:data/
:core1/
:data/
:tomcat/
: 
: The solrhome property is set in tomcat as -Dsolr.solr.home=../..
: 
: And the solr.xml file is
: 
: solr persistent='true'
: cores adminPath='/admin/cores'
:   core name='core0' instanceDir='core0/'
: config='../../conf/solrconfig.xml' schema='../../conf/schema.xml'/
:   core name='core1' instanceDir='core1/'
: config='../../conf/solrconfig.xml' schema='../../conf/schema.xml'/
: /cores
: /solr
: 
: Everything is working fine, but when I try to access schema file from
: admin UI I am getting following error
: http://localhost:8080/solr/core0/admin/file/?file=../../conf/schema.xml
:  HTTP Status 403 - Invalid path: ../../conf/schema.xml
:  description Access to the specified resource (Invalid path:
: ../../conf/schema.xml) has been forbidden.
: 
: 



-Hoss



NPE in MultiSegmentReader$MultiTermDocs.doc

2009-03-17 Thread Comron Sattari
I've recently upgraded to Solr 1.3 using Lucene 2.4. One of the reasons I
upgraded was because of the nicer SearchComponent architecture that let me
add a needed feature to the default request handler. Simply put, I needed to
filter a query based on some additional parameters. So I subclassed
QueryComponent and replaced the line

   rb.setQuery( parser.getQuery() );

with one that wrapped the parsed query in a FilteredQuery

  Query query = parser.getQuery();
  String arguments = params.get(param-name);
  if( arguments != null) {
query = new FilteredQuery(query, new MyCustomFilter(arguments));
  }
  rb.setQuery(query);

The filter class I used can be seen here: http://privatepaste.com/021ZH27tKG.
And is nearly verbatim from the Lucene in Action book, when describing a way
to do security filtering.

This seems to work fine, although I'm getting some strange behavior when
exercising this code through some unit tests from my Rails app. Sometimes I
get an NPE when doing the filtering.

  at top level in at
org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.doc(MultiSegmentReader.java
at line 533
  at MyQueryComponent$MyCustomFilter.bits(Unknown Source)
  at top level in at org.apache.lucene.search.Filter.getDocIdSet(Filter.java
at line 49
  at top level in at
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java at line
105
  at top level in at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java at line 132
  at top level in at org.apache.lucene.search.Searcher.search(Searcher.java
at line 126

After some detective work I decided the problem had to do with an empty
index, and the termDocs iterator has 0 elements to iterate over and was
throwing this error. Since there is no size() method or analogous method on
the TermDocs iterator, I decided to use docFreq(term) as you can see in the
source code. But this didn't solve my problem either. This error is being
throw even when docFreq(term) returns more then 0 documents. I can't for the
life of me figure out why this iterator's doc() method throwing an NPE.
(Well I can deduce that the current member is null, but I don't know why.)
Is the index corrupted? I can see record in the index that should match my
Term through the solr /admin/ interface, and docFreq(term) returns a number
 0. Yet this NPE keeps showing up.

Any help or guidance would be appreciated. If you need to see more source
I'd be happy to provide it, but I'm sure thats all the relevant stuff.

(I cross posted this to both Solr and Lucene lists.)

Comron


Re: muticore setup with tomcat

2009-03-17 Thread Cheng Zhang

below is my setup, 

Context docBase=/home/zhangyongjiang/applications/solr/solr.war debug=0 
crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/zhangyongjiang/applications/solr override=false /
/Context

then under /home/zhangyongjiang/applications/solr, I have solr.xml as below,

solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core1 instanceDir=core1 /
  core name=core2 instanceDir=core2 /
  core name=core3 instanceDir=core3 /
  core name=core4 instanceDir=core4 /
 /cores
/solr

under /home/zhangyongjiang/applications/solr, I created core1/, core2/, core3/, 
core4 subdirectories.


hope it helps.





- Original Message 
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org
Sent: Tuesday, March 17, 2009 3:46:11 PM
Subject: Re: muticore setup with tomcat


You haven't really given us a lot of information to work with...

what shows up in your logs?
what did you name the context fragment file?
where did you put the context fragment file?
where did you put the multicore directory?

sharing *exact* directory lisings and the *exact* commands you've 
executed is much more likely to help people understand what you're seeing.

For example: the SolrTomcat wiki page shows an exact set of shell commands 
to install solr and tomcat on linux or cygwin and get it running against a 
simple example ... if you can provide a similar set commands showing 
*exactly* what you've done, people might be able to spot the problem (or 
try the steps themselve and reproduce the problem)

http://wiki.apache.org/solr/SolrTomcat

: Date: Mon, 9 Mar 2009 14:55:47 +0530

: Hi,
: 
: I am trying to do amulticore set up..
: 
: I added the following from the 1.3 solr download to new dir called multicore
: 
: core0 ,core1,solr.xml and solr.war
: 
: in the tomcat context fragment i have defined as
: 
: Context docBase=c:/multicore/solr.war debug=0 crossContext=true 
:Environment name=solr/home type=java.lang.String value=C:\multicore
: override=true /
: /Context
: http://localhost:8080/multicore/admin
: http://localhost:8080/multicore/admin/core0
: 
: The above 2 ursl give me resource not found error
: 
: the solr.xml is the default one from the download.
: 
: Please tell me as to what needs to be changed to make this work in tomcat
: 
: Regards
: Sujatha
: 



-Hoss


Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: bq works only with q.alt query and not with q queries. So, in your case you
: would be using qf parameter for field boosting, you will have to give both
: the fields in qf parameter i.e. both title and media.

FWIW: that statement is false.  the boost query (bq) is added to the 
query regardless of wether q or q.alt is ultimately used.

if you turn on debugQUery=true and look at your resulting query string, 
you can see exactly what the resulting query is (parsedQuery)

Using the example setup, compare the output from these examples...

http://localhost:8983/solr/select/?q.alt=bazq=solrdefType=dismaxqf=name+catbq=foodebugQuery=true
http://localhost:8983/solr/select/?q.alt=solrq=defType=dismaxqf=name+catbq=foodebugQuery=true


-Hoss



Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: Is not particularly helpful. I tried adding adding a bq argument to my
: search: 
: 
: bq=media:DVD^2
: 
: (yes, this is an index of films!) but I find when I start adding more
: and more:
: 
: bq=media:DVD^2bq=media:BLU-RAY^1.5
: 
: I find the negative results - e.g. films that are DVD but are not
: BLU-RAY get negatively affected in their score. In the end it all seems

that shouldn't be happening ... the outermost BooleanQuery (that the 
main q and all of hte bq queries are added to) has it's 
coordFactor disabled, so documents aren't penalized for not matching bq 
caluses.

What you may be seeing is that the raw numeric score values you see 
getting returned by Solr are lower for documents that match DVD when you add 
teh 
BLU-RAY bq ... that's totally possible because *absolute* scores from 
one query can't be compared to scores from another query -- what's important is 
that 
the *relative* order of scores from doc1 and doc2 should be consistent 
(ie: the score for a doc matching DVD might go down when you add the 
BLUERAY bq, but the scores for *all* documents not matching BLUERAY should 
go down some)

The important thing to look for is:
  1) are DVD docs sorting higher then they would without the DVD bq?
  2) are BLURAY docs sorting higher then they would without the BLURAY bq?
  3) are two docs that are equivilent except for a DVD?BLUERAY distinction 
 sorting such that the BLURAY doc comes first?


...the answers to all of those should be yes.  if you're seeing otherwise, 
please post the query tostrings for both queries, and the score 
explanations for the docs in question against both queries.




-Hoss



Re: fl wildcards

2009-03-17 Thread Chris Hostetter

FWIW: there has been a lot of dicsussion around how wildcards should work 
in various params that involve field names in the past: search the 
archives for glob or globbing and you'll find several.

: That makes sense, since hl.fl probably can get away with calculating in the
: writer, and not as part of the core. However, I really need wildcard (or
: globbing) support for field lists as part of the common query parameter fl.
: Again, if someone can just point me to where the Solr core is using the
: contents of the fl param, I am happy to implement this, if only locally for my
: purposes.

It's complicated... the SolrQueryResponse has a setReturnFields method 
where the RequestHandler can specify which fields should be returned and 
the ResponseWRiters use that when outputing DocList (writer fetches the 
Document by internal id) ... but with the addition of distributed 
searching now there are also SolrDocumentList objects and whoever puts the 
SolrDocumentList in the response decides which fields to populate.

if you grep the code base for CommonParams.FL and setReturnFields you 
should find all of the various touch points.

if you're really interested in pursuing this, some brainstorming on
how to deal with field globs in a very general and robust way were 
discussed about a year ago, and i posted notes on the wiki...

   http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

...but no one has actively pursued it enough to figure out what the real 
ramifications/benefits could be.



-Hoss



Re: Compound word search (maybe DisMaxQueryPaser problem)

2009-03-17 Thread Chris Hostetter

: My original assumption for the DisMax Handler was, that it will just take the
: original query string and pass it to every field in its fieldlist using the
: fields configured analyzer stack. Maybe in the end add some stuff for the
: special options and so ... and then send the query to lucene. Can you explain
: why this approach was not choosen?

because then it wouldn't be the DisMaxRequestHandler.

seriously: the point of dismax is to build up a DisjunctionMaxQuery for 
each chunk in the query string and populate those DisjunctionMaxQueries 
with the Queries produced by analyzing that chunk against each field in 
the qf -- then all of the DisjunctionMaxQueries are grouped into a 
BooleanQuery with a minNrSHouldMatch.

if you look at the query toString from debugQuery (using a non trivial qf 
param and a q string containing more then one chunk) you can see what i 
mean.  your example shows it pretty well actaully...

:  :  :  ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)

the point is to build those DisjunctionMaxQueries -- so that each chunk 
only contributes significantly based on the highest scoring field that 
chunk appears in ... if your example someone typing blue tooth can get a 
match when a doc matches blue in one field and tooth in another -- that 
wouldn't be possible with the appraoch you describe.  the Query structure 
also means that a doc where tooth appears in both the category and name 
fields but blue doesn't appear at all won't score as high as a doc that 
matches blue in category and tooth in name (allthough you have to look 
at the score explanations to really see hwat i mean by that)


There are certainly a lot of improvements that could be made to dismax ... 
more customiation in terms of how the querystrings is parsed before 
building up the DisjunctionMaxQueries and calling the individual field 
analyzers would certainly be one way it could improve ... but so far no 
one has attempted anything like that.




-Hoss



Re: muticore setup with tomcat

2009-03-17 Thread Chris Hostetter


: below is my setup, 
: 
: Context docBase=/home/zhangyongjiang/applications/solr/solr.war debug=0 
crossContext=true 
:Environment name=solr/home type=java.lang.String 
value=/home/zhangyongjiang/applications/solr override=false /
: /Context

you provided that information before, but you still haven't answered the 
most of the questions i asked you...

: You haven't really given us a lot of information to work with...
: 
: what shows up in your logs?
: what did you name the context fragment file?
: where did you put the context fragment file?
: where did you put the multicore directory?

...the answer to that last question is the only new piece of information 
you provided.

My other comments also still hold true...

: sharing *exact* directory lisings and the *exact* commands you've 
: executed is much more likely to help people understand what you're seeing.

...please cut and paste directory listings of the directories in question, 
cust/paste how you are running tomcat, which directory you are running 
tomcat in, what log messages you are getting, etc...

: For example: the SolrTomcat wiki page shows an exact set of shell commands 
: to install solr and tomcat on linux or cygwin and get it running against a 
: simple example ... if you can provide a similar set commands showing 
: *exactly* what you've done, people might be able to spot the problem (or 
: try the steps themselve and reproduce the problem)
: 
: http://wiki.apache.org/solr/SolrTomcat



-Hoss



Re: Solr 1.4: filter documens using fields

2009-03-17 Thread Chris Hostetter

: I'm using StandardRequestHandler and I wanted to filter results by two fields
: in order to avoid duplicate results (in this case the documents are very
: similar, with differences in fields that are not returned in a query
: response).
...
: I'm manage to do the filtering in the client, but then the paging doesn't work
: as it should (some pages may contain more duplicated results than others).
: Is there a way (query or other RequestHandler) to do this?

not at the moment, but some people have been working on trying to solve 
the broader problem in an efficient way...

https://issues.apache.org/jira/browse/SOLR-236



-Hoss



Re: Operators and Minimum Match with Dismax handler

2009-03-17 Thread Chris Hostetter
: I have an index which we are setting the default operator to AND.
: Am I right in saying that using the dismax handler, the default operator in
: the schema file is effectively ignored? (This is the conclusion I've made
: from testing myself)

correct.

: The issue I have with this, is that if I want to include an OR in my phrase,
: these are effectively getting ignored. The parser is still trying to match
: 100% of the search terms
: 
: e.g. 'lucene OR query' still only finds matches for 'lucene AND query'
: the parsed query is: +(((drug_name:lucen) (drug_name:queri))~2) ()

correct.  dismax isn't designed to be used that way (it's a fluke of 
the implementation that using  AND  works at all)

: Does anyone have any advise as to how I could deal with this kkind of
: problem?

i would set your mm to something smaller and let your users use + when 
they want to make something required.  if you really want to support the 
AND/OR/NOT type sequence  ... don't use dismax: that type of syntax is 
what the standard parser is for.


-Hoss



Re: Custom handler that forwards a request to another core

2009-03-17 Thread Chris Hostetter

: My problem was that the XMLResponseWriter is using the searcher of the
: original request to get the matching documents (in the method writeDocList
: of the class XMLWriter). Since the DocList contains id from the index of the
: second core, there were not valid in the index of the core receiving the
: request.

correct.  to deal with this type of problem in distributed search, the 
SolrDocumentList class was introduced -- if you call getSearcher() on your 
LocalSolrQueryRequest you can use that to build up a SolrDocumentList from 
the DocList, and then add that to your response.

BTW...

:  public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse
:  response)
...
:  request = new LocalSolrQueryRequest(coreToRequest, params);
:  
:  SolrRequestHandler mlt = coreToRequest.getRequestHandler(/mlt);
:  coreToRequest.execute(mlt, request, response);

this doesn't look safe ... SolrQueryRequest objects need to be closed 
when they're finished, and you aren't doing that here.  as a result the 
searcher ref obtained for the life of the request won't be closed.


-Hoss



Re: com.ctc.wstx.exc.WstxLazyException exception while passing the text content of a word doc to SOLR

2009-03-17 Thread Chris Hostetter

: I am using Apache POI parser to parse a Word Doc and extract the text
: content. Then i am passing the text content to SOLR. The Word document has
: many pictures, graphs and tables. But when i am passing the content to SOLR,
: it fails. Here is the exception trace.
: 
: 09:31:04,516 ERROR [STDERR] Mar 14, 2009 9:31:04 AM
: org.apache.solr.common.SolrException log
: SEVERE: [com.ctc.wstx.exc.WstxLazyException]
: com.ctc.wstx.exc.WstxParsingException: Illegal charact
: er entity: expansion character (code 0x7) not a valid XML character
:  at [row,col {unknown-source}]: [40,18]

the error string is fairly self explanatory: on line 40, column 18 you 
have a character that isn't legal in XML (0x7)

(not all UTF-8 characters are legal in XML)

If search the solr archives for Illegal character you'll find lots of 
discussion about this and how to deal with this in general.

You might also want to check out this comment pointing out some advantages 
in using Tika instead of using POI directly...

https://issues.apache.org/jira/browse/LUCENE-1559?#action_12681347

..lastly you might wnat to check out this plugin and do all hte hard work 
server side...

http://wiki.apache.org/solr/ExtractingRequestHandler




-Hoss



RE: Operators and Minimum Match with Dismax handler

2009-03-17 Thread Dean Missikowski (Consultant), CLSA
I'm using dismax with the default operator set to AND, and don't use
Minimum Match (commented out in solrconfig.xml), meaning 100% of the
terms must match.  Then in my application logic I use a regex that
checks if the query contains  OR , and if it does I add mm=1 to the
solr request to effectively turn the query into an OR. This trick
doesn't work for complex boolean queries but works for simple xxx OR
yyy. 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 18/03/2009 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Operators and Minimum Match with Dismax handler

: I have an index which we are setting the default operator to AND.
: Am I right in saying that using the dismax handler, the default
operator in
: the schema file is effectively ignored? (This is the conclusion I've
made
: from testing myself)

correct.

: The issue I have with this, is that if I want to include an OR in my
phrase,
: these are effectively getting ignored. The parser is still trying to
match
: 100% of the search terms
: 
: e.g. 'lucene OR query' still only finds matches for 'lucene AND query'
: the parsed query is: +(((drug_name:lucen) (drug_name:queri))~2) ()

correct.  dismax isn't designed to be used that way (it's a fluke of 
the implementation that using  AND  works at all)

: Does anyone have any advise as to how I could deal with this kkind of
: problem?

i would set your mm to something smaller and let your users use + when

they want to make something required.  if you really want to support the

AND/OR/NOT type sequence  ... don't use dismax: that type of syntax is 
what the standard parser is for.


-Hoss


CLSA CLEAN  GREEN: Please consider our environment before printing this email.
The content of this communication is subject to CLSA Legal and Regulatory 
Notices. 
These can be viewed at https://www.clsa.com/disclaimer.html or sent to you upon 
request.




Re: Phrase slop / Proximity search

2009-03-17 Thread Chris Hostetter

: Can I set the phrase slop value to standard request handler? I want it
: to be configurable in solrconfig.xml file.

if you mean when a user enters a query like...

+fieldA:some phrase +(fieldB:true fieldC:1234)

..you want to be able to control what slop value gets used for some 
phrase then at the moment the only way to configure that is to put it in 
the query string...

+fieldA:some phrase~3 +(fieldB:true fieldC:1234)

...it's the kind of thing that could be set as a property on the query 
parser, but no one has implemented that.  (patches welcome!)




-Hoss



Re: More replication questions

2009-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Mar 18, 2009 at 12:34 AM, Vauthrin, Laurent
laurent.vauth...@disney.com wrote:
 Hello,



 I have a couple of questions relating to replication in Solr.  As far as
 I understand it, the replication approach for both 1.3 and 1.4 involves
 having the slaves poll the master for updates to the index.  We're
 curious to know if it's possible to have a more dynamic/quicker way to
 propagate updates.



 1.       Is there a built-in mechanism for pushing out
 updates(/inserts/deletes) received by the master to the slaves?
The pull mechanism in 1.4 can be good enough. The 'pollInterval' can
be as small as 1 sec. So you will get the updates within a second
.Isn't it not good enough?

 2.       Is it discouraged to post updates to multiple Solr instances?
 (all instances can receive updates and fulfill query requests)
This is prone to serious errors all the solr instances may not be in sync

 3.       If that sort of capability is not supported, why was it not
 implemented this way?  (So that we don't repeat any mistakes)
A push based replication is in the cards. the implementation is not
trivial. In Solr commits are already expensive s a second's delay may
be alright .

 4.       Has anyone else on the list attempted to do this?  The intent
 here is to achieve optimal performance while have the freshest data
 possible if that's possible.



 Thanks,
 Laurent





-- 
--Noble Paul


Re: CJKAnalyzer and Chinese Text sort

2009-03-17 Thread Sachin
Created SOLR-1073 in JIRA with the class file:
https://issues.apache.org/jira/browse/SOLR-1073

-- Original Message --
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org
Subject: Re: CJKAnalyzer and Chinese Text sort
Date: Mon, 16 Mar 2009 21:34:09 -0700 (PDT)


: Thanks Hoss for your comments! I don't mind submitting it as a patch, 
: shall I create a issue in Jira and submit the patch with that? Also, I 

yep, just attach the patch file.

: didn't modify the core solr for locale based sorting; I just added the 
: created a jar file with the class file  copied it over to the lib 
: folder. As part of the patch, shall I add it to the core solr code-base 
: (users who want to use this don't need anything extra to do) or add it 
: as a contrib field (they need to compile it as jar and copy it over to 
: the lib folder)?

go ahead and attach what you've got (Yonik's Law of patches) but i'm 
guessing it would probably make sense if these changes ultimately became 
part of the core StrField ... there shouldn't be any down side (as long as 
it doesn't adversely affect the performance for people that don't want to 
use hte feature)

:   http://wiki.apache.org/solr/HowToContribute


-Hoss




Stuck in a dead end job?? Click to start living your dreams by earning an 
online degree.
http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxVwhsXZmn8Fh5mJqQTtwqvDiT5dxityHQk9LzIqLNu2xV1qEwUgbW/