Re: Problem with words thats amost similar

2009-12-18 Thread Steinar Asbjørnsen
Den 17. des. 2009 kl. 13.48 skrev Shalin Shekhar Mangar:

 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com
 
 Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar:
 
 
 
 For specific cases like this, you can add the word to a file and specify
 it
 in schema, for example:
 
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
 
 Ty Shalin.
 
 This is my schema.xml file
 fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType
 
 I added restaurant and restaurering to protwords.txt, restarted Tomcat, but
 no dice.
 Do I need to use the SnowballPorterFilterFactory?
 And do I need to reindex the documents?
 
 
 Actually EnglishPorterFilterFactory is the same as
 SnowballPorterFilterFactory with language=English. Both will work. You
 will need to re-index the documents.

What I've done so far is to add both restaurant and restaurering to 
protwords.txt.
I've also refeed a single document (with the keyword restaurering) to check 
that it no longer appears in a search result for restaurant.
Do i have to refeed every document in the index?
Or restart so that solr re-reads the protwords.txt-file (this is on a different 
installation(prod) then the one i restarted earlier(dev))?

Steinar

Problem with words thats amost similar

2009-12-17 Thread Steinar Asbjørnsen
Hi all.

I have a delicate problem when it comes to two words that are rather similar in 
the way they are typed, but when it comes to the meaning of the word they are 
completely different.
The actual words are restaurant (as in restaurant) and restaurering (as in 
restoration).

Solr seems to think these words are similar enough to present hits on both of 
them in the same search result.
Obviously this is not desirable.

Is there a way to take care of such spesific cases without disabling solr 
functionality for stemming and/or plurals?
Or would I need to disable stemming to make this special case disapear?

I'm using the dismax query handler
The field  im querying against is of type text.

Any help is apreciated :)

Regards,
Steinar

Re: Problem with words thats amost similar

2009-12-17 Thread Steinar Asbjørnsen
Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar:

 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com
 
 Hi all.
 
 I have a delicate problem when it comes to two words that are rather
 similar in the way they are typed, but when it comes to the meaning of the
 word they are completely different.
 The actual words are restaurant (as in restaurant) and restaurering (as in
 restoration).
 
 Solr seems to think these words are similar enough to present hits on both
 of them in the same search result.
 Obviously this is not desirable.
 
 Is there a way to take care of such spesific cases without disabling solr
 functionality for stemming and/or plurals?
 Or would I need to disable stemming to make this special case disapear?
 
 
 For specific cases like this, you can add the word to a file and specify it
 in schema, for example:
 
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/

Ty Shalin.

This is my schema.xml file
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no 
dice.
Do I need to use the SnowballPorterFilterFactory?
And do I need to reindex the documents?

Steinar

New solr-driven site

2009-10-29 Thread Steinar Asbjørnsen

Hi all.

Just wanted to inform you that a new solr-driven website is up and  
running, and to say thanks to you guys for helping out.


A little info:
Bedriftsøket.no (http://www.bedriftsoket.no/) is a Norwegian company  
catalogue using solr and SolrNet for all search/faceting  
functionality. The main index, of three in total, contains ~800k  
documents


I just added the site to http://wiki.apache.org/solr/PublicServers.

I did this project as a consultant, and I really hope I can introduce  
solr to future customers as well.


Regards,
Steinar

copyField from multiple fields into one

2009-10-26 Thread Steinar Asbjørnsen

Hi all.

I'm currently working on setting up spelling suggestion functionality.
What I'd like is to put the values of two fields (keyword and name) 
into the spell-field.

Something like (from schema.xml):
field name=spell type=textSpell indexed=true stored=true  
multiValued=true/

...
copyField source=keyword dest=spell/
copyField source=name dest=spell/

As far as i can see I only get suggestions from the keyword-field, and  
not from the name-field.


So my question is:
Is it possible to copy both keyword and name into the spell-field?

Thanks,
Steinar



Re: Only one usage of each socket address error

2009-10-01 Thread Steinar Asbjørnsen

Hi.

This situation is still bugging me.
I thought i had it fixed yday, but no...

Seems like this goes both for deleting and adding, but I'll explain  
the delete-situation here:
When I'm deleting documents(~5k) from a index, i get a error message  
saying
Only one usage of each socket address (protocol/network address/port)  
is normally permitted 127.0.0.1:8983.


I've tried  both delete by id and delete by query, and both gives me  
the same error.
The command that is giving me the errormessage is solr.Delete(id) and  
solr.Delete(new SolrQuery(id:+id)).


The command is issued with SolrNet, and I'm not sure if this is  
SolrNet or solr related.


I cannot find anything that helps me out in the catalina-log.
Are there any other logs that should be checked?

I'm grateful for any pointers :)

Thanks,
Steinar

Den 29. sep. 2009 kl. 11.15 skrev Steinar Asbjørnsen:

Seems like the post in the SolrNet group: 		http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 
 helped me get trough.


Thanks you solr-user's for helping out too!

Steinar

Videresendt melding:


Fra: Steinar Asbjørnsen steinar...@gmail.com
Dato: 28. september 2009 17.07.15 GMT+02.00
Til: solr-user@lucene.apache.org
Emne: Re: Only one usage of each socket address error

I'm using the add(MyObject) command form ()in a foreach loop to add  
my objects to the index.


In the catalina-log i cannot see anything that helps me out.
It stops at:
28.sep.2009 08:58:40  
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {add=[12345]} 0 187
28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187
Whitch indicates nothing wrong.

Are there any other logs that should be checked?

What it seems like to me at the moment is that the foreach is  
passing objects(documents) to solr faster then solr can add them to  
the index. As in I'm eventually running out of connections (to  
solr?) or something.


I'm running another incremental update that with other objects  
where the foreachs isn't quite as fast. This job has added over  
100k documents without failing, and still going. Whereas the  
problematic job fails after ~3k.


What I've learned trough the day tho, is that the index where my  
feed is failing is actually redundant.

I.e I'm off the hook for now.

Still I'd like to figure out whats going wrong.

Steinar

There's nothing in that output that indicates something we can  
help with over in solr-user land.  What is the call you're making  
to Solr?  Did Solr log anything anomalous?


Erik


On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote:

I just posted to the SolrNet-group since i have the exact same(?)  
problem.
Hope I'm not beeing rude posting here as well (since the SolrNet- 
group doesn't seem as active as this mailinglist).


The problem occurs when I'm running an incremental feed(self  
made) of a index.


My post:
[snip]
Whats happening is that i get this error message (in VS):
A first chance exception of type
'SolrNet.Exceptions.SolrConnectionException' occurred in  
SolrNet.DLL

And the web browser (which i use to start the feed says:
System.Data.SqlClient.SqlException: Timeout expired.  The timeout
period elapsed prior to completion of the operation or the server  
is

not responding.
At the time of writing my index contains 15k docs, and lacks  
~700k

docs that the incremental feed should take care of adding to the
index.
The error message appears after 3k docs are added, and before 4k
docs are added.
I'm committing each 1%1000==0.
In addittion autocommit is set to:
autoCommit
maxDocs1/maxDocs
/autoCommit
More info:
From schema.xml:
field name=id type=text indexed=true stored=true
required=true /
field name=name type=string indexed=true stored=true
required=false /
I'm fetching data from a (remote) Sql 2008 Server, using  
sqljdbc4.jar.

And Solr is running on a local Tomcat-installation.
SolrNet version: 0.2.3.0
Solr Specification Version: 1.3.0.2009.08.29.08.05.39

[/snip]
Any suggestions on how to fix this would be much apreceiated.

Regards,
Steinar










Fwd: Only one usage of each socket address error

2009-09-29 Thread Steinar Asbjørnsen
Seems like the post in the SolrNet group: 		http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 
 helped me get trough.


Thanks you solr-user's for helping out too!

Steinar

Videresendt melding:


Fra: Steinar Asbjørnsen steinar...@gmail.com
Dato: 28. september 2009 17.07.15 GMT+02.00
Til: solr-user@lucene.apache.org
Emne: Re: Only one usage of each socket address error

I'm using the add(MyObject) command form ()in a foreach loop to add  
my objects to the index.


In the catalina-log i cannot see anything that helps me out.
It stops at:
28.sep.2009 08:58:40  
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {add=[12345]} 0 187
28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187
Whitch indicates nothing wrong.

Are there any other logs that should be checked?

What it seems like to me at the moment is that the foreach is  
passing objects(documents) to solr faster then solr can add them to  
the index. As in I'm eventually running out of connections (to  
solr?) or something.


I'm running another incremental update that with other objects where  
the foreachs isn't quite as fast. This job has added over 100k  
documents without failing, and still going. Whereas the problematic  
job fails after ~3k.


What I've learned trough the day tho, is that the index where my  
feed is failing is actually redundant.

I.e I'm off the hook for now.

Still I'd like to figure out whats going wrong.

Steinar

There's nothing in that output that indicates something we can help  
with over in solr-user land.  What is the call you're making to  
Solr?  Did Solr log anything anomalous?


Erik


On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote:

I just posted to the SolrNet-group since i have the exact same(?)  
problem.
Hope I'm not beeing rude posting here as well (since the SolrNet- 
group doesn't seem as active as this mailinglist).


The problem occurs when I'm running an incremental feed(self made)  
of a index.


My post:
[snip]
Whats happening is that i get this error message (in VS):
A first chance exception of type
'SolrNet.Exceptions.SolrConnectionException' occurred in  
SolrNet.DLL

And the web browser (which i use to start the feed says:
System.Data.SqlClient.SqlException: Timeout expired.  The timeout
period elapsed prior to completion of the operation or the server is
not responding.
At the time of writing my index contains 15k docs, and lacks ~700k
docs that the incremental feed should take care of adding to the
index.
The error message appears after 3k docs are added, and before 4k
docs are added.
I'm committing each 1%1000==0.
In addittion autocommit is set to:
autoCommit
maxDocs1/maxDocs
/autoCommit
More info:
From schema.xml:
field name=id type=text indexed=true stored=true
required=true /
field name=name type=string indexed=true stored=true
required=false /
I'm fetching data from a (remote) Sql 2008 Server, using  
sqljdbc4.jar.

And Solr is running on a local Tomcat-installation.
SolrNet version: 0.2.3.0
Solr Specification Version: 1.3.0.2009.08.29.08.05.39

[/snip]
Any suggestions on how to fix this would be much apreceiated.

Regards,
Steinar








Re: Only one usage of each socket address error

2009-09-28 Thread Steinar Asbjørnsen
I just posted to the SolrNet-group since i have the exact same(?)  
problem.
Hope I'm not beeing rude posting here as well (since the SolrNet-group  
doesn't seem as active as this mailinglist).


The problem occurs when I'm running an incremental feed(self made) of  
a index.


My post:
[snip]
Whats happening is that i get this error message (in VS):
A first chance exception of type
'SolrNet.Exceptions.SolrConnectionException' occurred in SolrNet.DLL
And the web browser (which i use to start the feed says:
System.Data.SqlClient.SqlException: Timeout expired.  The timeout
period elapsed prior to completion of the operation or the server is
not responding.
At the time of writing my index contains 15k docs, and lacks ~700k
docs that the incremental feed should take care of adding to the
index.
The error message appears after 3k docs are added, and before 4k
docs are added.
I'm committing each 1%1000==0.
In addittion autocommit is set to:
autoCommit
maxDocs1/maxDocs
/autoCommit
More info:
From schema.xml:
field name=id type=text indexed=true stored=true
required=true /
field name=name type=string indexed=true stored=true
required=false /
I'm fetching data from a (remote) Sql 2008 Server, using sqljdbc4.jar.
And Solr is running on a local Tomcat-installation.
SolrNet version: 0.2.3.0
Solr Specification Version: 1.3.0.2009.08.29.08.05.39

[/snip]
Any suggestions on how to fix this would be much apreceiated.

Regards,
Steinar