Is there an issue with hypens in SpellChecker with StandardTokenizer?
I am getting an error using the SpellChecker component with the query another-test java.lang.StringIndexOutOfBoundsException: String index out of range: -7 This appears to be related to this issuehttps://issues.apache.org/jira/browse/SOLR-1630 which has been marked as fixed. My configuration and test case that follows appear to reproduce the error I am seeing. Both another and test get changed into tokens with start and end offsets of 0 and 12. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer spellcheck=truespellcheck.collate=true Is this an issue with my configuration/test or is there an issue with the SpellingQueryConverter? Is there a recommended work around such as the WhitespaceTokenizer as mention in the issue comments? Thank you for your help. package org.apache.solr.spelling; import static org.junit.Assert.assertTrue; import java.util.Collection; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.util.Version; import org.apache.solr.common.util.NamedList; import org.junit.Test; public class SimpleQueryConverterTest { @Test public void testSimpleQueryConversion() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedList()); converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); String original = another-test; CollectionToken tokens = converter.convert(original); assertTrue(Token offsets do not match, isOffsetCorrect(original, tokens)); } private boolean isOffsetCorrect(String s, CollectionToken tokens) { for (Token token : tokens) { int start = token.startOffset(); int end = token.endOffset(); if (!s.substring(start, end).equals(token.toString())) return false; } return true; } }
Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?
Hi Steve, I was using branch 3.5. I will try this on tip of branch_3x too. Thanks. On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe sar...@syr.edu wrote: Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test succeeds: @Test public void testStandardAnalyzerWithHyphen() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedList()); converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); String original = another-test; CollectionToken tokens = converter.convert(original); assertTrue(tokens is null and it shouldn't be, tokens != null); assertEquals(tokens Size: + tokens.size() + is not 2, 2, tokens.size()); assertTrue(Token offsets do not match, isOffsetCorrect(original, tokens)); } What version of Solr/Lucene are you using? Steve -Original Message- From: Brandon Fish [mailto:brandon.j.f...@gmail.com] Sent: Thursday, December 15, 2011 3:08 PM To: solr-user@lucene.apache.org Subject: Is there an issue with hypens in SpellChecker with StandardTokenizer? I am getting an error using the SpellChecker component with the query another-test java.lang.StringIndexOutOfBoundsException: String index out of range: -7 This appears to be related to this issuehttps://issues.apache.org/jira/browse/SOLR-1630 which has been marked as fixed. My configuration and test case that follows appear to reproduce the error I am seeing. Both another and test get changed into tokens with start and end offsets of 0 and 12. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer spellcheck=truespellcheck.collate=true Is this an issue with my configuration/test or is there an issue with the SpellingQueryConverter? Is there a recommended work around such as the WhitespaceTokenizer as mention in the issue comments? Thank you for your help. package org.apache.solr.spelling; import static org.junit.Assert.assertTrue; import java.util.Collection; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.util.Version; import org.apache.solr.common.util.NamedList; import org.junit.Test; public class SimpleQueryConverterTest { @Test public void testSimpleQueryConversion() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedList()); converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); String original = another-test; CollectionToken tokens = converter.convert(original); assertTrue(Token offsets do not match, isOffsetCorrect(original, tokens)); } private boolean isOffsetCorrect(String s, CollectionToken tokens) { for (Token token : tokens) { int start = token.startOffset(); int end = token.endOffset(); if (!s.substring(start, end).equals(token.toString())) return false; } return true; } }
Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?
Yes the branch_3x works for me as well. The addition of the OffsetAttribute probably corrected this issue. I will either switch to WhitespaceAnalyzer, patch my distribution or wait for 3.6 to resolve this. Thanks. On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish brandon.j.f...@gmail.comwrote: Hi Steve, I was using branch 3.5. I will try this on tip of branch_3x too. Thanks. On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe sar...@syr.edu wrote: Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test succeeds: @Test public void testStandardAnalyzerWithHyphen() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedList()); converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); String original = another-test; CollectionToken tokens = converter.convert(original); assertTrue(tokens is null and it shouldn't be, tokens != null); assertEquals(tokens Size: + tokens.size() + is not 2, 2, tokens.size()); assertTrue(Token offsets do not match, isOffsetCorrect(original, tokens)); } What version of Solr/Lucene are you using? Steve -Original Message- From: Brandon Fish [mailto:brandon.j.f...@gmail.com] Sent: Thursday, December 15, 2011 3:08 PM To: solr-user@lucene.apache.org Subject: Is there an issue with hypens in SpellChecker with StandardTokenizer? I am getting an error using the SpellChecker component with the query another-test java.lang.StringIndexOutOfBoundsException: String index out of range: -7 This appears to be related to this issuehttps://issues.apache.org/jira/browse/SOLR-1630 which has been marked as fixed. My configuration and test case that follows appear to reproduce the error I am seeing. Both another and test get changed into tokens with start and end offsets of 0 and 12. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer spellcheck=truespellcheck.collate=true Is this an issue with my configuration/test or is there an issue with the SpellingQueryConverter? Is there a recommended work around such as the WhitespaceTokenizer as mention in the issue comments? Thank you for your help. package org.apache.solr.spelling; import static org.junit.Assert.assertTrue; import java.util.Collection; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.util.Version; import org.apache.solr.common.util.NamedList; import org.junit.Test; public class SimpleQueryConverterTest { @Test public void testSimpleQueryConversion() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedList()); converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); String original = another-test; CollectionToken tokens = converter.convert(original); assertTrue(Token offsets do not match, isOffsetCorrect(original, tokens)); } private boolean isOffsetCorrect(String s, CollectionToken tokens) { for (Token token : tokens) { int start = token.startOffset(); int end = token.endOffset(); if (!s.substring(start, end).equals(token.toString())) return false; } return true; } }
Re: How to check if replication is running
Hi Yury, You could try checking out the details command of the replication handler: http://slave_host:port/solr/replication?command=details which has information such as isReplicating. You could also look at the script attached to this issue which shows a thorough check of a slaves replication status which could be polled for to trigger a restart if there is an error. Brandon 2011/9/16 Yury Kats yuryk...@yahoo.com Let's say I'm forcing a replication of a core using fetchindex command. No new content is being added to the master. I can check whether replication has finished by periodically querying master and slave for their indexversion and comparing the two. But what's the best way to check if replication is actually happening and hasn't been dropped, if for example, there was a network outage between master and the slave, in which case, I want to re-start replication. Thanks, Yury
Re: How to check if replication is running
Adding missing link to the issue I mentioned: https://issues.apache.org/jira/browse/SOLR-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851462#action_12851462 2011/9/16 Yury Kats yuryk...@yahoo.com Let's say I'm forcing a replication of a core using fetchindex command. No new content is being added to the master. I can check whether replication has finished by periodically querying master and slave for their indexversion and comparing the two. But what's the best way to check if replication is actually happening and hasn't been dropped, if for example, there was a network outage between master and the slave, in which case, I want to re-start replication. Thanks, Yury
Re: Data Import from a Queue
Let me provide some more details to the question: I was unable to find any example implementations where individual documents (single document per message) are read from a message queue (like ActiveMQ or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another method. Does anyone know of any available examples for this type of import? If no examples exist, what would be a recommended commit strategy for performance? My best guess for this would be to have a queue per core and commit once the queue is empty. Thanks. On Mon, Jul 18, 2011 at 6:52 PM, Erick Erickson erickerick...@gmail.comwrote: This is a really cryptic problem statement. you might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fish brandon.j.f...@gmail.com wrote: Does anyone know of any existing examples of importing data from a queue into Solr? Thank you.
Data Import from a Queue
Does anyone know of any existing examples of importing data from a queue into Solr? Thank you.
Re: Server Restart Required for Schema Changes After Document Delete All?
I'm not having any issues. I was just asking to see if any backward incompatible changes exist that would require a server restart. Thanks. 2011/6/27 Tomás Fernández Löbbe tomasflo...@gmail.com This should work with dynamic fields too. Are you having any problems with it? On Thu, Jun 23, 2011 at 3:14 PM, Brandon Fish brandon.j.f...@gmail.com wrote: Are there any schema changes that would cause problems with the following procedure from the FAQ http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F ? 1.Use the match all docs query in a delete by query command before shutting down Solr: deletequery*:*/query/delete 1. Reload core 2. Re-Index your data Would this work when dynamic fields are removed?
Server Restart Required for Schema Changes After Document Delete All?
Are there any schema changes that would cause problems with the following procedure from the FAQhttp://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F ? 1.Use the match all docs query in a delete by query command before shutting down Solr: deletequery*:*/query/delete 1. Reload core 2. Re-Index your data Would this work when dynamic fields are removed?
Modifying Configuration from a Browser
Does anyone have any examples of modifying a configuration file, like elevate.xml from a browser? Is there an API that would help for this? If nothing exists for this, I am considering implementing something that would change the elevate.xml file then reload the core. Or is there a better approach for dynamic configuration? Thank you.