Re: Requesting to add into a Contributor Group
done. let us know if you have any problems. On Sat, May 4, 2013 at 10:12 AM, Krunal jariwalakru...@gmail.com wrote: Dear Sir, Kindly add me to the contributor group to help me contribute to the Solr wiki. My Email id: jariwalakru...@gmail.com Login Name: Krunal Specific changes I would like to make to begin with are: - Correct Link of Ajax Solr here http://wiki.apache.org/solr/SolrJS which is wrong, the correct link should be https://github.com/evolvingweb/ajax-solr/wiki - Add our company data here http://wiki.apache.org/solr/Support We offer Solr integration service on Dot Net Platform at Xcellence-IT. And business division of ours, i.e. nopAccelerate - offers a Solr Integration Plugin for nopCommerce along with other nopCommerce performance optimization services. We have been working on Solr since last 1 years and will be happy to contribute back by helping community maintain update Wiki. If this is not allowed, then kindly let us know so I will send you our Company details so you can make changes too. Thanks, Awaiting your response. Krunal *Krunal Jariwala* *Cell:* +91-98251-07747 *Best time to Call:* 9am to 7pm (IST) GMT +5.30
Re: Why is SolrCloud doing a full copy of the index?
Second the thanks Erick On Sat, May 4, 2013 at 6:08 PM, Lance Norskog goks...@gmail.com wrote: Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems Anyone who has additional information for this page, feel free to add it. I hope I haven't made too many mistakes! Thanks, Shawn
Re: How to get solr synonyms in result set.
Sure, you can specify a separate synonyms list at query time, just define an index and query time analysis chain one without the synonym filter factory and one without. Be aware that index-time and query-time have some different characteristics, especially around multi-word synonyms see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Best Erick On Sun, May 5, 2013 at 12:23 AM, varun srivastava varunmail...@gmail.com wrote: Hi , Synonyms list is used at index time. So I dont think you can pass list at query time and make it work. On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.comwrote: Hi, I want to get specific solr synonyms terms list during query time in result set based on filter criteria. I have implemented synonyms in .txt file. Thanks - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why is SolrCloud doing a full copy of the index?
Thanks for the replies. It is really appreciated. Based on the replies it seems like upgrading to the latest version of Solr is something that will probably resolve this issue. We are also update quite frequently. We update every 5 minutes. We will try and set this to higher interval and see if that helps. We will also try increasing the servlet timeout and see if that resolves the issue. Among the other suggestions we already tried increasing the zkClientTimeout from 15 seconds to 30 seconds but that didn't seem to help. What do you recommend is a good value to try? Few more details about our system: we are running this on a system with 16GB of RAM. We are using 64 bit server and we also use SSD disks. Also, since we are already using 4.0 in our production environment with the aforementioned 3 servers setup, how should we go about upgrading to the latest version (4.3)? Do we need to do a full reindex of our data or is the index compatible between these versions? We will try out the suggestions and will post later if any of them help us resolve the issue. Again, thanks for the reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800p4060897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom tokenizer error
I didn't notice any call to the reset method for your base tokenizer. Is there any reason that you didn't just use char filters to replace colon and periods with spaces? -- Jack Krupansky -Original Message- From: Sarita Nair Sent: Friday, May 03, 2013 2:43 PM To: solr-user@lucene.apache.org Subject: custom tokenizer error I am using a custom Tokenizer, as part of analysis chain, for a Solr (4.2.1) field. On trying to index, Solr throws a NullPointerException. The unit tests for the custom tokenizer work fine. Any ideas as to what is it that I am missing/doing incorrectly will be appreciated. Here is the relevant schema.xml excerpt: fieldType name=negated class=solr.TextField omitNorms=true analyzer type=index tokenizer class=some.other.solr.analysis.EmbeddedPunctuationTokenizer$Factory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ /analyzer /fieldType Here are the relevant pieces of the Tokenizer: /** * Intercepts each token produced by {@link StandardTokenizer#incrementToken()} * and checks for the presence of a colon or period. If found, splits the token * on the punctuation mark and adjusts the term and offset attributes of the * underlying {@link TokenStream} to create additional tokens. * * */ public class EmbeddedPunctuationTokenizer extends Tokenizer { private static final Pattern PUNCTUATION_SYMBOLS = Pattern.compile([:.]); private StandardTokenizer baseTokenizer; private CharTermAttribute termAttr; private OffsetAttribute offsetAttr; private /*@Nullable*/ String tokenAfterPunctuation = null; private int currentOffset = 0; public EmbeddedPunctuationTokenizer(final Reader reader) { super(reader); baseTokenizer = new StandardTokenizer(Version.MINIMUM_LUCENE_VERSION, reader); // Two TokenStreams are in play here: the one underlying the current // instance and the one underlying the StandardTokenizer. The attribute // instances must be associated with both. termAttr = baseTokenizer.addAttribute(CharTermAttribute.class); offsetAttr = baseTokenizer.addAttribute(OffsetAttribute.class); this.addAttributeImpl((CharTermAttributeImpl)termAttr); this.addAttributeImpl((OffsetAttributeImpl)offsetAttr); } @Override public void end() throws IOException { baseTokenizer.end(); super.end(); } @Override public void close() throws IOException { baseTokenizer.close(); super.close(); } @Override public void reset() throws IOException { super.reset(); baseTokenizer.reset(); currentOffset = 0; tokenAfterPunctuation = null; } @Override public final boolean incrementToken() throws IOException { clearAttributes(); if (tokenAfterPunctuation != null) { // Do not advance the underlying TokenStream if the previous call // found an embedded punctuation mark and set aside the substring // that follows it. Set the attributes instead from the substring, // bearing in mind that the substring could contain more embedded // punctuation marks. adjustAttributes(tokenAfterPunctuation); } else if (baseTokenizer.incrementToken()) { // No remaining substring from a token with embedded punctuation: save // the starting offset reported by the base tokenizer as the current // offset, then proceed with the analysis of token it returned. currentOffset = offsetAttr.startOffset(); adjustAttributes(termAttr.toString()); } else { // No more tokens in the underlying token stream: return false return false; } return true; } private void adjustAttributes(final String token) { Matcher m = PUNCTUATION_SYMBOLS.matcher(token); if (m.find()) { int index = m.start(); offsetAttr.setOffset(currentOffset, currentOffset + index); termAttr.copyBuffer(token.toCharArray(), 0, index); tokenAfterPunctuation = token.substring(index + 1); // Given that the incoming token had an embedded punctuation mark, // the starting offset for the substring following the punctuation // mark will be 1 beyond the end of the current token, which is the // substring preceding embedded punctuation mark. currentOffset = offsetAttr.endOffset() + 1; } else if (tokenAfterPunctuation != null) { // Last remaining substring following a previously detected embedded // punctuation mark: adjust attributes based on its values. int length = tokenAfterPunctuation.length(); termAttr.copyBuffer(tokenAfterPunctuation.toCharArray(), 0, length); offsetAttr.setOffset(currentOffset, currentOffset + length); tokenAfterPunctuation = null; } // Implied else: neither is true so attributes from base tokenizer need // no adjustments. } } } Solr throws the following error, in the 'else if' block of #incrementToken 2013-04-29 14:19:48,920 [http-thread-pool-8080(3)] ERROR org.apache.solr.core.SolrCore - java.lang.NullPointerException at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
zookeeper errors
Hi, I'm running a Solr 4.2.1 cloud with an external three-node Zookeeper 2.4.5 setup. I'm seeing a lot of these errors in the zookeeper logs: 2013-05-05 15:06:22,863 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception Also some of these: NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x3e22c637f5063f due to java.io.IOException: Connection reset by peer or NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x3e22c637f506a1 due to java.io.IOException: Connection timed out We've had problems with nodes dropping out of collections during indexing. I'm assuming these are related? Is there some sort of socket tuning I need to do on the Solr side to keep these connections going? Thanks for any input anybody might be able to provide, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game
Re: disaster recovery scenarios for solr cloud and zookeeper
From the wiki: SolrCloud can continue to serve results without interruption as long as at least one server hosts every shard. You can demonstrate this by judiciously shutting down various instances and looking for results. If you have killed all of the servers for a particular shard, requests to other servers will result in a 503 error. To return just the documents that are available in the shards that are still alive (and avoid the error), add the following query parameter: shards.tolerant=true That doesn't completely answer your question, but is an important part of the puzzle. -- Jack Krupansky -Original Message- From: Dennis Haller Sent: Friday, May 03, 2013 3:21 PM To: solr-user@lucene.apache.org Subject: disaster recovery scenarios for solr cloud and zookeeper Hi, Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is expected to have a very high (perfect?) availability. With 3 or 5 zookeeper nodes, it is possible to manage zookeeper maintenance and online availability to be close to %100. But what is the worst case for Solr if for some unanticipated reason all Zookeeper nodes go offline? Could someone comment on a couple of possible scenarios for which all ZK nodes are offline. What would happen to Solr and what would be needed to recover in each case? 1) brief interruption, say 2 minutes, 2) longer downtime, say 60 min Thanks Dennis
Re: disaster recovery scenarios for solr cloud and zookeeper
When Solr loses it's connection to ZooKeeper, updates will start being rejected. Read requests will continue as normal. This is regardless of how long ZooKeeper is down. So it's pretty simple - when you lost the ability to talk to ZK, everything keeps working based on the most recent clusterstate - except that updates are blocked and you cannot add new nodes to the cluster. You are essentially in steady state. The ZK clients will continue trying to reconnect so that when ZK comes back updates while start being accepted again and new nodes may join the cluster. - Mark On May 3, 2013, at 3:21 PM, Dennis Haller dhal...@talenttech.com wrote: Hi, Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is expected to have a very high (perfect?) availability. With 3 or 5 zookeeper nodes, it is possible to manage zookeeper maintenance and online availability to be close to %100. But what is the worst case for Solr if for some unanticipated reason all Zookeeper nodes go offline? Could someone comment on a couple of possible scenarios for which all ZK nodes are offline. What would happen to Solr and what would be needed to recover in each case? 1) brief interruption, say 2 minutes, 2) longer downtime, say 60 min Thanks Dennis
Re: zookeeper errors
It sounds like you probably need to raise the default 15 sec zk client timeout. We have it default to a fairly aggressive setting. For high load envs, you may have to bring it to 30 or 45 seconds. Also, be sure you are not using a stop the world gc collector - CMS (the concurrent low pause collector) is best. - Mark On May 5, 2013, at 1:50 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, I'm running a Solr 4.2.1 cloud with an external three-node Zookeeper 2.4.5 setup. I'm seeing a lot of these errors in the zookeeper logs: 2013-05-05 15:06:22,863 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception Also some of these: NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x3e22c637f5063f due to java.io.IOException: Connection reset by peer or NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x3e22c637f506a1 due to java.io.IOException: Connection timed out We've had problems with nodes dropping out of collections during indexing. I'm assuming these are related? Is there some sort of socket tuning I need to do on the Solr side to keep these connections going? Thanks for any input anybody might be able to provide, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game
Re: zookeeper errors
Mark, I'm definitely using CMS, so I'll look into the zk client timeout. Thanks! Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Sun, May 5, 2013 at 2:21 PM, Mark Miller markrmil...@gmail.com wrote: It sounds like you probably need to raise the default 15 sec zk client timeout. We have it default to a fairly aggressive setting. For high load envs, you may have to bring it to 30 or 45 seconds. Also, be sure you are not using a stop the world gc collector - CMS (the concurrent low pause collector) is best. - Mark On May 5, 2013, at 1:50 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, I'm running a Solr 4.2.1 cloud with an external three-node Zookeeper 2.4.5 setup. I'm seeing a lot of these errors in the zookeeper logs: 2013-05-05 15:06:22,863 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception Also some of these: NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x3e22c637f5063f due to java.io.IOException: Connection reset by peer or NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x3e22c637f506a1 due to java.io.IOException: Connection timed out We've had problems with nodes dropping out of collections during indexing. I'm assuming these are related? Is there some sort of socket tuning I need to do on the Solr side to keep these connections going? Thanks for any input anybody might be able to provide, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game
Re: disaster recovery scenarios for solr cloud and zookeeper
Is soul retrieval possible when ZooKeeper is down? -- Jack Krupansky -Original Message- From: Mark Miller Sent: Sunday, May 05, 2013 2:19 PM To: solr-user@lucene.apache.org Subject: Re: disaster recovery scenarios for solr cloud and zookeeper When Solr loses it's connection to ZooKeeper, updates will start being rejected. Read requests will continue as normal. This is regardless of how long ZooKeeper is down. So it's pretty simple - when you lost the ability to talk to ZK, everything keeps working based on the most recent clusterstate - except that updates are blocked and you cannot add new nodes to the cluster. You are essentially in steady state. The ZK clients will continue trying to reconnect so that when ZK comes back updates while start being accepted again and new nodes may join the cluster. - Mark On May 3, 2013, at 3:21 PM, Dennis Haller dhal...@talenttech.com wrote: Hi, Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is expected to have a very high (perfect?) availability. With 3 or 5 zookeeper nodes, it is possible to manage zookeeper maintenance and online availability to be close to %100. But what is the worst case for Solr if for some unanticipated reason all Zookeeper nodes go offline? Could someone comment on a couple of possible scenarios for which all ZK nodes are offline. What would happen to Solr and what would be needed to recover in each case? 1) brief interruption, say 2 minutes, 2) longer downtime, say 60 min Thanks Dennis
Re: Why is SolrCloud doing a full copy of the index?
Advance warning: this is a long reply. Awesome Shawn. Thanks!
iterate through each document in Solr
Dear Solr Users, Does anyone know what is the best way to iterate through each document in a Solr index with billion entries? I tried to use select?q=*:*start=xxrows=500 to get 500 docs each time and then change start value, but it got very slow after getting through about 10 million docs. Thanks, Ming-
Re: iterate through each document in Solr
On 5/5/13 7:48 PM, Mingfeng Yang wrote: Dear Solr Users, Does anyone know what is the best way to iterate through each document in a Solr index with billion entries? I tried to use select?q=*:*start=xxrows=500 to get 500 docs each time and then change start value, but it got very slow after getting through about 10 million docs. Thanks, Ming- You need to use a unique and stable sort key and get documents sortkey. For example, if you have a unique key, retrieve documents ordered by the unique key, and for each batch get documents max (key) from the previous batch -Mike
Re: How to get solr synonyms in result set.
Hi Suneel, After discovering that only query time synonym work with solr I found a good article on pros and cons of query and index time synonyms . It may help you http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ Regards Varun On Sun, May 5, 2013 at 9:20 AM, Erick Erickson erickerick...@gmail.comwrote: Sure, you can specify a separate synonyms list at query time, just define an index and query time analysis chain one without the synonym filter factory and one without. Be aware that index-time and query-time have some different characteristics, especially around multi-word synonyms see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Best Erick On Sun, May 5, 2013 at 12:23 AM, varun srivastava varunmail...@gmail.com wrote: Hi , Synonyms list is used at index time. So I dont think you can pass list at query time and make it work. On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.com wrote: Hi, I want to get specific solr synonyms terms list during query time in result set based on filter criteria. I have implemented synonyms in .txt file. Thanks - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get solr synonyms in result set.
There is no way to identify *which* synonyms triggered in your search output. You could implement a synonyms search component, that looks in the stored values of configured fields for synonyms, and adds another block of XML to the output. This old be useful component. Upayavira On Mon, May 6, 2013, at 05:23 AM, varun srivastava wrote: Hi Suneel, After discovering that only query time synonym work with solr I found a good article on pros and cons of query and index time synonyms . It may help you http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ Regards Varun On Sun, May 5, 2013 at 9:20 AM, Erick Erickson erickerick...@gmail.comwrote: Sure, you can specify a separate synonyms list at query time, just define an index and query time analysis chain one without the synonym filter factory and one without. Be aware that index-time and query-time have some different characteristics, especially around multi-word synonyms see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Best Erick On Sun, May 5, 2013 at 12:23 AM, varun srivastava varunmail...@gmail.com wrote: Hi , Synonyms list is used at index time. So I dont think you can pass list at query time and make it work. On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.com wrote: Hi, I want to get specific solr synonyms terms list during query time in result set based on filter criteria. I have implemented synonyms in .txt file. Thanks - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html Sent from the Solr - User mailing list archive at Nabble.com.