RE: How to make SolrCloud more elastic
Toke, Thanks for your reply. Yes, I believe I will be working with a write once archive. However, my understanding is that all shards are defined up front, with the option to split later. Can you describe, or point me to documentation, on how to create shards one at a time? Thanks, Matt -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Wednesday, February 11, 2015 11:47 PM To: solr-user@lucene.apache.org Subject: Re: How to make SolrCloud more elastic On Wed, 2015-02-11 at 21:32 +0100, Matt Kuiper wrote: I am starting a new project and one of the requirements is that Solr must scale to handle increasing load (both search performance and index size). [...] Before I got too deep, I wondered if anyone has any tips or warnings on these approaches, or has scaled Solr in a different manner. If your corpus only contains static content (e.e. log files or a write-once archive), you can create shards one at a time and optimize them. This lowers requirements for your searchers. - Toke Eskildsen, State and University Library, Denmark
RE: How to make SolrCloud more elastic
Matt Kuiper [matt.kui...@issinc.com] wrote: Thanks for your reply. Yes, I believe I will be working with a write once archive. However, my understanding is that all shards are defined up front, with the option to split later. Our situation might be a bit special as a few minutes downtime - preferably at off-peak hours - now and then is acceptable. We basically maintain a SolrCloud with static shards and use a completely separate builder to generate new shards, one at a time. When the builder has finished a shard, we add it to the cloud the hard way (re-configuration and restarting, hence the downtime). There's a description at https://sbdevel.wordpress.com/net-archive-search/ To avoid too much ZooKeeper hassle, we have a bunch of empty shards, ready to be switched with newly build ones. We have contemplated making the shard under construction being part of the Solrcloud, but have yet to experiment with that setup. Static shards, optimized down to a single segment and using DocValues for faceting is a very potent mix: A Solr serving a non-static index needs more memory as it must be capable of handling having more than one version of the index open at a time, plus the indexing itself. Faceting on many unique values is more efficient with single-segment as there is no need for an internal structure mapping the terms between the segments. - Toke Eskildsen
RE: How to make SolrCloud more elastic
Thanks Alex. Per your recommendation I checked out the presentation and it was very informative. While my problem space will not reach the scale addressed in this talk, some of the topics may be helpful. Those being the improvements to shard splitting and the new 'migrate' API. Thanks, Matt Matt Kuiper - Software Engineer Intelligent Software Solutions p. 719.452.7721 | matt.kui...@issinc.com www.issinc.com | LinkedIn: intelligent-software-solutions -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, February 11, 2015 2:31 PM To: solr-user Subject: Re: How to make SolrCloud more elastic Did you have a look at the presentations from the recent SolrRevolution? E.g. https://www.youtube.com/watch?v=nxRROble76Alist=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 11 February 2015 at 15:32, Matt Kuiper matt.kui...@issinc.com wrote: I am starting a new project and one of the requirements is that Solr must scale to handle increasing load (both search performance and index size). My understanding is that one way to address search performance is by adding more replicas. I am more concerned about handling a growing index size. I have already been given some good input on this topic and am considering a shard splitting approach, but am more focused on a rebalancing approach that includes defining many shards up front and then moving these existing shards on to new Solr servers as needed. Plan to experiment with this approach first. Before I got too deep, I wondered if anyone has any tips or warnings on these approaches, or has scaled Solr in a different manner. Thanks, Matt
Re: Multy-tenancy and quarantee of service per application (tenant)
Not really, not 100%, if tenants share the same hardware and there is no isolation through things like containers (in which case they don't share the same SolrCloud cluster, really). Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 11:17 AM, Victor Rondel rondelvic...@gmail.com wrote: Hi everyone, I am wondering about multy-tenancy and garantee of service in SolrCloud : *Multy-tenant cluster* : Is there a way to *guarantee a level of service* / capacity planning for *each tenant* using the cluster (its *own collections*) ? Thanks,
Re: 43sec commit duration - blocked by index merge events?
If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: Index directory containing only segments.gen
Well, I don't know If I'm being helpful but here goes. My clusterstate.json actually has no leader for the shard in question. I have 2 nodes as recovery_failed and one as down. No leaders there. I've not used core admin or collections api to create anything. Everything was setup using the - now deprecated - tags cores and core inside solr.xml. Also index directories are different since I ended up copying the index from the one node that still had it to the other too and restarting again. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186107.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multy-tenancy and quarantee of service per application (tenant)
There are two main, distinct forms of multi-tenancy: 1. The service provider controls the app and the Solr server and the app is carefully coded to isolate the data and load of the various tenants, such as adding a filter query with the tenant ID and throttling requests in an app server. 2. Each tenant has their own app and the service provider controls the Solr server but has no control over the app or load. The first is supported by Solr. The second is not, other than the service provider spinning up separate instances of Solr on separate physical servers. -- Jack Krupansky On Thu, Feb 12, 2015 at 1:30 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Not really, not 100%, if tenants share the same hardware and there is no isolation through things like containers (in which case they don't share the same SolrCloud cluster, really). Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 11:17 AM, Victor Rondel rondelvic...@gmail.com wrote: Hi everyone, I am wondering about multy-tenancy and garantee of service in SolrCloud : *Multy-tenant cluster* : Is there a way to *guarantee a level of service* / capacity planning for *each tenant* using the cluster (its *own collections*) ? Thanks,
Re: Index directory containing only segments.gen
From the logs I've got one instance failing as described in my first comment and the other two failing during PeerSync recovery when trying to communicate with the server that was missing the segments_* files. The exception follows org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://server:host/solr/core at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448) ... 10 more Caused by: org.apache.http.ProtocolException: Invalid header: ,code=500} at org.apache.http.impl.io.AbstractMessageParser.parseHeaders(AbstractMessageParser.java:232) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:267) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:713) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:518) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 13 more -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186113.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud performance issues
Hi Vijay, We're working on SOLR-6816 ... would love for you to be a test site for any improvements we make ;-) Curious if you've experimented with changing the mergeFactor to a higher value, such as 25 and what happens if you set soft-auto-commits to something lower like 15 seconds? Also, make sure your indexing clients are not sending hard-commits as well, i.e. just rely on auto-commits. re: When the number of replicas increases the bulk indexing time increase almost exponentially ... ugh ... I'm wondering what your CPU utilization / thread counts are? the Leader sends updates to all replicas in parallel, so it shouldn't be a huge impact if you're doing 1 replica or 15 (probably a little more overhead with 15, but not exponential for sure) ... what are threads waiting on when this huge slow down occurs? jstack -l PID should give you some idea. Lastly, do you have GC logging enabled and have you ruled out GC pauses causing the big slow down? On Thu, Feb 12, 2015 at 4:07 PM, Vijay Sekhri sekhrivi...@gmail.com wrote: Hi Erick, We have following configuration of our solr cloud 1. 10 Shards 2. 15 replicas per shard 3. 9 GB of index size per shard 4. a total of around 90 mil documents 5. 2 collection viz search1 serving live traffic and search 2 for indexing. We swap collection when indexing finishes 6. On 150 hosts we have 2 JVMs running one for search1 collection and other for search2 collection 7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in total 8. Each host has 16 processors 9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux 10. We have two ways to index data. 1. Bulk indexing . All 90 million docs pumped in from 14 parallel process (on 14 different client hosts). This is done on collection that is not serving live traffic 2. Incremental indexing . Only delta changes (Range from 100K to 5 Mil) every two hours. This is done on collection also serving live traffic 11. The request per second count on live collection is around 300 TPS 12. Hard commit setting is every 30 second with open searcher false and soft commit setting is every 15 minutes . We have tried a lot of different setting here BTW. Now we have two issues with indexing 1) Solr just could not keep up with the bulk indexing when replicas are also active. We have concluded this by changing the number of replicas to just 2 , to 4 and then to 15. When the number of replicas increases the bulk indexing time increase almost exponentially We seem to have encountered the same issue reported here https://issues.apache.org/jira/browse/SOLR-6816 It gets to a point that even to index 100 docs the solr cluster would take 300 second. It would start of indexing 100 docs in 55 millisecond and slowly increase over time and within hour and a half just could not keep up. We have a workaround for this and i.e we stop all the replicas , do the bulk indexing and bring all the replicas up one by one . This sort of defeats the purpose of solr cloud but we can still work with this workaround. We can do this because , bulk indexing happen on the collection that is not serving live traffic. However we would love to have a solution from the solr cloud itself like ask it to stop replication and start via an API at the end of indexing. 2) This issues is related to soft commit with incremental indexing . When we do incremental indexing, it is done on the same collection serving live traffic with 300 request per second throughput. Everything is fine except whenever the soft commit happens. Each time soft commit (autosoftcommit in sorlconfig.xml) happens which BTW happens almost at the same time throughout the cluster , there is a spike in the response times and throughput decreases almost to 150 tps. The spike continues for 2 minutes and then it happens again at the exact interval when the soft commit happens. We have monitored the logs and found a direct co relation when the soft commit happens and when the response time tanks. Now the latter issue is quite disturbing , because it is serving live traffic and we cannot sustain these periodic degradation. We have played around with different soft commit setting . Interval ranging from 2 minutes to 30 minutes . Auto warming half cache , auto warming full cache, auto warming only 10 %. Doing warm up queries on every new searcher , doing NONE warm up queries on every new searching and all the different setting yields the same results . As and when soft commit happens the response time tanks and throughput deceases. The difference is almost 50 % in response times and 50 % in throughput Our workaround for this solution is to also do incremental delta indexing on the collection not serving live traffic and swap when it is done. As you can see that this also defeats the purpose of solr
RE: How to make SolrCloud more elastic
Otis, Thanks for your reply. I see your point about too many shards and search efficiency. I also agree that I need to get a better handle on customer requirements and expected loads. Initially I figured that with the shard splitting option, I would need to double my Solr nodes every time I split (as I would want to split every shard within the collection). Where actually only the number of shards would double, and then I would have the opportunity to rebalance the shards over the existing Solr nodes plus a number of new nodes that make sense at the time. This may be preferable to defining many micro shards up front. The time-base collections may be an option for this project. I am not familiar with query routing, can you point me to any documentation on how this might be implemented? Thanks, Matt -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Wednesday, February 11, 2015 9:13 PM To: solr-user@lucene.apache.org Subject: Re: How to make SolrCloud more elastic Hi Matt, You could create extra shards up front, but if your queries are fanned out to all of them, you can run into situations where there are too many concurrent queries per node causing lots of content switching and ultimately being less efficient than if you had fewer shards. So while this is an approach to take, I'd personally first try to run tests to see how much a single node can handle in terms of volume, expected query rates, and target latency, and then use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you start approaching the target limits you are ready with additional nodes and shard splitting if needed. Of course, if your data and queries are such that newer documents are queries more, you should look into time-based collections... and if your queries can only query a subset of data you should look into query routing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper matt.kui...@issinc.com wrote: I am starting a new project and one of the requirements is that Solr must scale to handle increasing load (both search performance and index size). My understanding is that one way to address search performance is by adding more replicas. I am more concerned about handling a growing index size. I have already been given some good input on this topic and am considering a shard splitting approach, but am more focused on a rebalancing approach that includes defining many shards up front and then moving these existing shards on to new Solr servers as needed. Plan to experiment with this approach first. Before I got too deep, I wondered if anyone has any tips or warnings on these approaches, or has scaled Solr in a different manner. Thanks, Matt
Re: Solrcloud performance issues
Hi, Did you say you have 150 servers in this cluster? And 10 shards for just 90M docs? If so, that 150 hosts sounds like too much for all other numbers I see here. I'd love to see some metrics here. e.g. what happens with disk IO around those commits? How about GC time/size info? Are JVM memory pools full-ish and is the CPU jumping like crazy? Can you share more info to give us a more complete picture of your system? SPM for Solr http://sematext.com/spm/ will help if you don't already capture these types of things. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 11:07 AM, Vijay Sekhri sekhrivi...@gmail.com wrote: Hi Erick, We have following configuration of our solr cloud 1. 10 Shards 2. 15 replicas per shard 3. 9 GB of index size per shard 4. a total of around 90 mil documents 5. 2 collection viz search1 serving live traffic and search 2 for indexing. We swap collection when indexing finishes 6. On 150 hosts we have 2 JVMs running one for search1 collection and other for search2 collection 7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in total 8. Each host has 16 processors 9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux 10. We have two ways to index data. 1. Bulk indexing . All 90 million docs pumped in from 14 parallel process (on 14 different client hosts). This is done on collection that is not serving live traffic 2. Incremental indexing . Only delta changes (Range from 100K to 5 Mil) every two hours. This is done on collection also serving live traffic 11. The request per second count on live collection is around 300 TPS 12. Hard commit setting is every 30 second with open searcher false and soft commit setting is every 15 minutes . We have tried a lot of different setting here BTW. Now we have two issues with indexing 1) Solr just could not keep up with the bulk indexing when replicas are also active. We have concluded this by changing the number of replicas to just 2 , to 4 and then to 15. When the number of replicas increases the bulk indexing time increase almost exponentially We seem to have encountered the same issue reported here https://issues.apache.org/jira/browse/SOLR-6816 It gets to a point that even to index 100 docs the solr cluster would take 300 second. It would start of indexing 100 docs in 55 millisecond and slowly increase over time and within hour and a half just could not keep up. We have a workaround for this and i.e we stop all the replicas , do the bulk indexing and bring all the replicas up one by one . This sort of defeats the purpose of solr cloud but we can still work with this workaround. We can do this because , bulk indexing happen on the collection that is not serving live traffic. However we would love to have a solution from the solr cloud itself like ask it to stop replication and start via an API at the end of indexing. 2) This issues is related to soft commit with incremental indexing . When we do incremental indexing, it is done on the same collection serving live traffic with 300 request per second throughput. Everything is fine except whenever the soft commit happens. Each time soft commit (autosoftcommit in sorlconfig.xml) happens which BTW happens almost at the same time throughout the cluster , there is a spike in the response times and throughput decreases almost to 150 tps. The spike continues for 2 minutes and then it happens again at the exact interval when the soft commit happens. We have monitored the logs and found a direct co relation when the soft commit happens and when the response time tanks. Now the latter issue is quite disturbing , because it is serving live traffic and we cannot sustain these periodic degradation. We have played around with different soft commit setting . Interval ranging from 2 minutes to 30 minutes . Auto warming half cache , auto warming full cache, auto warming only 10 %. Doing warm up queries on every new searcher , doing NONE warm up queries on every new searching and all the different setting yields the same results . As and when soft commit happens the response time tanks and throughput deceases. The difference is almost 50 % in response times and 50 % in throughput Our workaround for this solution is to also do incremental delta indexing on the collection not serving live traffic and swap when it is done. As you can see that this also defeats the purpose of solr cloud . We cannot do bulk indexing because replicas cannot keeps up and we cannot do incremental indexing because of soft commit performance. Is there a way to make the cluster not do soft commit all at the same time or is there a way to make soft commit not cause this degradation ? We are open
Re: Index directory containing only segments.gen
So after adding some docs to the index (and committing) with those two nodes active, do segment files magically appear? My _guess_ is that there's something radially wrong with you set up the collection. Did you by any chance use the core admin API to create the cores? That can lead to interesting results of you don't get everything just right. For instance, if you point the data dir for all three nodes at the same directory... What does your clusterstate.json file look like? Best, Erick On Thu, Feb 12, 2015 at 8:30 AM, Zisis Tachtsidis zist...@runbox.com wrote: I'm using SolrCloud 4.10.3 and the current setup is simple using 3 nodes with 1 shard. After a rolling restart of the Solr cluster I've ended up with 2 failing nodes reporting the following org.apache.solr.servlet.SolrDispatchFilter null:org.apache.solr.common.SolrException: SolrCore 'core' is not available due to init failure: Error opening new searcher Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1574) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1686) at org.apache.solr.core.SolrCore.init(SolrCore.java:853) ... 8 more Caused by: java.nio.file.NoSuchFileException: /path/to/index/segments_1 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:334) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:792) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1537) ... 10 more Checking the index directory of each node I found out that only *segments.gen* was inside. What I could not determine is how I ended up with this single file. Looking at the logs I could not find anything related. The 3rd node had its index intact. Has anyone else encountered something similar? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: American British Dictionary for Solr
Dinesh See this: http://wordlist.aspell.net/varcon/ You will need to do some work to convert to a SOLR friendly format though. Cheers François On Feb 12, 2015, at 12:22 AM, dinesh naik dineshkumarn...@gmail.com wrote: Hi , We are looking for a dictionary to support American/British English synonym. Could you please let us know what all dictionaries are available ? -- Best Regards, Dinesh Naik
Re: ApacheCon 2015 at Austin, TX
Hi, Looks like I'll be there. So if you want to discuss luke / lucene / solr, will be happy to de-virtualize. Dmitry On Mon, Jan 12, 2015 at 6:32 PM, CP Mishra mishr...@gmail.com wrote: Hi, I am planning to attend ApacheCon 2015 at Austin, TX (Apr 13-16th) and wondering if there will be lucene/solr sessions in it. Anyone else planning to attend? Thanks, CP -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: SASL with zkcli.sh
: I'm trying to start a SolrCloud cluster with a kerberized Zookeeper. I'm not : sure if it is possible, I have a Hadoop Cluster with an already running : zookeeper and I do not think running two zoo in parallel would be the wise : choice. : Is there a way to use SASL with SolrCloud ? Work has been done along these lines, but it won't be available until Solr 5.1... https://issues.apache.org/jira/browse/SOLR-6915 ...but you could certainly start experimenting with it using the 5x branch. i don't know of any docs on how to use it yet -- but check the svn commits for details on what the test cofigs look like. -Hoss http://www.lucidworks.com/
Re: creating a new collection fails as SearchHandler can't be found
Hi it was jars copied into a solr-zk-cli directory to allow easy running of solr zk cmd line client. well i think that is what fixed tomcat! I've also tried with jetty with a clean solr home and that also works and seems a much cleaner way of running multiple instances (probably more to do with rubbish tomcat sys admin skills on my part than anything else) anyway cheers for help. On 11 February 2015 at 22:38, Chris Hostetter hossman_luc...@fucit.org wrote: : The collection fails to be created (shard_replica dir and data and index : across the servers get created but collection creation fails) : : The full log is appended below. I thought it should be a straight forward : class not found problem but I just can't seem to fix this (few hours now). : I've even placed all the libs from solr.war into a directory and referenced : these from the solrconfig. You can see these being loaded in the below log. what exactly did hte logs errors look like *BEFORE* you started copying libs arround? as things stand right now, you've got at least 2 (possibly 3, i'm not certain) copies of every Solr class in your classpath -- which can cause a lot more classloader type errors (including ClassNotFound) then it will ever solve, because of how the classloader hierarchy works -- if you have to copies of ClassX loaded by two differnet classloaders, and ClassX refers to SearchHandler but SearchHandler is not available in the same ClassLoader that loaded ClassX, all sorts of not-fun ClassNotFound type exceptions can happen. Go back to a clean install, w/ a single solr.war file, and no manulaly copied jars anywhere, and see if you still get problems with SearchHandler when you create a collection. if you do, then take another step back and try a single (tomcat) solr node setup (no solrcloud) with an instanceDir already in place with a single SolrCore using hte example configs included in Solr 4.10.3 ... if that *STILL* doesn't work, and you still get ClassNotFound errors then something is jacked up with your tomcat setup/classpath (unless of coure you can still reproduce the same problem with the same example configs using hte Solr provided jetty -- in that case, the problem gets a lot more interesting and your logs from *that* would be helpful to diagnose it) : Any help would be appreciated. : : Cheers Lee C : : : INFO - 2015-02-11 19:22:42.494; : org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null : path=/admin/collections : params={numShards=4name=hotelPackagereplicationFactor=1action=CREATE} : status=0 QTime=3519 : INFO - 2015-02-11 19:22:42.594; : org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: : WatchedEvent state:SyncConnected type:NodeDataChanged : path:/clusterstate.json, has occurred - updating... (live nodes size: 4) : INFO - 2015-02-11 19:33:52.275; : org.apache.solr.handler.admin.CollectionsHandler; Creating Collection : : numShards=4name=hotelPackagereplicationFactor=1action=CREATE : INFO - 2015-02-11 19:33:52.280; : org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher : fired on path: /overseer/collection-queue-work state: SyncConnected type : NodeChildrenChanged : INFO - 2015-02-11 19:33:52.282; : org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection : Processor: Get the message id:/overseer/collection-queue-work/qn-78 : message:{ : operation:createcollection, : fromApi:true, : name:hotelPackage, : replicationFactor:1, : numShards:4} : WARN - 2015-02-11 19:33:52.283; : org.apache.solr.cloud.OverseerCollectionProcessor; : OverseerCollectionProcessor.processMessage : createcollection , { : operation:createcollection, : fromApi:true, : name:hotelPackage, : replicationFactor:1, : numShards:4} : INFO - 2015-02-11 19:33:52.284; : org.apache.solr.cloud.OverseerCollectionProcessor; Only one config set : found in zk - using it:hotelPackageConf : INFO - 2015-02-11 19:33:52.285; : org.apache.solr.cloud.OverseerCollectionProcessor; creating collections : conf node /collections/hotelPackage : INFO - 2015-02-11 19:33:52.285; org.apache.solr.common.cloud.SolrZkClient; : makePath: /collections/hotelPackage : INFO - 2015-02-11 19:33:52.297; : org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher : fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged : INFO - 2015-02-11 19:33:52.300; : org.apache.solr.cloud.Overseer$ClusterStateUpdater; building a new : collection: hotelPackage : INFO - 2015-02-11 19:33:52.300; : org.apache.solr.cloud.Overseer$ClusterStateUpdater; Create collection : hotelPackage with shards [shard1, shard2, shard3, shard4] : INFO - 2015-02-11 19:33:52.313; : org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: : WatchedEvent state:SyncConnected type:NodeDataChanged : path:/clusterstate.json, has occurred - updating... (live nodes
Re: Index directory containing only segments.gen
OK, I think this is the root of your problem: bq: Everything was setup using the - now deprecated - tags cores and core inside solr.xml. There are a bunch of ways this could go wrong. I'm pretty sure you have something that would take quite a while to untangle, so unless you have a _very_ good reason for making this work, I'd blow everything away. First stop Zookeeper and all your Solr instances. If you're using an external Zookeeper shut if off and, 'rm -rf /tmp/zookeeper'. If using embedded, you can remove zoo_data under your SOLR_HOME. Completely remove all of your cores as in 'rm -rf corename' on all the nodes. Nuke the entries in solr.xml and use the cloud friendly solr.xml, I'd just copy the one in '...4.10/solr/example/solr'''. You get the idea. Or you can brute-force this and just remove all of Solr and re-install 4.10.3. OK, now use the Collections API to create your collection, see: https://cwiki.apache.org/confluence/display/solr/Collections+API and go from there (don't forget to push your configs to Zookeeper first) and go from there. Note that when I suggest you uninstall/reinstall Solr it's simply because that's conceptually easier, of course you can spend some time untangling things with your existing setup, but I really question whether it's worth the effort. Best, Erick On Thu, Feb 12, 2015 at 11:53 AM, Zisis Tachtsidis zist...@runbox.com wrote: From the logs I've got one instance failing as described in my first comment and the other two failing during PeerSync recovery when trying to communicate with the server that was missing the segments_* files. The exception follows org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://server:host/solr/core at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448) ... 10 more Caused by: org.apache.http.ProtocolException: Invalid header: ,code=500} at org.apache.http.impl.io.AbstractMessageParser.parseHeaders(AbstractMessageParser.java:232) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:267) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:713) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:518) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 13 more -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186113.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi words query
I am using rub gem rsolr and querying simply the collection by this query: response = solr.get 'select', :params = { :q=query, :fl= 'id,title,description,body' :rows=10 } response[response][docs].each{|doc| puts doc[id] } I created a text field to copy all the fields to and the query handler request this field rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-query-tp4185625p4185922.html Sent from the Solr - User mailing list archive at Nabble.com.
Batch updates and separate update request processor chain for atomic document updates
Hi, we're using a SolrJ client which either adds (or overwrites) existing documents or updates some meta-data fields of existing documents. Our default update request processor chain is configured with a processor for language detection. To avoid setting a wrong language, we're using a different chain without that processor for partial updates by setting the update.chain request parameter. Our SolrJ client uses batch indexing and I'm wondering if I can use a single update request with both document additions and partial updates. Is it possible to specify the update.chain parameter per document? Currently our client sends two requests, one for partial updates and one for document additions, removals and the commit. This works fine, I'm just wondering if there's a better way. (We could of course move language detection from the update request processor chain to the SolrJ client.) Regards, Andreas
RE: American /British Dictionary for solr-4.10.2
There are no dictionaries that sum up all possible conjugations, using a heuristics based normalizer would be more appropriate. There are nevertheless some good sources to start: Contains lots of useful spelling issues, incl. british/american/canadian/australian http://grammarist.com/spelling Very useful http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#Acronyms_and_abbreviations A handy list http://www.avko.org/free/reference/british-vs-american-spelling.html There are some more lists but it seems the other one's tab is no longer open! Good luck -Original message- From:dinesh naik dineshkumarn...@gmail.com Sent: Thursday 12th February 2015 7:17 To: solr-user@lucene.apache.org Subject: American /British Dictionary for solr-4.10.2 Hi, What are the dictionaries available for Solr 4.10.2? We are looking for a dictionary to support American/British English synonym. -- Best Regards, Dinesh Naik
variaton on boosting recent documents gives exception
Since my field to measure recency is not a date field but a string field (with only year-numbers in it), I tried a variation on the suggested boost function for recent documents: recip(sub(2015,min(sortyear,2015)),1,10,10) But this gives an exception when used in a boost or bf parameter. I guess the reason is that all the mathematics doesn't work with a string field even if it only contains numbers. Am I right with this guess? And if so, is there a function I can use to change the type to something numeric? Or are there other problems with my function? Another related question: as you can see the current year (2015) is hard coded. Is there an easy way to get the current year within the function? Messing around with NOW looks very complicated. -Michael
Solr suggest is related to second letter, not to initial letter
Hello Everyone, All I want to do with Solr suggester is obtaining the fact that the asserted suggestions for the second letter whose entry actualizes after the initial letter is actually related to initial letter, itself. But; just like the initial letters, the second letters rotate independently, as well. Example; http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s; http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22 adidas s response lst name=responseHeader int name=status0/int int name=QTime4/int /lst lst name=spellcheck lst name=suggestions lst name=s int name=numFound1/int int name=startOffset27/int int name=endOffset28/int arr name=suggestion strsamsung/str /arr /lst lst name=collation str name=collationQueryfacet_suggest_data:adidas samsung/str int name=hits0/int lst name=misspellingsAndCorrections str name=adidasadidas/str str name=ssamsung/str /lst /lst /lst /lst /response The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate documents. A common place in which both of them are available cannot be found. How can I solve that problem? schema.xml fieldType name=suggestions_type class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ApostropheFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ApostropheFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=“facet_suggest_data type=suggestions_type indexed=true multiValued=true stored=false omitNorms=true/ Best
Re: Collations are not working fine.
Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler *Schema.xml: * field name=gram_ci type=textSpellCi indexed=true stored=true multiValued=false/ /fieldTypefieldType name=textSpellCi class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Need autocomplete on whole phrase for multiple words .
Hi solrExperts, Need autocomplete on whole phrase for multiple words . When I typed *br, *the results are brad , brad pitt but I need only brad pitt to come. I’m using shinglefilterfactory + terms component for autocomplete feature , the query is something like http://localhost:8080/solr/actors/terms?terms.fl=content_autosuggestomitHeader=trueterms.sort=indexindent=truewt=jsonjson.nl=mapterms.prefix=bra below is my shema configuration : field name=actors type=text_auto indexed=true stored=true multiValued=true/ fieldType class=solr.TextField name=text_auto analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=2 outputUnigrams=true outputUnigramsIfNoShingles=false tokenSeparator= fillerToken=_/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= replace=all/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=2 outputUnigrams=false outputUnigramsIfNoShingles=false tokenSeparator= fillerToken=_/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= replace=all/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Thanks Regards, Vamshi
Use cases - Production examples: datas, queries, cluster hardware and conf, and statistics
Hi everyone, I am considering moving one or several Solr clusters to production. Although Solr's documentation and community is *great*, I am strongly startled not to find any *complete use-case story* stretching from application(s) needs and data considerations to hardware ones. Indeed, I understand why what/how much hardware / configuration / sharding questions are systematically replied with both it depends followed by test. But then, what about a few complete descriptions, out of so many elasticsearch users, from data use case to cluster's internals, along with a few performance and nodes stats? So here are questions, before moving to production : Are there any *complete* use cases around? Could you share some? By complete I mean including *at least some* of the following : 1. *Application needs and scope* 2. *Indexing Data indications* : data volume, documents mapping, documents / indexes volume 3. *Searching Data indications* : different applications, queries, use of facets - filters - pivot facets, concurrent indexing 4. *Cluster Hardware* : machines' hardware (RAM, Disks/SSD - DAS-JBOD/SAN/NAS), JVM heap / OS Cache, nb of machines, back office network 5. *Cluster Configuration* : one or several indexes, sharding, replication, master nodes, data nodes, use of over-sharding at start-up, use of re-indexing 6. *Benchmaks *: queries response times, QPS, with or without concurrent indexing, memory heap sweet spot, nodes stats For those interested, here is the (not *complete*) best-among-very-few exemples I've stumbled upon so far : - Perfs with hardware and query description : http://fr.slideshare.net/charliejuggler/lucene-solrlondonug-meetup28nov2014-solr-es-performance
SASL with zkcli.sh
Hello, I'm trying to start a SolrCloud cluster with a kerberized Zookeeper. I'm not sure if it is possible, I have a Hadoop Cluster with an already running zookeeper and I do not think running two zoo in parallel would be the wise choice. Is there a way to use SASL with SolrCloud ? Thank you, Simon M.
Re: Analytics Component not working Solr-5.0
Can somebody help, has anyone used analytics component here? -- View this message in context: http://lucene.472066.n3.nabble.com/Analytics-Component-not-working-Solr-5-0-tp4185666p4185977.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stopwords in shingles suggester
With more and more people starting to use the Suggester it seems that enablePositionIncrements for StopFilterFactory is still needed. Not sure why it is being removed from Solr5, but is there a way to keep the functionality beyond lucene 4.3 ? Or can this feature be reinstated? -- View this message in context: http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057p4185994.html Sent from the Solr - User mailing list archive at Nabble.com.
Multy-tenancy and quarantee of service per application (tenant)
Hi everyone, I am wondering about multy-tenancy and garantee of service in SolrCloud : *Multy-tenant cluster* : Is there a way to *guarantee a level of service* / capacity planning for *each tenant* using the cluster (its *own collections*) ? Thanks,
Solrcloud performance issues
Hi Erick, We have following configuration of our solr cloud 1. 10 Shards 2. 15 replicas per shard 3. 9 GB of index size per shard 4. a total of around 90 mil documents 5. 2 collection viz search1 serving live traffic and search 2 for indexing. We swap collection when indexing finishes 6. On 150 hosts we have 2 JVMs running one for search1 collection and other for search2 collection 7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in total 8. Each host has 16 processors 9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux 10. We have two ways to index data. 1. Bulk indexing . All 90 million docs pumped in from 14 parallel process (on 14 different client hosts). This is done on collection that is not serving live traffic 2. Incremental indexing . Only delta changes (Range from 100K to 5 Mil) every two hours. This is done on collection also serving live traffic 11. The request per second count on live collection is around 300 TPS 12. Hard commit setting is every 30 second with open searcher false and soft commit setting is every 15 minutes . We have tried a lot of different setting here BTW. Now we have two issues with indexing 1) Solr just could not keep up with the bulk indexing when replicas are also active. We have concluded this by changing the number of replicas to just 2 , to 4 and then to 15. When the number of replicas increases the bulk indexing time increase almost exponentially We seem to have encountered the same issue reported here https://issues.apache.org/jira/browse/SOLR-6816 It gets to a point that even to index 100 docs the solr cluster would take 300 second. It would start of indexing 100 docs in 55 millisecond and slowly increase over time and within hour and a half just could not keep up. We have a workaround for this and i.e we stop all the replicas , do the bulk indexing and bring all the replicas up one by one . This sort of defeats the purpose of solr cloud but we can still work with this workaround. We can do this because , bulk indexing happen on the collection that is not serving live traffic. However we would love to have a solution from the solr cloud itself like ask it to stop replication and start via an API at the end of indexing. 2) This issues is related to soft commit with incremental indexing . When we do incremental indexing, it is done on the same collection serving live traffic with 300 request per second throughput. Everything is fine except whenever the soft commit happens. Each time soft commit (autosoftcommit in sorlconfig.xml) happens which BTW happens almost at the same time throughout the cluster , there is a spike in the response times and throughput decreases almost to 150 tps. The spike continues for 2 minutes and then it happens again at the exact interval when the soft commit happens. We have monitored the logs and found a direct co relation when the soft commit happens and when the response time tanks. Now the latter issue is quite disturbing , because it is serving live traffic and we cannot sustain these periodic degradation. We have played around with different soft commit setting . Interval ranging from 2 minutes to 30 minutes . Auto warming half cache , auto warming full cache, auto warming only 10 %. Doing warm up queries on every new searcher , doing NONE warm up queries on every new searching and all the different setting yields the same results . As and when soft commit happens the response time tanks and throughput deceases. The difference is almost 50 % in response times and 50 % in throughput Our workaround for this solution is to also do incremental delta indexing on the collection not serving live traffic and swap when it is done. As you can see that this also defeats the purpose of solr cloud . We cannot do bulk indexing because replicas cannot keeps up and we cannot do incremental indexing because of soft commit performance. Is there a way to make the cluster not do soft commit all at the same time or is there a way to make soft commit not cause this degradation ? We are open to any ideas at this time now. -- * Vijay Sekhri *
Re: Possible to dump clusterstate, system stats into solr log?
Jim: Not that I know of. I'm guessing that accessing ZK directly and dumping from there is also not possible? Best, Erick On Wed, Feb 11, 2015 at 10:47 AM, Jim.Musil jim.mu...@target.com wrote: Hi, Is it possible to periodically dump the cluster state contents (or system diagnostics) into the main solr log file? We have many security protocols in place that prevents us from running diagnostic requests directly to the solr boxes, but we do have access to the shipped logs. Thanks! Jim
Index directory containing only segments.gen
I'm using SolrCloud 4.10.3 and the current setup is simple using 3 nodes with 1 shard. After a rolling restart of the Solr cluster I've ended up with 2 failing nodes reporting the following org.apache.solr.servlet.SolrDispatchFilter null:org.apache.solr.common.SolrException: SolrCore 'core' is not available due to init failure: Error opening new searcher Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1574) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1686) at org.apache.solr.core.SolrCore.init(SolrCore.java:853) ... 8 more Caused by: java.nio.file.NoSuchFileException: /path/to/index/segments_1 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:334) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:792) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1537) ... 10 more Checking the index directory of each node I found out that only *segments.gen* was inside. What I could not determine is how I ended up with this single file. Looking at the logs I could not find anything related. The 3rd node had its index intact. Has anyone else encountered something similar? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: American /British Dictionary for solr-4.10.2
You are looking for this sort of thing? elevator, lift blueberry, whortleberry, bilberry rutabega, swede hood, bonnet convertible top, hood trunk, boot daycare, preschool, nursery, playgroup arugula, rocket sidewalk, pavement sweater, jumper kerosene, paraffin paraffin, wax pants, trousers underwear, pants I did this once, as a demonstration to ship with Ultraseek. I dug through web resources manually and typed up a list. Some of the terms are domain-specific. There are big sets of automobile and railroad terms which are different. http://en.wikipedia.org/wiki/Comparison_of_American_and_British_English https://www.englishclub.com/vocabulary/british-american.htm http://resources.woodlands-junior.kent.sch.uk/customs/questions/americanbritish.html http://www.oxforddictionaries.com/us/words/british-and-american-terms http://www.englisch-hilfen.de/en/words/be-ae.htm http://www.englisch-hilfen.de/en/words/be-ae2.htm This can cause confusing search results, because there are domain-specific terms that mean different things in the two dialects: hood, pants, paraffin, swede, rocket, etc. One of our customers made rocket propulsion systems. They were confused that “rocket fuel” suggested “arugula”. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 12, 2015, at 12:19 AM, Markus Jelsma markus.jel...@openindex.io wrote: There are no dictionaries that sum up all possible conjugations, using a heuristics based normalizer would be more appropriate. There are nevertheless some good sources to start: Contains lots of useful spelling issues, incl. british/american/canadian/australian http://grammarist.com/spelling Very useful http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#Acronyms_and_abbreviations A handy list http://www.avko.org/free/reference/british-vs-american-spelling.html There are some more lists but it seems the other one's tab is no longer open! Good luck -Original message- From:dinesh naik dineshkumarn...@gmail.com Sent: Thursday 12th February 2015 7:17 To: solr-user@lucene.apache.org Subject: American /British Dictionary for solr-4.10.2 Hi, What are the dictionaries available for Solr 4.10.2? We are looking for a dictionary to support American/British English synonym. -- Best Regards, Dinesh Naik