Re: High cpu and gc time when performing optimization.
Heap: start small and increase as necessary. Leave as much RAM for FS cache, don't give it to the JVM until it starts crying. SPM for Solr will help you see when Solr and JVM are starting to hurt. Otis > On Jul 12, 2016, at 11:45, Jasonwrote: > > I'm using optimize because it's a option for fast search. > Our index updates one or more weekly. > If I don't use optimize, many index files should be kept. > Any performance issues in that case? > > And I'm wondering relation between index file size and heap size. > In case of running as master server that only update index, > is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.? > > > > Yonik Seeley wrote >> Optimize is a very expensive operation. It involves reading the >> entire index and merging and rewriting at a single segment. >> If you find it too expensive, do it less often, or don't do it at all. >> It's an optional operation. >> >> -Yonik >> >> >> On Mon, Jul 11, 2016 at 10:19 PM, Jason > >> hialooha@ > >> wrote: >>> hi, all. >>> >>> I'm running solr instance with two cores and JVM max heap is 32G. >>> Each core index size is 68G, 61G repectively. >>> I'm always keeping on optimization after update index. >>> BTW, on last week, document update is completed but optimize phase cpu is >>> very high. >>> I think that is because long gc time. >>> How should I solve this problem? >>> welcome any idea. >>> thanks, >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html >>> Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing logs in Solr
You can ship SOLR logs to Logsene or any other log management service and not worry too much about their storage/size. Otis > On Jun 5, 2016, at 02:08, Anilwrote: > > Hi , > > i would like to index logs using to enable search on it in our application. > > The problem would be index and stored size as log files size would go upto > terabytes. > > is there any way to use highlight feature without storing ? > > i found following link where Benedetti Alessandro mentioned about custom > highlighter on url field. > > http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html > > Any ideas would be helpful. Thanks. > > Cheers, > Anil
Re: Best way to track cumulative GC pauses in Solr
Hi Tom, SPM for SOLR should be helpful here. See http://sematext.com/spm Otis > On Nov 13, 2015, at 10:00, Tom Evanswrote: > > Hi all > > We have some issues with our Solr servers spending too much time > paused doing GC. From turning on gc debug, and extracting numbers from > the GC log, we're getting an idea of just how much of a problem. > > I'm currently doing this in a hacky, inefficient way: > > grep -h 'Total time for which application threads were stopped:' solr_gc* \ >| awk '($11 > 0.3) { print $1, $11 }' \ >| sed 's#:.*:##' \ >| sort -n \ >| sum_by_date.py > > (Yes, I really am using sed, grep and awk all in one line. Just wrong :) > > The "sum_by_date.py" program simply adds up all the values with the > same first column, and remembers the largest value seen. This is > giving me the cumulative GC time for extended pauses (over 0.5s), and > the maximum pause seen in a given time period (hourly), eg: > > 2015-11-13T11 119.124037 2.203569 > 2015-11-13T12 184.683309 3.156565 > 2015-11-13T13 65.934526 1.978202 > 2015-11-13T14 63.970378 1.411700 > > > This is fine for seeing that we have a problem. However, really I need > to get this in to our monitoring systems - we use munin. I'm > struggling to work out the best way to extract this information for > our monitoring systems, and I think this might be my naivety about > Java, and working out what should be logged. > > I've turned on JMX debugging, and looking at the different beans > available using jconsole, but I'm drowning in information. What would > be the best thing to monitor? > > Ideally, like the stats above, I'd like to know the cumulative time > spent paused in GC since the last poll, and the longest GC pause that > we see. munin polls every 5 minutes, are there suitable counters > exposed by JMX that it could extract? > > Thanks in advance > > Tom
Re: Best strategy for logging security
Logstash is open-source and free. At some point Sematext contributed Solr connector/output to Logstash. Here are some numbers about Logstash (and rsyslog, which is also an option, though it doesn't have Solr output): http://blog.sematext.com/2015/05/18/tuning-elasticsearch-indexing-pipeline-for-logs/ If you are new to Logstash, this is a good one: http://blog.sematext.com/2013/12/19/getting-started-with-logstash/ Note: Solr was mentioned as the destination for logs here, but it's not the only option. You can send your logs to other systems and services, including off-site ones, those that also archive your old logs for audit or other purposes, have more than just basic log search functionality, etc. HTH Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 1, 2015 at 4:47 PM, Vishal Swaroop vishal@gmail.com wrote: Thanks Rajesh... just trying to figure out if *logstash *is opensource and free ? On Mon, Jun 1, 2015 at 2:13 PM, Rajesh Hazari rajeshhaz...@gmail.com wrote: Logging : Just use logstash to a parse your logs for all collection and logstash forwarder and lumberjack at your solr replicas in your solr cloud to send the log events to you central logstash server and send it to back to solr (either the same or different instance) to a different collection. The default log4j.properties that comes with solr dist can log core name with each query log. Security: suggest you to go through this wiki https://wiki.apache.org/solr/SolrSecurity *Thanks,* *Rajesh,* *(mobile) : 8328789519.* On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com wrote: It will be great if you can provide your valuable inputs on strategy for logging security... Thanks a lot in advance... Logging : - Is there a way to implement logging for each cores separately. - What will be the best strategy to log every query details (like source IP, search query, etc.) at some point we will need monthly reports for analysis. Securing SOLR : - We need to implement SOLR security from client as well as server side... requests will be performed via web app as well as other server side apps e.g. curl... Please suggest about the best approach we can follow... link to any documentation will also help. Environment : SOLR 4.7 configured on Tomcat 7 (Linux)
Re: Solr Performance with Ram size variation
Hi, Because you went over 31-32 GB heap you lost the benefit of compressed pointers and even though you gave the JVM more memory the GC may have had to work harder. This is a relatively well educated guess, which you can confirm if you run tests and look at GC counts, times, JVM heap memory pool utilization, etc. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 17, 2015 at 10:14 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi, As per this article, the linux machine is preferred to have 1.5 times RAM with respect to index size. So, to verify this, I tried testing the solr performance in different volumes of RAM allocation keeping other configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same in both the cases. I am using solr 4.8.1 with tomcat server. https://wiki.apache.org/solr/SolrPerformanceProblems 1) Initially, the linux machine had 32 GB RAM, out of which I allocated 14GB to solr. export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log The average search time for 1000 queries 300ms. 2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to Solr. Now, on a strange note, the average search time for the same set of queries was 3000ms. Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But, still the search time was higher as compared to first case. What am I missing. Please suggest.
Re: Measuring QPS
Hi Daniel, See SPM http://sematext.com/spm/, which will give you QPS and a bunch of other Solr, JVM, and OS metrics, along with alerting, anomaly detection, and not-yet-announced transaction tracing https://sematext.atlassian.net/wiki/display/PUBSPM/Transactions+Tracing. It has percentiles Wunder mentions. I see others mentioned JMeter. We use SPM with JMeter pretty regularly when helping clients with Solr performance issues. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 3, 2015 at 11:37 AM, Davis, Daniel (NIH/NLM) [C] daniel.da...@nih.gov wrote: I wanted to gather QPS for our production Solr instances, but I was surprised that the Admin UI did not contain this information. We are running a mix of versions, but mostly 4.10 at this point. We are not using SolrCloud at present; that's part of why I'm checking - I want to validate the size of our existing setup and what sort of SolrCloud setup would be needed to centralize several of them. What is the best way to gather QPS information? What is the best way to add information like this to the Admin UI, if I decide to take that step? Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH
Re: Best way to monitor Solr regarding crashes
Hi Michael , SPM - http://sematext.com/spm will help. It can monitor all SOLR and JVM metrics and alert you when their values cross thresholds or become abnormal. In your case I'd first look at the JVM metrics - memory pools and their utilization. Heartbeat alert will notify you when your server(s) become unresponsive without you having to ping them. Solr logs will also likely have clues. Otis On Mar 28, 2015, at 09:45, Michael Bakonyi m.bako...@civit.de wrote: Hi, we were using Solr for about 3 months without problems until a few days ago it crashed one time and we don't know why. After a restart everything was fine again but we want to be better prepared the next time this could happen. So I'd like to know what's the best way to monitor a single Solr-instance and what logging-configuration you think is useful for this kind of monitoring. Maybe there's a possibility to automatically restart Solr after it crashed + to see in detail in the logs what happend right before the crash ..? Can you give me any hints? We're using Tomcat 6.X with Solr 4.8.X Cheers, Michael
Re: Solr Monitoring - Stored Stats?
Matt, SPM will give you all that out of the box with alerts, anomaly detection etc. See http://sematext.com/spm Otis On Mar 25, 2015, at 11:26, Matt Kuiper matt.kui...@issinc.com wrote: Hello, I am familiar with the JMX points that Solr exposes to allow for monitoring of statistics like QPS, numdocs, Average Query Time... I am wondering if there is a way to configure Solr to automatically store the value of these stats over time (for a given time interval), and then allow a user to query a stat over a time range. So for the QPS stat, the query might return a set that includes the QPS value for each hour in the time range specified. Thanks, Matt
Re: How To Remove an Alert
Hi, I think this may have been for Sematext SPM http://sematext.com/spm/ for Solr monitoring and Jack got our help a few hours ago. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Mar 23, 2015 at 7:46 PM, Erick Erickson erickerick...@gmail.com wrote: What product? What alert? This doesn't sound like straight Solr. There is zero context here to help us help you... Please review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Mar 23, 2015 at 1:37 PM, jack.met...@hp.com st.comm.c...@gmail.com wrote: Hello, I have a problem I just created an alert but I set the threshold too low. Is there a way to edit or remove the alert.
Re: backport Heliosearch features to Solr
Hi Yonik, Now that you joined Cloudera, why not everything? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Mar 1, 2015 at 4:50 PM, Yonik Seeley ysee...@gmail.com wrote: As many of you know, I've been doing some work in the experimental heliosearch fork of Solr over the past year. I think it's time to bring some more of those changes back. So here's a poll: Which Heliosearch features do you think should be brought back to Apache Solr? http://bit.ly/1E7wi1Q (link to google form) -Yonik
Re: how to debug solr performance degradation
Lots of suggestions here already. +1 for those JVM params from Boogie and for looking at JMX. Rebecca, try SPM http://sematext.com/spm (will look at JMX for you, among other things), it may save you time figuring out JVM/heap/memory/performance issues. If you can't tell what's slow via SPM, we can have a look at your metrics (charts are sharable) and may be able to help you faster than guessing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson erickerick...@gmail.com wrote: Before diving in too deeply, try attaching debug=timing to the query. Near the bottom of the response there'll be a list of the time taken by each _component_. So there'll be separate entries for query, highlighting, etc. This may not show any surprises, you might be spending all your time scoring. But it's worth doing as a check and might save you from going down some dead-ends. I mean if your query winds up spending 80% of its time in the highlighter you know where to start looking.. Best, Erick On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer boogie.sha...@proquest.com wrote: rebecca, you probably need to dig into your queries, but if you want to force/preload the index into memory you could try doing something like cat `find /path/to/solr/index` /dev/null if you haven't already reviewed the following, you might take a look here https://wiki.apache.org/solr/SolrPerformanceProblems perhaps going back to a very vanilla/default solr configuration and building back up from that baseline to better isolate what might specific setting be impacting your environment From: Tang, Rebecca rebecca.t...@ucsf.edu Sent: Wednesday, February 25, 2015 11:44 To: solr-user@lucene.apache.org Subject: RE: how to debug solr performance degradation Sorry, I should have been more specific. I was referring to the solr admin UI page. Today we started up an AWS instance with 240 G of memory to see if we fit all of our index (183G) in the memory and have enough for the JMV, could it improve the performance. I attached the admin UI screen shot with the email. The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 GB is used. The next bar is Swap Space and it¹s at 0.00 MB. The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. My understanding is that when Solr starts up, it reserves some memory for the JVM, and then it tries to use up as much of the remaining physical memory as possible. And I used to see the physical memory at anywhere between 70% to 90+%. Is this understanding correct? And now, even with 240G of memory, our index is performing at 10 - 20 seconds for a query. Granted that our queries have fq¹s and highlighting and faceting, I think with a machine this powerful I should be able to get the queries executed under 5 seconds. This is what we send to Solr: q=(phillip%20morris) wt=json start=0 rows=50 facet=true facet.mincount=0 facet.pivot=industry,collection_facet facet.pivot=availability_facet,availabilitystatus_facet facet.field=dddate fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank% 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page% 22%20OR%20dt%3A%22tab%20sheet%22)) facet.field=dt_facet facet.field=brd_facet facet.field=dg_facet hl=true hl.simple.pre=%3Ch1%3E hl.simple.post=%3C%2Fh1%3E hl.requireFieldMatch=false hl.preserveMulti=true hl.fl=ot,ti f.ot.hl.fragsize=300 f.ot.hl.alternateField=ot f.ot.hl.maxAlternateFieldLength=300 f.ti.hl.fragsize=300 f.ti.hl.alternateField=ti f.ti.hl.maxAlternateFieldLength=300 fq={!collapse%20field=signature} expand=true sort=score+desc,availability_facet+asc My guess is that it¹s performing so badly because it¹s only using 4% of the memory? And searches require disk access. Rebecca From: Shawn Heisey [apa...@elyograg.org] Sent: Tuesday, February 24, 2015 5:23 PM To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation On 2/24/2015 5:45 PM, Tang, Rebecca wrote: We gave the machine 180G mem to see if it improves performance. However, after we increased the memory, Solr started using only 5% of the physical memory. It has always used 90-something%. What could be causing solr to not grab all the physical memory (grabbing so little of the physical memory)? I would like to know what memory numbers in which program you are looking at, and why you
Re: Confirm Solr index corruption
Hi, It sounds like Solr simply could not index some docs. The index is not corrupt, it's just that indexing was failing while disk was full. You'll need to re-send/re-add/re-index the missing docs (or simply all of them if you don't know which ones are missing). Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 18, 2015 at 1:52 AM, Thomas Mathew mothas.tho...@gmail.com wrote: Hi All, I use Solr 4.4.0 in a master-slave configuration. Last week, the master server ran out of disk (logs got too big too quick due to a bug in our system). Because of this, we weren't able to add new docs to an index. The first thing I did was to delete a few old log files to free up disk space (later I moved the other logs to free up disk). The index is working fine even after this fiasco. The next day, a colleague of mine pointed out that we may be missing a few documents in the index. I suspect the above scenario may have broken the index. I ran the checkIndex against this index. It didn't mention of any corruption though. Right now, the index has about 25k docs. I haven't optimized this index in a while, and there are about 4000 deleted-docs. How can I confirm if we lost anything? If we've lost docs, is there a way to recover it? Thanks in advance!! Regards Thomas
Re: 43sec commit duration - blocked by index merge events?
Check http://search-lucene.com/?q=commit+wait+blockfc_type=mail+_hash_+user e.g. http://search-lucene.com/m/QTPa7Sqx81 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Feb 13, 2015 at 8:50 AM, Gili Nachum gilinac...@gmail.com wrote: Thanks Otis, can you confirm that a commit call will wait for merges to complete before returning? On Thu, Feb 12, 2015 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: How to make SolrCloud more elastic
Hi Matt, See: http://search-lucene.com/?q=query+routingfc_project=Solr https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 2:09 PM, Matt Kuiper matt.kui...@issinc.com wrote: Otis, Thanks for your reply. I see your point about too many shards and search efficiency. I also agree that I need to get a better handle on customer requirements and expected loads. Initially I figured that with the shard splitting option, I would need to double my Solr nodes every time I split (as I would want to split every shard within the collection). Where actually only the number of shards would double, and then I would have the opportunity to rebalance the shards over the existing Solr nodes plus a number of new nodes that make sense at the time. This may be preferable to defining many micro shards up front. The time-base collections may be an option for this project. I am not familiar with query routing, can you point me to any documentation on how this might be implemented? Thanks, Matt -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Wednesday, February 11, 2015 9:13 PM To: solr-user@lucene.apache.org Subject: Re: How to make SolrCloud more elastic Hi Matt, You could create extra shards up front, but if your queries are fanned out to all of them, you can run into situations where there are too many concurrent queries per node causing lots of content switching and ultimately being less efficient than if you had fewer shards. So while this is an approach to take, I'd personally first try to run tests to see how much a single node can handle in terms of volume, expected query rates, and target latency, and then use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you start approaching the target limits you are ready with additional nodes and shard splitting if needed. Of course, if your data and queries are such that newer documents are queries more, you should look into time-based collections... and if your queries can only query a subset of data you should look into query routing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper matt.kui...@issinc.com wrote: I am starting a new project and one of the requirements is that Solr must scale to handle increasing load (both search performance and index size). My understanding is that one way to address search performance is by adding more replicas. I am more concerned about handling a growing index size. I have already been given some good input on this topic and am considering a shard splitting approach, but am more focused on a rebalancing approach that includes defining many shards up front and then moving these existing shards on to new Solr servers as needed. Plan to experiment with this approach first. Before I got too deep, I wondered if anyone has any tips or warnings on these approaches, or has scaled Solr in a different manner. Thanks, Matt
Re: Solr scoring confusion
Hi Scott, Try optimizing after reindexing and this should go away. Had to do with updated/deleted docs participating in score computation. Otis On Feb 13, 2015, at 18:29, Scott Johnson sjohn...@dag.com wrote: We are getting inconsistent scoring results in Solr. It works about 95% of the time, where a search on one term returns the results which equal exactly that one term at the top, and results with multiple terms that also contain that one term are returned lower. Occasionally, however, if a subset of the data has been re-indexed (the same data just added to the index again) then the results will be slightly off, for example the data from the earlier index will get a higher score than it should, until we re-index all the data. Our assumption here is that setting omitNorms to false, then indexing the data, then searching, should result in scores where the data with an exact match has a higher score. We usually see this but not always. Is something added to the score besides the value that is being searched that we are not understaning? Thanks. .. Scott Johnson Data Advantage Group, Inc. 604 Mission Street San Francisco, CA 94105 Office: +1.415.947.0400 x204 Fax: +1.415.947.0401 Take the first step towards a successful meta data initiative with MetaCenter - the only plug and play, real-time meta data solution.http://www.dag.com/ www.dag.com ..
Re: Multy-tenancy and quarantee of service per application (tenant)
Not really, not 100%, if tenants share the same hardware and there is no isolation through things like containers (in which case they don't share the same SolrCloud cluster, really). Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 11:17 AM, Victor Rondel rondelvic...@gmail.com wrote: Hi everyone, I am wondering about multy-tenancy and garantee of service in SolrCloud : *Multy-tenant cluster* : Is there a way to *guarantee a level of service* / capacity planning for *each tenant* using the cluster (its *own collections*) ? Thanks,
Re: 43sec commit duration - blocked by index merge events?
If you are using Solr and SPM for Solr, you can check a report that shows the # of files in an index and the report that shows you the max docs-num docs delta. If you see the # of files drop during a commit, that's a merge. If you see a big delta change, that's probably a merge, too. You could also jstack or kill -3 the JVM and see where it's spending its time to give you some ideas what's going on inside. HTH. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote: Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit statements, but reading Solr's doc for Update Hanlder https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig it sounds like hard commits do wait for merges to occur: * The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.* Thanks.
Re: Solrcloud performance issues
Hi, Did you say you have 150 servers in this cluster? And 10 shards for just 90M docs? If so, that 150 hosts sounds like too much for all other numbers I see here. I'd love to see some metrics here. e.g. what happens with disk IO around those commits? How about GC time/size info? Are JVM memory pools full-ish and is the CPU jumping like crazy? Can you share more info to give us a more complete picture of your system? SPM for Solr http://sematext.com/spm/ will help if you don't already capture these types of things. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 11:07 AM, Vijay Sekhri sekhrivi...@gmail.com wrote: Hi Erick, We have following configuration of our solr cloud 1. 10 Shards 2. 15 replicas per shard 3. 9 GB of index size per shard 4. a total of around 90 mil documents 5. 2 collection viz search1 serving live traffic and search 2 for indexing. We swap collection when indexing finishes 6. On 150 hosts we have 2 JVMs running one for search1 collection and other for search2 collection 7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in total 8. Each host has 16 processors 9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux 10. We have two ways to index data. 1. Bulk indexing . All 90 million docs pumped in from 14 parallel process (on 14 different client hosts). This is done on collection that is not serving live traffic 2. Incremental indexing . Only delta changes (Range from 100K to 5 Mil) every two hours. This is done on collection also serving live traffic 11. The request per second count on live collection is around 300 TPS 12. Hard commit setting is every 30 second with open searcher false and soft commit setting is every 15 minutes . We have tried a lot of different setting here BTW. Now we have two issues with indexing 1) Solr just could not keep up with the bulk indexing when replicas are also active. We have concluded this by changing the number of replicas to just 2 , to 4 and then to 15. When the number of replicas increases the bulk indexing time increase almost exponentially We seem to have encountered the same issue reported here https://issues.apache.org/jira/browse/SOLR-6816 It gets to a point that even to index 100 docs the solr cluster would take 300 second. It would start of indexing 100 docs in 55 millisecond and slowly increase over time and within hour and a half just could not keep up. We have a workaround for this and i.e we stop all the replicas , do the bulk indexing and bring all the replicas up one by one . This sort of defeats the purpose of solr cloud but we can still work with this workaround. We can do this because , bulk indexing happen on the collection that is not serving live traffic. However we would love to have a solution from the solr cloud itself like ask it to stop replication and start via an API at the end of indexing. 2) This issues is related to soft commit with incremental indexing . When we do incremental indexing, it is done on the same collection serving live traffic with 300 request per second throughput. Everything is fine except whenever the soft commit happens. Each time soft commit (autosoftcommit in sorlconfig.xml) happens which BTW happens almost at the same time throughout the cluster , there is a spike in the response times and throughput decreases almost to 150 tps. The spike continues for 2 minutes and then it happens again at the exact interval when the soft commit happens. We have monitored the logs and found a direct co relation when the soft commit happens and when the response time tanks. Now the latter issue is quite disturbing , because it is serving live traffic and we cannot sustain these periodic degradation. We have played around with different soft commit setting . Interval ranging from 2 minutes to 30 minutes . Auto warming half cache , auto warming full cache, auto warming only 10 %. Doing warm up queries on every new searcher , doing NONE warm up queries on every new searching and all the different setting yields the same results . As and when soft commit happens the response time tanks and throughput deceases. The difference is almost 50 % in response times and 50 % in throughput Our workaround for this solution is to also do incremental delta indexing on the collection not serving live traffic and swap when it is done. As you can see that this also defeats the purpose of solr cloud . We cannot do bulk indexing because replicas cannot keeps up and we cannot do incremental indexing because of soft commit performance. Is there a way to make the cluster not do soft commit all at the same time or is there a way to make soft commit not cause this degradation ? We are open
Re: Solr 4.10.x on Oracle Java 1.8.x ?
Bok Jakov, We've been running Solr with Java 8 for several months without issues. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Feb 10, 2015 at 3:03 PM, Jakov Sosic jso...@gmail.com wrote: Hi guys, at the end of April Java 1.7 will be obsoleted, and Oracle will stop updating it. Is it safe to run Tomcat7 / Solr 4.10 on Java 1.8? Did anyone tried it already?
Re: Multi words query
Hi, Can you share details about how exactly you are querying Solr? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 5:21 AM, melb melaggo...@gmail.com wrote: Hi, I have a solr collection which I use to index some documents ( title, description, body) and I can search it well with solr query when it is a single word query When I search for multi words, the result is not satisfactory because I get some results with high scores with only one word of the query while documents with all terms are scored poorly How can I query solr collection with multi words and get documents with all query terms first and in the same time keeping the other documents too rgds -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-query-tp4185625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to make SolrCloud more elastic
Hi Matt, You could create extra shards up front, but if your queries are fanned out to all of them, you can run into situations where there are too many concurrent queries per node causing lots of content switching and ultimately being less efficient than if you had fewer shards. So while this is an approach to take, I'd personally first try to run tests to see how much a single node can handle in terms of volume, expected query rates, and target latency, and then use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you start approaching the target limits you are ready with additional nodes and shard splitting if needed. Of course, if your data and queries are such that newer documents are queries more, you should look into time-based collections... and if your queries can only query a subset of data you should look into query routing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper matt.kui...@issinc.com wrote: I am starting a new project and one of the requirements is that Solr must scale to handle increasing load (both search performance and index size). My understanding is that one way to address search performance is by adding more replicas. I am more concerned about handling a growing index size. I have already been given some good input on this topic and am considering a shard splitting approach, but am more focused on a rebalancing approach that includes defining many shards up front and then moving these existing shards on to new Solr servers as needed. Plan to experiment with this approach first. Before I got too deep, I wondered if anyone has any tips or warnings on these approaches, or has scaled Solr in a different manner. Thanks, Matt
Re: Solrcloud (to HDFS) poor indexing performance
Hi Tim, Although I doubt Kafka is the problem, I'd look at that first and eliminate that. What about those Flume agents? How are they behaving in terms of CPU/GC, and such? You have 18 Solr nodes. what happens if you increase the number of Flume sinks? Are you seeing anything specific that makes you think the problem is on the Solr side? Can you share charts that show your GC activity, disk IO, etc.? (you can share them easily with SPM http://sematext.com/spm, which may help others help you more easily) Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Feb 3, 2015 at 7:47 PM, Tim Smith secs...@gmail.com wrote: Hi, I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection configured to be populated by flume Morphlines sink. The flume agent reads data from Kafka and writes to the Solr collection. The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at best, dips to a few hundred per sec) across the cluster. The incoming data/document rate is about 30-40k/second. I have gone wide/thin with 18 nodes and each with 8GB (Java) + 4GB (non-heap) memory and narrow/thick with current set of 5 dedicated nodes each with 36GB (Java) and 16GB (non-heap) memory (18 shards with the former config and 5 shards, right now). On the flume side, I have gone from 5 flume instances, each with a single sink to 5 sinks for each flume instance. I have tweaked batchSize and batchDuration. I checked ZooKeeper loads and don't see it stressed. Neither are the datanodes. On the Solr nodes, solr is consuming all the allocated memory (32GB) but I don't see solr hitting any CPU limits. *But*, indexing rate stubbornly stays at ~6k docs/sec. When I bounce the flume agent, it jumps up momentarily to several hundreds of thousands but then comes down to ~6k/sec and the flume channels get saturated within seconds. Any clues/pointers for troubleshooting will be appreciated? Thanks, Tim
Re: Garbage Collection tuning - G1 is now a good option
Not sure about AggressiveOpts, but G1 has been working for us nicely. We've successfully used it with HBase, Hadoop, Elasticsearch, and other custom Java apps (all still Java 7, but Java 8 should be even better). Not sure if we are using in on our Solr instances. e.g. see http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Jan 1, 2015 at 8:35 PM, William Bell billnb...@gmail.com wrote: But tons of people on this mailing list do not recommend AggressiveOpts Why do you recommend it? On Thu, Jan 1, 2015 at 12:10 PM, Shawn Heisey apa...@elyograg.org wrote: I've been working with Oracle employees to find better GC tuning options. The results are good enough to share with the community: https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning With the latest Java 7 or Java 8 version, and a couple of tuning options, G1GC has grown up enough to be a viable choice. Two of the settings on that list were critical for making the performance acceptable with my testing: ParallelRefProcEnabled and G1HeapRegionSize. I've included some notes on the wiki about how you can size the G1 heap regions appropriately for your own index. Thanks, Shawn -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: .htaccess / password
Hi Craig, If you want to protect Solr, put it behind something like Apache / Nginx / HAProxy and put .htaccess at that level, in front of Solr. Or try something like http://blog.jelastic.com/2013/06/17/secure-access-to-your-jetty-web-application/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Jan 6, 2015 at 1:28 PM, Craig Hoffman choff...@eclimb.net wrote: Quick question: If put a .htaccess file in www.mydomin.com/8983/solr/#/ will Solr continue to function properly? One thing to note, I will have a CRON job that runs nightly that re-indexes the engine. In a nutshell I’m looking for a way to secure this area. Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman
Re: Solr on HDFS in a Hadoop cluster
Hi Charles, See http://search-lucene.com/?q=solr+hdfs and https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Jan 6, 2015 at 11:02 AM, Charles VALLEE charles.val...@edf.fr wrote: I am considering using *Solr* to extend *Hortonworks Data Platform* capabilities to search. - I found tutorials to index documents into a Solr instance from *HDFS*, but I guess this solution would require a Solr cluster distinct to the Hadoop cluster. Is it possible to have a Solr integrated into the Hadoop cluster instead? - *With the index stored in HDFS?* - Where would the processing take place (could it be handed down to Hadoop)? Is there a way to garantee a level of service (CPU, RAM) - to integrate with *Yarn*? - What about *SolrCloud*: what does it bring regarding Hadoop based use-cases? Does it stand for a Solr-only cluster? - Well, if that could lead to something working with a roles-based authorization-compliant *Banana*, it would be Christmass again! Thanks a lot for any help! Charles Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Re: Solr on HDFS in a Hadoop cluster
Oh, and https://issues.apache.org/jira/browse/SOLR-6743 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Jan 6, 2015 at 12:52 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Charles, See http://search-lucene.com/?q=solr+hdfs and https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Jan 6, 2015 at 11:02 AM, Charles VALLEE charles.val...@edf.fr wrote: I am considering using *Solr* to extend *Hortonworks Data Platform* capabilities to search. - I found tutorials to index documents into a Solr instance from *HDFS*, but I guess this solution would require a Solr cluster distinct to the Hadoop cluster. Is it possible to have a Solr integrated into the Hadoop cluster instead? - *With the index stored in HDFS?* - Where would the processing take place (could it be handed down to Hadoop)? Is there a way to garantee a level of service (CPU, RAM) - to integrate with *Yarn*? - What about *SolrCloud*: what does it bring regarding Hadoop based use-cases? Does it stand for a Solr-only cluster? - Well, if that could lead to something working with a roles-based authorization-compliant *Banana*, it would be Christmass again! Thanks a lot for any help! Charles Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Re: SolrCloud multi-datacenter failover?
Hi, Check http://search-lucene.com/?q=%22Cross+Data+Center+Replicaton%22 - http://issues.apache.org/jira/browse/SOLR-6273 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Jan 2, 2015 at 4:52 PM, jaime spicciati jaime.spicci...@gmail.com wrote: All, At my current customer we have developed a custom federator that will federate queries between Endeca and Solr to ease the transition from an extremely large (TBs of data) Endeca index to Solr. (Endeca is similar to Solr in terms of search/faceted navigation/etc). During this transition plan we need to support multi datacenter failover which we have historically handled via load balancers with the appropriate failover configurations (think F5). We are currently playing our dataloads into multiple datacenters to ensure data consistency. (Each datacenter has a stand-alone instance of solrcloud with its own redundancy/failover) I am curious to see how the community handles multi datacenter failureover at the presentation layer (datacenter A goes down and we want to failover to B). Solrcloud within a datacenter will handle single datacenter failure within the instance, but in order to support multi datacenter failover I haven't seen a definitive ‘answer’ as to how to handle this situation. At this point the only two options I can come up with are 1) Fail the entire datacenter if Solrcloud goes offline (GUI/index/etc go offline) - This is problematic because some portion of user activity will fail, queries that are in transit will not complete 2) Implement failover at the custom federator level. In doing so we would need to detect a failure at datacenter A within our federator, then query datacenter B to fulfill the user request, then potentially fail the entire datacenter A once all transactions have been fulfilled against A Since we are looking up the active solr instance via zookeeper (solrcloud) per datacenter I don’t see any reasonable means of failing over to another datacenter if a given solrcloud instance goes down? Any thoughts are welcome at this point? Thanks Jaime
Re: questions about default operator within solr query string
Hi Chun, Something like: +slug:variety +slug:entertainment headline:entertainment should work. But you may also want to use function queries for slug filtering: http://search-lucene.com/?q=fqfc_project=Solr https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefq(FilterQuery)Parameter Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Jan 5, 2015 at 6:11 AM, chun.sh...@thomsonreuters.com wrote: Hi, Nice to have a chance to discuss with solr experts! We are using solr as our search solution. But now we have a requirement that we don't know how to handle, even after we have looked through the Solr documentation. The solr version we used is 4.10.1. For the question, please refer to the following example url: http://10.90.44.33/solr/searcher/select?start=0rows=24fl=id,headline,slugq=slug:variety-entertainment%20headline:entertainmentsort=score%20ascdebug=true With our default operator(q.op) is configured as OR, the parsed query is: slug:variety slug:entertainment headline:entertainment But what we really want is as follows: +slug:variety+slug:entertainment headline:entertainment So, the question is: When searching , is there any way to configure the applied operator between the terms from the field slug to be AND and the operator between the fields slug and headline is OR? If no, could you please advise on how to handle this requirement in other ways? Thanks in advance Chun
Re: Solr performance issues
Likely lots of disk + network IO, yes. Put SPM for Solr on your nodes to double check. Otis On Dec 26, 2014, at 09:17, Mahmoud Almokadem prog.mahm...@gmail.com wrote: Dears, We've installed a cluster of one collection of 350M documents on 3 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS General purpose (1x1TB + 1x500GB) on each instance. Then we create logical volume using LVM of 1.5TB to fit our index. The response time is about 1 and 3 seconds for simple queries (1 token). Is the LVM become a bottleneck for our index? Thanks for help.
# of daily/weekly/monthly Solr downloads?
Hi, Does anyone know the number of daily/weekly/monthly Solr downloads? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))
Hi, On Sat, Nov 29, 2014 at 2:27 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: On 11/29/14 1:30 PM, Toke Eskildsen wrote: Michael Sokolov [msoko...@safaribooksonline.com] wrote: I wonder if there's any value in providing this metric (total index size - stored field size - term vector size) as part of the admin panel? Is it meaningful? It seems like there would be a lot of cases where it could give a good rule of thumb for memory sizing, and it would save having to root around in the index folder. At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about this. We know (https://lucidworks.com/blog/sizing-hardware-in-the- abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the full picture of an index, but it is a weekly occurrence on this mailing list that people asks questions where it helps to have a gist of the index metrics and how the index is used. Some sort of Copy the content of this concentrated metrics box, when you need to talk with other people about your index-functionality in the admin panel might help with this. To get an idea of usage, it could also contain a few non-filled fields, such as peak queries per second or typical queries. - Toke Eskildsen Yes - the cautions about the need for prototyping are all very well, but even if you take that advice, and build a prototype, it's not clear how to tell whether your setup has enough memory or not. You can add more and measure response times, but even then you only have a gross measurement, and no way of knowing where, in detail, the memory is being used. Also, you might be able to improve your system to make better use of memory with more precise information. It seems like we ought to be able to monitor a running system, observe its memory requirements over time, and report on those. +1 to that! I haven't been following this aspect of development super closely, but I believe there are memory/size estimators for various things at Lucene level that Elasticsearch is nicely exposing via its stats API. I don't know the specifics around those estimators without digging in, otherwise I'd open a JIRA, because I think this is valuable information -- at Sematext we regularly deal with hardware sizing, memory / CPU usage estimates, etc. etc., so the more of this info is surfaced the easier it will be for people to work with Solr. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Constantly high disk read access (40-60M/s)
Po-Yu, To add what others have said: * Your query cache is clearly not serving its purpose, so you are just wasting your heap on it. Consider disabling it. * That's a pretty big index. Do your queries really always have to go against the whole index? Are there multiple tenants in this index that would let you break up the index into multiple smaller indices? Can you segment your index by time? Maybe by doing that some indices will be hotter and some colder, and the OS could do a better job caching. * You didn't say anything about your queries. Maybe they can be tighten to pull less data off disk? * Add RAM :) Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sat, Nov 29, 2014 at 12:59 AM, Po-Yu Chuang ratbert.chu...@gmail.com wrote: Hi all, I am using Solr 4.9 with Tomcat. Thanks to the suggestions from Yonik and Dmitry about the slow start up. Everything works fine now, but I noticed that the load average of the server is high because there is constantly heavy disk read access. Please point me some directions. Some numbers about my system: RAM: 18G swap space: 2G number of documents: 27 million Solr home: 185G disk read access constantly 40-60M/s document cache size: 16K entries document cache hit ratio: 0.65 query cache size: 16K query cache hit ratio: 0.03 At first, I wondered if the disk read comes from swap, so I decreased the swappiness from 60 to 10, but the disk read is still there, which means that the disk read access does not result from swapping in. Then, I tried different document cache size and query different size. The effect on changing query cache size is not obvious. I tried 512, 16K, 256K entries and the hit ratio is between 0.01 to 0.03. For document cache, the larger cache size did improve the hit ratio of document cache size (I tried 512, 16K, 256K, 512K, 1024K and the hit ratio is between 0.58 - 0.87), but the disk read is still high. Is adjusting document cache size a reasonable direction? Or I should just increase the physical memory? Is there any method to estimate the right size of document cache (or other caches) and to estimate the size of physical memory needed? Thanks, Po-Yu
Re: Replicate a collection to a 2nd SolrCloud
Hi, I think you are looking for this: http://search-lucene.com/?q=Cross+Data+Center+Replicationfc_project=Solr == https://issues.apache.org/jira/browse/SOLR-6273 Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Nov 25, 2014 at 3:29 PM, Gili Nachum gilinac...@gmail.com wrote: Hi, *I need to replicate a collection between SolrClouds, anyone did it?*The replication style I need is one direction replicating anything that happens on my main site SolrCloud to the DR site (master-salve) I considered and decide against synchronizing the collections' shards Lucene index over rsync, for being s tricky to arrive at a consistent index and not being efficient enough on bandwidth. My current approach is writing a replicator app that knows to sync between two collections, in a fairly generic way, but it's a last of investment which I rather avoid. Saw that master-slave replication can't be used in SolrCloud https://cwiki.apache.org/confluence/display/solr/Index+Replication
Re: Does any solr version use lucene concurrent flush
Yes. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Nov 25, 2014 at 4:37 PM, Aaron Beach aaron.be...@sendgrid.com wrote: -- Aaron Beach Senior Data Scientist w: +1-303-625-7043 SendGrid -- Email Delivery. Simplified. http://sendgrid.com/careers.html
Re: New Meetup in London - Lucene/Solr User Group
Would LOVE to see the results (assuming you can ensure the same fruit(s?) are being compared) Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Nov 18, 2014 at 11:55 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: On 18 November 2014 11:41, Charlie Hull char...@flax.co.uk wrote: presenting some results of a Solr/Elasticsearch comparative performance study. I was asked about that a couple of times at the Solr Revolution conference. Looking forward to seeing the results. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Search for partial name in Solr 4.x
Hi, You may be looking for wildcard queries or ngrams. Otis On Nov 9, 2014, at 3:26 PM, PeriS peri.subrahma...@htcinc.com wrote: I was wondering if there is a way to search on partial names? Ex; Field is a string and stores values like titles of a book; When searching part of the title may be supplied; How do I resolve this? Please let me know Thanks -PeriS *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: recovery process - node with stale data elected leader
Hi, Not a direct answer to your question, sorry, but since 4.6.0 is relatively old and there have been a ton of changes around leader election, syncing, replication, etc., I'd first jump to the latest Solr and then see if this is still a problem. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Nov 6, 2014 at 5:32 AM, francois.groll...@barclays.com wrote: Hi all, Any idea on my issue below? Thanks Francois -Original Message- From: Grollier, Francois: IT (PRG) Sent: Tuesday, November 04, 2014 6:19 PM To: solr-user@lucene.apache.org Subject: recovery process - node with stale data elected leader Hi, I'm running solrCloud 4.6.0 and I have a question/issue regarding the recovery process. My cluster is made of 2 shards with 2 replicas each. Nodes A1 and B1 are leaders, A2 and B2 followers. I start indexing docs and kill A2. I keep indexing for a while and then kill A1. At this point, the cluster stops serving queries as one shard is completely unavailable. Then I restart A2 first, then A1. A2 gets elected leader, waits a bit for more replicas to be up and once it sees A1 it starts the recovery process. My understanding of the recovery process was that at this point A2 would notice that A1 has a more up to date state and it would sync with A1. It seems to happen like this but then I get: INFO - 2014-11-04 11:50:43.068; org.apache.solr.cloud.RecoveryStrategy; Attempting to PeerSync from http://a1:8111/solr/executions/ core=executions - recoveringAfterStartup=false INFO - 2014-11-04 11:50:43.069; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr START replicas=[http://a1:8111/solr/executions/] nUpdates=100 INFO - 2014-11-04 11:50:43.076; org.apache.solr.update.PeerSync; PeerSync: core=executions url= http://a2:8211/solr Received 98 versions from a1:8111/solr/executions/ INFO - 2014-11-04 11:50:43.076; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr Our versions are newer. ourLowThreshold=1483859630192852992 otherHigh=1483859633446584320 INFO - 2014-11-04 11:50:43.077; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr DONE. sync succeeded And I end up with a different set of documents in each node (actually A1 has all the documents but A2 misses some). Is my understanding wrong and is it a completely nonsense to start A2 before A1? If my understanding right, what could cause the desync? (I can provide more logs) And is there a way to force A2 to index the missing documents? I have try the FORCERECOVERY command but it generates the same result as shown above. Thanks francois ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___
Re: Migrating cloud to another set of machines
I think ZK stuff may actually be easier to handle, no? Add new ones to the existing ZK cluster and then remove the old ones. Won't this work smoothly? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Oct 30, 2014 at 1:16 PM, Jakov Sosic jso...@gmail.com wrote: On 10/30/2014 04:47 AM, Otis Gospodnetic wrote: Hi/Bok Jakov, 2) sounds good to me. It means no down-time. 1) means stoppage. If stoppage is not OK, but falling behind with indexing new content is OK, you could: * add a new cluster * start reading from old index and indexing into the new index * stop old cluster when done * index new content to new cluster (or maybe you can be doing this all along if indexing old + new at the same time is OK for you) -- Thank you for suggestions Otis. Everything is acceptable currently, but in the future as the data grows, we will certainly enter those edge cases where neither stopping indexing nor stopping queries will be acceptable. What makes things a little bit more problematic is that ZooKeepers are migrating also to new machines.
Re: Migrating cloud to another set of machines
Hi/Bok Jakov, 2) sounds good to me. It means no down-time. 1) means stoppage. If stoppage is not OK, but falling behind with indexing new content is OK, you could: * add a new cluster * start reading from old index and indexing into the new index * stop old cluster when done * index new content to new cluster (or maybe you can be doing this all along if indexing old + new at the same time is OK for you) Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Oct 29, 2014 at 10:18 PM, Jakov Sosic jso...@gmail.com wrote: Hi guys I was wondering is there some smart way to migrate Solr cloud from 1 set of machines to another? Specificaly, I have 2 cores, each of them with 2 replicas and 2 shards, spread across 4 machines. We bought new HW and are in a process of moving to new 4 machines. What are my options? 1) - Create new cluster on new set of machines. - stop write operations - copy data directories from old machines to new machines - start solrs on new machines 2) - expand number of replicas from 2 to 4 - add new solr nodes to cloud - wait for resync - stop old solr nodes - shrink number of replicas from 4 back to 2 Is there any other path to achieve this? I'm leaning towards no1, because I don't feel too comfortable with doing all those changes explained in no2 ... Ideas?
Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.
Hi, You may simply be overwhelming your cluster-nodes. Have you checked various metrics to see if that is the case? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Oct 26, 2014, at 9:59 PM, S.L simpleliving...@gmail.com wrote: Folks, I have posted previously about this , I am using SolrCloud 4.10.1 and have a sharded collection with 6 nodes , 3 shards and a replication factor of 2. I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that can each have upto 5 threds each , so the load on the indexing side can get to as high as 75 concurrent threads. I am facing an issue where the replicas of a particular shard(s) are consistently getting out of synch , initially I thought this was beccause I was using a custom component , but I did a fresh install and removed the custom component and reindexed using the Hadoop job , I still see the same behavior. I do not see any exceptions in my catalina.out , like OOM , or any other excepitions, I suspecting thi scould be because of the multi-threaded indexing nature of the Hadoop job . I use CloudSolrServer from my java code to index and initialize the CloudSolrServer using a 3 node ZK ensemble. Does any one know of any known issues with a highly multi-threaded indexing and SolrCloud ? Can someone help ? This issue has been slowing things down on my end for a while now. Thanks and much appreciated!
Re: about Solr log file
Hi Chunki, Having logs on the local disk is not a problem. You can use tools like rsyslog or Logstash or Flume or fluentd and ship your logs wherever you want - your own centralized logging system or Splunk or Logsene for example. This will make it easier to debug/troubleshoot, too - no need to grep big log files... Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Oct 22, 2014 at 10:50 PM, Lee Chunki lck7...@coupang.com wrote: Hi, I have two questions about Solr log file. First, Is it possible to set log setting to use one log file for each core? Because of I run many cores on one Solr and log file is getting bigger and bigger and it makes me to hard to debug when system error. Second, Is there any setting to gather Solr Cloud logs at any one server? I have plan to migrate to Solr Cloud but it seems that each sold node makes log at their local disk. Thanks, Chunki.
Re: Shared Directory for two Solr Clouds(Writer and Reader)
Hi Jae, Sounds a bit complicated and messy to me, but maybe I'm missing something. What are you trying to accomplish with this approach? Which problems do you have that are making you look for non-straight forward setup? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Oct 20, 2014 at 7:35 PM, Jaeyoung Yoon jaeyoungy...@gmail.com wrote: Hi Folks, Here are some my ideas to use shared file system with two separate Solr Clouds(Writer Solr Cloud and Reader Solr Cloud). I want to get your valuable feedbacks For prototype, I setup two separate Solr Clouds(one for Writer and the other for Reader). Basically big picture of my prototype is like below. 1. Reader and Writer Solr clouds share the same directory 2. Writer SolrCloud sends the openSearcher commands to Reader Solr Cloud inside postCommit eventHandler. That is, when new data are added to Writer Solr Cloud, writer Solr Cloud sends own openSearcher command to Reader Solr Cloud. 3. Reader opens searcher only when it receives openSearcher commands from Writer SolrCloud 4. Writer has own deletionPolicy to keep old commit points which might be used by running queries on Reader Solr Cloud when new searcher is opened on reader SolrCloud. 5. Reader has no update/no commits. Everything on reader Solr Cloud are read-only. It also creates searcher from directory not from indexer(nrtMode=false). That is, In Writer Solr Cloud, I added postCommit eventListner. Inside the postCommit eventListner, it sends own openSearcher command to reader Solr Cloud's own handler. Then reader Solr Cloud will create openSearcher directly without commit and return the writer's request. With this approach, Writer and Reader can use the same commit points in shared file system in synchronous way. When a Reader SolrCloud starts, it doesn't create openSearcher. Instead. Writer Solr Cloud listens the zookeeper of Reader Solr Cloud. Any change in the reader SolrCloud, writer sends openSearcher command to reader Solr Cloud. Does it make sense? Or am I missing some important stuff? any feedback would be very helpful to me. Thanks, Jae
Re: Retrieving and updating large set of documents on Solr 4.7.2
Hi, Not sure if you've seen https://issues.apache.org/jira/browse/SOLR-5244 ? It's not in Solr 4.7.2, but may be a good excuse to update Solr. Otis -- Solr Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Aug 18, 2014 at 4:09 AM, deniz denizdurmu...@gmail.com wrote: 0 down vote favorite I am trying to implement an activity feed for a website, and planning to use Solr for this case. As it does not have any follower/following relation, Solr is fitting for the requirements. There is one point which makes me concerned about performance. So as user A, I may have 10K activities in the feed, and then I have updated my preferences, so the activities that I have posted should be updated too (imagine that I am changing my user name, so all of the activities would have my new username). In order to update the all 10K activities, i need to retrieve the unique document ids from Solr, then update them. Retrieving 10K docs at once is not a good idea, if you imagine bunch of other users are also doing a similar change. I have checked docs and forums, using Cursors on Solr seems ok, but still makes me thing about the performance (after id retrieval, i need to update each activity) Are there any other ways to handle this withou Cursors? Or I should better use another tool/backend to have something like a username - activity_id mapping, so i can directly retrieve the ids to update? Regards, - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Retrieving-and-updating-large-set-of-documents-on-Solr-4-7-2-tp4153457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Anybody uses Solr JMX?
Hi Paul, There are lots of people/companies using SPM for Solr/SolrCloud and I don't recall anyone saying SPM agent collecting metrics via JMX had a negative impact on Solr performance. That said, some people really dislike JMX and some open source projects choose to expose metrics via custom stats APIs or even files. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Aug 6, 2014 at 11:18 PM, Paul Libbrecht p...@hoplahup.net wrote: Hello Otis, this looks like an excellent idea! I'm in need of that, erm… last week and probably this one too. Is there not a risk that reading certain JMX properties actually hogs the process? (or is it by design that MBeans are supposed to be read without any lock effect?). thanks for the hint. paul On 6 mai 2014, at 04:43, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Alexandre, you could use something like http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly dump everything out of JMX and see if there is anything there Solr Admin UI doesn't expose. I think you'll find there is more in JMX than Solr Admin UI shows. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote: On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I have religiously kept jmx statement in my solrconfig.xml, thinking it was enabling the web interface statistics output. But looking at the server logs really closely, I can see that JMX is actually disabled without server present. And the Admin UI does not actually seem to care after a quick test. Does anybody have a real experience with Solr JMX? Does it expose more information than Admin UI's Plugins/Stats page? Is it good for Have not been using JMX lately, but we were using it in the past. It does allow monitoring many useful details. As others have commented, it also integrates well with other monitoring tools as JMX is a standard. Regards, Gora
Re: Solr vs ElasticSearch
If performance is the main reason, you can stick with Solr. Both Solr and ES have many knobs to turn for performance, it is impossible to give a direct and correct answer to the question which is faster. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Aug 1, 2014 at 7:35 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: I did see that earlier. My main concern is search performance/scalability/throughput which unfortunately that article didn't address. Any benchmarks or comments about that? We are already using SOLR but there has been a push to check elasticsearch. All the benchmarks I have seen are at least few years old. On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Not super fresh, but more recent than the 2 links you sent: http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/ Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: This is quite an old discussion. Wanted to check any new comparisons after SOLR 4 especially with regards to performance/scalability/throughput? On Tue, Jul 26, 2011 at 7:33 PM, Peter peat...@yahoo.de wrote: Have a look: http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/ Regards, Peter. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Salman Akram -- Regards, Salman Akram
Re: Auto suggest with adding accents
Aha. I don't know if Solr Suggester can do that. Let's see what others say. I know http://www.sematext.com/products/autocomplete/ could do that. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Aug 1, 2014 at 9:26 AM, benjelloun anass@gmail.com wrote: hello, you didnt enderstand well my problem i give you exemple: the document contain the word genève. q=gene auto suggestion give geneve q=genè auto suggestion give genève but what i need is q=gene auto suggestion give genève with accent like correction of word. i tried to add spellchecker to correct it but the maximum of character for correction is 2 maybe there is other solution, i give my schema of field: fieldType name=textSuggest class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.StopFilterFactory words=stopApostrophe.txt ignoreCase=true/ filter class=solr.ASCIIFoldingFilterFactory preserveOriginal=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.StandardFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/replacement=$2/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.StopFilterFactory words=stopApostrophe.txt ignoreCase=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.StandardFilterFactory/ /analyzer /fieldType thanks best regards, Anass BENJELLOUN 2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] ml-node+s472066n4150410...@n3.nabble.com: You need to do the opposite. Make sure accents are NOT removed at index query time. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Jul 31, 2014 at 5:49 PM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4150410i=0 wrote: hi, q=gene it suggest geneve ASCIIFoldingFilter work like isolate accent what i need to suggest is genève any idea? thanks best reagards Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html To unsubscribe from Auto suggest with adding accents, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150379code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwMzc5fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Memory question
Which version of Solr? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Aug 1, 2014 at 11:17 PM, Ethan eh198...@gmail.com wrote: Our SolrCloud setup : 3 Nodes with Zookeeper, 2 running SolrCloud. Current dataset size is 97GB, JVM is 10GB, but 6GB is used(for less garbage collection time). RAM is 96GB, Our softcommit is set to 2secs and hardcommit is set to 1 hour. We are suddenly seeing high disk and network IOs. During search the leader usually logs one more query with it's node name and shard information - {NOW=1406911121656shard.url= chexjvassoms006.ch.expeso.com:52158/solr/Main.. ids=-9223372036371158536,-9223372036373602680,-9223372036618637568,-9223372036371157736..distrib=falsetimeAllowed=2000wt=javabinisShard=true The actually query didn't have any of this information. This started just today and causing lot of latency issues. We have had nodes go down several times today. Any of you faced similar issues before? E
Re: Auto suggest with adding accents
You need to do the opposite. Make sure accents are NOT removed at index query time. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Jul 31, 2014 at 5:49 PM, benjelloun anass@gmail.com wrote: hi, q=gene it suggest geneve ASCIIFoldingFilter work like isolate accent what i need to suggest is genève any idea? thanks best reagards Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is working very slow after certain time
Can we look at your disk IO and CPU? SPM http://sematext.com/spm/ can help. Isn't UseCompressedOops a typo? And deprecated? In general, may want to simplify your JVM params unless you are really sure they are helping. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Jul 31, 2014 at 7:54 PM, Ameya Aware ameya.aw...@gmail.com wrote: Hi, i could index around 10 documents in couple of hours. But after that the time for indexing very large (around just 15-20 documents per minute). i have taken care of garbage collection. i am passing below parameters to Solr: -Xms6144m -Xmx6144m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6 -XX:CMSInitiatingOccupancyFraction=70 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+UseCompressedOops -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -XX:-UseGCOverheadLimit Can anyone help to solve this problem? Thanks, Ameya
Re: Solr vs ElasticSearch
Not super fresh, but more recent than the 2 links you sent: http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/ Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: This is quite an old discussion. Wanted to check any new comparisons after SOLR 4 especially with regards to performance/scalability/throughput? On Tue, Jul 26, 2011 at 7:33 PM, Peter peat...@yahoo.de wrote: Have a look: http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/ Regards, Peter. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Salman Akram
Re: Solr enterprise tech support in Brazil
Hello, Sematext would be happy to help. Please see signature. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Jul 9, 2014, at 4:15 PM, Jefferson Olyntho Neto (STI) jefferson.olyn...@unimedbh.com.br wrote: Dear all, I would like some recommendation of companies who work with enterprise technical support for Solr in Brazil. Could someone help me? Thanks! Jefferson Olyntho Neto jefferson.olyn...@unimedbh.com.brmailto:jefferson.olyn...@unimedbh.com.br
JOB: Solr / Elasticsearch engineer @ Sematext
Hi, I think most people on this list have heard of Sematext http://sematext.com/, so I'll skip the company info, and just jump to the meat, which involves a lot of fun work with Solr and/or Elasticsearch: We have an opening for an engineer who knows either Elasticsearch or Solr or both and wants to use these technologies to implement search and analytics solutions for both Sematext's own products http://sematext.com/products/ such as SPM http://sematext.com/spm/ (monitoring, alerting, machine learning-based anomaly detection, etc.) and Logsene http://sematext.com/logsene/ (logging), as well as for Sematext's clients http://sematext.com/clients/. More info at: * http://blog.sematext.com/2014/07/07/job-elasticsearch-solr-engineer/ * http://sematext.com/about/jobs.html Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: Tomcat or Jetty to use with solr in production ?
Hi Gurunath, In 90% of our engagements with various Solr customers we see Jetty, which we also recommend and use ourselves for Solr + our own services and products. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 30, 2014 at 5:07 AM, gurunath gurunath@ge.com wrote: Hi, Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is there any better option for production. want to know the complexity's with tomcat and jetty in future, as i want to cluster with huge data on solr. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr4 optimization
Hi, I don't remember last time I ran optimize. Sure, yes, things will work faster if you optimize an index and reduce the number of segments, but if you are regularly writing to that index and performance is OK, leave it to Lucene segment merges to purge deletes. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 9, 2014 at 4:15 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some of the boxes we have about 5 million deleted docs and we have never run optimization since beginning. Does number of deleted docs have anything to do with performance of query? Should we consider optimization at all if we're not worried about disk space? Thanks!
Re: Cache response time
Hi Jeremy, Nothing in Solr tracks that time. Caches are pluggable. If you really want this info you could write your own cache that is just a proxy for the real cache and then you can time it. But why do you need this info? Do you suspect that is slow? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Jun 4, 2014 at 3:33 PM, Branham, Jeremy [HR] jeremy.d.bran...@sprint.com wrote: Is there a JMX metric for measuring the cache request time? I can see the avg request times, but I'm assuming this includes the cache and non-cache values. http://wiki.apache.org/solr/SolrPerformanceFactors This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.
Re: Strange behaviour when tuning the caches
Hi, Have you seen https://wiki.apache.org/solr/CollapsingQParserPlugin ? May help with the field collapsing queries. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Jun 3, 2014 at 8:41 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi Otis, We saw some improvement when increasing the size of the caches. Since then, we followed Shawn advice on the filterCache and gave some additional RAM to the JVM in order to reduce GC. The performance is very good right now but we are still experiencing some instability but not at the same level as before. With our current settings the number of evictions is actually very low so we might be able to reduce some caches to free up some additional memory for the JVM to use. As for the queries, it is a set of 5 million queries taken from our logs so they vary a lot. All I can say is that all queries involve either grouping/field collapsing and/or radius search around a point. Our largest customer is using a set of 8-10 filters that are translated as fq parameters. The collection contains around 13 million documents distributed on 5 shards with 2 replicas. The second collection has the same configuration and is used for indexing or as a fail-over index in case the first one falls. We`ll keep making adjustments today but we are pretty close of having something that performs while being stable. Thanks all for your help. -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: June-03-14 12:17 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches Hi Jean-Sebastien, One thing you didn't mention is whether as you are increasing(I assume) cache sizes you actually see performance improve? If not, then maybe there is no value increasing cache sizes. I assume you changed only one cache at a time? Were you able to get any one of them to the point where there were no evictions without things breaking? What are your queries like, can you share a few examples? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 2, 2014 at 11:09 AM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: Thanks for your quick response. Our JVM is configured with a heap of 8GB. So we are pretty close of the optimal configuration you are mentioning. The only other programs running is Zookeeper (which has its own storage device) and a proprietary API (with a heap of 1GB) we have on top of Solr to server our customer`s requests. I will look into the filterCache to see if we can better use it. Thanks for your help -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: June-02-14 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches On 6/2/2014 8:24 AM, Jean-Sebastien Vachon wrote: We have yet to determine where the exact breaking point is. The two patterns we are seeing are: - less cache (around 20-30% hit/ratio), poor performance but overall good stability When caches are too small, a low hit ratio is expected. Increasing them is a good idea, but only increase them a little bit at a time. The filterCache in particular should not be increased dramatically, especially the autowarmCount value. Filters can take a very long time to execute, so a high autowarmCount can result in commits taking forever. Each filter entry can take up a lot of heap memory -- in terms of bytes, it is the number of documents in the core divided by 8. This means that if the core has 10 million documents, each filter entry (for JUST that core) will take over a megabyte of RAM. - more cache (over 90% hit/ratio), improved performance but almost no stability. In that case, we start seeing messages such as No shards hosting shard X or cancelElection did not find election node to remove This would not be a direct result of increasing the cache size, unless perhaps you've increased them so they are *REALLY* big and you're running out of RAM for the heap or OS disk cache. Anyone, has any advice on what could cause this? I am beginning to suspect the JVM version, is there any minimal requirements regarding the JVM? Oracle Java 7 is recommended for all releases, and required for Solr 4.8. You just need to stay away from 7u40, 7u45, and 7u51 because of bugs in Java itself. Right now, the latest release is recommended, which is 7u60. The 7u21 release that you are running should be perfectly fine. With six 9.4GB cores per node, you'll achieve
Re: Strange behaviour when tuning the caches
Hi Jean-Sebastien, One thing you didn't mention is whether as you are increasing(I assume) cache sizes you actually see performance improve? If not, then maybe there is no value increasing cache sizes. I assume you changed only one cache at a time? Were you able to get any one of them to the point where there were no evictions without things breaking? What are your queries like, can you share a few examples? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 2, 2014 at 11:09 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Thanks for your quick response. Our JVM is configured with a heap of 8GB. So we are pretty close of the optimal configuration you are mentioning. The only other programs running is Zookeeper (which has its own storage device) and a proprietary API (with a heap of 1GB) we have on top of Solr to server our customer`s requests. I will look into the filterCache to see if we can better use it. Thanks for your help -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: June-02-14 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches On 6/2/2014 8:24 AM, Jean-Sebastien Vachon wrote: We have yet to determine where the exact breaking point is. The two patterns we are seeing are: - less cache (around 20-30% hit/ratio), poor performance but overall good stability When caches are too small, a low hit ratio is expected. Increasing them is a good idea, but only increase them a little bit at a time. The filterCache in particular should not be increased dramatically, especially the autowarmCount value. Filters can take a very long time to execute, so a high autowarmCount can result in commits taking forever. Each filter entry can take up a lot of heap memory -- in terms of bytes, it is the number of documents in the core divided by 8. This means that if the core has 10 million documents, each filter entry (for JUST that core) will take over a megabyte of RAM. - more cache (over 90% hit/ratio), improved performance but almost no stability. In that case, we start seeing messages such as No shards hosting shard X or cancelElection did not find election node to remove This would not be a direct result of increasing the cache size, unless perhaps you've increased them so they are *REALLY* big and you're running out of RAM for the heap or OS disk cache. Anyone, has any advice on what could cause this? I am beginning to suspect the JVM version, is there any minimal requirements regarding the JVM? Oracle Java 7 is recommended for all releases, and required for Solr 4.8. You just need to stay away from 7u40, 7u45, and 7u51 because of bugs in Java itself. Right now, the latest release is recommended, which is 7u60. The 7u21 release that you are running should be perfectly fine. With six 9.4GB cores per node, you'll achieve the best performance if you have about 60GB of RAM left over for the OS disk cache to use -- the size of your index data on disk. You did mention that you have 92GB of RAM per node, but you have not said how big your Java heap is, or whether there is other software on the machine that may be eating up RAM for its heap or data. http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date: 27/05/2014
Re: Uneven shard heap usage
Hi Joe, Are you/how are you sure all 3 shards are roughly the same size? Can you share what you run/see that shows you that? Are you sure queries are evenly distributed? Something like SPM http://sematext.com/spm/ should give you insight into that. How big are your caches? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Sat, May 31, 2014 at 5:54 PM, Joe Gresock jgres...@gmail.com wrote: Interesting thought about the routing. Our document ids are in 3 parts: 10-digit identifier!epoch timestamp!format e.g., 5/12345678!13025603!TEXT Each object has an identifier, and there may be multiple versions of the object, hence the timestamp. We like to be able to pull back all of the versions of an object at once, hence the routing scheme. The nature of the identifier is that a great many of them begin with a certain number. I'd be interested to know more about the hashing scheme used for the document routing. Perhaps the first character gives it more weight as to which shard it lands in? It seems strange that certain of the most highly-searched documents would happen to fall on this shard, but you may be onto something. We'll scrape through some non-distributed queries and see what we can find. On Sat, May 31, 2014 at 1:47 PM, Erick Erickson erickerick...@gmail.com wrote: This is very weird. Are you sure that all the Java versions are identical? And all the JVM parameters are the same? Grasping at straws here. More grasping at straws: I'm a little suspicious that you are using routing. You say that the indexes are about the same size, but is it is possible that your routing is somehow loading the problem shard abnormally? By that I mean somehow the documents on that shard are different, or have a drastically higher number of hits than the other shards? You can fire queries at shards with distrib=false and NOT have it go to other shards, perhaps if you can isolate the problem queries that might shed some light on the problem. Best er...@baffled.com On Sat, May 31, 2014 at 8:33 AM, Joe Gresock jgres...@gmail.com wrote: It has taken as little as 2 minutes to happen the last time we tried. It basically happens upon high query load (peak user hours during the day). When we reduce functionality by disabling most searches, it stabilizes. So it really is only on high query load. Our ingest rate is fairly low. It happens no matter how many nodes in the shard are up. Joe On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky j...@basetechnology.com wrote: When you restart, how long does it take it hit the problem? And how much query or update activity is happening in that time? Is there any other activity showing up in the log? If you bring up only a single node in that problematic shard, do you still see the problem? -- Jack Krupansky -Original Message- From: Joe Gresock Sent: Saturday, May 31, 2014 9:34 AM To: solr-user@lucene.apache.org Subject: Uneven shard heap usage Hi folks, I'm trying to figure out why one shard of an evenly-distributed 3-shard cluster would suddenly start running out of heap space, after 9+ months of stable performance. We're using the ! delimiter in our ids to distribute the documents, and indeed the disk size of our shards are very similar (31-32GB on disk per replica). Our setup is: 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so basically 2 physical CPUs), 24GB disk 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever). We reserve 10g heap for each solr instance. Also 3 zookeeper VMs, which are very stable Since the troubles started, we've been monitoring all 9 with jvisualvm, and shards 2 and 3 keep a steady amount of heap space reserved, always having horizontal lines (with some minor gc). They're using 4-5GB heap, and when we force gc using jvisualvm, they drop to 1GB usage. Shard 1, however, quickly has a steep slope, and eventually has concurrent mode failures in the gc logs, requiring us to restart the instances when they can no longer do anything but gc. We've tried ruling out physical host problems by moving all 3 Shard 1 replicas to different hosts that are underutilized, however we still get the same problem. We'll still be working on ruling out infrastructure issues, but I wanted to ask the questions here in case it makes sense: * Does it make sense that all the replicas on one shard of a cluster would have heap problems, when the other shard replicas do not, assuming a fairly even data distribution? * One thing we changed recently was to make all of our fields stored, instead of only half of them. This was to
Re: Offline Indexes Update to Shard
Hi, On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra clearmido...@gmail.comwrote: Hi All, Has anyone tried with building Offline indexes with EmbeddedSolrServer and posting it to Shards. What do you mean by posting it to shards? How is that different than copying them manually to the right location in FS? Could you please elaborate? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ FYI, I am done building the indexes but looking out for a way to post these index files on shards. Copying the indexes manually to each shard's replica is possible and is working fine but I don't want to go with that approach. Thanks!
Re: Solr High GC issue
Hi Bihan, That's a lot of parameters and without trying one can't really give you very specific and good advice. If I had to suggest something quickly I'd say: * go back to the basics - remove most of those params and stick with the basic ones. Look at GC and tune slowly by changing/adding params one at a time. * consider using G1 GC with the most recent Java7. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, May 29, 2014 at 1:36 AM, bihan.chandu bihan.cha...@gmail.comwrote: Hi All I am Currently using solr 3.6.1 and my system handle lot of request .Now we are facing High GC issue in system. Please find the memory parameters in my solr system . Can some on help me to identify is there any relationship between my memory parameters and GC issue. MEM_ARGS=-Xms7936M -Xmx7936M -XX:NewSize=512M -XX:MaxNewSize=512M -Xss1024k -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+AggressiveOpts -XX:LargePageSizeInBytes=2m -XX:+UseLargePages -XX:MaxTenuringThreshold=15 -XX:-UseAdaptiveSizePolicy -XX:PermSize=256M -XX:MaxPermSize=256M -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGC -Xloggc:${GCLOG} -XX:-OmitStackTraceInFastThrow -XX:+DisableExplicitGC -XX:-BindGCTaskThreadsToCPUs -verbose:gc -XX:StackShadowPages=20 Thanks Bihan -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-High-GC-issue-tp4138570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Contribute QParserPlugin
Hi, I think the question is not really how to do it - that's clear - http://wiki.apache.org/solr/HowToContribute The question is really about whether something like this would be of interest to Solr community, whether it is likely it would be accepted into Solr core or contrib, or whether, perhaps because of potentially unwanted dependency on Redis, Solr dev community might not want this in Solr and this might be better done outside Solr. Not sure what the answer is. maybe active Solr developers can chime in here? Or maybe dev list is a better place to ask? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, May 28, 2014 at 2:03 PM, Alan Woodward a...@flax.co.uk wrote: Hi Pawel, The easiest thing to do is to open a JIRA ticket on the Solr project, here: https://issues.apache.org/jira/browse/SOLR, and attach your patch. Alan Woodward www.flax.co.uk On 28 May 2014, at 16:50, Pawel Rog wrote: Hi, I need QParserPlugin that will use Redis as a backend to prepare filter queries. There are several data structures available in Redis (hash, set, etc.). From some reasons I cannot fetch data from redis data structures, build and send big requests from application. That's why I want to build that filters on backend (Solr) side. I'm wondering what do I have to do to contribute QParserPlugin into Solr repository. Can you suggest me a way (in a few steps) to publish it in Solr repository, probably as a contrib? -- Paweł Róg
Re: Percolator feature
Yes - Luwak. Stay tuned for more. :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, May 28, 2014 at 4:44 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Is there some work around in Solr ecosystem to get something similar to the percolator feature offered by elastic search? Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Contribute QParserPlugin
Hi, On Wed, May 28, 2014 at 10:58 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Well, Solr just bundled a set of Hadoop jars that does not actually contribute anything to Solr itself (not really integrated, etc). So, I Good point about Hadoop jars. am not sure how the may not want process happened there. Would be nice to have one actually, because there is a slow building wave of external components for Solr which are completely not discoverable by the Solr community at large. Agreed and a Wiki page where people can add this or Google don't cut it? (serious question) So, I would love us to (re-?)start the serious discussion on the plugin model for Solr. Probably on the dev list. Sure. Separate thread? I would even commit to building an initial package discovery/search website if the dev-list powers would agree on how that mechanism (package/plugins/downloads) should look like. ElasticSearch is very obviously benefiting from having a plugin system. Solr's kitchen-sync approach worked when it was the only one. But with increased speed of releases and the growing packages, it is becoming very noticeably pudgy. It even had to be excused during the Solr vs. ElasticSearch presentation at the BerlinBuzz a couple of days ago. For the curious - Alex is referring to http://blog.sematext.com/2014/05/28/presentation-and-video-side-by-side-with-solr-and-elasticsearch/ Re building something - may be best to talk about that in that separate thread. P.s. Regarding the specific issue, I know of another Redis plugin. Not sure how relevant or useful it is, but at least it exists: https://github.com/dfdeshom/solr-redis-cache Thanks. It's different from what Pawel was asking about. Maybe Pawel can provide a couple of examples so people can better understand what he is looking to do. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, May 29, 2014 at 2:50 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I think the question is not really how to do it - that's clear - http://wiki.apache.org/solr/HowToContribute The question is really about whether something like this would be of interest to Solr community, whether it is likely it would be accepted into Solr core or contrib, or whether, perhaps because of potentially unwanted dependency on Redis, Solr dev community might not want this in Solr and this might be better done outside Solr. Not sure what the answer is. maybe active Solr developers can chime in here? Or maybe dev list is a better place to ask? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, May 28, 2014 at 2:03 PM, Alan Woodward a...@flax.co.uk wrote: Hi Pawel, The easiest thing to do is to open a JIRA ticket on the Solr project, here: https://issues.apache.org/jira/browse/SOLR, and attach your patch. Alan Woodward www.flax.co.uk On 28 May 2014, at 16:50, Pawel Rog wrote: Hi, I need QParserPlugin that will use Redis as a backend to prepare filter queries. There are several data structures available in Redis (hash, set, etc.). From some reasons I cannot fetch data from redis data structures, build and send big requests from application. That's why I want to build that filters on backend (Solr) side. I'm wondering what do I have to do to contribute QParserPlugin into Solr repository. Can you suggest me a way (in a few steps) to publish it in Solr repository, probably as a contrib? -- Paweł Róg
Re: Physical Files v. Reported Index Size
Darrell, Look at the top index.x directory in your second image. Looks like that's your index, the same one you see in the Solr UI. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, May 6, 2014 at 11:34 PM, Darrell Burgan darrell.bur...@infor.comwrote: Hello all, I’m trying to reconcile what I’m seeing in the file system for a Solr index versus what it is reporting in the UI. Here’s what I see in the UI for the index: https://s3-us-west-2.amazonaws.com/pa-darrell/ui.png As shown, the index is 74.85 GB in size. However, here is what I see in the data folder of the file system on that server: https://s3-us-west-2.amazonaws.com/pa-darrell/file-system.png As shown, it is consuming 109 GB of space. Also note that one of the index folders is 75 GB in size. My question is why the difference, and whether I can remove some of these index folders to reclaim file system space? Or is there a Solr command to do it (is it as obvious as “Optimize”)? If there a manual I should RTFM about the file structure, please point me to it. J Thanks! Darrell [image: Description: Infor] http://www.infor.com/ *Darrell Burgan* | Architect, Sr. Principal, PeopleAnswers office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | darrell.bur...@infor.com | http://www.infor.com CONFIDENTIALITY NOTE: This email (including any attachments) is confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the information contained herein is prohibited. If you have received this message in error, please notify the sender by replying to this message and then delete this message in its entirety. Thank you for your cooperation.
Re: SolrCloud - Highly Reliable / Scalable Resources?
Hi, Re: we have suffered several issues which always seem quite problematic to resolve. Try grabbing the latest version if you can. We identified a number of issues in older SolrCloud versions when working on large client setups with thousands of cores, but a lot of those issues have been fixes in the more recent versions. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, May 12, 2014 at 9:53 AM, Darren Lee d...@amplience.com wrote: Hi everyone, We have been using Solr Cloud (4.4) for ~ 6 months now. Functionally its excellent but we have suffered several issues which always seem quite problematic to resolve. I was wondering if anyone in the community can recommend good resources / reading for setting up a highly scalable / highly reliable cluster. A lot of what I see in the solr documentation is aimed at small setups or is quite sparse. Dealing with topics like: * Capacity planning * Losing nodes * Voting panic * Recovery failure * Replication factors * Elasticity / Auto scaling / Scaling recipes * Exhibitor * Container configuration, concurrency limits, packet drop tuning * Increasing capacity without downtime * Scalable approaches to full indexing hundreds of millions of documents * External health check vs CloudSolrServer * Separate vs local zookeeper * Benchmarks Sorry, I know that's a lot to ask heh. We are going to run a project for a month or so soon where we re-write all our run books and do deeper testing on various failure scenarios and the above but any starting point would be much appreciated. Thanks all, Darren
Re: Anybody uses Solr JMX?
Alexandre, you could use something like http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly dump everything out of JMX and see if there is anything there Solr Admin UI doesn't expose. I think you'll find there is more in JMX than Solr Admin UI shows. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote: On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I have religiously kept jmx statement in my solrconfig.xml, thinking it was enabling the web interface statistics output. But looking at the server logs really closely, I can see that JMX is actually disabled without server present. And the Admin UI does not actually seem to care after a quick test. Does anybody have a real experience with Solr JMX? Does it expose more information than Admin UI's Plugins/Stats page? Is it good for Have not been using JMX lately, but we were using it in the past. It does allow monitoring many useful details. As others have commented, it also integrates well with other monitoring tools as JMX is a standard. Regards, Gora
Re: How to get a list of currently executing queries?
No, though one could write a custom SearchComponent, I imagine. Not terribly useful for most situations where queries typically run for only a few milliseconds, but Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Apr 17, 2014 at 7:34 AM, Nikhil Chhaochharia nikhil...@yahoo.comwrote: Hello, Is there some way of getting a list of all queries that are currently executing? Something similar to 'show full processlist' in MySQL. Thanks, Nikhil
Re: TB scale
Hi Ed, Unfortunately, there is no good *general* advice, so you'd need to provide a lot more detail to get useful help. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 25, 2014 at 3:48 PM, Ed Smiley esmi...@ebrary.com wrote: Anyone with experience, suggestions or lessons learned in the 10 -100 TB scale they'd like to share? Researching optimum design for a Solr Cloud with, say, about 20TB index. - Thanks Ed Smiley, Senior Software Architect, Ebooks ProQuest | 161 Evelyn Ave. | Mountain View, CA 94041 USA | +1 640 475 8700 ext. 3772 ed.smi...@proquest.commailto:ed.smi...@proquest.com www.proquest.comhttp://www.proquest.com/ | www.ebrary.com http://www.ebrary.com/ | www.eblib.comhttp://www.eblib.com/ ebrary and EBL, ProQuest businesses
Re: SolrCloud load balancing during heavy indexing
Hi, On Fri, Apr 25, 2014 at 12:54 PM, zzT zis@gmail.com wrote: Erick Erickson wrote Back up, you're misunderstanding the update process. A leader node distributes the update to every replica. So _all_ your nodes in a slice are indexing when _any_ of them index. So the idea of sending queries to just the replicas to avoid performance problems isn't relevant. Hmm, I thought that it's not actual indexing taking place on the replicas but that the changes were somehow transferred to the replicas and thus it was less intensive for them. Unfortunately that's not the case. Each node that gets a doc still has to analyze and index it. I think at some point I sent a message to the list and/or created a JIRA issue to suggest doing analysis on just the receiving node, in which case the other nodes that need to index could skip that step and do a little less work, but that hasn't been implemented yet. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ Erick Erickson wrote In order to support NRT and HA/DR, it's required that all the nodes be ready to take over, so the notion of the leader being the only node that actually indexed the documents then distributing only the indexed document to the other members of the slice isn't how it's done. So, this is where SolrCloud is different from legacy master/slave configuration? I mean master/slave sends segments to the slaves using e.g. rsync while SolrCloud forwards the indexing request to replicas where it's processed locally on each replica, right? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-during-heavy-indexing-tp4133099p4133160.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search for a mask that matches the requested string
Luwak is not based on the fork of Lucene or rather, the fork you are seeing is there only because the Luwak authors needed highlighting. If you don't need highlighting you can probably modify Luwak a bit to use regular Lucene. The Lucene fork you are seeing there will also, eventually, be committed to Lucene trunk and then hopefully backported to 4.x. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 25, 2014 at 6:46 PM, Muhammad Gelbana m.gelb...@gmail.comwrote: Luwak is based on a fork of solr\lucene which I cannot use. I have to do this using solr 4.6, whether by writing extra code or not. Thanks. *-* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana On Sat, Apr 26, 2014 at 12:13 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, You don't need to write code for this. Use luwak (I gave the link in my first e-mail) instead. If your can't get luwak running because its too complicated etc, see a similar discussion http://find.searchhub.org/document/9411388c7d2de701#36e50082e918b10c where diy-percolator example pointer is given. It is an example to use memory index. Ahmet On Saturday, April 26, 2014 1:05 AM, Muhammad Gelbana m.gelb...@gmail.com wrote: @Jack, I am ready to write custom code to implement such feature but I don't know what feature in solr should I extend ? Where should I start ? I believe it should be a very simple task. @Ahmet, how can I use the class you mentioned ? Is there a tutorial for it ? I'm not sure how the code in the class's description should work, I've never extended solr before. Thank you all. *-* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana On Fri, Apr 25, 2014 at 10:38 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Your use case is different than ad hoc retrieval. Where you have set of documents and varying queries. In your case it is the reverse, you have a query (string masks) stored A?, and incoming documents are percolated against it. out of the box Solr does not have support for this today. Please see : http://lucene.apache.org/core/4_7_2/memory/org/apache/lucene/index/memory/MemoryIndex.html By the way wildcard ? matches a single character. Ahmet On Friday, April 25, 2014 11:02 PM, Muhammad Gelbana m.gelb...@gmail.com wrote: I have no idea how can this help me. I have been using solr for a few weeks and I'm not familiar with it yet. I'm asking for a very simple task, a way to customize how solr matches a string, does this exist in solr ? *-* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana On Thu, Apr 24, 2014 at 10:09 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Please see : https://github.com/flaxsearch/luwak Ahmet On Thursday, April 24, 2014 8:40 PM, Muhammad Gelbana m.gelb...@gmail.com wrote: (Please make sure you reply to my address because I didn't subscribe to this mailing list) I'm using Solr 4.6 I need to store string masks in Solr. By masks, I mean strings that can match other strings. Then I need to search for masks that match the string I'm providing in my query. For example, assume the following single-field document stored in Solr: { fieldA: __A__ } I need to be able to find this document if I query the fieldA field with a string like *12A34*, as the underscore *_* matches a single string. The single string matching mechanism is my strict goal here, multiple string matching won't be helpful. I hope I was clear enough. Please elaborate because I'm not versatile with solr and I haven't been using it for too long. Thank you. *-* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana
Re: Application of different stemmers / stopword lists within a single field
Hi Tim, Step one is probably to detect language boundaries. You know your data. If they happen on paragraph breaks, your job will be easier. If they don't, a bit harder, but not impossible at all. I'm sure there is a ton of research on this topic out there, but the obvious approach would involve dictionaries and individual terms or shingle lookups, keeping track of the current language or language of last N terms and watching out for a switch. Once you have that you'd know the language of each paragraph. At that point you'd feed those into Solr in separate language-specific fields. Of course, the other side of this is often the more complicated one - identifying the language of the query. The problem is they are short. But you can handle it via UI, via user preferences, via a combination of these things, etc. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 25, 2014 at 6:34 AM, Timothy Hill timothy.d.h...@gmail.comwrote: This may not be a practically solvable problem, but the company I work for has a large number of lengthy mixed-language documents - for example, scholarly articles about Islam written in English but containing lengthy passages of Arabic. Ideally, we would like users to be able to search both the English and Arabic portions of the text, using the full complement of language-processing tools such as stemming and stopword removal. The problem, of course, is that these two languages co-occur in the same field. Is there any way to apply different processing to different words or paragraphs within a single field through language detection? Is this to all intents and purposes impossible within Solr? Or is another approach (using language detection to split the single large field into language-differentiated smaller fields, for example) possible/recommended? Thanks, Tim Hill
Re: What contributes to disk IO?
Lucene segment merges cause both reads and writes. If you look at SPM, you'll see the number of index files and the number of segments, which will give you an idea what's going on at that level. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Mar 25, 2014 at 8:12 PM, Software Dev static.void@gmail.comwrote: What are the main contributing factors for Solr Cloud generating a lot of disk IO? A lot of reads? Writes? Insufficient RAM? I would think if there was enough disk cache available for the whole index there would be little to no disk IO.
Re: w/10 ? [was: Partial Counts in SOLR]
I think SQP is getting axed, no? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Mar 24, 2014 at 3:45 PM, T. Kuro Kurosaka k...@healthline.comwrote: On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro
Re: Limit on # of collections -SolrCloud
Hours sounds too long indeed. We recently had a client with several thousand collections, but restart wasn't taking hours... Otis Solr ElasticSearch Support http://sematext.com/ On Mar 20, 2014 5:49 PM, Erick Erickson erickerick...@gmail.com wrote: How many total replicas are we talking here? As in how many shards and, for each shard, how many replicas? I'm not asking for a long list here, just if you have a bazillion replicas in aggregate. Hours is surprising. Best, Erick On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com wrote: Thanks, Shalin. Making clusterstate.json on a collection basis sounds awesome. I am not having problems with #2 . #3 is a major time hog in my environment. I have over 300 +collections and restarting the entire cluster takes in the order of hours. (2-3 hour). Can you explain more about the leaderVoteWait setting? On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is because of: 1) Core discovery, schema/config parsing etc 2) Transaction log replay on startup 3) Wait time for enough replicas to become available before leader election happens You can't do much about 1 right now I think. For #2, you can keep your transaction logs smaller by a hard commit before shutdown. For #3 there is a leaderVoteWait settings but I'd rather not touch that unless it becomes a problem. On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote: Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C -- Regards, Shalin Shekhar Mangar. -- Best -- C
Re: w/10 ? [was: Partial Counts in SOLR]
Hi, Guessing it's surround query parser's support for within backed by span queries. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 19, 2014 4:44 PM, T. Kuro Kurosaka k...@healthline.com wrote: In the thread Partial Counts in SOLR, Salman gave us this sample query: ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or purchase* or repurchase*)) w/10 (executive or director) I'm not familiar with this w/10 notation. What does this mean, and what parser(s) supports this syntax? Kuro
Re: Excessive Heap Usage from docValues?
Hi, Which type of doc values? See Wiki or reference guide for a list of types. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 19, 2014 5:02 PM, tradergene nos...@krevets.com wrote: Hello All, I'm hoping to get your assistance in debugging what seems like a memory issue. I have a Solr index with about 32 million docs. Each doc is relatively small but has multiple dynamic fields that are storing INTs. The initial problem that I had to resolve is that we were running into OOMs (on a 48GB heap, 130GB on-disk index). I narrowed that issue down to Lucene FieldCache filling up the heap due to all the dynamic fields. To mitigate this, I enabled docValues on the schema for many of the dynamicField culprits. This dropped the FieldCache down to almost nothing. Now, when re-indexing for docValues functionality, I ran into OOMs as soon as I reached 12 million of the 32 million documents. Before enabling docValues, I was able to load up Solr on a 48GB heap but ran into problems after enough unique searches occurred (normal FieldCache issue). Now, with docValues, a 48GB heap is giving me OOM after 12 million docs indexed. I split the collection into 10 shards and with 2 nodes (48GB heap each) was able to get up to 21 million docs indexed. Now, I've had to move the shards to more nodes and am up to 10 shards across 4 nodes and am hoping to be able to get all 32 million docs indexed. This will be 48GB x 4 heap which seems really excessive for an index that was only 132GB pre-docValues. I would love some thoughts as to whether I'm expecting too much efficiency with docValues enabled. I was under the impression that docValues would increase storage requirements on disk (which it has), but l thought that RAM usage would go down during searching (which I haven't tested) as well as indexing. Thanks for any assistance anyone can provide. Gene -- View this message in context: http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing large documents
Hi, I think you probably want to split giant documents because you / your users probably want to be able to find smaller sections of those big docs that are best matches to their queries. Imagine querying War and Peace. Almost any regular word your query for will produce a match. Yes, you may want to enable field collapsing aka grouping. I've seen facet counts get messed up when grouping is turned on, but have not confirmed if this is a (known) bug or not. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Mar 18, 2014 at 10:52 PM, Stephen Kottmann stephen_kottm...@h3biomedicine.com wrote: Hi Solr Users, I'm looking for advice on best practices when indexing large documents (100's of MB or even 1 to 2 GB text files). I've been hunting around on google and the mailing list, and have found some suggestions of splitting the logical document up into multiple solr documents. However, I haven't been able to find anything that seems like conclusive advice. Some background... We've been using solr with great success for some time on a project that is mostly indexing very structured data - ie. mainly based on ingesting through DIH. I've now started a new project and we're trying to make use of solr again - however, in this project we are indexing mostly unstructured data - pdfs, powerpoint, word, etc. I've not done much configuration - my solr instance is very close to the example provided in the distribution aside from some minor schema changes. Our index is relatively small at this point ( ~3k documents ), and for initial indexing I am pulling documents from a http data source, running them through Tika, and then pushing to solr using solrj. For the most part this is working great... until I hit one of these huge text files and then OOM on indexing. I've got a modest JVM - 4GB allocated. Obviously I can throw more memory at it, but it seems like maybe there's a more robust solution that would scale better. Is splitting the logical document into multiple solr documents best practice here? If so, what are the considerations or pitfalls of doing this that I should be paying attention to. I guess when querying I always need to use a group by field to prevent multiple hits for the same document. Are there issues with term frequency, etc that you need to work around? Really interested to hear how others are dealing with this. Thanks everyone! Stephen -- [This e-mail message may contain privileged, confidential and/or proprietary information of H3 Biomedicine. If you believe that it has been sent to you in error, please contact the sender immediately and delete the message including any attachments, without copying, using, or distributing any of the information contained therein. This e-mail message should not be interpreted to include a digital or electronic signature that can be used to authenticate an agreement, contract or other legal document, nor to reflect an intention to be bound to any legally-binding agreement or contract.]
Re: Help me understand these newrelic graphs
Are you trying to bring that 24.9 ms response time down? Looks like there is room for more aggressive sharing there, yes. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Mar 14, 2014 at 1:07 PM, Software Dev static.void@gmail.comwrote: Here is a screenshot of the host information: http://postimg.org/image/vub5ihxix/ As you can see we have 24 core CPU's and the load is only at 5-7.5. On Fri, Mar 14, 2014 at 10:02 AM, Software Dev static.void@gmail.com wrote: If that is the case, what would help? On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: It really depends, hard to give a definitive instruction without more pieces of info. e.g. if your CPUs are all maxed out and you already have a high number of concurrent queries than sharding may not be of any help at all. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 13, 2014 at 7:42 PM, Software Dev static.void@gmail.com wrote: Ahh.. its including the add operation. That makes sense I then. A bit silly on NR's part they don't break it down. Otis, our index is only 8G so I don't consider that big by any means but our queries can get a bit complex with a bit of faceting. Do you still think it makes sense to shard? How easy would this be to get working? On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I think NR has support for breaking by handler, no? Just checked - no. Only webapp controller, but that doesn't apply to Solr. SPM should be more helpful when it comes to monitoring Solr - you can filter by host, handler, collection/core, etc. -- you can see the demo - https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud. If your index is big or queries are complex, shard it and parallelize search. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 13, 2014 at 6:17 PM, ralph tice ralph.t...@gmail.com wrote: I think your response time is including the average response for an add operation, which generally returns very quickly and due to sheer number are averaging out the response time of your queries. New Relic should break out requests based on which handler they're hitting but they don't seem to. On Thu, Mar 13, 2014 at 2:18 PM, Software Dev static.void@gmail.com wrote: Here are some screen shots of our Solr Cloud cluster via Newrelic http://postimg.org/gallery/2hyzyeyc/ We currently have a 5 node cluster and all indexing is done on separate machines and shipped over. Our machines are running on SSD's with 18G of ram (Index size is 8G). We only have 1 shard at the moment with replicas on all 5 machines. I'm guessing thats a bit of a waste? How come when we do our bulk updating the response time actually decreases? I would think the load would be higher therefor response time should be higher. Any way I can decrease the response time? Thanks
Re: Help me understand these newrelic graphs
Hi, I think NR has support for breaking by handler, no? Just checked - no. Only webapp controller, but that doesn't apply to Solr. SPM should be more helpful when it comes to monitoring Solr - you can filter by host, handler, collection/core, etc. -- you can see the demo - https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud. If your index is big or queries are complex, shard it and parallelize search. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 13, 2014 at 6:17 PM, ralph tice ralph.t...@gmail.com wrote: I think your response time is including the average response for an add operation, which generally returns very quickly and due to sheer number are averaging out the response time of your queries. New Relic should break out requests based on which handler they're hitting but they don't seem to. On Thu, Mar 13, 2014 at 2:18 PM, Software Dev static.void@gmail.com wrote: Here are some screen shots of our Solr Cloud cluster via Newrelic http://postimg.org/gallery/2hyzyeyc/ We currently have a 5 node cluster and all indexing is done on separate machines and shipped over. Our machines are running on SSD's with 18G of ram (Index size is 8G). We only have 1 shard at the moment with replicas on all 5 machines. I'm guessing thats a bit of a waste? How come when we do our bulk updating the response time actually decreases? I would think the load would be higher therefor response time should be higher. Any way I can decrease the response time? Thanks
Re: Solr supports log-based recovery?
Skimmed this, but yes, docs are durable thanks to transaction log that can replay on start. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 13, 2014 8:25 PM, shushuai zhu ss...@yahoo.com wrote: Hi, I noticed the following post indicating that Solr could recover not-committed data from operational log: http://www.opensourceconnections.com/2013/04/25/understanding-solr-soft-commits-and-data-durability/ which contradicts with Solr's web site: https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching that seems to indicate that data soft-committed before the last hard-commit is lost. I reproduced what the author did in the first post (the two lessons he listed) with Solr 4.7, and specifically compared below two experiments: I posted some records to Solr without commit I could not view the records on browser after that since I set soft-commit in 5 seconds After 5 seconds, I can view the records on browser Hard commit still does not happen since I set it in 60 seconds Kill the Solr with a kill -9 processId Keep the log file Re-start the Solr I could see the records via browser I think the hard-commit does not happen in the above experiment, since in a different experiment, I got: I posted some records to Solr without commit I could not view the records on browser after that since I set soft-commit in 5 seconds After 5 seconds, I can view the records on browser Hard commit still does not happen since I set it in 60 seconds Kill the Solr with a kill -9 processId Remove the log file Re-start the Solr I could NOT see the records via browser This means Solr supports some database-like recovery (based on log). So, as long as the log exists, after a crash, Solr can still recover from the log. Any comments or idea? Thanks. Shushuai
Re: Help me understand these newrelic graphs
It really depends, hard to give a definitive instruction without more pieces of info. e.g. if your CPUs are all maxed out and you already have a high number of concurrent queries than sharding may not be of any help at all. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 13, 2014 at 7:42 PM, Software Dev static.void@gmail.comwrote: Ahh.. its including the add operation. That makes sense I then. A bit silly on NR's part they don't break it down. Otis, our index is only 8G so I don't consider that big by any means but our queries can get a bit complex with a bit of faceting. Do you still think it makes sense to shard? How easy would this be to get working? On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I think NR has support for breaking by handler, no? Just checked - no. Only webapp controller, but that doesn't apply to Solr. SPM should be more helpful when it comes to monitoring Solr - you can filter by host, handler, collection/core, etc. -- you can see the demo - https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud. If your index is big or queries are complex, shard it and parallelize search. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 13, 2014 at 6:17 PM, ralph tice ralph.t...@gmail.com wrote: I think your response time is including the average response for an add operation, which generally returns very quickly and due to sheer number are averaging out the response time of your queries. New Relic should break out requests based on which handler they're hitting but they don't seem to. On Thu, Mar 13, 2014 at 2:18 PM, Software Dev static.void@gmail.com wrote: Here are some screen shots of our Solr Cloud cluster via Newrelic http://postimg.org/gallery/2hyzyeyc/ We currently have a 5 node cluster and all indexing is done on separate machines and shipped over. Our machines are running on SSD's with 18G of ram (Index size is 8G). We only have 1 shard at the moment with replicas on all 5 machines. I'm guessing thats a bit of a waste? How come when we do our bulk updating the response time actually decreases? I would think the load would be higher therefor response time should be higher. Any way I can decrease the response time? Thanks
Disabling lookups into disabled caches?
Hi, Is there a way to disable cache *lookups* into cached that are disabled? Check this for example: https://apps.sematext.com/spm-reports/s/Z04bfIvGyH This is a Document cache that was enabled, and then got disabled. But the lookups are still happening, which is pointless if the cache is disabled. If that's not doable, I will JIRA? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: Disabling lookups into disabled caches?
Hi Shawn, Here it is: https://issues.apache.org/jira/browse/SOLR-5851 Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Mar 11, 2014 at 11:22 PM, Shawn Heisey s...@elyograg.org wrote: On 3/11/2014 8:51 PM, Shawn Heisey wrote: On 3/11/2014 8:07 PM, Otis Gospodnetic wrote: Is there a way to disable cache *lookups* into cached that are disabled? Check this for example: https://apps.sematext.com/spm-reports/s/Z04bfIvGyH This is a Document cache that was enabled, and then got disabled. But the lookups are still happening, which is pointless if the cache is disabled. If that's not doable, I will JIRA? I think this needs an issue. I've worked up a *possible* patch for the problem. One that still needs testing and review. Which reminds me, I should probably invent new test methods for this. The lookups should have very little overhead, but any avoidable overhead *should* be avoided. The quickfix that I started with on FastLRUCache didn't work and made most of the tests fail. It turns out that FastLRUCache bumps the max cache size to 2 when you set it to zero. I haven't looked deeper into the other cache types yet. Once you create the issue, we can move this discussion there. Thanks, Shawn
Re: Mixing lucene scoring and other scoring
Hi Benson, http://lucene.apache.org/core/4_7_0/expressions/org/apache/lucene/expressions/Expression.html https://issues.apache.org/jira/browse/SOLR-5707 That? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 6, 2014 at 8:34 AM, Benson Margulies bimargul...@gmail.comwrote: Some months ago, I talked to some people at LR about this, but I can't find my notes. Imagine a function of some fields that produces a score between 0 and 1. Imagine that you want to combine this score with relevance over some more or less complex ordinary query. What are the options, given the arbitrary nature of Lucene scores?
Re: Solr Filter Cache Size
What Erick said. That's a giant Filter Cache. Have a look at these Solr metrics and note the Filter Cache in the middle: http://www.flickr.com/photos/otis/8409088080/ Note how small the cache is and how high the hit rate is. Those are stats for http://search-lucene.com/ and http://search-hadoop.com/ where you can see facets on the right that and up being used as filter queries. Most Solr apps I've seen had small Filter Caches. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:34 PM, Erick Erickson erickerick...@gmail.comwrote: This, BTW, is an ENORMOUS number cached queries. Here's a rough guide: Each entry will be (length of query) + maxDoc/8 bytes long. Think of the filterCache as a map where the key is the query and the value is a bitmap large enough to hold maxDoc bits. BTW, I'd kick this back to the default (512?) and periodically check it with the adminplugins/stats page to see what kind of hit ratio I have and adjust from there. Best, Erick On Mon, Mar 3, 2014 at 11:00 AM, Benjamin Wiens benjamin.wi...@gmail.com wrote: How can we calculate how much heap memory the filter cache will consume? We understand that in order to determine a good size we also need to evaluate how many filterqueries would be used over a certain time period. Here's our setting: filterCache class=solr.FastLRUCache size=30 initialSize=30 autowarmCount=5/ According to the post below, 53 GB of RAM would be needed just by the filter cache alone with 1.4 million Docs. Not sure if this true and how this would work. Reference: http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem We filled the filterquery cache with Solr Meter and had a JVM Heap Size of far less than 53 GB. Can anyone chime in and enlighten us? Thank you! Ben Wiens Benjamin Mosior
Re: Indexing huge data
Hi, 6M is really not huge these days. 6B is big, though also still not huge any more. What seems to be the bottleneck? Solr or DB or network or something else? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 2:37 PM, Rallavagu rallav...@gmail.com wrote: All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj based solution to collect data from difference resources to index into Solr. It takes hours to index Solr. Thanks in advance
Re: Indexing huge data
Hi, It depends. Are docs huge or small? Server single core or 32 core? Heap big or small? etc. etc. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu rallav...@gmail.com wrote: It seems the latency is introduced by collecting the data from different sources and putting them together then actual Solr index. I would say all these activities are contributing equally though I would say So, is it normal to expect to run indexing to run for long? Wondering what to expect in such cases. Thanks. On 3/5/14, 11:47 AM, Otis Gospodnetic wrote: Hi, 6M is really not huge these days. 6B is big, though also still not huge any more. What seems to be the bottleneck? Solr or DB or network or something else? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 2:37 PM, Rallavagu rallav...@gmail.com wrote: All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj based solution to collect data from difference resources to index into Solr. It takes hours to index Solr. Thanks in advance
Re: Indexing huge data
Hi, Each doc is 100K? That's on the big side, yes, and the server seems on the small side, yes. Hence the speed. :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu rallav...@gmail.com wrote: Otis, Good points. I guess you are suggesting that it depends on the resources. The document is 100k each the pre processing server is a 2 cpu VM running with 4G RAM. So, that could be a small machine relatively to process such amount of data?? On 3/5/14, 12:27 PM, Otis Gospodnetic wrote: Hi, It depends. Are docs huge or small? Server single core or 32 core? Heap big or small? etc. etc. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu rallav...@gmail.com wrote: It seems the latency is introduced by collecting the data from different sources and putting them together then actual Solr index. I would say all these activities are contributing equally though I would say So, is it normal to expect to run indexing to run for long? Wondering what to expect in such cases. Thanks. On 3/5/14, 11:47 AM, Otis Gospodnetic wrote: Hi, 6M is really not huge these days. 6B is big, though also still not huge any more. What seems to be the bottleneck? Solr or DB or network or something else? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 2:37 PM, Rallavagu rallav...@gmail.com wrote: All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj based solution to collect data from difference resources to index into Solr. It takes hours to index Solr. Thanks in advance
Re: Scalability Limit of SolrCloud
It depends on hardware, your latency requirements and such. We've helped customers with several billion documents, so big numbers alone are not a problem. Otis Solr ElasticSearch Support http://sematext.com/ On Feb 27, 2014 6:47 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All What is the Scalability Limit of CloudSolr, can it reach to index Billions of Documents and each document containing 400-500 Number Field(probably Float or Double). Is it possible and feasible to go with current CloudSolr Architecture or are there some other alternative or replacement. Regards
Re: SolrCloud Startup
Hi, Slow startup could it be your transaction logs are being replayed? Are they very big? Do you see lots of disk reading during those 20-30 minutes? Shawn was referring to http://wiki.apache.org/solr/SolrPerformanceProblems Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 24, 2014 at 10:41 PM, Shawn Heisey s...@elyograg.org wrote: Hi I have a 4 node solrcloud cluster with more than 50 collections with 4 shards each. Everytime I want to make a schema change, I upload configs to zookeeper and then restart all nodes. However the restart of every node is very slow and takes about 20-30 minutes per node. Is it recommended to make loadOnStartup=false and allow solrcloud to lazy load? Is there a way to make schema changes without restarting solrcloud? I'm on my phone so getting a Url for you is hard. Search the wiki for SolrPerformanceProblems. There's a section there on slow startup. If that's not it, it's probably not enough RAM for the OS disk cache. That is also discussed on that wiki page. Thanks, Shawn
JOB @ Sematext: Professional Services Lead = Head
Hello, We have what I think is a great opening at Sematext. Ideal candidate would be in New York, but that's not an absolute must. More info below + on http://sematext.com/about/jobs.html in job-ad-speak, but I'd be happy to describe what we are looking for, what we do, and what types of companies we work with in regular-human-speak off-line. DESCRIPTION Sematext is hiring a technical, hands-onProfessional Services Lead to join, lead, and grow the Professional Services side of Sematext and potentially grow into the Head role. REQUIREMENTS * Experience working with Solr or Elasticsearch * Plan and coordinate customer engagements from business and technical perspective * Identify customer pain points, needs, and success criteria at the onset of each engagement * Provide expert-level consulting and support services and strive to be a trustworthy advisor to a wide range of customers * Resolve complex search issues involving Solr or Elasticsearch * Identify opportunities to provide customers with additional value through our products or services * Communicate high-value use cases and customer feedback to our Product teams * Participate in open source community by contributing bug fixes, improvements, answering questions, etc. EXPERIENCE * BS or higher in Engineering or Computer Science preferred * 2 or more years of IT Consulting and/or Professional Services experience required * Exposure to other related open source projects (Hadoop, Nutch, Kafka, Storm, Mahout, etc.) a plus * Experience with other commercial and open source search technologies a plus * Enterprise Search, eCommerce, and/or Business Intelligence experience a plus * Experience working in a startup a plus Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: Solr server requirements for 100+ million documents
Hi Susheel, No, we wouldn't want to go with just 1 ZK. :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Hi Otis, Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1 for Zookeeper. Is that correct? Thanks, Susheel -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Friday, January 24, 2014 5:21 PM To: solr-user@lucene.apache.org Subject: Re: Solr server requirements for 100+ million documents Hi Susheel, Like Erick said, it's impossible to give precise recommendations, but making a few assumptions and combining them with experience (+ a licked finger in the air): * 3 servers * 32 GB * 2+ CPU cores * Linux Assuming docs are not bigger than a few KB, that they are not being reindexed over and over, that you don't have a search rate higher than a few dozen QPS, assuming your queries are not a page long, etc. assuming best practices are followed, the above should be sufficient. I hope this helps. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Hi, Currently we are indexing 10 million document from database (10 db data entities) index size is around 8 GB on windows virtual box. Indexing in one shot taking 12+ hours while indexing parallel in separate cores merging them together taking 4+ hours. We are looking to scale to 100+ million documents and looking for recommendation on servers requirements on below parameters for a Production environment. There can be 200+ users performing search same time. No of physical servers (considering solr cloud) Memory requirement Processor requirement (# cores) Linux as OS oppose to windows Thanks in advance. Susheel
Re: need help in understating solr cloud stats data
+101 for more stats. Was just saying that trying to pre-aggregate them along multiple dimensions is probably best left out of Solr. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Feb 4, 2014 at 10:49 AM, Mark Miller markrmil...@gmail.com wrote: I think that is silly. We can still offer per shard stats *and* let a user easily see stats for a collection without requiring they jump hoops or use a specific monitoring solution where someone else has already jumped hoops for them. You don't have to guess what ops people really want - *everyone* wants stats that make sense for the collections and cluster on top of the per shard stats. *Everyone* wouldn't mind seeing these without having to setup a monitoring solution first. If you want more than that, then you can fiddle with your monitoring solution. - Mark http://about.me/markrmiller On Feb 3, 2014, at 11:10 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Oh, I just saw Greg's email on dev@ about this. IMHO aggregating in the search engine is not the way to do. Leave that to external tools, which are likely to be more flexible when it comes to this. For example, our SPM for Solr can do all kinds of aggregations and filtering by a number of Solr and SolrCloud-specific dimensions already, without Solr having to do any sort of aggregation that it thinks Ops people will really want. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 11:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it's just no one has done it. We currently expect you to aggregate in the monitoring layer and it's a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Special NGRAMish requirement
Hi, Can you provide an example, Alexander? Otis Solr ElasticSearch Support http://sematext.com/ On Feb 3, 2014 5:28 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi, we need to use something very similar to EdgeNGram (minGramSize=1 maxGramSize=50 side=front). The only thing missing is that we would like to reduce the number of matches. The request we need to implement is returning only those matches with the longest tokens (or terms if that is the right word). Is there a way to do this in Solr (not necessarily with EdgeNGram)? Thanks, Alexander
Re: how to write an efficient query with a subquery to restrict the search space?
Hi, Sounds like a possible document and query routing use case. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 31, 2014 7:11 AM, svante karlsson s...@csi.se wrote: It seems to be faster to first restrict the search space and then do the scoring compared to just use the full query and let solr handle everything. For example in my application one of the scoring fields effectivly hits 1/12 of the database (a month field) and if we have 100'' items in the database the this matters. /svante 2014-01-30 Jack Krupansky j...@basetechnology.com: Lucene's default scoring should give you much of what you want - ranking hits of low-frequency terms higher - without any special query syntax - just list out your terms and use OR as your default operator. -- Jack Krupansky -Original Message- From: svante karlsson Sent: Thursday, January 23, 2014 6:42 AM To: solr-user@lucene.apache.org Subject: how to write an efficient query with a subquery to restrict the search space? I have a solr db containing 1 billion records that I'm trying to use in a NoSQL fashion. What I want to do is find the best matches using all search terms but restrict the search space to the most unique terms In this example I know that val2 and val4 is rare terms and val1 and val3 are more common. In my real scenario I'll have 20 fields that I want to include or exclude in the inner query depending on the uniqueness of the requested value. my first approach was: q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2 OR field4:val4)rows=100fl=* but what I think I get is . field4:val4 AND (field2:val2 OR field4:val4) this result is then OR'ed with the rest if I write q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND (field2:val2 OR field4:val4)rows=100fl=* then what I think I get is two sub-queries that is evaluated separately and then joined - performance wise this is bad. Whats the best way to write these types of queries? Are there any performance issues when running it on several solrcloud nodes vs a single instance or should it scale? /svante
Re: Adding DocValues in an existing field
Hi, You can change the field definition and then reindex. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 30, 2014 1:12 PM, yriveiro yago.rive...@gmail.com wrote: Hi, Can I add to an existing field the docvalue feature without wipe the actual? The modification on the schema will be something like this: field name=surrogate_id type=tlong indexed=true stored=true multiValued=false / field name=surrogate_id type=tlong indexed=true stored=true multiValued=false docValues=true/ I want use the actual data to reindex it again in the same collection but in the process create the docvalues too, it's possible? I'm using solr 4.6.1 - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-DocValues-in-an-existing-field-tp4114462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help in understating solr cloud stats data
Hi, Oh, I just saw Greg's email on dev@ about this. IMHO aggregating in the search engine is not the way to do. Leave that to external tools, which are likely to be more flexible when it comes to this. For example, our SPM for Solr can do all kinds of aggregations and filtering by a number of Solr and SolrCloud-specific dimensions already, without Solr having to do any sort of aggregation that it thinks Ops people will really want. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 11:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it's just no one has done it. We currently expect you to aggregate in the monitoring layer and it's a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Duplicate Facet.FIelds cause same results, should dedupe?
Hi, Don't know if this is old or new problem, but it does feel like a bug to me. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 10:48 AM, William Bell billnb...@gmail.com wrote: If we add : facet.field=prac_spec_heirfacet.field=prac_spec_heir we get it twice in the results. This breaks deserialization on wt=json since you cannot have the same name twice Thoughts? Seems like a new bug in 4.6 ? facet.field: [prac_spec_heir,all_proc_name_code,all_cond_name_code, prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name], -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: SOLR USING 100% percent CPU and not responding after a while
Hi, Show us more graphs. Is the GC working hard? Any of the JVM mem pools at or near 100%? SPM for Solr is your friend for long term monitoring/alerting/trends, jconsole and visualvm for a quick look. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Jan 28, 2014 at 2:11 PM, heaven aheave...@gmail.com wrote: I have the same problem, please look at the image: http://lucene.472066.n3.nabble.com/file/n4114026/Screenshot_733.png And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an issue, there's a lot. RAID 10 (15000RPM rapid hdd). -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4114026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Related Search Suggestions
Hi, I don't know of anything like that in OSS, but we have it here: http://sematext.com/products/related-searches/index.html Is that the functionality you are looking for? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jan 27, 2014 at 4:29 AM, kumar pavan2...@gmail.com wrote: What is the best way to implement related search suggestions. For example : If the user is looking for marriage halls i need to show results like catering services, photography, wedding cards, invitation cards, music organisers. Thanks Regards, kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Related-Search-Suggestions-tp4113672.html Sent from the Solr - User mailing list archive at Nabble.com.