Highlighting integer field
Hi, Is it possible to highlight int (TrieLongField) or long (TrieLongField) field in Solr? -- Paweł
Re: Edismax parser and boosts
Hi, Thank you for your response. I checked it in Solr 4.8 but I think this works as I described from very long time. I'm not 100% sure if it is really bug or not. When I run phrase query like foo^1.0 bar this works very similarto what happens in edismax with set *pf* parameter (boost part is not removed). -- Paweł Róg On Thu, Oct 9, 2014 at 12:07 AM, Jack Krupansky j...@basetechnology.com wrote: Definitely sounds like a bug! File a Jira. Thanks for reporting this. What release of Solr? -- Jack Krupansky -Original Message- From: Pawel Rog Sent: Wednesday, October 8, 2014 3:57 PM To: solr-user@lucene.apache.org Subject: Edismax parser and boosts Hi, I use edismax query with q parameter set as below: q=foo^1.0+AND+bar For such a query for the same document I see different (lower) scoring value than for q=foo+AND+bar By default boost of term is 1 as far as i know so why the scoring differs? When I check debugQuery parameter in parsedQuery for foo^1.0+AND+bar I see Boolean query which one of clauses is a phrase query foo 1.0 bar. It seems that edismax parser takes whole q parameter as a phrase without removing boost value and add it as a boolean clause. Is it a bug or it should work like that? -- Paweł Róg
Edismax parser and boosts
Hi, I use edismax query with q parameter set as below: q=foo^1.0+AND+bar For such a query for the same document I see different (lower) scoring value than for q=foo+AND+bar By default boost of term is 1 as far as i know so why the scoring differs? When I check debugQuery parameter in parsedQuery for foo^1.0+AND+bar I see Boolean query which one of clauses is a phrase query foo 1.0 bar. It seems that edismax parser takes whole q parameter as a phrase without removing boost value and add it as a boolean clause. Is it a bug or it should work like that? -- Paweł Róg
Contribute QParserPlugin
Hi, I need QParserPlugin that will use Redis as a backend to prepare filter queries. There are several data structures available in Redis (hash, set, etc.). From some reasons I cannot fetch data from redis data structures, build and send big requests from application. That's why I want to build that filters on backend (Solr) side. I'm wondering what do I have to do to contribute QParserPlugin into Solr repository. Can you suggest me a way (in a few steps) to publish it in Solr repository, probably as a contrib? -- Paweł Róg
Solr cloud hangs
Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Solr cloud hangs
Hi, Here is the whole stack trace: https://gist.github.com/anonymous/9056783 -- Pawel On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller markrmil...@gmail.com wrote: Can you share the full stack trace dump? - Mark http://about.me/markrmiller On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote: Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Solr cloud hangs
There are also many errors in solr log like that one: org.apache.solr.update.StreamingSolrServers$1; error org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232) at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- Pawel On Mon, Feb 17, 2014 at 8:01 PM, Pawel Rog pawelro...@gmail.com wrote: Hi, Here is the whole stack trace: https://gist.github.com/anonymous/9056783 -- Pawel On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller markrmil...@gmail.comwrote: Can you share the full stack trace dump? - Mark http://about.me/markrmiller On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote: Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Wildcard query vs facet.prefix for autocomplete?
Maybe try EdgeNgramFilterFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/#solr.EdgeNGramFilterFactory On Mon, Jul 16, 2012 at 6:57 AM, santamaria2 aravinda@contify.comwrote: I'm about to implement an autocomplete mechanism for my search box. I've read about some of the common approaches, but I have a question about wildcard query vs facet.prefix. Say I want autocomplete for a title: 'Shadows of the Damned'. I want this to appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that it won't appear if I type 'hadows'. While indexing, I'd use a whitespace tokenizer and a lowercase filter to store that title in the index. Now I'm thinking two approaches for 'dam' typed in the search box: 1) q=title:dam* 2) q=*:*facet=onfacet.field=titlefacet.prefix=dam So any reason that I should favour one over the other? Speed a factor? The index has around 200,000 items. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FilterCache - maximum size of document set
Thanks I don't use NOW in queries. All my filters with timestamp are rounded to hundreds of seconds to increase hitrate. The only problem could be in price filters which can be varied (users are unpredictable :P), but also that filters from fq or setting cache=false is also bad idea ... checked it :) Load rised three times :) -- Pawel On Fri, Jun 15, 2012 at 1:30 PM, Erick Erickson erickerick...@gmail.comwrote: Test first, of course, but slave on 3.6 and master on 3.5 should be fine. If you're getting evictions with the cache settings that high, you really want to look at why. Note that in particular, using NOW in your filter queries virtually guarantees that they won't be re-used as per the link I sent yesterday. Best Erick On Fri, Jun 15, 2012 at 1:15 AM, Pawel Rog pawelro...@gmail.com wrote: It can be true that filters cache max size is set to high value. That is also true that. We looked at evictions and hit rate earlier. Maybe you are right that evictions are not always unwanted. Some time ago we made tests. There are not so high difference in hit rate when filters maxSize is set to 4000 (hit rate about 85%) and 16000 (hitrate about 91%). I think that also using LFU cache can be helpful but it makes me to migrate to 3.6. Do you think it is reasonable to use slave on version 3.6 and master on 3.5? Once again, Thanks for your help -- Pawel On Thu, Jun 14, 2012 at 7:22 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, your maxSize is pretty high, it may just be that you've set this much higher than is wise. The maxSize setting governs the number of entries. I'd start with a much lower number here, and monitor the solr/admin page for both hit ratio and evictions. Well, and size too. 16,000 entries puts a ceiling of, what, 48G on it? Ouch! It sounds like what's happening here is you're just accumulating more and more fqs over the course of the evening and blowing memory. Not all FQs will be that big, there's some heuristics in there to just store the document numbers for sparse filters, maxDocs/8 is pretty much the upper bound though. Evictions are not necessarily a bad thing, the hit-ratio is important here. And if you're using a bare NOW in your filter queries, you're probably never re-using them anyway, see: http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/ I really question whether this limit is reasonable, but you know your situation best. Best Erick On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog pawelro...@gmail.com wrote: Thanks for your response Yes, maybe you are right. I thought that filters can be larger than 3M. All kinds of filters uses BitSet? Moreover maxSize of filterCache is set to 16000 in my case. There are evictions during day traffic but not during night traffic. Version of Solr which I use is 3.5 I haven't used Memory Anayzer yet. Could you write more details about it? -- Regards, Pawel On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize
Re: FilterCache - maximum size of document set
It can be true that filters cache max size is set to high value. That is also true that. We looked at evictions and hit rate earlier. Maybe you are right that evictions are not always unwanted. Some time ago we made tests. There are not so high difference in hit rate when filters maxSize is set to 4000 (hit rate about 85%) and 16000 (hitrate about 91%). I think that also using LFU cache can be helpful but it makes me to migrate to 3.6. Do you think it is reasonable to use slave on version 3.6 and master on 3.5? Once again, Thanks for your help -- Pawel On Thu, Jun 14, 2012 at 7:22 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, your maxSize is pretty high, it may just be that you've set this much higher than is wise. The maxSize setting governs the number of entries. I'd start with a much lower number here, and monitor the solr/admin page for both hit ratio and evictions. Well, and size too. 16,000 entries puts a ceiling of, what, 48G on it? Ouch! It sounds like what's happening here is you're just accumulating more and more fqs over the course of the evening and blowing memory. Not all FQs will be that big, there's some heuristics in there to just store the document numbers for sparse filters, maxDocs/8 is pretty much the upper bound though. Evictions are not necessarily a bad thing, the hit-ratio is important here. And if you're using a bare NOW in your filter queries, you're probably never re-using them anyway, see: http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/ I really question whether this limit is reasonable, but you know your situation best. Best Erick On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog pawelro...@gmail.com wrote: Thanks for your response Yes, maybe you are right. I thought that filters can be larger than 3M. All kinds of filters uses BitSet? Moreover maxSize of filterCache is set to 16000 in my case. There are evictions during day traffic but not during night traffic. Version of Solr which I use is 3.5 I haven't used Memory Anayzer yet. Could you write more details about it? -- Regards, Pawel On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize is also not good solution because during day cache filters are very very helpful. You could be interested in type of filters that I use. These are range filters (I tried standard range filters and frange) - eg. price:[* TO 1]. Some fq with price can return few thousands of results (eg. price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of documents. I'd also like to avoid solution which will introduce strict ranges that user can choose. Have you any suggestions what can I do? Is there any way to limit for example maximum size of docSet which is cached in FilterCache? -- Pawel
Re: FilterCache - maximum size of document set
Thanks for your response Yes, maybe you are right. I thought that filters can be larger than 3M. All kinds of filters uses BitSet? Moreover maxSize of filterCache is set to 16000 in my case. There are evictions during day traffic but not during night traffic. Version of Solr which I use is 3.5 I haven't used Memory Anayzer yet. Could you write more details about it? -- Regards, Pawel On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize is also not good solution because during day cache filters are very very helpful. You could be interested in type of filters that I use. These are range filters (I tried standard range filters and frange) - eg. price:[* TO 1]. Some fq with price can return few thousands of results (eg. price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of documents. I'd also like to avoid solution which will introduce strict ranges that user can choose. Have you any suggestions what can I do? Is there any way to limit for example maximum size of docSet which is cached in FilterCache? -- Pawel
Re: Difference between two solr indexes
If there are only 100'000 documents dump all document ids and make diff If you're using linux based system you can just use simple tools to do it. Something like that can be helpful curl http://your.hostA:port/solr/index/select?*:*fl=idwt=csv; /tmp/idsA curl http://your.hostB:port/solr/index/select?*:*fl=idwt=csv; /tmp/idsB diff /tmp/idsA /tmp/idsB | grep \| | awk '{print $2;}' | sed 's/\(.*\)/id\1\/id/g' /tmp/ids_to_delete.xml Now you have file. Now you can just add to that file delete and /detele and upload that file into solr using curl curl -X POST -d @/tmp/ids_to_delete.xml http://your.hostA:port /solr/index/upadte On Tue, Apr 17, 2012 at 2:09 PM, nutchsolruser nutchsolru...@gmail.comwrote: I'm Also seeking solution for similar problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-two-solr-indexes-tp3916328p3917050.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr hangs
You wrote that you can see such error OutOfMemoryError. I had such problems when my caches were to big. It means that there is no more free memory in JVM and probably full gc starts running. How big is your Java heap? Maybe cache sizes in yout solr are to big according to your JVM settings. -- Regards, Pawel On Tue, Apr 10, 2012 at 9:51 PM, Peter Markey sudoma...@gmail.com wrote: Hello, I have a solr cloud setup based on a blog ( http://outerthought.org/blog/491-ot.html) and am able to bring up the instances and cores. But when I start indexing data (through csv update), the core throws a out of memory exception (null:java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread). The thread dump from new solr ui is below: cmdDistribExecutor-8-thread-777 (827) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1bd11b79 - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2043) - org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158) - org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking (ConnPoolByRoute.java:403) - org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry (ConnPoolByRoute.java:300) - org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection (ThreadSafeClientConnManager.java:224) - org.apache.http.impl.client.DefaultRequestDirector.execute (DefaultRequestDirector.java:401) - org.apache.http.impl.client.AbstractHttpClient.execute (AbstractHttpClient.java:820) - org.apache.http.impl.client.AbstractHttpClient.execute (AbstractHttpClient.java:754) - org.apache.http.impl.client.AbstractHttpClient.execute (AbstractHttpClient.java:732) - org.apache.solr.client.solrj.impl.HttpSolrServer.request (HttpSolrServer.java:304) - org.apache.solr.client.solrj.impl.HttpSolrServer.request (HttpSolrServer.java:209) - org.apache.solr.update.SolrCmdDistributor$1.call (SolrCmdDistributor.java:320) - org.apache.solr.update.SolrCmdDistributor$1.call (SolrCmdDistributor.java:301) - java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) - java.util.concurrent.FutureTask.run(FutureTask.java:166) - java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) - java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) - java.util.concurrent.FutureTask.run(FutureTask.java:166) - java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1110) - java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:603) - java.lang.Thread.run(Thread.java:679) Apparently I do see lots of threads like above in the thread dump. I'm using latest build from the trunk (Apr 10th). Any insights into this issue woudl be really helpful. Thanks a lot.
Re: Usage of * as a first character in wild card query
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory On Mon, Mar 26, 2012 at 7:08 AM, Ishan isan.fu...@germinait.com wrote: Hi, I need to query on solr with * as a first character in query. For eg. Content indexed in* Be careful *and query i want to fire is **ful *But solr does not allow * as a first character in wildcard query. Plz let me know if there is any other alternative for doing this*. * -- Thanks Regards, Isan Fulia.
Re: Boosting terms
Thanks a lot, I'll read it :) It seems to be helpfull On Sun, Mar 18, 2012 at 8:58 PM, Ahmet Arslan iori...@yahoo.com wrote: Is there any possibility to boost terms during indexing? Searching that using google I found information that there is no such feature in Solr (we can only boost fields). Is it true? Yes, only field and document boosting exist. You might find this article interesting. http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
Re: Help with duplicate unique IDs
Once I had the same problem. I didn't know what's going on. After few moment of analysis I created completely new index and removed old one (I hadn't enough time to analyze problem). Problem didn't come back any more. -- Regards, Pawel On Fri, Mar 2, 2012 at 8:23 PM, Thomas Dowling tdowl...@ohiolink.edu wrote: In a Solr index of journal articles, I thought I was safe reindexing articles because their unique ID would cause the new record in the index to overwrite the old one. (As stated at http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field - right?) My schema.xml includes: fields... field name=id type=string indexed=true stored=true required=true/ .../fields And: uniqueKeyid/uniqueKey And yet I can compose a query with two hits in the index, showing: #1: str name=id03405443/v66i0003/347_mrirtaitmbpa/str #2: str name=id03405443/v66i0003/347_mrirtaitmbpa/str Can anyone give pointers on where I'm screwing something up? Thomas Dowling thomas.dowl...@gmail.com
Re: Realtime profile data
Thank you. I'll try NRT and some post-filter :) On Tue, Feb 7, 2012 at 3:09 PM, Erick Erickson erickerick...@gmail.com wrote: You have several options: 1 if you can go to trunk (bleeding edge, I admit), you can get into the near real time (NRT) stuff. 2 You could maintain essentially a post-filter step where your app maintains a list of deleted messages and removes them from the response. This will cause some of your counts (e.g. facets, grouping) to be slightly off 3 Train your users to expect whatever latency you've built into the system (i.e. indexing, commit and replication) Best Erick On Mon, Feb 6, 2012 at 10:42 AM, Pawel Rog pawelro...@gmail.com wrote: Hello. I have some problem which i'd like to solve using solr. I have user profile which has some kind of messages in it. User can filter messages, sort them etc. The problem is with delete operation. If user click on message to delete it it's very hard to update index of solr in real time. When user deletes message, it will be still visible. Have you idea how to solve problem with removing data?
Re: Solr 3.5 very slow (performance)
* 1st question (ls from index directory) solr 1.4 -rw-r--r-- 1 user user2180582 Nov 30 07:26 _3g1_cf.del -rw-r--r-- 1 user user 5190652802 Nov 28 17:57 _3g1.fdt -rw-r--r-- 1 user user 139556724 Nov 28 17:57 _3g1.fdx -rw-r--r-- 1 user user 4963 Nov 28 17:56 _3g1.fnm -rw-r--r-- 1 user user 1879006175 Nov 28 18:01 _3g1.frq -rw-r--r-- 1 user user 513919573 Nov 28 18:01 _3g1.prx -rw-r--r-- 1 user user2745451 Nov 28 18:01 _3g1.tii -rw-r--r-- 1 user user 218731810 Nov 28 18:01 _3g1.tis -rw-r--r-- 1 user user 275268 Nov 30 07:26 _3uu_1a.del -rw-r--r-- 1 user user 666375513 Nov 30 03:35 _3uu.fdt -rw-r--r-- 1 user user 17616636 Nov 30 03:35 _3uu.fdx -rw-r--r-- 1 user user 4884 Nov 30 03:35 _3uu.fnm -rw-r--r-- 1 user user 243847897 Nov 30 03:35 _3uu.frq -rw-r--r-- 1 user user 64791316 Nov 30 03:35 _3uu.prx -rw-r--r-- 1 user user 545317 Nov 30 03:35 _3uu.tii -rw-r--r-- 1 user user 42993472 Nov 30 03:35 _3uu.tis -rw-r--r-- 1 user user 1178 Nov 30 07:26 _3wj_1.del -rw-r--r-- 1 user user2813124 Nov 30 07:26 _3wj.fdt -rw-r--r-- 1 user user 74852 Nov 30 07:26 _3wj.fdx -rw-r--r-- 1 user user 2175 Nov 30 07:26 _3wj.fnm -rw-r--r-- 1 user user 911051 Nov 30 07:26 _3wj.frq -rw-r--r-- 1 user user 4 Nov 30 07:26 _3wj.nrm -rw-r--r-- 1 user user 285405 Nov 30 07:26 _3wj.prx -rw-r--r-- 1 user user 7951 Nov 30 07:26 _3wj.tii -rw-r--r-- 1 user user 624702 Nov 30 07:26 _3wj.tis -rw-r--r-- 1 user user 35859092 Nov 30 07:26 _3wk.fdt -rw-r--r-- 1 user user 958148 Nov 30 07:26 _3wk.fdx -rw-r--r-- 1 user user 4104 Nov 30 07:26 _3wk.fnm -rw-r--r-- 1 user user 12228212 Nov 30 07:26 _3wk.frq -rw-r--r-- 1 user user3438508 Nov 30 07:26 _3wk.prx -rw-r--r-- 1 user user 58672 Nov 30 07:26 _3wk.tii -rw-r--r-- 1 user user4621519 Nov 30 07:26 _3wk.tis -rw-r--r-- 1 user user 0 Nov 30 07:27 lucene-9445a367a714cc9bf70d0ebdf83b9e01-write.lock -rw-r--r-- 1 user user 1010 Nov 30 07:26 segments_2tr -rw-r--r-- 1 user user 20 Nov 17 14:06 segments.gen solr 3.5 (dates are older - because I turned off feeding 3.5 instance) -rw-r--r-- 1 user user2188376 Nov 29 13:10 _2x_6g.del -rw-r--r-- 1 user user 4955406209 Nov 28 17:38 _2x.fdt -rw-r--r-- 1 user user 140054140 Nov 28 17:38 _2x.fdx -rw-r--r-- 1 user user 4852 Nov 28 17:37 _2x.fnm -rw-r--r-- 1 user user 1845719205 Nov 28 17:42 _2x.frq -rw-r--r-- 1 user user 497871055 Nov 28 17:42 _2x.prx -rw-r--r-- 1 user user3006635 Nov 28 17:42 _2x.tii -rw-r--r-- 1 user user 230304265 Nov 28 17:42 _2x.tis -rw-r--r-- 1 user user 50128 Nov 29 13:10 _5s_48.del -rw-r--r-- 1 user user 116159640 Nov 29 00:25 _5s.fdt -rw-r--r-- 1 user user3206268 Nov 29 00:25 _5s.fdx -rw-r--r-- 1 user user 4963 Nov 29 00:25 _5s.fnm -rw-r--r-- 1 user user 44556139 Nov 29 00:25 _5s.frq -rw-r--r-- 1 user user 11405232 Nov 29 00:25 _5s.prx -rw-r--r-- 1 user user 149965 Nov 29 00:25 _5s.tii -rw-r--r-- 1 user user 11662163 Nov 29 00:25 _5s.tis -rw-r--r-- 1 user user 63191 Nov 29 13:10 _97_1o.del -rw-r--r-- 1 user user 145482785 Nov 29 08:08 _97.fdt -rw-r--r-- 1 user user4042300 Nov 29 08:08 _97.fdx -rw-r--r-- 1 user user 4963 Nov 29 08:08 _97.fnm -rw-r--r-- 1 user user 55361299 Nov 29 08:08 _97.frq -rw-r--r-- 1 user user 14181208 Nov 29 08:08 _97.prx -rw-r--r-- 1 user user 187731 Nov 29 08:08 _97.tii -rw-r--r-- 1 user user 14617940 Nov 29 08:08 _97.tis -rw-r--r-- 1 user user 21310 Nov 29 13:10 _9q_1a.del -rw-r--r-- 1 user user 49864395 Nov 29 09:19 _9q.fdt -rw-r--r-- 1 user user1361884 Nov 29 09:19 _9q.fdx -rw-r--r-- 1 user user 4963 Nov 29 09:19 _9q.fnm -rw-r--r-- 1 user user 17879364 Nov 29 09:19 _9q.frq -rw-r--r-- 1 user user4970178 Nov 29 09:19 _9q.prx -rw-r--r-- 1 user user 75969 Nov 29 09:19 _9q.tii -rw-r--r-- 1 user user5932085 Nov 29 09:19 _9q.tis -rw-r--r-- 1 user user 62661357 Nov 29 10:19 _a6.fdt -rw-r--r-- 1 user user1717820 Nov 29 10:19 _a6.fdx -rw-r--r-- 1 user user 4963 Nov 29 10:19 _a6.fnm -rw-r--r-- 1 user user 23283028 Nov 29 10:19 _a6.frq -rw-r--r-- 1 user user6196945 Nov 29 10:19 _a6.prx -rw-r--r-- 1 user user 92528 Nov 29 10:19 _a6.tii -rw-r--r-- 1 user user7209783 Nov 29 10:19 _a6.tis -rw-r--r-- 1 user user 26871 Nov 29 13:10 _a6_y.del -rw-r--r-- 1 user user 16372020 Nov 29 10:39 _ab.fdt -rw-r--r-- 1 user user 455476 Nov 29 10:39 _ab.fdx -rw-r--r-- 1 user user 4963 Nov 29 10:39 _ab.fnm -rw-r--r-- 1 user user6025966 Nov 29 10:39 _ab.frq -rw-r--r-- 1 user user1622841 Nov 29 10:39 _ab.prx -rw-r--r-- 1 user user 35252 Nov 29 10:39 _ab.tii -rw-r--r-- 1 user user2766468 Nov 29 10:39 _ab.tis -rw-r--r-- 1 user user 7147 Nov 29 13:10 _ab_u.del -rw-r--r-- 1 user user 14818116 Nov 29 11:09 _aj.fdt -rw-r--r-- 1 user user 409356 Nov 29 11:09 _aj.fdx -rw-r--r-- 1 user user 4963 Nov 29 11:09 _aj.fnm -rw-r--r-- 1 user user5461353
Re: Solr 3.5 very slow (performance)
I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu (left side of chart). at the begining of chart there was about 60rps and about 100rps (before turning off solr 3.5). Then there was 1.4 turned on with 100rps. -- Pawel On Wed, Nov 30, 2011 at 9:07 AM, Pawel Rog pawelro...@gmail.com wrote: * 1st question (ls from index directory) solr 1.4 -rw-r--r-- 1 user user 2180582 Nov 30 07:26 _3g1_cf.del -rw-r--r-- 1 user user 5190652802 Nov 28 17:57 _3g1.fdt -rw-r--r-- 1 user user 139556724 Nov 28 17:57 _3g1.fdx -rw-r--r-- 1 user user 4963 Nov 28 17:56 _3g1.fnm -rw-r--r-- 1 user user 1879006175 Nov 28 18:01 _3g1.frq -rw-r--r-- 1 user user 513919573 Nov 28 18:01 _3g1.prx -rw-r--r-- 1 user user 2745451 Nov 28 18:01 _3g1.tii -rw-r--r-- 1 user user 218731810 Nov 28 18:01 _3g1.tis -rw-r--r-- 1 user user 275268 Nov 30 07:26 _3uu_1a.del -rw-r--r-- 1 user user 666375513 Nov 30 03:35 _3uu.fdt -rw-r--r-- 1 user user 17616636 Nov 30 03:35 _3uu.fdx -rw-r--r-- 1 user user 4884 Nov 30 03:35 _3uu.fnm -rw-r--r-- 1 user user 243847897 Nov 30 03:35 _3uu.frq -rw-r--r-- 1 user user 64791316 Nov 30 03:35 _3uu.prx -rw-r--r-- 1 user user 545317 Nov 30 03:35 _3uu.tii -rw-r--r-- 1 user user 42993472 Nov 30 03:35 _3uu.tis -rw-r--r-- 1 user user 1178 Nov 30 07:26 _3wj_1.del -rw-r--r-- 1 user user 2813124 Nov 30 07:26 _3wj.fdt -rw-r--r-- 1 user user 74852 Nov 30 07:26 _3wj.fdx -rw-r--r-- 1 user user 2175 Nov 30 07:26 _3wj.fnm -rw-r--r-- 1 user user 911051 Nov 30 07:26 _3wj.frq -rw-r--r-- 1 user user 4 Nov 30 07:26 _3wj.nrm -rw-r--r-- 1 user user 285405 Nov 30 07:26 _3wj.prx -rw-r--r-- 1 user user 7951 Nov 30 07:26 _3wj.tii -rw-r--r-- 1 user user 624702 Nov 30 07:26 _3wj.tis -rw-r--r-- 1 user user 35859092 Nov 30 07:26 _3wk.fdt -rw-r--r-- 1 user user 958148 Nov 30 07:26 _3wk.fdx -rw-r--r-- 1 user user 4104 Nov 30 07:26 _3wk.fnm -rw-r--r-- 1 user user 12228212 Nov 30 07:26 _3wk.frq -rw-r--r-- 1 user user 3438508 Nov 30 07:26 _3wk.prx -rw-r--r-- 1 user user 58672 Nov 30 07:26 _3wk.tii -rw-r--r-- 1 user user 4621519 Nov 30 07:26 _3wk.tis -rw-r--r-- 1 user user 0 Nov 30 07:27 lucene-9445a367a714cc9bf70d0ebdf83b9e01-write.lock -rw-r--r-- 1 user user 1010 Nov 30 07:26 segments_2tr -rw-r--r-- 1 user user 20 Nov 17 14:06 segments.gen solr 3.5 (dates are older - because I turned off feeding 3.5 instance) -rw-r--r-- 1 user user 2188376 Nov 29 13:10 _2x_6g.del -rw-r--r-- 1 user user 4955406209 Nov 28 17:38 _2x.fdt -rw-r--r-- 1 user user 140054140 Nov 28 17:38 _2x.fdx -rw-r--r-- 1 user user 4852 Nov 28 17:37 _2x.fnm -rw-r--r-- 1 user user 1845719205 Nov 28 17:42 _2x.frq -rw-r--r-- 1 user user 497871055 Nov 28 17:42 _2x.prx -rw-r--r-- 1 user user 3006635 Nov 28 17:42 _2x.tii -rw-r--r-- 1 user user 230304265 Nov 28 17:42 _2x.tis -rw-r--r-- 1 user user 50128 Nov 29 13:10 _5s_48.del -rw-r--r-- 1 user user 116159640 Nov 29 00:25 _5s.fdt -rw-r--r-- 1 user user 3206268 Nov 29 00:25 _5s.fdx -rw-r--r-- 1 user user 4963 Nov 29 00:25 _5s.fnm -rw-r--r-- 1 user user 44556139 Nov 29 00:25 _5s.frq -rw-r--r-- 1 user user 11405232 Nov 29 00:25 _5s.prx -rw-r--r-- 1 user user 149965 Nov 29 00:25 _5s.tii -rw-r--r-- 1 user user 11662163 Nov 29 00:25 _5s.tis -rw-r--r-- 1 user user 63191 Nov 29 13:10 _97_1o.del -rw-r--r-- 1 user user 145482785 Nov 29 08:08 _97.fdt -rw-r--r-- 1 user user 4042300 Nov 29 08:08 _97.fdx -rw-r--r-- 1 user user 4963 Nov 29 08:08 _97.fnm -rw-r--r-- 1 user user 55361299 Nov 29 08:08 _97.frq -rw-r--r-- 1 user user 14181208 Nov 29 08:08 _97.prx -rw-r--r-- 1 user user 187731 Nov 29 08:08 _97.tii -rw-r--r-- 1 user user 14617940 Nov 29 08:08 _97.tis -rw-r--r-- 1 user user 21310 Nov 29 13:10 _9q_1a.del -rw-r--r-- 1 user user 49864395 Nov 29 09:19 _9q.fdt -rw-r--r-- 1 user user 1361884 Nov 29 09:19 _9q.fdx -rw-r--r-- 1 user user 4963 Nov 29 09:19 _9q.fnm -rw-r--r-- 1 user user 17879364 Nov 29 09:19 _9q.frq -rw-r--r-- 1 user user 4970178 Nov 29 09:19 _9q.prx -rw-r--r-- 1 user user 75969 Nov 29 09:19 _9q.tii -rw-r--r-- 1 user user 5932085 Nov 29 09:19 _9q.tis -rw-r--r-- 1 user user 62661357 Nov 29 10:19 _a6.fdt -rw-r--r-- 1 user user 1717820 Nov 29 10:19 _a6.fdx -rw-r--r-- 1 user user 4963 Nov 29 10:19 _a6.fnm -rw-r--r-- 1 user user 23283028 Nov 29 10:19 _a6.frq -rw-r--r-- 1 user user 6196945 Nov 29 10:19 _a6.prx -rw-r--r-- 1 user user 92528 Nov 29 10:19 _a6.tii -rw-r--r-- 1 user user 7209783 Nov 29 10:19 _a6.tis -rw-r--r-- 1 user user 26871 Nov 29 13:10 _a6_y.del -rw-r--r-- 1 user user 16372020 Nov 29 10:39 _ab.fdt -rw-r--r-- 1 user user 455476 Nov 29 10:39 _ab.fdx -rw-r--r-- 1 user user 4963 Nov 29 10:39 _ab.fnm -rw-r--r-- 1 user user 6025966 Nov 29 10:39 _ab.frq -rw-r--r-- 1 user user
Re: Solr 3.5 very slow (performance)
I made thread dump. Most active threads have such trace: 471003383@qtp-536357250-245 - Thread t@270 java.lang.Thread.State: RUNNABLE at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1144) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:362) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:378) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) On Wed, Nov 30, 2011 at 10:31 AM, Pawel Rog pawelro...@gmail.com wrote: I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu (left side of chart). at the begining of chart there was about 60rps and about 100rps (before turning off solr 3.5). Then there was 1.4 turned on with 100rps. -- Pawel On Wed, Nov 30, 2011 at 9:07 AM, Pawel Rog pawelro...@gmail.com wrote: * 1st question (ls from index directory) solr 1.4 -rw-r--r-- 1 user user 2180582 Nov 30 07:26 _3g1_cf.del -rw-r--r-- 1 user user 5190652802 Nov 28 17:57 _3g1.fdt -rw-r--r-- 1 user user 139556724 Nov 28 17:57 _3g1.fdx -rw-r--r-- 1 user user 4963 Nov 28 17:56 _3g1.fnm -rw-r--r-- 1 user user 1879006175 Nov 28 18:01 _3g1.frq -rw-r--r-- 1 user user 513919573 Nov 28 18:01 _3g1.prx -rw-r--r-- 1 user user 2745451 Nov 28 18:01 _3g1.tii -rw-r--r-- 1 user user 218731810 Nov 28 18:01 _3g1.tis -rw-r--r-- 1 user user 275268 Nov 30 07:26 _3uu_1a.del -rw-r--r-- 1 user user 666375513 Nov 30 03:35 _3uu.fdt -rw-r--r-- 1 user user 17616636 Nov 30 03:35 _3uu.fdx -rw-r--r-- 1 user user 4884 Nov 30 03:35 _3uu.fnm -rw-r--r-- 1 user user 243847897 Nov 30 03:35 _3uu.frq -rw-r--r-- 1 user user 64791316 Nov 30 03:35 _3uu.prx -rw-r--r-- 1 user user 545317 Nov 30 03:35 _3uu.tii -rw-r--r-- 1 user user 42993472 Nov 30 03:35 _3uu.tis -rw-r--r-- 1 user user 1178 Nov 30 07:26 _3wj_1.del -rw-r--r-- 1 user user 2813124 Nov 30 07:26 _3wj.fdt -rw-r--r-- 1 user user 74852 Nov 30 07:26 _3wj.fdx -rw-r--r-- 1 user user 2175 Nov 30 07:26 _3wj.fnm -rw-r--r-- 1 user user 911051 Nov 30 07:26 _3wj.frq -rw-r--r-- 1 user user 4 Nov 30 07:26 _3wj.nrm -rw-r--r-- 1 user user 285405 Nov 30 07:26 _3wj.prx -rw-r--r-- 1 user user 7951 Nov 30 07:26 _3wj.tii -rw-r--r-- 1 user user 624702 Nov 30 07:26 _3wj.tis -rw-r--r-- 1 user user 35859092 Nov 30 07:26 _3wk.fdt -rw-r--r-- 1 user user 958148 Nov 30 07:26 _3wk.fdx -rw-r--r-- 1 user user 4104 Nov 30 07:26 _3wk.fnm -rw-r--r-- 1 user user 12228212 Nov 30 07:26 _3wk.frq -rw-r--r-- 1 user user 3438508 Nov 30 07:26 _3wk.prx -rw-r--r-- 1 user user 58672 Nov 30 07:26 _3wk.tii -rw-r--r-- 1 user user 4621519 Nov 30 07:26 _3wk.tis -rw-r--r-- 1 user user 0 Nov 30 07:27 lucene-9445a367a714cc9bf70d0ebdf83b9e01-write.lock -rw-r--r-- 1 user user 1010 Nov 30 07:26 segments_2tr -rw-r--r-- 1 user user 20 Nov 17 14:06 segments.gen solr 3.5 (dates are older - because I turned off feeding 3.5 instance)
Re: Solr 3.5 very slow (performance)
http://imageshack.us/photo/my-images/838/cpuusage.png/ On Wed, Nov 30, 2011 at 9:18 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu : (left side of chart). FWIW: The mailing list software filters out most attachments (there are some exceptions for certain text mime types) -Hoss
Re: Solr 3.5 very slow (performance)
On Wed, Nov 30, 2011 at 9:05 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I tried to use index from 1.4 (load was the same as on index from 3.5) : but there was problem with synchronization with master (invalid : javabin format) : Then I built new index on 3.5 with luceneMatchVersion LUCENE_35 why would you need to re-replicate from the master? You already have a copy of the Solr 1.4 index on the slave machine where you are doing testing correct? Just (make sure Solr 1.4 isn't running and) point Solr 3.5 at that solr home directory for the configs and data and time that. (Just because Solr 3.5 can't replicate from Solr 1.4 over HTTP doesn't mean it can't open indexes built by Solr 1.4) I made It before sending earlier e-mail. Efect was the same. It's important to understand if the discrepencies you are seeing have to do with *building* the index under Solr 3.5, or *searching* in Solr 3.5. : reader : SolrIndexReader{this=8cca36c,r=ReadOnlyDirectoryReader@8cca36c,refCnt=1,segments=4} : readerDir : org.apache.lucene.store.NIOFSDirectory@/data/solr_data/itemsfull/index : : solr 3.5 : reader : SolrIndexReader{this=3d01e178,r=ReadOnlyDirectoryReader@3d01e178,refCnt=1,segments=14} : readerDir : org.apache.lucene.store.MMapDirectory@/data/solr_data_350/itemsfull/index : lockFactory=org.apache.lucene.store.NativeFSLockFactory@294ce5eb As mentioned, the difference in the number of segments may be contributing to the perf differences you are seeing, so optimizing both indexes (or doing a partial optimize of your 3.5 index down to 4 segments) for comparison would probably be worthwhile. (and if that is the entirety of hte problem, then explicitly configuring a MergePolicy may help you in the long run) but independent of that I would like to suggest that you first try explicitly configuring Solr 3.5 to use NIOFSDirectory so it's consistent with what Solr 1.4 was doing (I'm told MMapDirectory should be faster, but maybe there's something about your setup that makes that not true) So it would be helpful to also try adding this to your 3.5 solrconfig.xml and testing ... directoryFactory name=DirectoryFactory class=solr.NIOFSDirectoryFactory/ : I made some test with quiet heavy query (with frange). In both cases : (1.4 and 3.5) I used the same newSearcher queries and started solr : without any load. : Results of debug timing Ok, well ... honestly: giving us *one* example of the timing data for *one* query (w/o even telling us what the exact query was) ins't really anything we can use to help you ... the crux of the question was: was the slow performance you are seeing only under heavy load or was it also slow when you did manual testing? : When I send fewer than 60 rps I see that in comparsion to 1.4 median : response time is worse, avarage is worse but maximum time is better. : It doesn't change propotion of cpu usage (3.5 uses much more cpu). How much fewer then 60 rps ? ... I'm trying to understand if the problems you are seeing are solely happening under heavy concurrent load, or if you are seeing Solr 3.5 consistently respond much slower then Solr 1.4 even with a single client? Also: I may still be missunderstanding how you are generating load, and wether you are throttling the clients, but seeing higher CPU utilization in Solr 3.5 isn't neccessarily an indication of something going wrong -- in some cases higher CPU% (particularly under heavy concurrent load on a multi-core machine) could just mean that Solr is now capable of utilizing more CPU to process parallel request, where as previous versions might have been hitting other bottle necks. -- but that doesn't explain the slower response times. that's what concerns me the most. I don't think that 1200% CPU usage with the same traffic is better then 200%. I think you are wrong :) Using solr 1.4 I can reach 300rps and then reach 1200% on cpu and only 60rps in solr 3.5 FWIW: I'm still wondering what the stats from your caches wound up looking like on both Solr 1.4 and Solr 3.5... 7) What do the cache stats look like on your Solr 3.5 instance after you've done some of this timing testing? the output of... http://localhost:8983/solr/admin/mbeans?cat=CACHEstats=truewt=jsonindent=true ...would be helpful. NOTE: you may need to add this to your solrconfig.xml for that URL to work... requestHandler name=/admin/ class=solr.admin.AdminHandlers /' ...but i don't think /admin/mbeans exists in Solr 1.4, so you may just have to get the details from stats.jsp. I forgot to write it earlier. QueryCache hit rate was about 0.03 (in solr 1.4 and 3.5). Filter cache hitrate was abaout 0.35 in both cases. Document hit rate was about 0.55 in both cases. Trace from thread wasn't helpful to diagnose problem? As I mentioned before - almost all threads were in the same line of code in SolrIndexSearcher.
Re: Solr 3.5 very slow (performance)
Yes it works. Thanks a lot. But I stil don't understand why in solr 1.4 that option was efficient but in solr 3.5 not On Wed, Nov 30, 2011 at 11:01 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog pawelro...@gmail.com wrote: at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1144) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:362) This is interesting, and suggests that you have useFilterForSortedQuery set in your solrconfig.xml Can you try removing it (or setting it to false)? -Yonik http://www.lucidimagination.com
Re: Solr 3.5 very slow (performance)
examples facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=0q=name:(kurtka+skóry+brazowe42)facet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=1350q=name:naczepafacet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=0q=it_name:(miłosz+giedroyc)facet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 default operation ANDpromoted - intending - intb_count - intname - textcat1 - intcat2 -int these are only few examples. almost all queries are much slower. there was about 60 searches per second on old and new version of solr. solr 1.4 reached 200% cpu utilization and solr 3.5 reached 1200% cpu utilization on same machine On Tue, Nov 29, 2011 at 7:05 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Nov 29, 2011 at 12:25 PM, Pawel pawelmis...@gmail.com wrote: I've build index on solr 1.4 some time ago (about 18milions documents, about 8GB). I need new features from newer version of solr, so i decided to upgrade solr version from 1.4 to 3.5. * I created new solr master on new physical machine * then I created new index using the same schema as in earlier version * then I indexed some slave, and start sending the same requests as earlier but to newer version of solr (3.5, but the same situation is on solr 3.4). The CPU went from 200% to 1200% and load went from 3 to 15. Avarage QTime went from 15ms to 180ms and median went from 1ms to 150ms I didn't change any parameters in solrconfig and schema. What are the requests that look slower? -Yonik http://www.lucidimagination.com
Re: Solr 3.5 very slow (performance)
in my last pos i mean default operation AND promoted - int ending - int b_count - int name - text cat1 - int cat2 - int On Tue, Nov 29, 2011 at 7:54 PM, Pawel Rog pawelro...@gmail.com wrote: examples facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=0q=name:(kurtka+skóry+brazowe42)facet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=1350q=name:naczepafacet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=0q=it_name:(miłosz+giedroyc)facet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 default operation ANDpromoted - intending - intb_count - intname - textcat1 - intcat2 -int these are only few examples. almost all queries are much slower. there was about 60 searches per second on old and new version of solr. solr 1.4 reached 200% cpu utilization and solr 3.5 reached 1200% cpu utilization on same machine On Tue, Nov 29, 2011 at 7:05 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Nov 29, 2011 at 12:25 PM, Pawel pawelmis...@gmail.com wrote: I've build index on solr 1.4 some time ago (about 18milions documents, about 8GB). I need new features from newer version of solr, so i decided to upgrade solr version from 1.4 to 3.5. * I created new solr master on new physical machine * then I created new index using the same schema as in earlier version * then I indexed some slave, and start sending the same requests as earlier but to newer version of solr (3.5, but the same situation is on solr 3.4). The CPU went from 200% to 1200% and load went from 3 to 15. Avarage QTime went from 15ms to 180ms and median went from 1ms to 150ms I didn't change any parameters in solrconfig and schema. What are the requests that look slower? -Yonik http://www.lucidimagination.com
Re: Solr 3.5 very slow (performance)
On Tue, Nov 29, 2011 at 9:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Let's back up a minute and cover some basics... 1) You said that you built a brand new index on a brand new master server, using Solr 3.5 -- how do you build your indexes? did the source data change at all? does your new index have the same number of docs as your previous Solr 1.4 index? what does a directory listing (including file sizes) look like for both your old and new indexes? Yes, both indexes have same data. Indexes are build using some C++ programm which reads data from database and inserts it into Solr (using XML). Both indexes have about 8GB size and 18milions documents. 2) Did you try using your Solr 1.4 index (and configs) directly in Solr 3.5 w/o rebuilding from scratch? Yes I used the same configs in solr 1.4 and solr 3.5 (adding only line about luceneMatchVersion) As I see in example of solr 3.5 in repository (solrconfig.xml) there are not many diffrences. 3) You said you build the new index on a new mmachine, but then you said you used a slave where the performanne was worse then Solr 1.4 on the same machine ... are you running both the Solr 1.4 and Solr 3.5 instances concurrently on your slave machine? How much physical ram is on that machine? what JVM options are using when running the Solr 3.5 instance? what servlet container are you using? Mayby I didn't wrote precisely enough. I have some machine on which there is master node. I have second machine on which there is slave. I tested solr 1.4 on that machine, then turned it off and turned on solr-3.5. I have 36GB RAM on that machine. On both - solr 1.4 and 3.5 configuration of JVM is the same, and the same servlet container ... jetty-6 JVM options: -server -Xms12000m -Xmx12000m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=1500m -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=60 4) what does your request handler configuration look like? do you have any default/invariant/appended request params? requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindexommand-- str name=masterUrlhttp://${masterHost}:${masterPort}/solr-3.5/${solr.core.instanceDir}replication/str str name=pollInterval00:00:02/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler 5) The descriptions youve given of how the performance has changed sound like you are doing concurrent load testing -- did you do cache warming before you started your testing? how many client threads are hitting the solr server at one time? Maybe I wasn't precise enough again. CPU on solr 1.4 was 200% and on solr 3.5 1200% yes there is cache warming. There are 50-100 client threads on both 1.4 and 3.5. There are about 60 requests per second on 3.5 and on 1.4, but on 3.5 responses are slower and CPU usage much higher. 6) have you tried doing some basic manual testing to see how individual requests performe? ie: single client at a time, loading a URL, then request the same URL again to verify that your Solr caches are in use and the QTime is low. If you see slow respone times even when manually executing single requests at a time, have you tried using debug=timing to see which serach components are contributing the most to the slow QTimes? Most time is in org.apache.solr.handler.component.QueryComponent and org.apache.solr.handler.component.DebugComponent in process. I didn't comare individual request performance. 7) What do the cache stats look like on your Solr 3.5 instance after you've done some of this timing testing? the output of... http://localhost:8983/solr/admin/mbeans?cat=CACHEstats=truewt=jsonindent=true ...would be helpful. NOTE: you may need to add this to your solrconfig.xml for that URL to work... requestHandler name=/admin/ class=solr.admin.AdminHandlers /' Will check it :) : in my last pos i mean : default operation AND : promoted - int : ending - int : b_count - int : name - text : cat1 - int : cat2 - int : : On Tue, Nov 29, 2011 at 7:54 PM, Pawel Rog pawelro...@gmail.com wrote: : examples : : facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=0q=name:(kurtka+skóry+brazowe42)facet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 : : facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=1350q=name:naczepafacet.limit=500facet.field=cat1facet.field=cat2wt=jsonrows=50 : : facet=truesort=promoted+desc,ending+asc
Re: Solr 3.5 very slow (performance)
IO waits about 0-2% Didn't see any suspicious activity in logs, but I can check it again On Tue, Nov 29, 2011 at 11:40 PM, Darren Govoni dar...@ontrenet.com wrote: Any suspicous activity in the logs? what about disk activity? On 11/29/2011 05:22 PM, Pawel Rog wrote: On Tue, Nov 29, 2011 at 9:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Let's back up a minute and cover some basics... 1) You said that you built a brand new index on a brand new master server, using Solr 3.5 -- how do you build your indexes? did the source data change at all? does your new index have the same number of docs as your previous Solr 1.4 index? what does a directory listing (including file sizes) look like for both your old and new indexes? Yes, both indexes have same data. Indexes are build using some C++ programm which reads data from database and inserts it into Solr (using XML). Both indexes have about 8GB size and 18milions documents. 2) Did you try using your Solr 1.4 index (and configs) directly in Solr 3.5 w/o rebuilding from scratch? Yes I used the same configs in solr 1.4 and solr 3.5 (adding only line about luceneMatchVersion) As I see in example of solr 3.5 in repository (solrconfig.xml) there are not many diffrences. 3) You said you build the new index on a new mmachine, but then you said you used a slave where the performanne was worse then Solr 1.4 on the same machine ... are you running both the Solr 1.4 and Solr 3.5 instances concurrently on your slave machine? How much physical ram is on that machine? what JVM options are using when running the Solr 3.5 instance? what servlet container are you using? Mayby I didn't wrote precisely enough. I have some machine on which there is master node. I have second machine on which there is slave. I tested solr 1.4 on that machine, then turned it off and turned on solr-3.5. I have 36GB RAM on that machine. On both - solr 1.4 and 3.5 configuration of JVM is the same, and the same servlet container ... jetty-6 JVM options: -server -Xms12000m -Xmx12000m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=1500m -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=60 4) what does your request handler configuration look like? do you have any default/invariant/appended request params? requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindexommand-- str name=masterUrlhttp://${masterHost}:${masterPort}/solr-3.5/${solr.core.instanceDir}replication/str str name=pollInterval00:00:02/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler 5) The descriptions youve given of how the performance has changed sound like you are doing concurrent load testing -- did you do cache warming before you started your testing? how many client threads are hitting the solr server at one time? Maybe I wasn't precise enough again. CPU on solr 1.4 was 200% and on solr 3.5 1200% yes there is cache warming. There are 50-100 client threads on both 1.4 and 3.5. There are about 60 requests per second on 3.5 and on 1.4, but on 3.5 responses are slower and CPU usage much higher. 6) have you tried doing some basic manual testing to see how individual requests performe? ie: single client at a time, loading a URL, then request the same URL again to verify that your Solr caches are in use and the QTime is low. If you see slow respone times even when manually executing single requests at a time, have you tried using debug=timing to see which serach components are contributing the most to the slow QTimes? Most time is in org.apache.solr.handler.component.QueryComponent and org.apache.solr.handler.component.DebugComponent in process. I didn't comare individual request performance. 7) What do the cache stats look like on your Solr 3.5 instance after you've done some of this timing testing? the output of... http://localhost:8983/solr/admin/mbeans?cat=CACHEstats=truewt=jsonindent=true ...would be helpful. NOTE: you may need to add this to your solrconfig.xml for that URL to work... requestHandler name=/admin/ class=solr.admin.AdminHandlers /' Will check it :) : in my last pos i mean : default operation AND : promoted - int : ending - int : b_count - int : name - text : cat1 - int : cat2 - int : : On Tue, Nov 29, 2011 at 7:54 PM, Pawel Rogpawelro...@gmail.com wrote: : examples : : facet=truesort=promoted+desc,ending+asc,b_count+descfacet.mincount=1start=0q