Re: SolrEntityProcessor gets slower and slower
Minfeng- This issue is tougher as the number of shard you have raise, you can read Erick Erickson's post: http://grokbase.com/t/lucene/solr-user/131p75p833/how-distributed-queries-works. If you have 100M docs I guess you are running this issue. The common way to deal with this issue is by filtering on a value that would return fewer results every query, as a creation_date field, and every query change this field range. For your data import use-case you might want to generate your data-import.xml with different entities, each one for another creation_date range. Thus no need for deep paging. Another option is using http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore. Implementing it in a multi sharded environment, as all your scores=1.0 thus results are ranked by shard (according to the internal [docId] of each shard), is not possible of my knowledge. Caching all the query results in each shard (by raising the queryResultWindow) should help, wouldn't it? Best, Manu On Mon, Jun 10, 2013 at 8:56 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: SolrEntityProcessor is fine for small amounts of data but not useful for such a large index. The problem is that deep paging in search results is expensive. As the start value for a query increases so does the cost of the query. You are much better off just re-indexing the data. On Mon, Jun 10, 2013 at 11:19 PM, Mingfeng Yang mfy...@wisewindow.com wrote: I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud index (v4.1, 4 shards) by using SolrEntityProcessor. My data-config.xml is like dataConfig document entity name=sep processor=SolrEntityProcessor url=http://10.64.35.117:8995/solr/; query=*:* rows=2000 fl= author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url / /document /dataConfig Initially, the data import rate is about 1K docs/second, but it eventually decrease to 20docs/second after running for tens of hours. Last time I tried data import with solorentityprocessor, the transfer rate can be as high as 3K docs/seconds. Anyone has any clues what can cause the slowdown? Thanks, Ming- -- Regards, Shalin Shekhar Mangar.
DIH and tinyint(1) Field
Hello, I have exactly the same problem as here http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967 however for the solution there, it is ruining my date type fields... are there any other ways to deal with this problem? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and tinyint(1) Field
Your database's JDBC driver is interpreting the tinyint(1) as a boolean. Solr 4.4 fixes the problem affected date fields with convertType=true. It should be released by the end of this week. On Mon, Jul 22, 2013 at 12:18 PM, deniz denizdurmu...@gmail.com wrote: Hello, I have exactly the same problem as here http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967 however for the solution there, it is ruining my date type fields... are there any other ways to deal with this problem? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so if you might want to consider upgrading. There was, for instance, a situation where it would take 3 minutes for machines to start up. How impatient were you? Also, what are your hard commit parameters? All of the documents you're indexing will be in the transaction log between hard commits, and when a node comes up the leader will replay everything in the tlog to the new node, which might be a source of why it took so long for the new node to come back up. At the very least the new node you were bringing back online will need to do a full index replication (old style) to get caught up. Best Erick On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com wrote: While indexing some documents to a SolrCloud cluster (10 machines, 5 shards and 2 replicas, so one replica on each machine) one of the replicas stopped receiving documents, while the other replica of the shard continued to grow. That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here). This morning when I was able to look at the cluster both replicas of that shard were marked as down (with one marked as leader). I attempted to restart the non-leader node but it took a long time to restart so I killed it and restarted the old leader, which also took a long time. I killed that one (I'm impatient) and left the non-leader node to restart, not realising it was missing approximately 700k documents that the old leader had. Eventually it restarted and became leader. I restarted the old leader and it dropped the number of documents it had to match the previous non-leader. Is this expected behaviour when a replica with fewer documents is started before the other and elected leader? Should I have been paying more attention to the number of documents on the server before restarting nodes? I am still in the process of tuning the caches and warming for these servers but we are putting some load through the cluster so it is possible that the nodes are having to work quite hard when a new version of the core comes is made available. Is this likely to explain why I occasionally see nodes dropping out? Unfortunately in restarting the nodes I lost the GC logs to see whether that was likely to be the culprit. Is this the sort of situation where you raise the ZooKeeper timeout a bit? Currently the timeout for all nodes is 15 seconds. Are there any known issues which might explain what's happening? I'm just getting started with SolrCloud after using standard master/slave replication for an index which has got too big for one machine over the last few months. Also, is there any particular information that would be helpful to help with these issues if it should happen again?
highlighting required in document
Hi, I'm using solr 4.3.0 following is the response against hit highlighting request: Request: http://localhost:8080/solr/collection2/select?q=content:ps4hl=true Response: doc arr name=contentstrThis post is regarding ps4 accuracy and qulaity which is smooth and factastic/str/arr /doc lst name=highlighting lst name=1 arr name=contentstrThis post is regarding bps4/b accuracy and qulaity which is smooth and factastic/str/arr /lst I wanted result like this: doc arr name=contentstrThis post is regarding bps4/b accuracy and qulaity which is smooth and factastic/str/arr /doc lst name=highlighting lst name=1 arr name=contentstrThis post is regarding bps4/b accuracy and qulaity which is smooth and factastic/str/arr /lst Thanks in advance! Regards, Jamshaid
Re: DIH and tinyint(1) Field
Shalin Shekhar Mangar wrote Your database's JDBC driver is interpreting the tinyint(1) as a boolean. Solr 4.4 fixes the problem affected date fields with convertType=true. It should be released by the end of this week. On Mon, Jul 22, 2013 at 12:18 PM, deniz lt; denizdurmus87@ gt; wrote: Hello, I have exactly the same problem as here http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967 however for the solution there, it is ruining my date type fields... are there any other ways to deal with this problem? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. thank you Shalin, for a quick solution i found that adding amp;tinyInt1isBit=false to connection url also works fine - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392p4079398.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: short-circuit OR operator in lucene/solr
Short answer, no - it has zero sense. But after some thinking, it can make some sense, potentially. DisjunctionSumScorer holds child scorers semi-ordered in a binary heap. Hypothetically inequality can be enforced at that heap, but heap might not work anymore for such alignment. Hence, instead of heap TreeSet can be used for experiment. fwiw, it's a dev list question. On Mon, Jul 22, 2013 at 4:48 AM, Deepak Konidena deepakk...@gmail.comwrote: I understand that lucene's AND (), OR (||) and NOT (!) operators are shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why one can't treat them as boolean operators (adhering to boolean algebra). I have been trying to construct a simple OR expression, as follows q = +(field1:value1 OR field2:value2) with a match on either field1 or field2. But since the OR is merely an optional, documents where both field1:value1 and field2:value2 are matched, the query returns a score resulting in a match on both the clauses. How do I enforce short-circuiting in this context? In other words, how to implement short-circuiting as in boolean algebra where an expression A || B || C returns true if A is true without even looking into whether B or C could be true. -Deepak -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Regex in Stopword.xml
Hi, I was looking for an issue, in order to put some regular expression in the StopWord.xml, but it seems that we can only have words in the file. I'm just wondering if there is a feature which will be done in this way or if someone got a tip it will help me a lot :) Best, Scatman. -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr - Multiple Facet Exclusion for the same Field
Hello, i need different (multiple) Facet exclusions for the same field. This approach works: http://server/core/select/?q=*:* fq={!tag=b}brand:adidas fq={!tag=c}color:red facet.field={!ex=b}brand facet.field={!ex=c}brand facet.field={!ex=b,c}brand facet.field=brand facet=truefacet.mincount=1 then my result provides different facets for brand. BUT: is there any possibility to get to know, which exclusion fits to which facet? Is there something like as in SQL (e.g. facet.field={!ex=b as BrandB}brand) ? We are using Solr 3.6. Hopefully this is a feature, not a bug, which we are using. Thanks in advance. Ralf
Re: Solr - Multiple Facet Exclusion for the same Field
Just found it. Use {!ex=c key=ckey} ... On 07/22/2013 11:35 AM, Ralf Heyde wrote: Hello, i need different (multiple) Facet exclusions for the same field. This approach works: http://server/core/select/?q=*:* fq={!tag=b}brand:adidas fq={!tag=c}color:red facet.field={!ex=b}brand facet.field={!ex=c}brand facet.field={!ex=b,c}brand facet.field=brand facet=truefacet.mincount=1 then my result provides different facets for brand. BUT: is there any possibility to get to know, which exclusion fits to which facet? Is there something like as in SQL (e.g. facet.field={!ex=b as BrandB}brand) ? We are using Solr 3.6. Hopefully this is a feature, not a bug, which we are using. Thanks in advance. Ralf
Programatic instantiation of solr container and cores with config loaded from a jar
Hi, I use solr embedded in a desktop app and I want to change it to no longer require the configuration for the container and core to be in the filesystem but rather be distributed as part of a jar file. Could someone kindly point me to the right docs? So far my impression is, I need to instantiate CoreContainer with a custom SolrResourceLoader with properties parsed via some other API but from the javadocs alone I feel a bit lost (why does it have to have an instance directory at all?) and googling did not give me many results. What would be ideal would be to have something like this (pseudocode with partly imagined names, which hopefully illustrates what I am trying to achieve): ContainerConfig containerConfig = ContainerConfigParser.parse(InputStream from Classloader); CoreContainer container = new CoreContainer(containerConfig); CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream from Classloader); container.register(name, coreConfig); Ideally I would like to keep XML format to reuse my current solr.xml and solrconfig.xml but that is just a nice-to-have. Does such a way exist and if so, what are the real API classes and calls to use? Thank you in advance, Robert
Re: Regex in Stopword.xml
Use the pattern replace filter factory filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement=/ This will do exactly what you asked for http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory On Mon, Jul 22, 2013 at 12:22 PM, Scatman alan.aron...@sfr.com wrote: Hi, I was looking for an issue, in order to put some regular expression in the StopWord.xml, but it seems that we can only have words in the file. I'm just wondering if there is a feature which will be done in this way or if someone got a tip it will help me a lot :) Best, Scatman. -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Programatic instantiation of solr container and cores with config loaded from a jar
Hi Robert, The upcoming 4.4 release should make this a bit easier (you can check out the release branch now if you like, or wait a few days for the official version). CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as constructor parameters, and you can create a ConfigSolr object from a string representation of solr.xml using the ConfigSolr.fromString() static method. Alan Woodward www.flax.co.uk On 22 Jul 2013, at 11:41, Robert Krüger wrote: Hi, I use solr embedded in a desktop app and I want to change it to no longer require the configuration for the container and core to be in the filesystem but rather be distributed as part of a jar file. Could someone kindly point me to the right docs? So far my impression is, I need to instantiate CoreContainer with a custom SolrResourceLoader with properties parsed via some other API but from the javadocs alone I feel a bit lost (why does it have to have an instance directory at all?) and googling did not give me many results. What would be ideal would be to have something like this (pseudocode with partly imagined names, which hopefully illustrates what I am trying to achieve): ContainerConfig containerConfig = ContainerConfigParser.parse(InputStream from Classloader); CoreContainer container = new CoreContainer(containerConfig); CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream from Classloader); container.register(name, coreConfig); Ideally I would like to keep XML format to reuse my current solr.xml and solrconfig.xml but that is just a nice-to-have. Does such a way exist and if so, what are the real API classes and calls to use? Thank you in advance, Robert
Re: Regex in Stopword.xml
Thank for reply but it's not a solution that i'm looking for, and i should better explained myself, because i got like 100 hundred regex to put in the config. In order to manage easiest Solr, i think the better way is to put regex in a file... I know that GSA from google do it, so i'd just hoped that it will the case for Solr :) Best, Scatman. -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079438.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so if you might want to consider upgrading. There was, for instance, a situation where it would take 3 minutes for machines to start up. How impatient were you? Also, what are your hard commit parameters? All of the documents you're indexing will be in the transaction log between hard commits, and when a node comes up the leader will replay everything in the tlog to the new node, which might be a source of why it took so long for the new node to come back up. At the very least the new node you were bringing back online will need to do a full index replication (old style) to get caught up. Best Erick On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com wrote: While indexing some documents to a SolrCloud cluster (10 machines, 5 shards and 2 replicas, so one replica on each machine) one of the replicas stopped receiving documents, while the other replica of the shard continued to grow. That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here). This morning when I was able to look at the cluster both replicas of that shard were marked as down (with one marked as leader). I attempted to restart the non-leader node but it took a long time to restart so I killed it and restarted the old leader, which also took a long time. I killed that one (I'm impatient) and left the non-leader node to restart, not realising it was missing approximately 700k documents that the old leader had. Eventually it restarted and became leader. I restarted the old leader and it dropped the number of documents it had to match the previous non-leader. Is this expected behaviour when a replica with fewer documents is started before the other and elected leader? Should I have been paying more attention to the number of documents on the server before restarting nodes? I am still
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?). Basically you shot yourself in the foot. Underlying here is why it took the machine you were re-starting so long to come up that you got impatient and started killing nodes. There has been quite a bit done to make that process better, so what version of Solr are you using? 4.4 is being voted on right now, so
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Sorry, I should also mention that these leader nodes which are marked as down can actually still be queried locally with distrib=false with no problems. Is it possible that they've somehow got themselves out-of-sync? On 22 July 2013 13:37, Neil Prosser neil.pros...@gmail.com wrote: No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of circulation and then bounced nodes until that node became the leader. So of course it wouldn't have the documents (how could it?).
RE: Solr 4.3.1 - SolrCloud nodes down and lost documents
It is possible: https://issues.apache.org/jira/browse/SOLR-4260 I rarely see it and i cannot reliably reproduce it but it just sometimes happens. Nodes will not bring each other back in sync. -Original message- From:Neil Prosser neil.pros...@gmail.com Sent: Monday 22nd July 2013 14:41 To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents Sorry, I should also mention that these leader nodes which are marked as down can actually still be queried locally with distrib=false with no problems. Is it possible that they've somehow got themselves out-of-sync? On 22 July 2013 13:37, Neil Prosser neil.pros...@gmail.com wrote: No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about
RE: Solr 4.3.1 - SolrCloud nodes down and lost documents
You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. -Original message- From:Neil Prosser neil.pros...@gmail.com Sent: Monday 22nd July 2013 14:38 To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents No need to apologise. It's always good to have things like that reiterated in case I've misunderstood along the way. I have a feeling that it's related to garbage collection. I assume that if the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's still alive and so gets marked as down. I've just taken a look at the GC logs and can see a couple of full collections which took longer than my ZK timeout of 15s). I'm still in the process of tuning the cache sizes and have probably got it wrong (I'm coming from a Solr instance which runs on a 48G heap with ~40m documents and bringing it into five shards with 8G heap). I thought I was being conservative with the cache sizes but I should probably drop them right down and start again. The entire index is cached by Linux so I should just need caches to help with things which eat CPU at request time. The indexing level is unusual because normally we wouldn't be indexing everything sequentially, just making delta updates to the index as things are changed in our MoR. However, it's handy to know how it reacts under the most extreme load we could give it. In the case that I set my hard commit time to 15-30 seconds with openSearcher set to false, how do I control when I actually do invalidate the caches and open a new searcher? Is this something that Solr can do automatically, or will I need some sort of coordinator process to perform a 'proper' commit from outside Solr? In our case the process of opening a new searcher is definitely a hefty operation. We have a large number of boosts and filters which are used for just about every query that is made against the index so we currently have them warmed which can take upwards of a minute on our giant core. Thanks for your help. On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote: Wow, you really shouldn't be having nodes go up and down so frequently, that's a big red flag. That said, SolrCloud should be pretty robust so this is something to pursue... But even a 5 minute hard commit can lead to a hefty transaction log under load, you may want to reduce it substantially depending on how fast you are sending docs to the index. I'm talking 15-30 seconds here. It's critical that openSearcher be set to false or you'll invalidate your caches that often. All a hard commit with openSearcher set to false does is close off the current segment and open a new one. It does NOT open/warm new searchers etc. The soft commits control visibility, so that's how you control whether you can search the docs or not. Pardon me if I'm repeating stuff you already know! As far as your nodes coming and going, I've seen some people have good results by upping the ZooKeeper timeout limit. So I guess my first question is whether the nodes are actually going out of service or whether it's just a timeout issue Good luck! Erick On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote: Very true. I was impatient (I think less than three minutes impatient so hopefully 4.4 will save me from myself) but I didn't realise it was doing something rather than just hanging. Next time I have to restart a node I'll just leave and go get a cup of coffee or something. My configuration is set to auto hard-commit every 5 minutes. No auto soft-commit time is set. Over the course of the weekend, while left unattended the nodes have been going up and down (I've got to solve the issue that is causing them to come and go, but any suggestions on what is likely to be causing something like that are welcome), at one point one of the nodes stopped taking updates. After indexing properly for a few hours with that one shard not accepting updates, the replica of that shard which contains all the correct documents must have replicated from the broken node and dropped documents. Is there any protection against this in Solr or should I be focusing on getting my nodes to be more reliable? I've now got a situation where four of my five shards have leaders who are marked as down and followers who are up. I'm going to start grabbing information about the cluster state so I can track which changes are happening and in what order. I can get hold of Solr logs and garbage collection logs while these things are happening. Is this all just down to my nodes being unreliable? On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote: Well, if I'm reading this right you had a node go out of
Problem instatanting a ValueSourceParser plugin in 4.3.1
Hi, I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which extends ValueSourceParser and it works under Solr 4.0.0 but does not work under Solr 4.3.1. I compiled the plugin using the latest solr-4.3.1*.jars and lucene-4.3.1*.jars but I get the following stacktrace error when starting up a core referencing this plugin...seen below. Does anyone know why it might be giving me a ClassCastException under 4.3.1? Thanks, Niran 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer Unable to create core: example_core org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027) at org.apache.solr.core.SolrCore.init(SolrCore.java:749) ... 13 more Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19 more 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer null:org.apache.solr.common.SolrException: Unable to create core: example_core at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
Re: Programatic instantiation of solr container and cores with config loaded from a jar
Great, thank you! On Jul 22, 2013 1:35 PM, Alan Woodward a...@flax.co.uk wrote: Hi Robert, The upcoming 4.4 release should make this a bit easier (you can check out the release branch now if you like, or wait a few days for the official version). CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as constructor parameters, and you can create a ConfigSolr object from a string representation of solr.xml using the ConfigSolr.fromString() static method. Alan Woodward www.flax.co.uk On 22 Jul 2013, at 11:41, Robert Krüger wrote: Hi, I use solr embedded in a desktop app and I want to change it to no longer require the configuration for the container and core to be in the filesystem but rather be distributed as part of a jar file. Could someone kindly point me to the right docs? So far my impression is, I need to instantiate CoreContainer with a custom SolrResourceLoader with properties parsed via some other API but from the javadocs alone I feel a bit lost (why does it have to have an instance directory at all?) and googling did not give me many results. What would be ideal would be to have something like this (pseudocode with partly imagined names, which hopefully illustrates what I am trying to achieve): ContainerConfig containerConfig = ContainerConfigParser.parse(InputStream from Classloader); CoreContainer container = new CoreContainer(containerConfig); CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream from Classloader); container.register(name, coreConfig); Ideally I would like to keep XML format to reuse my current solr.xml and solrconfig.xml but that is just a nice-to-have. Does such a way exist and if so, what are the real API classes and calls to use? Thank you in advance, Robert
Problem instatanting a ValueSourceParser plugin in 4.3.1
Hi, I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which extends ValueSourceParser and it works under Solr 4.0.0 but it does not work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and lucene-4.3.1*.jars but I get the following stacktrace error when starting up a core referencing this plugin...seen below. Does anyone know why it might be giving me a ClassCastException under 4.3.1? Thanks, Niran 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer Unable to create core: example_core org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027) at org.apache.solr.core.SolrCore.init(SolrCore.java:749) ... 13 more Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19 more 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer null:org.apache.solr.common.SolrException: Unable to create core: example_core at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115) at
how to improve (keyword) relevance?
Good morning, I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev on tomcat 7. Early on, I used copyfield to put the meta data into the text field to simplify solr queries (i.e. I only have to query one field now.) However, a lot people are concerned about improving relevance. I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook; however is there a way to modify it so it only uses one field? (i.e. the text field?) (Note well: I have multi cores and the schemas are all somewhat different; If I can't get this to work with one field then I would have to build complex queries for all the other cores; this would vastly over complicate the UI. Is there another way?) here's the requesthandler in question: requestHandler name=/better class=solr.StandardRequestHandler 1st name=defaults str name=indenttrue/str str name=q_query_:{!edismaxqf=$qfQuery mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery} /str str name=qfQueryname^10 description/str str name=mmQuery1/str str name=pfQueryname description/str str name=boostQuery_query_:{!edismaxqf=$boostQuerQf mm=100% v=$mainQuery}^10/str /1st /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
On 7/22/2013 6:45 AM, Markus Jelsma wrote: You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. When I tried G1, the occasional stop-the-world GC actually got worse. I tried G1 after trying CMS with no other tuning parameters. The average GC time went down, but when it got into a place where it had to do a stop-the-world collection, it was worse. Based on the GC statistics in jvisualvm and jstat, I didn't think I had a problem. The way I discovered that I had a problem was by looking at my haproxy load balancer -- sometimes requests would be sent to a backup server instead of my primary, because the ping request handler was timing out on the LB health check. The LB was set to time out after five seconds. When I went looking deeper with the GC log and some other tools, I was seeing 8-10 second GC pauses. G1 was showing me pauses of 12 seconds. Now I use a heavily tuned CMS config, and there are no more LB switches to a backup server. I've put some of my own information about my GC settings on my personal Solr wiki page: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I've got an 8GB heap on my systems running 3.5.0 (one copy of the index) and a 6GB heap on those running 4.2.1 (the other copy of the index). Summary: Just switching to the G1 collector won't solve GC pause problems. There's not a lot of G1 tuning information out there yet. If someone can come up with a good set of G1 tuning parameters, G1 might become better than CMS. Thanks, Shawn
Re: Regex in Stopword.xml
How did you get the impression that GSA supports regex stop words? GSA seems to follow the same rules as Solr. See the doc: http://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_searchexp/ce_improving_search.html#1050255 As with GSA, the stop words are a simple .TXT file. In any case, Solr and Lucene do not support stop words that are regular expressions, although a regex filter can simulate them to a limited degree. -- Jack Krupansky -Original Message- From: Scatman Sent: Monday, July 22, 2013 7:48 AM To: solr-user@lucene.apache.org Subject: Re: Regex in Stopword.xml Thank for reply but it's not a solution that i'm looking for, and i should better explained myself, because i got like 100 hundred regex to put in the config. In order to manage easiest Solr, i think the better way is to put regex in a file... I know that GSA from google do it, so i'd just hoped that it will the case for Solr :) Best, Scatman. -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079438.html Sent from the Solr - User mailing list archive at Nabble.com.
queryResultCache should not related with the order of fq's list
Hello, QueryResultCache should not related with the order of fq's list. There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=:fq=field1:value1fq=field2:value2 case2: q=:fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. I am a new comer in posting bug. I can’t sure whether it is a bug. I create the issure: https://issues.apache.org/jira/browse/SOLR-5057 https://issues.apache.org/jira/browse/SOLR-5057 By the way, if the issure is ok , how can I post my code? Thanks.
Re: how to improve (keyword) relevance?
Could you please be more specific about the relevancy problem you are trying to solve? -- Jack Krupansky -Original Message- From: eShard Sent: Monday, July 22, 2013 9:57 AM To: solr-user@lucene.apache.org Subject: how to improve (keyword) relevance? Good morning, I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev on tomcat 7. Early on, I used copyfield to put the meta data into the text field to simplify solr queries (i.e. I only have to query one field now.) However, a lot people are concerned about improving relevance. I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook; however is there a way to modify it so it only uses one field? (i.e. the text field?) (Note well: I have multi cores and the schemas are all somewhat different; If I can't get this to work with one field then I would have to build complex queries for all the other cores; this would vastly over complicate the UI. Is there another way?) here's the requesthandler in question: requestHandler name=/better class=solr.StandardRequestHandler 1st name=defaults str name=indenttrue/str str name=q_query_:{!edismaxqf=$qfQuery mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery} /str str name=qfQueryname^10 description/str str name=mmQuery1/str str name=pfQueryname description/str str name=boostQuery_query_:{!edismaxqf=$boostQuerQf mm=100% v=$mainQuery}^10/str /1st /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: short-circuit OR operator in lucene/solr
function queries to the rescue! q={!func}def(query($a),query($b),query($c)) a=field1:value1 b=field2:value2 c=field3:value3 def or default function returns the value of the first argument that matches. It's named default because it's more commonly used like def(popularity,50) (return the value of the popularity field, or 50 if the doc has no value for that field). -Yonik http://lucidworks.com On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena deepakk...@gmail.com wrote: I understand that lucene's AND (), OR (||) and NOT (!) operators are shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why one can't treat them as boolean operators (adhering to boolean algebra). I have been trying to construct a simple OR expression, as follows q = +(field1:value1 OR field2:value2) with a match on either field1 or field2. But since the OR is merely an optional, documents where both field1:value1 and field2:value2 are matched, the query returns a score resulting in a match on both the clauses. How do I enforce short-circuiting in this context? In other words, how to implement short-circuiting as in boolean algebra where an expression A || B || C returns true if A is true without even looking into whether B or C could be true. -Deepak
adding date column to the index
I have added a date field to my index. I dont want the query to search on this field, but I want it to be returned with each row. So I have defined it in the scema.xml as follows: field name=LastModificationTime type=date indexed=false stored=true required=true/ I added it to the select in data-config.xml and I see it selected in the profiler. now, when I query all fileds (using the dashboard) I dont see it. Even when I ask for it specifically I dont see it. What am I doing wrong? (In the db it is (datetimeoffset(7)))
Re: Auto-sharding and numShard parameter
That would be great. One step toward this goal is to stop treating the situation where there are no collections or cores as an error condition. It took me a while to get out of the mindset when bringing up a Solr install that I had to avoid that scenario at all costs, because red text == bad. There's no reason for the web interface to be deactivated when there are no collections or cores, though. Imagine if mysql didn't let you connect to it via phpmyadmin if you hadn't configured a database yet? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Sat, Jul 20, 2013 at 10:33 PM, Mark Miller markrmil...@gmail.com wrote: A lot has changed since those example were written - in general, we are moving away from that type of collection initialization and towards using the Collections API. Eventually, I'd personally like SolrCloud to ship with no predefined collections and have users simply start it and then start using the Collections API - preconfigured collections will be second class and possibly deprecated at some point. - Mark On Jul 20, 2013, at 10:13 PM, Erick Erickson erickerick...@gmail.com wrote: Flavio: One of the great things about having people continually using Solr (and SolrCloud) for the first time is the opportunity to improve the docs. Anyone can update/add to the docs, all it takes is a signon. Unfortunately we has a bunch of spam bots a while ago, so it's now a two step process 1 create a login on the Solr wiki 2 post a message on this list indicating that you'd like to help improve the Wiki and give us your Solr login. We'll add you to the list of people who can edit the wiki and you can help the community by improving the documentation. Best Erick On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Thank you for the reply Erick, I was facing exactly with that problem..from the documentation it seems that those parameter are required to run SolrCloud, instead they are just used to initialize a sample collection.. I think that in the examples on the user doc it should be better to separate those 2 concepts: one is starting the server, another one is creating/managing collections. Best, Flavio On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson erickerick...@gmail.comwrote: First the numShards parameter is only relevant the very first time you create your collection. It's a little confusing because in the SolrCloud examples you're getting collection1 by default. Look further down the SolrCloud Wiki page, the section titled Managing Collections via the Collections API for creating collections with a different name. Either way, either when you run the bootstrap command or when you create a new collection, that's the only time numShards counts. It's ignored the rest of the time. As far as data growing, you need to either 1 create enough shards to handle the eventual size things will be, sometimes called oversharding or 2 use the splitShard capabilities in very recent Solrs to expand capacity. Best Erick On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio
Re: Programatic instantiation of solr container and cores with config loaded from a jar
Does it mean that I can easily load Solr configuration as parsed by Solr from an external program? Because the last time I tried (4.3.1), the number of jars required was quite long, including SolrJ jar due to some exception. Regards., Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward a...@flax.co.uk wrote: Hi Robert, The upcoming 4.4 release should make this a bit easier (you can check out the release branch now if you like, or wait a few days for the official version). CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as constructor parameters, and you can create a ConfigSolr object from a string representation of solr.xml using the ConfigSolr.fromString() static method. Alan Woodward www.flax.co.uk On 22 Jul 2013, at 11:41, Robert Krüger wrote: Hi, I use solr embedded in a desktop app and I want to change it to no longer require the configuration for the container and core to be in the filesystem but rather be distributed as part of a jar file. Could someone kindly point me to the right docs? So far my impression is, I need to instantiate CoreContainer with a custom SolrResourceLoader with properties parsed via some other API but from the javadocs alone I feel a bit lost (why does it have to have an instance directory at all?) and googling did not give me many results. What would be ideal would be to have something like this (pseudocode with partly imagined names, which hopefully illustrates what I am trying to achieve): ContainerConfig containerConfig = ContainerConfigParser.parse(InputStream from Classloader); CoreContainer container = new CoreContainer(containerConfig); CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream from Classloader); container.register(name, coreConfig); Ideally I would like to keep XML format to reuse my current solr.xml and solrconfig.xml but that is just a nice-to-have. Does such a way exist and if so, what are the real API classes and calls to use? Thank you in advance, Robert
Re: Problem instatanting a ValueSourceParser plugin in 4.3.1
I saw something similar and used an absolute path to my JAR file in solrconfig.xml vs. a relative path and it resolved the issue for me. Not elegant but worth trying, at least to rule that out. Tim On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran niran.abeygunaward...@proquest.co.uk wrote: Hi, I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which extends ValueSourceParser and it works under Solr 4.0.0 but it does not work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and lucene-4.3.1*.jars but I get the following stacktrace error when starting up a core referencing this plugin...seen below. Does anyone know why it might be giving me a ClassCastException under 4.3.1? Thanks, Niran 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer Unable to create core: example_core org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027) at org.apache.solr.core.SolrCore.init(SolrCore.java:749) ... 13 more Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19 more 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer null:org.apache.solr.common.SolrException: Unable to create core: example_core at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser
RE: Problem instatanting a ValueSourceParser plugin in 4.3.1
Thanks Tim. I copied my jar containing the plugin to the solr's lib directory as it wasn't finding my jar due to a bug in 4.3: https://issues.apache.org/jira/browse/SOLR-4791 but the ClassCastException remains. I'll try solr 4.2 and see if the plugin works in that. Cheers, Niran -Original Message- From: Timothy Potter [mailto:thelabd...@gmail.com] Sent: 22 July 2013 15:39 To: solr-user@lucene.apache.org Subject: Re: Problem instatanting a ValueSourceParser plugin in 4.3.1 I saw something similar and used an absolute path to my JAR file in solrconfig.xml vs. a relative path and it resolved the issue for me. Not elegant but worth trying, at least to rule that out. Tim On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran niran.abeygunaward...@proquest.co.uk wrote: Hi, I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which extends ValueSourceParser and it works under Solr 4.0.0 but it does not work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and lucene-4.3.1*.jars but I get the following stacktrace error when starting up a core referencing this plugin...seen below. Does anyone know why it might be giving me a ClassCastException under 4.3.1? Thanks, Niran 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer Unable to create core: example_core org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027) at org.apache.solr.core.SolrCore.init(SolrCore.java:749) ... 13 more Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19 more 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer null:org.apache.solr.common.SolrException: Unable to create core: example_core at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser
Re: custom field type plugin
Like Hoss said, you're going to have to solve this using http://wiki.apache.org/solr/SpatialForTimeDurations Using PointType is *not* going to work because your durations are multi-valued per document. It would be useful to create a custom field type that wraps the capability outlined on the wiki to make it easier to use without requiring the user to think spatially. You mentioned that these numeric ranges extend upwards of 10 billion or so. Unfortunately, the current prefix tree implementation under the hood for non-geodetic spatial, the QuadTree, is unlikely to scale to numbers that big. I don't know where the boundary is, but I doubt 10B. You could try and see what happens. I'm working (very slowly on very little spare time) on improving the PrefixTree implementations to scale to such large numbers; I hope something will be available this fall. ~ David Smiley Kevin Stone wrote I have a particular use case that I think might require a custom field type, however I am having trouble getting the plugin to work. My use case has to do with genetics data, and we are running into several situations were we need to be able to query multiple regions of a chromosome (or gene, or other object types). All that really boils down to is being able to give a number, e.g. 10234, and return documents that have regions containing the number. So you'd have a document with a list like [1:16090,400:8000,40123:43564], and it should come back because 10234 falls between 1:16090. If there is a better or easier way to do this please speak up. I'd rather not have to use a join on another index, because 1) it's more complex to set up, and 2) we might need to join against something else and you can only do one join at a time. Anyway… I tried creating a field type similar to a PointType just to see if I could get one working. I added the following jars to get it to compile: apache-solr-core-4.0.0,lucene-core-4.0.0,lucene-queries-4.0.0,apache-solr-solrj-4.0.0. I am running solr 4.0.0 on jetty, and put my jar file in a sharedLib folder, and specified it in my solr.xml (I have multiple cores). After starting up solr, I got the line that it picked up the jar: INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader But I get this error about it not being able to find the AbstractSubTypeFieldType class. Here is the first bit of the trace: SEVERE: null:java.lang.NoClassDefFoundError: org/apache/solr/schema/AbstractSubTypeFieldType at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ...etc… Any hints as to what I did wrong? I can provide source code, or a fuller stack trace, config settings, etc. Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib, then repack. However, when I did that, I get a NoClassDefFoundError for my plugin itself. Thanks, Kevin The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086p4079494.html Sent from the Solr - User mailing list archive at Nabble.com.
Node down, but not out
I've run into a problem recently that's difficult to debug and search for: I have three nodes in a cluster and this weekend one of the nodes went partially down. It no longer responds to distributed updates and it is marked as GONE in the Cloud view of the admin screen. That's not ideal, but there's still two boxes up so not the end of the world. The problem is that it is still responding to ping requests and returning queries successfully. In my setup, I have the three servers on an haproxy load balancer so that I can distribute requests and have clients stick to a specific solr box. Because the bad node is still returning OK to the ping requests and still returns results for simple queries, the load balancer does not remove it from the group. Is there a ping like request handler that would tell me whether the given box I'm hitting is still in the cloud? Thanks! Jim Musil -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regex in Stopword.xml
I know it because i actually want to change GSA with Solr who his much better in the enterprise's situation :) Thank's for reply anyway ! Best, Scatman. -- View this message in context: http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079491.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: short-circuit OR operator in lucene/solr
Deepak, I think your goal is to gain something in speed, but most likely the function query will be slower than the query without score computation (the filter query) - this stems from the fact how the query is executed, but I may, of course, be wrong. Would you mind sharing measurements you make? Thanks, roman On Mon, Jul 22, 2013 at 10:54 AM, Yonik Seeley yo...@lucidworks.com wrote: function queries to the rescue! q={!func}def(query($a),query($b),query($c)) a=field1:value1 b=field2:value2 c=field3:value3 def or default function returns the value of the first argument that matches. It's named default because it's more commonly used like def(popularity,50) (return the value of the popularity field, or 50 if the doc has no value for that field). -Yonik http://lucidworks.com On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena deepakk...@gmail.com wrote: I understand that lucene's AND (), OR (||) and NOT (!) operators are shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why one can't treat them as boolean operators (adhering to boolean algebra). I have been trying to construct a simple OR expression, as follows q = +(field1:value1 OR field2:value2) with a match on either field1 or field2. But since the OR is merely an optional, documents where both field1:value1 and field2:value2 are matched, the query returns a score resulting in a match on both the clauses. How do I enforce short-circuiting in this context? In other words, how to implement short-circuiting as in boolean algebra where an expression A || B || C returns true if A is true without even looking into whether B or C could be true. -Deepak
Re: short-circuit OR operator in lucene/solr
Sweet! On Mon, Jul 22, 2013 at 10:54 AM, Yonik Seeley yo...@lucidworks.com wrote: function queries to the rescue! q={!func}def(query($a),query($b),query($c)) a=field1:value1 b=field2:value2 c=field3:value3 def or default function returns the value of the first argument that matches. It's named default because it's more commonly used like def(popularity,50) (return the value of the popularity field, or 50 if the doc has no value for that field). -Yonik http://lucidworks.com On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena deepakk...@gmail.com wrote: I understand that lucene's AND (), OR (||) and NOT (!) operators are shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why one can't treat them as boolean operators (adhering to boolean algebra). I have been trying to construct a simple OR expression, as follows q = +(field1:value1 OR field2:value2) with a match on either field1 or field2. But since the OR is merely an optional, documents where both field1:value1 and field2:value2 are matched, the query returns a score resulting in a match on both the clauses. How do I enforce short-circuiting in this context? In other words, how to implement short-circuiting as in boolean algebra where an expression A || B || C returns true if A is true without even looking into whether B or C could be true. -Deepak
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
A couple of things I've learned along the way ... I had a similar architecture where we used fairly low numbers for auto-commits with openSearcher=false. This keeps the tlog to a reasonable size. You'll need something on the client side to send in the hard commit request to open a new searcher every N docs or M minutes. Be careful with raising the Zk timeout as that also determines how quickly Zk can detect a node has crashed (afaik). In other words, it takes the zk client timeout seconds for Zk to consider an ephemeral znode as gone, so I caution you in increasing this value too much. The other thing to be aware of is this leaderVoteWait safety mechanism ... might see log messages that look like: 2013-06-24 18:12:40,408 [coreLoadExecutor-4-thread-1] INFO solr.cloud.ShardLeaderElectionContext - Waiting until we see more replicas up: total=2 found=1 timeoutin=139368 From Mark M: This is a safety mechanism - you can turn it off by configuring leaderVoteWait to 0 in solr.xml. This is meant to protect the case where you stop a shard or it fails and then the first node to get started back up has stale data - you don't want it to just become the leader. So we wait to see everyone we know about in the shard up to 3 or 5 min by default. Then we know all the shards participate in the leader election and the leader will end up with all updates it should have. You can lower that wait or turn it off with 0. NOTE: I tried setting it to 0 and my cluster went haywire, so consider just lowering it but not making it zero ;-) Max heap of 8GB seems overly large to me for 8M docs per shard esp. since you're using MMapDirectory to cache the primary data structures of your index in OS cache. I have run shards with 40M docs with 6GB max heap and chose to have more aggressive cache eviction by using a smallish LFU filter cache. This approach seems to spread the cost of GC out over time vs. massive amounts of clean-up when a new searcher is opened. With 8M docs, each cached filter will require about 1M of memory, so it seems like you could run with a smaller heap. I'm not a GC expert but found that having smaller heap and more aggressive cache evictions reduced full GC's (and how long they run for) on my Solr instances. On Mon, Jul 22, 2013 at 8:09 AM, Shawn Heisey s...@elyograg.org wrote: On 7/22/2013 6:45 AM, Markus Jelsma wrote: You should increase your ZK time out, this may be the issue in your case. You may also want to try the G1GC collector to keep STW under ZK time out. When I tried G1, the occasional stop-the-world GC actually got worse. I tried G1 after trying CMS with no other tuning parameters. The average GC time went down, but when it got into a place where it had to do a stop-the-world collection, it was worse. Based on the GC statistics in jvisualvm and jstat, I didn't think I had a problem. The way I discovered that I had a problem was by looking at my haproxy load balancer -- sometimes requests would be sent to a backup server instead of my primary, because the ping request handler was timing out on the LB health check. The LB was set to time out after five seconds. When I went looking deeper with the GC log and some other tools, I was seeing 8-10 second GC pauses. G1 was showing me pauses of 12 seconds. Now I use a heavily tuned CMS config, and there are no more LB switches to a backup server. I've put some of my own information about my GC settings on my personal Solr wiki page: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I've got an 8GB heap on my systems running 3.5.0 (one copy of the index) and a 6GB heap on those running 4.2.1 (the other copy of the index). Summary: Just switching to the G1 collector won't solve GC pause problems. There's not a lot of G1 tuning information out there yet. If someone can come up with a good set of G1 tuning parameters, G1 might become better than CMS. Thanks, Shawn
RE: Problem instatanting a ValueSourceParser plugin in 4.3.1
Hi, Upgrading to Solr 4.2.1 works for my plugin but 4.3.1 does not work. I believe the ClassCastException which I am getting in 4.3.1 is due to this bug in 4.3.1: https://issues.apache.org/jira/browse/SOLR-4791 Thanks, Niran -Original Message- From: Abeygunawardena, Niran [mailto:niran.abeygunaward...@proquest.co.uk] Sent: 22 July 2013 16:01 To: solr-user@lucene.apache.org Subject: RE: Problem instatanting a ValueSourceParser plugin in 4.3.1 Thanks Tim. I copied my jar containing the plugin to the solr's lib directory as it wasn't finding my jar due to a bug in 4.3: https://issues.apache.org/jira/browse/SOLR-4791 but the ClassCastException remains. I'll try solr 4.2 and see if the plugin works in that. Cheers, Niran -Original Message- From: Timothy Potter [mailto:thelabd...@gmail.com] Sent: 22 July 2013 15:39 To: solr-user@lucene.apache.org Subject: Re: Problem instatanting a ValueSourceParser plugin in 4.3.1 I saw something similar and used an absolute path to my JAR file in solrconfig.xml vs. a relative path and it resolved the issue for me. Not elegant but worth trying, at least to rule that out. Tim On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran niran.abeygunaward...@proquest.co.uk wrote: Hi, I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which extends ValueSourceParser and it works under Solr 4.0.0 but it does not work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and lucene-4.3.1*.jars but I get the following stacktrace error when starting up a core referencing this plugin...seen below. Does anyone know why it might be giving me a ClassCastException under 4.3.1? Thanks, Niran 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer Unable to create core: example_core org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:821) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027) at org.apache.solr.core.SolrCore.init(SolrCore.java:749) ... 13 more Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19 more 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer null:org.apache.solr.common.SolrException: Unable to create core: example_core at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at
Re: deserializing highlighting json result
Exactly why is it difficult to deserialize? Seems simple enough. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, July 22, 2013 11:14 AM To: solr-user@lucene.apache.org Subject: deserializing highlighting json result When I request a json result I get the following streucture in the highlighting {highlighting:{ 394c65f1-dfb1-4b76-9b6c-2f14c9682cc9:{ PackageName:[- emTestingem channel twenty.]}, baf8434a-99a4-4046-8a4d-2f7ec09eafc8:{ PackageName:[- emTestingem channel twenty.]}, 0a699062-cd09-4b2e-a817-330193a352c1:{ PackageName:[- emTestingem channel twenty.]}, 0b9ec891-5ef8-4085-9de2-38bfa9ea327e:{ PackageName:[- emTestingem channel twenty.]}}} It is difficult to deserialize this json because the guid is in the attribute name. Is that solveable (using c#)?
Re: Node down, but not out
Why was it down? e.g. did it OOM? If so, the recommended approach is kill the process on OOM vs. leaving it in the cluster in a zombie state. I had similar issues when my nodes OOM'd is why I ask. That said, you can get the /clusterstate.json which contains Zk's status of a node using a request like: http://localhost:8983/solr/zookeeper?detail=truepath=%2Fclusterstate.json Although that would require some basic JSON processing to dig into the response to get the status of the node of interest, so you may want to implement a custom request handler. On Mon, Jul 22, 2013 at 9:55 AM, jimtronic jimtro...@gmail.com wrote: I've run into a problem recently that's difficult to debug and search for: I have three nodes in a cluster and this weekend one of the nodes went partially down. It no longer responds to distributed updates and it is marked as GONE in the Cloud view of the admin screen. That's not ideal, but there's still two boxes up so not the end of the world. The problem is that it is still responding to ping requests and returning queries successfully. In my setup, I have the three servers on an haproxy load balancer so that I can distribute requests and have clients stick to a specific solr box. Because the bad node is still returning OK to the ping requests and still returns results for simple queries, the load balancer does not remove it from the group. Is there a ping like request handler that would tell me whether the given box I'm hitting is still in the cloud? Thanks! Jim Musil -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: queryResultCache should not related with the order of fq's list
: By the way, if the issure is ok , how can I post my code? Take a look at this wiki page for imformation on submitting patches... https://wiki.apache.org/solr/HowToContribute https://wiki.apache.org/solr/HowToContribute#Generating_a_patch ...you can attach your patch directly to hte Jira issue you created... https://wiki.apache.org/solr/HowToContribute#Contributing_your_work -Hoss
Re: how to improve (keyword) relevance?
Sure, let's say the user types in test pdf; we need the results with all the query words to be near the top of the result set. the query will look like this: /select?q=text%3Atest+pdfwt=xml How do I ensure that the top resultset contains all of the query words? How can I boost the first (or second) term when they are both the same field (i.e. text)? Does this make sense? Please bear with me; I'm still new to the solr query syntax so I don't even know if I'm asking the right question. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Node down, but not out
I'm not sure why it went down exactly -- I restarted the process and lost the logs. (d'oh!) An OOM seems likely, however. Is there a setting for killing the processes when solr encounters an OOM? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto-sharding and numShard parameter
There is a reason of course, or else it wouldn't be like that. We addressed it recently. https://issues.apache.org/jira/browse/SOLR-3633 https://issues.apache.org/jira/browse/SOLR-3677 https://issues.apache.org/jira/browse/SOLR-4943 - Mark On Jul 22, 2013, at 10:57 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: That would be great. One step toward this goal is to stop treating the situation where there are no collections or cores as an error condition. It took me a while to get out of the mindset when bringing up a Solr install that I had to avoid that scenario at all costs, because red text == bad. There's no reason for the web interface to be deactivated when there are no collections or cores, though. Imagine if mysql didn't let you connect to it via phpmyadmin if you hadn't configured a database yet? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Sat, Jul 20, 2013 at 10:33 PM, Mark Miller markrmil...@gmail.com wrote: A lot has changed since those example were written - in general, we are moving away from that type of collection initialization and towards using the Collections API. Eventually, I'd personally like SolrCloud to ship with no predefined collections and have users simply start it and then start using the Collections API - preconfigured collections will be second class and possibly deprecated at some point. - Mark On Jul 20, 2013, at 10:13 PM, Erick Erickson erickerick...@gmail.com wrote: Flavio: One of the great things about having people continually using Solr (and SolrCloud) for the first time is the opportunity to improve the docs. Anyone can update/add to the docs, all it takes is a signon. Unfortunately we has a bunch of spam bots a while ago, so it's now a two step process 1 create a login on the Solr wiki 2 post a message on this list indicating that you'd like to help improve the Wiki and give us your Solr login. We'll add you to the list of people who can edit the wiki and you can help the community by improving the documentation. Best Erick On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Thank you for the reply Erick, I was facing exactly with that problem..from the documentation it seems that those parameter are required to run SolrCloud, instead they are just used to initialize a sample collection.. I think that in the examples on the user doc it should be better to separate those 2 concepts: one is starting the server, another one is creating/managing collections. Best, Flavio On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson erickerick...@gmail.comwrote: First the numShards parameter is only relevant the very first time you create your collection. It's a little confusing because in the SolrCloud examples you're getting collection1 by default. Look further down the SolrCloud Wiki page, the section titled Managing Collections via the Collections API for creating collections with a different name. Either way, either when you run the bootstrap command or when you create a new collection, that's the only time numShards counts. It's ignored the rest of the time. As far as data growing, you need to either 1 create enough shards to handle the eventual size things will be, sometimes called oversharding or 2 use the splitShard capabilities in very recent Solrs to expand capacity. Best Erick On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio
deserializing highlighting json result
When I request a json result I get the following streucture in the highlighting {highlighting:{ 394c65f1-dfb1-4b76-9b6c-2f14c9682cc9:{ PackageName:[- emTestingem channel twenty.]}, baf8434a-99a4-4046-8a4d-2f7ec09eafc8:{ PackageName:[- emTestingem channel twenty.]}, 0a699062-cd09-4b2e-a817-330193a352c1:{ PackageName:[- emTestingem channel twenty.]}, 0b9ec891-5ef8-4085-9de2-38bfa9ea327e:{ PackageName:[- emTestingem channel twenty.]}}} It is difficult to deserialize this json because the guid is in the attribute name. Is that solveable (using c#)?
Re: adding date column to the index
On 22 July 2013 20:01, Mysurf Mail stammail...@gmail.com wrote: I have added a date field to my index. I dont want the query to search on this field, but I want it to be returned with each row. So I have defined it in the scema.xml as follows: field name=LastModificationTime type=date indexed=false stored=true required=true/ I added it to the select in data-config.xml and I see it selected in the profiler. now, when I query all fileds (using the dashboard) I dont see it. Even when I ask for it specifically I dont see it. What am I doing wrong? (In the db it is (datetimeoffset(7))) Did you restart your Java container, and reindex? Regards, Gora
Re: XInclude and Document Entity not working on schema.xml
: to use Document Entity in schema.xml, I get this exception : : java.lang.RuntimeException: schema fieldtype : string(org.apache.solr.schema.StrField) invalid : arguments:{xml:base=solrres:/commonschema_types.xml} Elodie can you please open a bug in jira for this with your specific example? please note in the Jira your comment that it works in Solr 4.2.1 but fails in later versions (if you could test with 4.3 and the newly voted 4.4 that would be helpful.) : The same error appears in this bug (fixed ?): : https://issues.apache.org/jira/browse/SOLR-3087 That issue was specific to xinclude, not document entities, so it's possible the fix applied there did not affect/fix document entities -- but since you mentioned that you see document entity includes of fieldTypes working in 4.2.1 suggests that it might be a slightly diff problem, otherwise i would expect to see it fail as far back as 4.0 just like SOLR-3087... : I also try to use use XML XInclude mechanism : (http://en.wikipedia.org/wiki/XInclude) to include parts of schema.xml. : : When I try to include a fieldType, I get this exception : : org.apache.solr.common.SolrException: Unknown fieldType 'long' specified ...the issue you linked to before (SOLR-3087) included a specific test to ensure that fieldTYpes could be include like this, and that test works -- so pehaps in your testing you have some other subtle bug? what are the absolute paths of the various files you are trying to include in one another? -Hoss
Re: Programatic instantiation of solr container and cores with config loaded from a jar
Hi Alex, I'm not sure I follow - are you trying to create a ConfigSolr object from data read in from elsewhere, or trying to export the ConfigSolr object to another process? If you're dealing with solr core java objects, you'll need the solr jar and all its dependencies (including solrj). Alan Woodward www.flax.co.uk On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote: Does it mean that I can easily load Solr configuration as parsed by Solr from an external program? Because the last time I tried (4.3.1), the number of jars required was quite long, including SolrJ jar due to some exception. Regards., Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward a...@flax.co.uk wrote: Hi Robert, The upcoming 4.4 release should make this a bit easier (you can check out the release branch now if you like, or wait a few days for the official version). CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as constructor parameters, and you can create a ConfigSolr object from a string representation of solr.xml using the ConfigSolr.fromString() static method. Alan Woodward www.flax.co.uk On 22 Jul 2013, at 11:41, Robert Krüger wrote: Hi, I use solr embedded in a desktop app and I want to change it to no longer require the configuration for the container and core to be in the filesystem but rather be distributed as part of a jar file. Could someone kindly point me to the right docs? So far my impression is, I need to instantiate CoreContainer with a custom SolrResourceLoader with properties parsed via some other API but from the javadocs alone I feel a bit lost (why does it have to have an instance directory at all?) and googling did not give me many results. What would be ideal would be to have something like this (pseudocode with partly imagined names, which hopefully illustrates what I am trying to achieve): ContainerConfig containerConfig = ContainerConfigParser.parse(InputStream from Classloader); CoreContainer container = new CoreContainer(containerConfig); CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream from Classloader); container.register(name, coreConfig); Ideally I would like to keep XML format to reuse my current solr.xml and solrconfig.xml but that is just a nice-to-have. Does such a way exist and if so, what are the real API classes and calls to use? Thank you in advance, Robert
Re: Node down, but not out
There is but I couldn't get it to work in my environment on Jetty, see: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E Let me know if you have any better luck. I had to resort to something hacky but was out of time I could devote to such unproductive endeavors ;-) On Mon, Jul 22, 2013 at 10:49 AM, jimtronic jimtro...@gmail.com wrote: I'm not sure why it went down exactly -- I restarted the process and lost the logs. (d'oh!) An OOM seems likely, however. Is there a setting for killing the processes when solr encounters an OOM? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XInclude and Document Entity not working on schema.xml
: Elodie can you please open a bug in jira for this with your specific ... : ...the issue you linked to before (SOLR-3087) included a specific test to : ensure that fieldTYpes could be include like this, and that test works -- : so pehaps in your testing you have some other subtle bug? what are the : absolute paths of the various files you are trying to include in one : another? Hmm... actually, i had some time while i was on a conf call, so i just updated the test to also test entity includes, and i wan't able to reproduce either of hte problems you described. can you please take a look at this test, and the configs it uses, and compare with how you are trying to do things... http://svn.apache.org/r1505749 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test/org/apache/solr/core/TestXIncludeConfig.java?view=markup http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-xinclude.xml?view=markup http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-snippet-types.incl?view=markup http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-snippet-type.xml?view=markup -Hoss
Re: how to improve (keyword) relevance?
Again, you haven't indicated what the problem is. I mean, have you actually confirmed that a problem exists? Add debugQuery=true to your query and examine the explain section if you believe that Solr has improperly computed any document scores. If you simply want to boost a term in a query, use the ^ operator, which applies to the preceding term. a boost of 1.0 means no change, 2.0 means double, 0.5 means cut in half. But, you don't need to boost. Relevancy is based on the data in the documents themselves. BTW, q=text%3Atest+pdf does not search for pdf in the text field - field- qualification only applies to a single term, but you can use parentheses: q=text%3A(test+pdf) -- Jack Krupansky -Original Message- From: eShard Sent: Monday, July 22, 2013 12:34 PM To: solr-user@lucene.apache.org Subject: Re: how to improve (keyword) relevance? Sure, let's say the user types in test pdf; we need the results with all the query words to be near the top of the result set. the query will look like this: /select?q=text%3Atest+pdfwt=xml How do I ensure that the top resultset contains all of the query words? How can I boost the first (or second) term when they are both the same field (i.e. text)? Does this make sense? Please bear with me; I'm still new to the solr query syntax so I don't even know if I'm asking the right question. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Programatic instantiation of solr container and cores with config loaded from a jar
I am trying to read a solr config files from outside of running Solr instance. It's - one of the approaches - for SolrLint ( https://github.com/arafalov/SolrLint ). I kind of expected to just need core Solr classes for that, but I needed SolrJ and Lucene analyzer jar and a bunch of other jars. The goal was to avoid recreating valid/invalid parsing of config files and just use Solr's definition. Anyway, I don't want to hijack the thread. In the end, I think Solr's parse mechanism is probably not the best match for me, as I explicitly want to detect things like field definitions in wrong place or incorrect spelling and the current parser just ignores those by doing select XPath queries instead. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jul 22, 2013 at 1:16 PM, Alan Woodward a...@flax.co.uk wrote: Hi Alex, I'm not sure I follow - are you trying to create a ConfigSolr object from data read in from elsewhere, or trying to export the ConfigSolr object to another process? If you're dealing with solr core java objects, you'll need the solr jar and all its dependencies (including solrj). Alan Woodward www.flax.co.uk On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote: Does it mean that I can easily load Solr configuration as parsed by Solr from an external program? Because the last time I tried (4.3.1), the number of jars required was quite long, including SolrJ jar due to some exception. Regards., Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward a...@flax.co.uk wrote: Hi Robert, The upcoming 4.4 release should make this a bit easier (you can check out the release branch now if you like, or wait a few days for the official version). CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as constructor parameters, and you can create a ConfigSolr object from a string representation of solr.xml using the ConfigSolr.fromString() static method. Alan Woodward www.flax.co.uk On 22 Jul 2013, at 11:41, Robert Krüger wrote: Hi, I use solr embedded in a desktop app and I want to change it to no longer require the configuration for the container and core to be in the filesystem but rather be distributed as part of a jar file. Could someone kindly point me to the right docs? So far my impression is, I need to instantiate CoreContainer with a custom SolrResourceLoader with properties parsed via some other API but from the javadocs alone I feel a bit lost (why does it have to have an instance directory at all?) and googling did not give me many results. What would be ideal would be to have something like this (pseudocode with partly imagined names, which hopefully illustrates what I am trying to achieve): ContainerConfig containerConfig = ContainerConfigParser.parse(InputStream from Classloader); CoreContainer container = new CoreContainer(containerConfig); CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream from Classloader); container.register(name, coreConfig); Ideally I would like to keep XML format to reuse my current solr.xml and solrconfig.xml but that is just a nice-to-have. Does such a way exist and if so, what are the real API classes and calls to use? Thank you in advance, Robert
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: adding date column to the index
Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you re-index. On 07/22/2013 07:31 AM, Mysurf Mail wrote: I have added a date field to my index. I dont want the query to search on this field, but I want it to be returned with each row. So I have defined it in the scema.xml as follows: field name=LastModificationTime type=date indexed=false stored=true required=true/ I added it to the select in data-config.xml and I see it selected in the profiler. now, when I query all fileds (using the dashboard) I dont see it. Even when I ask for it specifically I dont see it. What am I doing wrong? (In the db it is (datetimeoffset(7)))
IllegalStateException
I'm seeing random crashes in solr 4.0 but I don't have anything to go on other than IllegalStateException. Other than checking for corrupt index and out of memory, what other things should I check? org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet default threw exception java.lang.IllegalStateException at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:483) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
Re: Performance of cross join vs block join
Hello Mikhail, ps: sending to the solr-user as well, i've realized i was writing just to you, sorry... On Mon, Jul 22, 2013 at 3:07 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Roman, Pleas get me right. I have no idea what happened with that dependency. There are recent patches from Yonik, they should be more actual, and I think he can help you with particular issues. From the common (captain's) sense I propose to specify any closer version of jetty, I don't think there are much reason to rely on that particular one. I'm thinking about your problem from time to time. You are right, it's definitely not a case for block join. I still trying to figure out how to make it computationally easier. As far as I get you have recursive many-to-many relationship and need to traverse it during the search. doc(id, author, text, references:[docid,] ) I'm not sure it's possible with lucene now, but if it can, what you think about writing DocValues stripe contains internal Lucene docnums instead of external docIds. It moves few steps from query time to index time, hence can get some performance. Our use case of many-to-many relations is probably a weird one and we ought to de-normalize the values. What I do (a building a citation network in memory, using Lucene caches) is just a work-around that happens to out-perform the index seeking, no surprise on that, but in the expense of memory. I am aware the de-normalization may be necessary, the DocValues would probably be a step forward to it - the joins give great flexibility, it is really cool, but that comes with its own price... Also, I mentioned you hesitates regarding cross segments join. You actually shouldn't due to the following reasons: - Join is a Solr code (which is a top reader beast); - it obtains and works with SolrIndexSearcher which is a top reader... - join happens at Weight without any awareness about leaf segments. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L272 Thanks, I think I have not used (i believe) because there was very small chance it could have been fast enough. It is reading terms/joins for docs that match the query, so in that sense, it is not different from pre-computing the citation cache - but it happens for every query/request, and so for 0.5M of edges it must take some time. But I guess I should measure it. I haven't made notes so now I am having hard time backtracking :) roman It seems to me cross segment join works well. On Mon, Jul 22, 2013 at 3:08 AM, Roman Chyla roman.ch...@gmail.comwrote: ah, in case you know the solution, here ant output: resolve: [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve] module not found: org.eclipse.jetty#jetty-deploy;8.1.10.v20130312 [ivy:retrieve] local: tried [ivy:retrieve] /home/rchyla/.ivy2/local/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/ivys/ivy.xml [ivy:retrieve] -- artifact org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar: [ivy:retrieve] /home/rchyla/.ivy2/local/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/jars/jetty-deploy.jar [ivy:retrieve] shared: tried [ivy:retrieve] /home/rchyla/.ivy2/shared/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/ivys/ivy.xml [ivy:retrieve] -- artifact org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar: [ivy:retrieve] /home/rchyla/.ivy2/shared/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/jars/jetty-deploy.jar [ivy:retrieve] public: tried [ivy:retrieve] http://repo1.maven.org/maven2/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom [ivy:retrieve] sonatype-releases: tried [ivy:retrieve] http://oss.sonatype.org/content/repositories/releases/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom [ivy:retrieve] -- artifact org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar: [ivy:retrieve] http://oss.sonatype.org/content/repositories/releases/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.jar [ivy:retrieve] maven.restlet.org: tried [ivy:retrieve] http://maven.restlet.org/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom [ivy:retrieve] -- artifact org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar: [ivy:retrieve] http://maven.restlet.org/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.jar [ivy:retrieve] working-chinese-mirror: tried [ivy:retrieve] http://mirror.netcologne.de/maven2/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom [ivy:retrieve] -- artifact org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar: [ivy:retrieve]
how number of indexed fields effect performance
Hi, We have a two shard solrcloud cluster with each shard allocated 3 separate machines. We do complex queries involving a number of filter queries coupled with group queries and faceting. All of our machines are 64 bit with 32 gb ram. Our index size is around 10gb with around 8,00,000 documents. We have around 1000 indexed fields per document. 6gb of memeory is allocated to tomcat under which solr is running on each of the six machines. We have a zookeeper ensemble consisting of 3 zookeeper instances running on 3 of the six machines with 4gb memory allocated to each of the zookeeper instance. First solr start taking too much time with Broken pipe exception because of timeout from client side coming again and again, then after sometime a whole shard goes down with one machine at at time followed by other machines. Is having 1000 fields indexed with each document resulting in this problem? If it is so, what would be the ideal number of indexed fields in such environment. Regards, Suryansh
Bug with Group.Limit and Group.Main in Distributed Case
We are using grouping in a distributed environment, and we have noticed a discrepancy: On a single core with a group.limit 1 and group.main=true, setting rows=10 will return 10 documents. A distributed setup with the same parameters will return 10 groups. We plan to open a jira ticket and submit a fix, but there is the question of which way to fix it. In the case where group.main is not set, the group.limit applies to the number of groups for both single and multi core cases, so that approach would be consistent. However, it seems to us that a user requesting the group.main results format will likely expect the group.limit to apply to the number of documents. A discussion held around an older fix a couple of years ago supports this view. (https://issues.apache.org/jira/browse/SOLR-2063) Unless there is a good case for the first approach, we plan to go with the second; I wanted to put this out to see if we're overlooking something - or if this was implemented in the way for some reason - feedback? Monica Skidmore Search Application Services Engineering Lead CareerBuilder.com
Fw:
Hi! http://210.172.48.53/google.com.offers.html
Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
I just have a little python script which I run with cron (luckily that's the granularity we have in Graphite). It reads the same JSON the admin UI displays and dumps numeric values into Graphite. I can open source it if you like. I just need to make sure I remove any hacks/shortcuts that I've taken because I'm working with our cluster! On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote: Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).
Re: how number of indexed fields effect performance
Was all of this running fine previously and only started running slow recently, or is this your first measurement? Are very simple queries (single keyword, no filters or facets or sorting or anything else, and returning only a few fields) working reasonably well? -- Jack Krupansky -Original Message- From: Suryansh Purwar Sent: Monday, July 22, 2013 4:07 PM To: solr-user@lucene.apache.org Subject: how number of indexed fields effect performance Hi, We have a two shard solrcloud cluster with each shard allocated 3 separate machines. We do complex queries involving a number of filter queries coupled with group queries and faceting. All of our machines are 64 bit with 32 gb ram. Our index size is around 10gb with around 8,00,000 documents. We have around 1000 indexed fields per document. 6gb of memeory is allocated to tomcat under which solr is running on each of the six machines. We have a zookeeper ensemble consisting of 3 zookeeper instances running on 3 of the six machines with 4gb memory allocated to each of the zookeeper instance. First solr start taking too much time with Broken pipe exception because of timeout from client side coming again and again, then after sometime a whole shard goes down with one machine at at time followed by other machines. Is having 1000 fields indexed with each document resulting in this problem? If it is so, what would be the ideal number of indexed fields in such environment. Regards, Suryansh
/update/extract error
Hi all, im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper. All its runing ok, documents are indexing in 2 diferent shards and select *:* give me all documents. Now im trying to add/index a new document via solj ussing CloudSolrServer. the code: CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(tika); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(C:\\sample.pdf), application/octet-stream); up.setParam(literal.id, 666); server.request(up); server.commit(); when up.setParam(literal.id, 666);, a exception is thown: *apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR: [doc=66 6] unknown field 'ignored_dcterms:modified'* at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:375) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43 9) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:918) at java.lang.Thread.run(Thread.java:662) My schema looks like this: fields field name=id type=integer indexed=true stored=true required=true/ field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text_ind indexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields my solrConfig.xml: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.Last-Modifiedlast_modified/str str name=uprefixignored_/str /lst lst name=date.formats str-MM-dd/str /lst /requestHandler i have already activate /admin/luke check the schema, no dcterms:modified field in the response only the corrects fields declared in schema.xml Can someone help me with this issue? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /update/extract error
You need a dynamic field pattern for ignored_* to ignore unmapped metadata. -- Jack Krupansky -Original Message- From: franagan Sent: Monday, July 22, 2013 5:14 PM To: solr-user@lucene.apache.org Subject: /update/extract error Hi all, im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper. All its runing ok, documents are indexing in 2 diferent shards and select *:* give me all documents. Now im trying to add/index a new document via solj ussing CloudSolrServer. the code: CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(tika); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(C:\\sample.pdf), application/octet-stream); up.setParam(literal.id, 666); server.request(up); server.commit(); when up.setParam(literal.id, 666);, a exception is thown: *apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR: [doc=66 6] unknown field 'ignored_dcterms:modified'* at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:375) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43 9) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:918) at java.lang.Thread.run(Thread.java:662) My schema looks like this: fields field name=id type=integer indexed=true stored=true required=true/ field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text_ind indexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields my solrConfig.xml: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.Last-Modifiedlast_modified/str str name=uprefixignored_/str /lst lst name=date.formats str-MM-dd/str /lst /requestHandler i have already activate /admin/luke check the schema, no dcterms:modified field in the response only the corrects fields declared in schema.xml Can someone help me with this issue? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /update/extract error
I added dynamicField name=ignored_* type=string indexed=true stored=true/ to the schema.xml and now its working. * Thank you very much Jack. * -- View this message in context: http://lucene.472066.n3.nabble.com/update-extract-error-in-Solr-4-3-1-tp4079555p4079564.html Sent from the Solr - User mailing list archive at Nabble.com.
Use same spell check dictionary across different collections
I have 2 collections, lets say coll1 and coll2. I configured solr.DirectSolrSpellChecker in coll1 solrconfig.xml and works fine. Now, I want to configure coll2 solrconfig.xml to use SAME spell check dictionary index created above. (I do not want coll2 prepare its own dictionary index but just do spell check against the coll1 Spell dictionary index) Is it possible to do it? Tried out with IndexBasedSpellChecker but could not get it working. Any suggestions? Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Use-same-spell-check-dictionary-across-different-collections-tp4079566.html Sent from the Solr - User mailing list archive at Nabble.com.
spellcheck and search in a same solr request
Hey, Is there a way to do spellcheck and search (using suggestions returned from spellcheck) in a single Solr request? I am seeing that if my query is spelled correctly, i get results but if misspelled, I just get suggestions. Any pointers will be very helpful. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-and-search-in-a-same-solr-request-tp4079571.html Sent from the Solr - User mailing list archive at Nabble.com.
softCommit doesn't work - ?
Hi, I use solr 4.3.1. I tried to index about 70 documents using sofCommit as below: SolrInputDocument doc = new SolrInputDocument(); result = fillMetaData(request, doc); // custom one int softCommit = 1; solrServer.add(doc, softCommit); Process ran very fast but there is nothing in the index neither after 10sec nor after restarting server application In the solr log I got something like that: 2013-07-23 01:58:01,543 INFO [org.apache.solr.update.processor.LogUpdateProcessor] (http-127.0.0.1-8090-5) [collection1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[Rep_CA_FairyCakes (1441307014244335616)]} 0 3 2013-07-23 01:58:01,546 INFO [org.apache.solr.update.UpdateHandler] (http-127.0.0.1-8090-5) start rollback{} 2013-07-23 01:58:01,547 INFO [org.apache.solr.update.DefaultSolrCoreState] (http-127.0.0.1-8090-5) Creating new IndexWriter... 2013-07-23 01:58:01,547 INFO [org.apache.solr.update.DefaultSolrCoreState] (http-127.0.0.1-8090-5) Waiting until IndexWriter is unused... core=collection1 2013-07-23 01:58:01,547 INFO [org.apache.solr.update.DefaultSolrCoreState] (http-127.0.0.1-8090-5) Rollback old IndexWriter... core=collection1 2013-07-23 01:58:01,617 INFO [org.apache.solr.core.SolrCore] (http-127.0.0.1-8090-5) SolrDeletionPolicy.onInit: commits:num=1 commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@7ed1f882; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ew,generation=536,filenames=[_ah_Lucene41_0.tim, _9d.fdt, _a5.fdx, _ag_Lucene41_0.pos, _9l.si, _a7.nvd, _a0_Lucene41_0.pos, _ah_Lucene41_0.tip, _9d.fdx, _a5.fdt, _9r.fnm, _97_Lucene41_0.doc, _9k_Lucene41_0.tim, _a7.nvm, _ad.fnm, _9k_Lucene41_0.tip, _a9.fnm, _9g.nvm, _ao_Lucene41_0.tim, _ao_Lucene41_0.tip, _9i_Lucene41_0.doc, _a2.nvm, _az_Lucene41_0.tim, _az_Lucene41_0.tip, _af_Lucene41_0.pos, _9t.nvm, _9w.fnm, _9z.si, _a9_Lucene41_0.tim, _9h.fnm, _9g.nvd, _a9_Lucene41_0.tip, _9d_Lucene41_0.pos, _9t.nvd, _a3.fdx, _aw.nvm, _9i_Lucene41_0.pos, _98.fnm, _a3.fdt, _a8_Lucene41_0.tim, _am.nvd, _aw.nvd, _a8_Lucene41_0.tip, _9f.si, _ap.fdt, _ag.fdt, _au.fnm, _aq.nvm, _ap.fdx, _av.fdt, _a0.si, _ac_Lucene41_0.doc, _a9_Lucene41_0.doc, _at_Lucene41_0.doc, _9u.fdx, _9z.fnm, _9d.si, _af.nvd, _9j_Lucene41_0.doc, _9u.fdt, _ag.fdx, _9b.si, _af.nvm, _9q.fnm, _aw_Lucene41_0.tim, _aw_Lucene41_0.tip, _ao.fnm, _9f.fnm, _a1.fdt, _9l_Lucene41_0.pos, _ad_Lucene41_0.pos, _a1.fdx, _aa_Lucene41_0.tip, _aa_Lucene41_0.tim, _9j_Lucene41_0.pos, _a2.nvd, _aj.nvd, _9o.fnm, _am.fnm, _9t_Lucene41_0.doc, _av.fdx, _ab.fdt, _an.nvd, _at.nvd, _ao_Lucene41_0.doc, _al.fnm, _9e_Lucene41_0.doc, _ab.fdx, _9x.fnm, _aj.nvm, _at.nvm, _ai.fnm, _9a_Lucene41_0.tim, _ak.nvm, _a2_Lucene41_0.doc, _an.nvm, _ah.nvd, _aw.fnm, _al_Lucene41_0.doc, _9a_Lucene41_0.tip, _9f_Lucene41_0.tim, _aq.fnm, _ah.nvm, _9k.nvd, _9b.nvm, _9c.fnm, _9f_Lucene41_0.tip, _9y_Lucene41_0.pos, _ax_Lucene41_0.doc, _av_Lucene41_0.tip, _ar_Lucene41_0.tim, _9c.si, _av_Lucene41_0.tim, _9b.nvd, _ar_Lucene41_0.tip, _as_Lucene41_0.tip, _as_Lucene41_0.tim, _ae_Lucene41_0.pos, _9j.si, _9z.nvd, _9y_Lucene41_0.doc, _a6_Lucene41_0.doc, _9d_Lucene41_0.doc, _ao.nvd, _9m.fdx, _ac.fdx, _a6.si, _aa_Lucene41_0.doc, _9m.fdt, _ac.fdt, _a3_Lucene41_0.pos, _av_Lucene41_0.doc, _9k.nvm, _ay_Lucene41_0.pos, _9z.nvm, _ai_Lucene41_0.tim, _aq.si, _ap_Lucene41_0.pos, _ai_Lucene41_0.tip, _96.si, _ab_Lucene41_0.pos, _9e.fnm, _as_Lucene41_0.doc, _9h.si, _96.nvm, _96.nvd, _ae.fdt, _9f_Lucene41_0.pos, _a4.fdx, _ae.fdx, _a4.fdt, _9j.fnm, _9z_Lucene41_0.doc, _9p.nvm, _aw.si, _a8.nvm, _9p.nvd, _9s.fdx, _9v.fnm, _a8.nvd, _9f_Lucene41_0.doc, _9s.fdt, _a2.si, _ai.si, _9o_Lucene41_0.tip, _a3.si, _9o_Lucene41_0.tim, _aj_Lucene41_0.tip, _aj_Lucene41_0.tim, _99.si, _9k_Lucene41_0.pos, _97.fdt, _9w.fdx, _a5.si, _9s_Lucene41_0.pos, _9w.fdt, _aj.fnm, _97.fdx, _9p.fdx, _9t.fnm, _9j.fdx, _9j.fdt, _ar_Lucene41_0.pos, _au_Lucene41_0.doc, _9p_Lucene41_0.doc, _9a.fdx, _9j_Lucene41_0.tip, _9q.nvd, _at_Lucene41_0.tip, _an.si, _9j_Lucene41_0.tim, _at_Lucene41_0.tim, _ad.fdx, _az_Lucene41_0.doc, _ad.fdt, _9q.nvm, _9g.fdx, _ax_Lucene41_0.pos, _9r.fdt, _9g.fdt, _9r.fdx, _9a.fdt, _a7.si, _98.nvm, _au_Lucene41_0.tim, _ag.nvm, _az.si, _au_Lucene41_0.tip, _ag.nvd, _ao.nvm, _9o.fdx, _9q_Lucene41_0.tip, _ax.si, _9p_Lucene41_0.pos, _9q_Lucene41_0.tim, _az.fdx, _a1.si, _98.nvd, _az.fdt, _9w_Lucene41_0.doc, _aa_Lucene41_0.pos, _ag.fnm, _a9.nvm, _aa.nvm, _a2.fnm, _9b_Lucene41_0.tip, _ak.nvd, _9b_Lucene41_0.tim, _a9.nvd, _ai.nvm, _9i.fdx, _a3.fnm, _9e_Lucene41_0.pos, _a7_Lucene41_0.tip, _9z.fdx, _a7_Lucene41_0.tim, _ai.nvd, _aa.nvd, _9i.fdt, _9z.fdt, _ae_Lucene41_0.doc, _9t_Lucene41_0.pos, _ak.si, _97_Lucene41_0.pos, _al_Lucene41_0.tim, _ax.nvm, _9x.nvm, _ap.fnm, _9c_Lucene41_0.pos, _ah.si, _ax.nvd, _af.fdx, _af.fdt, _a6.fdx, _ac.fnm, _9r_Lucene41_0.pos, _al_Lucene41_0.tip, _a1_Lucene41_0.pos, _9t_Lucene41_0.tip, _a4.fnm, _ak_Lucene41_0.pos,
salutations
http://tagtjek.nu/kbjdzhn/qvpcuvlvvyhpgxkjamkgc chris sleeman 7/23/2013 2:37:13 AM
Re:
Hi! http://brubud.pl/cnn.com.today.html
how number of indexed fields effect performance
It was running fine initially when we just had around 100 fields indexed. In this case as well it runs fine but after sometime broken pipe exception starts coming which results in shard getting down. Regards, Suryansh On Tuesday, July 23, 2013, Jack Krupansky wrote: Was all of this running fine previously and only started running slow recently, or is this your first measurement? Are very simple queries (single keyword, no filters or facets or sorting or anything else, and returning only a few fields) working reasonably well? -- Jack Krupansky -Original Message- From: Suryansh Purwar Sent: Monday, July 22, 2013 4:07 PM To: solr-user@lucene.apache.org Subject: how number of indexed fields effect performance Hi, We have a two shard solrcloud cluster with each shard allocated 3 separate machines. We do complex queries involving a number of filter queries coupled with group queries and faceting. All of our machines are 64 bit with 32 gb ram. Our index size is around 10gb with around 8,00,000 documents. We have around 1000 indexed fields per document. 6gb of memeory is allocated to tomcat under which solr is running on each of the six machines. We have a zookeeper ensemble consisting of 3 zookeeper instances running on 3 of the six machines with 4gb memory allocated to each of the zookeeper instance. First solr start taking too much time with Broken pipe exception because of timeout from client side coming again and again, then after sometime a whole shard goes down with one machine at at time followed by other machines. Is having 1000 fields indexed with each document resulting in this problem? If it is so, what would be the ideal number of indexed fields in such environment. Regards, Suryansh
Question about field boost
Dear Solr experts: Here is my query: defType=dismaxq=term1+term2qf=title^100 content Apparently (at least I thought) my intention is to boost the title field. While I'm getting some non-trivial results, I'm surprised that the documents with both term1 and term2 in title (I know such docs do exist in my repository) were not returned (or maybe ranked very low). The situation does not change even when I use much larger boost factors. What am I doing wrong?
Re: Question about field boost
Maybe you're not doing anything wrong - other than having an artificial expectation of what the true relevance of your data actually is. Many factors go into relevance scoring. You need to look at all aspects of your data. Maybe your terms don't occur in your titles the way you think they do. Maybe you need a boost of 500 or more... Lots of potential maybes. Relevancy tuning is an art and craft, hardly a science. Step one: Know your data, inside and out. Use the debugQuery=true parameter on your queries and see how much of the score is dominated by your query terms in the non-title fields. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Monday, July 22, 2013 11:06 PM To: solr-user@lucene.apache.org Subject: Question about field boost Dear Solr experts: Here is my query: defType=dismaxq=term1+term2qf=title^100 content Apparently (at least I thought) my intention is to boost the title field. While I'm getting some non-trivial results, I'm surprised that the documents with both term1 and term2 in title (I know such docs do exist in my repository) were not returned (or maybe ranked very low). The situation does not change even when I use much larger boost factors. What am I doing wrong?
Re: how number of indexed fields effect performance
After restarting Solr and doing a couple of queries to warm the caches, are queries already slow/failing, or does it take some time and a number of queries before failures start occurring? One possibility is that you just need a lot more memory for caches for this amount of data. So, maybe the failures are caused by heavy garbage collections. So, after restarting Solr, check how much Java heap is available, then do some warming queries, then check the Java heap available again. Add the debugQuery=true parameter to your queries and look at the timings to see what phases of query processing are taking the most time. Also check whether the reported QTime seems to match actual wall clock time; sometimes formatting of the results and network transfer time can dwarf actual query time. How many fields are you returning on a typical query? -- Jack Krupansky -Original Message- From: Suryansh Purwar Sent: Monday, July 22, 2013 11:06 PM To: solr-user@lucene.apache.org ; j...@basetechnology.com Subject: how number of indexed fields effect performance It was running fine initially when we just had around 100 fields indexed. In this case as well it runs fine but after sometime broken pipe exception starts coming which results in shard getting down. Regards, Suryansh On Tuesday, July 23, 2013, Jack Krupansky wrote: Was all of this running fine previously and only started running slow recently, or is this your first measurement? Are very simple queries (single keyword, no filters or facets or sorting or anything else, and returning only a few fields) working reasonably well? -- Jack Krupansky -Original Message- From: Suryansh Purwar Sent: Monday, July 22, 2013 4:07 PM To: solr-user@lucene.apache.org Subject: how number of indexed fields effect performance Hi, We have a two shard solrcloud cluster with each shard allocated 3 separate machines. We do complex queries involving a number of filter queries coupled with group queries and faceting. All of our machines are 64 bit with 32 gb ram. Our index size is around 10gb with around 8,00,000 documents. We have around 1000 indexed fields per document. 6gb of memeory is allocated to tomcat under which solr is running on each of the six machines. We have a zookeeper ensemble consisting of 3 zookeeper instances running on 3 of the six machines with 4gb memory allocated to each of the zookeeper instance. First solr start taking too much time with Broken pipe exception because of timeout from client side coming again and again, then after sometime a whole shard goes down with one machine at at time followed by other machines. Is having 1000 fields indexed with each document resulting in this problem? If it is so, what would be the ideal number of indexed fields in such environment. Regards, Suryansh
Re: Question about field boost
Thanks for your hint, Jack. Here is the debug results, which I'm having a hard deciphering (the two terms are china and snowden)... 0.26839527 = (MATCH) sum of: 0.26839527 = (MATCH) sum of: 0.26757246 = (MATCH) max of: 7.9147343E-4 = (MATCH) weight(content:china in 249), product of: 0.019873314 = queryWeight(content:china), product of: 1.6649085 = idf(docFreq=46832, maxDocs=91058) 0.01193658 = queryNorm 0.039825942 = (MATCH) fieldWeight(content:china in 249), product of: 4.8989797 = tf(termFreq(content:china)=24) 1.6649085 = idf(docFreq=46832, maxDocs=91058) 0.0048828125 = fieldNorm(field=content, doc=249) 0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of: 0.5836803 = queryWeight(title:china^10.0), product of: 10.0 = boost 4.8898454 = idf(docFreq=1861, maxDocs=91058) 0.01193658 = queryNorm 0.45842302 = (MATCH) fieldWeight(title:china in 249), product of: 1.0 = tf(termFreq(title:china)=1) 4.8898454 = idf(docFreq=1861, maxDocs=91058) 0.09375 = fieldNorm(field=title, doc=249) 8.2282536E-4 = (MATCH) max of: 8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of: 0.03407834 = queryWeight(content:snowden), product of: 2.8549502 = idf(docFreq=14246, maxDocs=91058) 0.01193658 = queryNorm 0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product of: 1.7320508 = tf(termFreq(content:snowden)=3) 2.8549502 = idf(docFreq=14246, maxDocs=91058) 0.0048828125 = fieldNorm(field=content, doc=249) On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky j...@basetechnology.comwrote: Maybe you're not doing anything wrong - other than having an artificial expectation of what the true relevance of your data actually is. Many factors go into relevance scoring. You need to look at all aspects of your data. Maybe your terms don't occur in your titles the way you think they do. Maybe you need a boost of 500 or more... Lots of potential maybes. Relevancy tuning is an art and craft, hardly a science. Step one: Know your data, inside and out. Use the debugQuery=true parameter on your queries and see how much of the score is dominated by your query terms in the non-title fields. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Monday, July 22, 2013 11:06 PM To: solr-user@lucene.apache.org Subject: Question about field boost Dear Solr experts: Here is my query: defType=dismaxq=term1+term2**qf=title^100 content Apparently (at least I thought) my intention is to boost the title field. While I'm getting some non-trivial results, I'm surprised that the documents with both term1 and term2 in title (I know such docs do exist in my repository) were not returned (or maybe ranked very low). The situation does not change even when I use much larger boost factors. What am I doing wrong?
Re: Question about field boost
Is my reading correct that the boost is only applied on china but not snowden? How can that be? My query is: q=china+snowdenqf=title^10 content On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang smartag...@gmail.com wrote: Thanks for your hint, Jack. Here is the debug results, which I'm having a hard deciphering (the two terms are china and snowden)... 0.26839527 = (MATCH) sum of: 0.26839527 = (MATCH) sum of: 0.26757246 = (MATCH) max of: 7.9147343E-4 = (MATCH) weight(content:china in 249), product of: 0.019873314 = queryWeight(content:china), product of: 1.6649085 = idf(docFreq=46832, maxDocs=91058) 0.01193658 = queryNorm 0.039825942 = (MATCH) fieldWeight(content:china in 249), product of: 4.8989797 = tf(termFreq(content:china)=24) 1.6649085 = idf(docFreq=46832, maxDocs=91058) 0.0048828125 = fieldNorm(field=content, doc=249) 0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of: 0.5836803 = queryWeight(title:china^10.0), product of: 10.0 = boost 4.8898454 = idf(docFreq=1861, maxDocs=91058) 0.01193658 = queryNorm 0.45842302 = (MATCH) fieldWeight(title:china in 249), product of: 1.0 = tf(termFreq(title:china)=1) 4.8898454 = idf(docFreq=1861, maxDocs=91058) 0.09375 = fieldNorm(field=title, doc=249) 8.2282536E-4 = (MATCH) max of: 8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of: 0.03407834 = queryWeight(content:snowden), product of: 2.8549502 = idf(docFreq=14246, maxDocs=91058) 0.01193658 = queryNorm 0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product of: 1.7320508 = tf(termFreq(content:snowden)=3) 2.8549502 = idf(docFreq=14246, maxDocs=91058) 0.0048828125 = fieldNorm(field=content, doc=249) On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky j...@basetechnology.comwrote: Maybe you're not doing anything wrong - other than having an artificial expectation of what the true relevance of your data actually is. Many factors go into relevance scoring. You need to look at all aspects of your data. Maybe your terms don't occur in your titles the way you think they do. Maybe you need a boost of 500 or more... Lots of potential maybes. Relevancy tuning is an art and craft, hardly a science. Step one: Know your data, inside and out. Use the debugQuery=true parameter on your queries and see how much of the score is dominated by your query terms in the non-title fields. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Monday, July 22, 2013 11:06 PM To: solr-user@lucene.apache.org Subject: Question about field boost Dear Solr experts: Here is my query: defType=dismaxq=term1+term2**qf=title^100 content Apparently (at least I thought) my intention is to boost the title field. While I'm getting some non-trivial results, I'm surprised that the documents with both term1 and term2 in title (I know such docs do exist in my repository) were not returned (or maybe ranked very low). The situation does not change even when I use much larger boost factors. What am I doing wrong?
Re: Question about field boost
That means that for that document china occurs in the title vs. snowden found in a document but not in the title. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Tuesday, July 23, 2013 12:52 AM To: solr-user@lucene.apache.org Subject: Re: Question about field boost Is my reading correct that the boost is only applied on china but not snowden? How can that be? My query is: q=china+snowdenqf=title^10 content On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang smartag...@gmail.com wrote: Thanks for your hint, Jack. Here is the debug results, which I'm having a hard deciphering (the two terms are china and snowden)... 0.26839527 = (MATCH) sum of: 0.26839527 = (MATCH) sum of: 0.26757246 = (MATCH) max of: 7.9147343E-4 = (MATCH) weight(content:china in 249), product of: 0.019873314 = queryWeight(content:china), product of: 1.6649085 = idf(docFreq=46832, maxDocs=91058) 0.01193658 = queryNorm 0.039825942 = (MATCH) fieldWeight(content:china in 249), product of: 4.8989797 = tf(termFreq(content:china)=24) 1.6649085 = idf(docFreq=46832, maxDocs=91058) 0.0048828125 = fieldNorm(field=content, doc=249) 0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of: 0.5836803 = queryWeight(title:china^10.0), product of: 10.0 = boost 4.8898454 = idf(docFreq=1861, maxDocs=91058) 0.01193658 = queryNorm 0.45842302 = (MATCH) fieldWeight(title:china in 249), product of: 1.0 = tf(termFreq(title:china)=1) 4.8898454 = idf(docFreq=1861, maxDocs=91058) 0.09375 = fieldNorm(field=title, doc=249) 8.2282536E-4 = (MATCH) max of: 8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of: 0.03407834 = queryWeight(content:snowden), product of: 2.8549502 = idf(docFreq=14246, maxDocs=91058) 0.01193658 = queryNorm 0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product of: 1.7320508 = tf(termFreq(content:snowden)=3) 2.8549502 = idf(docFreq=14246, maxDocs=91058) 0.0048828125 = fieldNorm(field=content, doc=249) On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky j...@basetechnology.comwrote: Maybe you're not doing anything wrong - other than having an artificial expectation of what the true relevance of your data actually is. Many factors go into relevance scoring. You need to look at all aspects of your data. Maybe your terms don't occur in your titles the way you think they do. Maybe you need a boost of 500 or more... Lots of potential maybes. Relevancy tuning is an art and craft, hardly a science. Step one: Know your data, inside and out. Use the debugQuery=true parameter on your queries and see how much of the score is dominated by your query terms in the non-title fields. -- Jack Krupansky -Original Message- From: Joe Zhang Sent: Monday, July 22, 2013 11:06 PM To: solr-user@lucene.apache.org Subject: Question about field boost Dear Solr experts: Here is my query: defType=dismaxq=term1+term2**qf=title^100 content Apparently (at least I thought) my intention is to boost the title field. While I'm getting some non-trivial results, I'm surprised that the documents with both term1 and term2 in title (I know such docs do exist in my repository) were not returned (or maybe ranked very low). The situation does not change even when I use much larger boost factors. What am I doing wrong?
Re: adding date column to the index
clarify: I did deleted the data in the index and reloaded it (+ commit). (As i said, I have seen it loaded in the sb profiler) Thanks for your comment. On Mon, Jul 22, 2013 at 9:25 PM, Lance Norskog goks...@gmail.com wrote: Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you re-index. On 07/22/2013 07:31 AM, Mysurf Mail wrote: I have added a date field to my index. I dont want the query to search on this field, but I want it to be returned with each row. So I have defined it in the scema.xml as follows: field name=LastModificationTime type=date indexed=false stored=true required=true/ I added it to the select in data-config.xml and I see it selected in the profiler. now, when I query all fileds (using the dashboard) I dont see it. Even when I ask for it specifically I dont see it. What am I doing wrong? (In the db it is (datetimeoffset(7)))
Re: deserializing highlighting json result
the guid appears as the attribute id and not as id:baf8434a-99a4-4046-8a4d-2f7ec09eafc8: Trying to create an object that holds this guid will create an attribute with name baf8434a-99a4-4046-8a4d-2f7ec09eafc8 On Mon, Jul 22, 2013 at 6:30 PM, Jack Krupansky j...@basetechnology.comwrote: Exactly why is it difficult to deserialize? Seems simple enough. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, July 22, 2013 11:14 AM To: solr-user@lucene.apache.org Subject: deserializing highlighting json result When I request a json result I get the following streucture in the highlighting {highlighting:{ 394c65f1-dfb1-4b76-9b6c-**2f14c9682cc9:{ PackageName:[- emTestingem channel twenty.]}, baf8434a-99a4-4046-8a4d-**2f7ec09eafc8:{ PackageName:[- emTestingem channel twenty.]}, 0a699062-cd09-4b2e-a817-**330193a352c1:{ PackageName:[- emTestingem channel twenty.]}, 0b9ec891-5ef8-4085-9de2-**38bfa9ea327e:{ PackageName:[- emTestingem channel twenty.]}}} It is difficult to deserialize this json because the guid is in the attribute name. Is that solveable (using c#)?