Re: Issue with large html indexing
ok. see this: http://s23.postimg.org/yck2s5k1n/html_indexing.png On Wed, Oct 23, 2013 at 10:45 PM, Erick Erickson erickerick...@gmail.comwrote: Attachments and images are often eaten by the mail server, your image is not visible at least to me. Can you describe what you're seeing? Or post the image somewhere and provide a link? Best, Erick On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan raheelhasan@gmail.com wrote: Hi, I have an issue here while indexing large html. Here is the confguration for that: 1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH) 2) Schema has this for the field: type=text_en_splitting indexed=true stored=false required=false 3) text_en_splitting has the following work done for indexing: HTMLStripCharFilterFactory WhitespaceTokenizerFactory (create tokens) StopFilterFactory WordDelimiterFilterFactory ICUFoldingFilterFactory PorterStemFilterFactory RemoveDuplicatesTokenFilterFactory LengthFilterFactory However, the indexed data is like this (as in the attached image): [image: Inline image 1] so what are these numbers? If I put small html, it works fine, but as the size of html file increases, this is what happens.. -- Regards, Raheel Hasan -- Regards, Raheel Hasan
Re: Minor bug with CloudSolrServer and collection-alias.
Thanks to both of you for fixing the bug. Impressive response time for the fix (7 hours). Thomas Egense On Wed, Oct 23, 2013 at 7:16 PM, Mark Miller markrmil...@gmail.com wrote: I filed https://issues.apache.org/jira/browse/SOLR-5380 and just committed a fix. - Mark On Oct 23, 2013, at 11:15 AM, Shawn Heisey s...@elyograg.org wrote: On 10/23/2013 3:59 AM, Thomas Egense wrote: Using cloudSolrServer.setDefaultCollection(collectionId) does not work as intended for an alias spanning more than 1 collection. The virtual collection-alias collectionID is recoqnized as a existing collection, but it does only query one of the collections it is mapped to. You can confirm this easy in AliasIntegrationTest. The test-class AliasIntegrationTest creates to cores with 2 and 3 different documents. And then creates an alias pointing to both of them. Line 153: // search with new cloud client CloudSolrServer cloudSolrServer = new CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean()); cloudSolrServer.setParallelUpdates(random().nextBoolean()); query = new SolrQuery(*:*); query.set(collection, testalias); res = cloudSolrServer.query(query); cloudSolrServer.shutdown(); assertEquals(5, res.getResults().getNumFound()); No unit-test bug here, however if you change it from setting the collectionid on the query but on CloudSolrServer instead,it will produce the bug: // search with new cloud client CloudSolrServer cloudSolrServer = new CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean()); cloudSolrServer.setDefaultCollection(testalias); cloudSolrServer.setParallelUpdates(random().nextBoolean()); query = new SolrQuery(*:*); //query.set(collection, testalias); res = cloudSolrServer.query(query); cloudSolrServer.shutdown(); assertEquals(5, res.getResults().getNumFound()); -- Assertion failure Should I create a Jira issue for this? Thomas, I have confirmed this with the following test patch, which adds to the test rather than changing what's already there: http://apaste.info/9ke5 I'm about to head off to the train station to start my commute, so I will be unavailable for a little while. If you haven't gotten the jira filed by the time I get to another computer, I will create it. Thanks, Shawn
RE: New shard leaders or existing shard replicas depends on zookeeper?
Absolutely, the scenario I'm seeing does _sound_ like I've not specified the number of shards, but I think I have - the evidence is: - DnumShards=24 defined within the /etc/sysconfig/solrnode* files - DnumShards=24 seen on each 'ps' line (two nodes listed here): tomcat 26135 1 5 09:51 ?00:00:22 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode1 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp org.apache.catalina.startup.Bootstrap start tomcat 26225 1 5 09:51 ?00:00:19 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode2 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp org.apache.catalina.startup.Bootstrap start - The Solr node dashboard shows -DnumShards=24 in its list of Args for each node And yet, the ldwa01 nodes are leader and replica of shard 17 and there are no other shard leaders created. Plus, if I only change the ZK ensemble declarations in /etc/system/solrnode* to the different dev ZK servers, all 24 leaders are created before any replicas are added. I can also mention, when I browse the Cloud view, I can see both the ldwa01 collection and the ukdomain collection listed, suggesting that this information comes from the ZKs - I assume this is as expected. Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed for ldwa01 but these addresses are also listed as 'Down' in the ukdomain collection (except for :8983 which only shows in the ldwa01 collection). Any help very gratefully received. Gil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 23 October 2013 18:50 To: solr-user@lucene.apache.org Subject: Re: New shard leaders or existing shard replicas depends on zookeeper? My first impulse would be to ask how you created the collection. It sure _sounds_ like you didn't specify 24 shards and thus have only a single shard, one leader and 23 replicas bq: ...to point to the zookeeper ensemble also used for the ukdomain collection... so my guess is that this ZK ensemble has the ldwa01 collection defined as having only one shard I admit I pretty much skimmed your post though... Best, Erick On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil gil.hogga...@bl.uk wrote: Hi solr-users, I'm seeing some confusing behaviour in Solr/zookeeper and hope you can shed some light on what's happening/how I can correct it. We have two physical servers running automated builds of RedHat 6.4 and Solr 4.4.0 that host two separate Solr services. The first server (called ld01) has 24 shards and hosts a collection called 'ukdomain'; the second server (ld02) also has 24 shards and hosts a different collection called 'ldwa01'. It's evidently important to note that previously both of these physical servers provided the 'ukdomain' collection, but the 'ldwa01' server has been rebuilt for the new collection. When I start the ldwa01 solr nodes with their zookeeper configuration (defined in /etc/sysconfig/solrnode* and with collection.configName as 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes initially become shard leaders and then replicas as I'd expect. But if I change the ldwa01 solr nodes to point to the zookeeper ensemble also used for the ukdomain collection, all ldwa01 solr nodes start on the same shard (that is, the first ldwa01 solr node becomes the shard leader, then every other solr node becomes a replica for this shard). The significant point here is no other ldwa01 shards gain leaders (or replicas). The ukdomain collection uses a zookeeper collection.configName of 'ukdomaincfg', and prior to the creation of this ldwa01 service the collection.configName of 'ldwa01cfg' has never previously been used. So I'm
Re: Terms function join with a Select function ?
Dear All, Ok I have an answer concerning the first question (limit) It's the terms.limit parameters. But I can't find how to apply a Terms request on a query result any idea ? Bruno Le 23/10/2013 23:19, Bruno Mannina a écrit : Dear Solr users, I use the Terms function to see the frequency data in a field but it's for the whole database. I have 2 questions: - Is it possible to increase the number of statistic ? actually I have the 10 first frequency term. - Is it possible to limit this statistic to the result of a request ? PS: the second question is very important for me. Many thanks --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
SolrCloud: optimizing a core triggers optimizations of all cores in that collection?
Hi! I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2. Today I trigered the optimization on core *shard2_replica2* which only contained 3M docs, and 2.7G. The size of the other shards were shard3=2.7G and shard1=48G (the routing is implicit but after some update deadlocks and restarts the shard range in Zookeeper got null and everything since then apparently got indexed to shard1) So, half an hour after i triggered the optimization, via the Admin UI, i noticed that used space was increasing alot on *both servers* for cores *shard1_replica1 and shard1_replica2*. It was now 67G and increasing. In the end after about 40 minutes from the start operation shard1 was done optimizing on both servers leaving shard1_replica1 and shard1_replica2 at about 33G. Any idea what is happening and why the core on which i wanted the optimization to happen, got no optimization and instead another shard got optimized, on both servers? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimizations-of-all-cores-in-that-collection-tp4097499.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck with Distributed Search (sharding).
Any idea? 2013/10/23 Luis Cappa Banda luisca...@gmail.com More info: When executing the Query to a single Solr server it works: http://solr1:8080/events/data/suggest?q=mwt=jsonhttp://solrclusterd.buguroo.dev:8080/events/data/suggest?q=mwt=json { - responseHeader: { - status: 0, - QTime: 1 }, - response: { - numFound: 0, - start: 0, - docs: [ ] }, - spellcheck: { - suggestions: [ - m, - { - numFound: 4, - startOffset: 0, - endOffset: 1, - suggestion: [ - marca, - marcacom, - mis, - mispelotas ] } ] } } But when choosing the Request handler this way it doesn't: http://solr1:8080/events/data/select?*qt=/sugges*twt=jsonq=*:*http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggestwt=jsonq=*:* 2013/10/23 Luis Cappa Banda luisca...@gmail.com Hello! I'be been trying to enable Spellchecking using sharding following the steps from the Wiki, but I failed, :-( What I do is: *Solrconfig.xml* *searchComponent name=suggest* class=solr.SpellCheckComponent lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldsuggestion/str str name=buildOnOptimizetrue/str /lst /searchComponent *requestHandler name=/suggest* class=solr.SearchHandler lst name=defaults str name=dfsuggestion/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=last-components strsuggest/str /arr /requestHandler *Note:* I have two shards (solr1 and solr2) and both have the same solrconfig.xml. Also, bot indexes were optimized to create the spellchecker indexes. *Query* solr1:8080/events/data/select?q=mqt=/suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data * * *Response* * * { - responseHeader: { - status: 404, - QTime: 12, - params: { - shards: solr1:8080/events/data,solr2:8080/events/data, - shards.qt: /suggestion, - q: m, - wt: json, - qt: /suggestion } }, - error: { - msg: Server at http://solr1:8080/events/data returned non ok status:404, message:Not Found, - code: 404 } } More query syntaxes that I used and that doesn't work: http://solr1:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data http://solr1:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data Any idea of what I'm doing wrong? Thank you very much in advance! Best regards, -- - Luis Cappa -- - Luis Cappa -- - Luis Cappa
Proposal for new feature, cold replicas, brainstorming
I'm wondering some time ago if it's possible have replicas of a shard synchronized but in an state that they can't accept queries only updates. This replica in replication mode only awake to accept queries if it's the last alive replica and goes to replication mode when other replica becomes alive and synchronized. The motivation of this is simple, I want have replication but I don't want have n replicas actives with full resources allocated (cache and so on). This is usefull in enviroments where replication is needed but a high query throughput is not fundamental and the resources are limited. I know that right now is not possible, but I think that it's a feature that can be implemented in a easy way creating a new status for shards. The bottom line question is, I'm the only one with this kind of requeriments? Does it make sense one functionality like this? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Proposal-for-new-feature-cold-replicas-brainstorming-tp4097501.html Sent from the Solr - User mailing list archive at Nabble.com.
Query result caching with custom functions
Hi all! Got a question on the Solr cache :) I've written a custom function, which is able to provide a distance based on some DocValues to re-sort result lists. This basically works great, but we've got the problem that if I don't change the query, but the function parameters, Solr delivers a cached result without re-ordering. I turned off caching and see there, problem solved. But of course this is not a avenue I want to pursue further as it doesn't make sense for a prodcutive system. Do you have any ideas (beyond fake query modification and turning off caching) to counteract? btw. I'm using Solr 4.4 (so if you are aware of the issue and it has been resolved in 4.5 I'll port it :) The code I'm using is at https://bitbucket.org/dermotte/liresolr regards, Mathias -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Solr subset searching in 100-million document index
Hi, We have a Solr index of around 100 million documents with each document being given a region id growing at a rate of about 10 million documents per month - the average document size being aronud 10KB of pure text. The total number of region ids are themselves in the range of 2.5 million. We want to search for a query with a given list of region ids. The number of region ids in this list is usually around 250-300 (most of the time), but can be upto 500, with a maximum cap of around 2000 ids in one request. What is the best way to model such queries besides using an IN param in the query, or using a Filter FQ in the query? Are there any other faster methods available? If it may help, the index is on a VM with 4 virtual-cores and has currently 4GB of Java memory allocated out of 16GB in the machine. The number of queries do not exceed more than 1 per minute for now. If needed, we can throw more hardware to the index - but the index will still be only on a single machine for atleast 6 months. Regards, Sandeep Gupta
Re: Terms function join with a Select function ?
That would be called faceting :) http://wiki.apache.org/solr/SimpleFacetParameters On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote: Dear All, Ok I have an answer concerning the first question (limit) It's the terms.limit parameters. But I can't find how to apply a Terms request on a query result any idea ? Bruno Le 23/10/2013 23:19, Bruno Mannina a écrit : Dear Solr users, I use the Terms function to see the frequency data in a field but it's for the whole database. I have 2 questions: - Is it possible to increase the number of statistic ? actually I have the 10 first frequency term. - Is it possible to limit this statistic to the result of a request ? PS: the second question is very important for me. Many thanks --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Basic query process question with fl=id
Hi Any distributed lookup is basically composed of two stages: the first collecting all the matching documents from every shard and a second which fetches additional information about specific ids (i.e stored, termVectors). It can be seen in the logs of each shard (isShard=true), where first request logs the num of hits that were received on the query by the specific shard and a second that contains the ids fields (ids=...) for the additional fetch. At the end of both I get a total QTime of the query and the total num of hits. My question is about the case only id's are requested (fl=id). This query should make only one request against a shard, while it actually does the two of them. Looks like the response builder has to go through these two stages no matter what is the kind of query. My question: 1. Is it normal the response builder has to go though both stages? 2. Does the first request gets internal lucene DocId's or the actual uniqueKey id? 3. A query as above (fl=id), where is the Id read from? Is it fetched from the stored file? or doc value file if exists? Because if fetched from the stored, a high row param (say 1000 in my case) would need 1000 lookups which could badly heart performance. Thanks Manuel
RE: Spellcheck with Distributed Search (sharding).
Is it that your request handler is named /suggest but you are setting shards.qt to /suggestion ? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Luis Cappa Banda [mailto:luisca...@gmail.com] Sent: Thursday, October 24, 2013 6:22 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck with Distributed Search (sharding). Any idea? 2013/10/23 Luis Cappa Banda luisca...@gmail.com More info: When executing the Query to a single Solr server it works: http://solr1:8080/events/data/suggest?q=mwt=jsonhttp://solrclusterd.buguroo.dev:8080/events/data/suggest?q=mwt=json { - responseHeader: { - status: 0, - QTime: 1 }, - response: { - numFound: 0, - start: 0, - docs: [ ] }, - spellcheck: { - suggestions: [ - m, - { - numFound: 4, - startOffset: 0, - endOffset: 1, - suggestion: [ - marca, - marcacom, - mis, - mispelotas ] } ] } } But when choosing the Request handler this way it doesn't: http://solr1:8080/events/data/select?*qt=/sugges*twt=jsonq=*:*http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggestwt=jsonq=*:* 2013/10/23 Luis Cappa Banda luisca...@gmail.com Hello! I'be been trying to enable Spellchecking using sharding following the steps from the Wiki, but I failed, :-( What I do is: *Solrconfig.xml* *searchComponent name=suggest* class=solr.SpellCheckComponent lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldsuggestion/str str name=buildOnOptimizetrue/str /lst /searchComponent *requestHandler name=/suggest* class=solr.SearchHandler lst name=defaults str name=dfsuggestion/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=last-components strsuggest/str /arr /requestHandler *Note:* I have two shards (solr1 and solr2) and both have the same solrconfig.xml. Also, bot indexes were optimized to create the spellchecker indexes. *Query* solr1:8080/events/data/select?q=mqt=/suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data * * *Response* * * { - responseHeader: { - status: 404, - QTime: 12, - params: { - shards: solr1:8080/events/data,solr2:8080/events/data, - shards.qt: /suggestion, - q: m, - wt: json, - qt: /suggestion } }, - error: { - msg: Server at http://solr1:8080/events/data returned non ok status:404, message:Not Found, - code: 404 } } More query syntaxes that I used and that doesn't work: http://solr1:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data http://solr1:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data Any idea of what I'm doing wrong? Thank you very much in advance! Best regards, -- - Luis Cappa -- - Luis Cappa -- - Luis Cappa
Re: Proposal for new feature, cold replicas, brainstorming
On Thu, 2013-10-24 at 13:27 +0200, yriveiro wrote: The motivation of this is simple, I want have replication but I don't want have n replicas actives with full resources allocated (cache and so on). This is usefull in enviroments where replication is needed but a high query throughput is not fundamental and the resources are limited. Coincidentally we recently talked about the exact same setup. We are looking at sharding a 20 TB index into 20 * 1 TB shards, each located on their own dedicated physical SSD, which has more than enough horsepower for our needs. For replication, we have a remote storage system capable of serving requests for 2-4 shards with acceptable latency. Projected performance for the SSD setup is superior (5-10 times) to our remote storage, so we would like to hit only the SSDs if possible. Setting up a cloud to issue all requests to the SSD-shards unless a catastrophic failure happened to on of them and in that case fallback to the remote story replica for only that shard, would be perfect. I know that right now is not possible, but I think that it's a feature that can be implemented in a easy way creating a new status for shards. shardIsLastResort=true? On paper it seems like a simple addition, but I am not at familiar enough with the SolrCloud-code to guess if it is easy to implement. - Toke Eskildsen, State and University Library, Denmark
Searching on special characters
Hi, How should I setup Solr so I can search and get hit on special characters such as: + - || ! ( ) { } [ ] ^ ~ * ? : \ My need is, if a user has text like so: Doc-#1: (Solr) Doc-#2: Solr And they type (solr) I want a hit on (solr) only in document #1, with the brackets matching. And if they type solr, they will get a hit in Document #2 only. An additional nice-to-have is, if they type solr, I want a hit in both document #1 and #2. Here is what my current schema.xml looks like: analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=1 stemEnglishPossessive=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Currently, special characters are being stripped. Any idea how I can configure Solr to do this? I'm using Solr 3.6. Thanks !! -MJ
Re: Searching on special characters
Have two or three copies of the text, one field could be raw string and boosted heavily for exact match, a second could be text using the keyword tokenizer but with lowercase filter also heavily boosted, and the third field general, tokenized text with a lower boost. You could also have a copy that uses the keyword tokenizer to maintain a single token but also applies a regex filter to strip special characters and applies a lower case filter and give that an intermediate boost. -- Jack Krupansky -Original Message- From: johnmu...@aol.com Sent: Thursday, October 24, 2013 9:20 AM To: solr-user@lucene.apache.org Subject: Searching on special characters Hi, How should I setup Solr so I can search and get hit on special characters such as: + - || ! ( ) { } [ ] ^ ~ * ? : \ My need is, if a user has text like so: Doc-#1: (Solr) Doc-#2: Solr And they type (solr) I want a hit on (solr) only in document #1, with the brackets matching. And if they type solr, they will get a hit in Document #2 only. An additional nice-to-have is, if they type solr, I want a hit in both document #1 and #2. Here is what my current schema.xml looks like: analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=1 stemEnglishPossessive=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Currently, special characters are being stripped. Any idea how I can configure Solr to do this? I'm using Solr 3.6. Thanks !! -MJ
Re: Spellcheck with Distributed Search (sharding).
I'ts just a type error, sorry about that! The Request Handler is OK spelled and it doesn't work. 2013/10/24 Dyer, James james.d...@ingramcontent.com Is it that your request handler is named /suggest but you are setting shards.qt to /suggestion ? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Luis Cappa Banda [mailto:luisca...@gmail.com] Sent: Thursday, October 24, 2013 6:22 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck with Distributed Search (sharding). Any idea? 2013/10/23 Luis Cappa Banda luisca...@gmail.com More info: When executing the Query to a single Solr server it works: http://solr1:8080/events/data/suggest?q=mwt=json http://solrclusterd.buguroo.dev:8080/events/data/suggest?q=mwt=json { - responseHeader: { - status: 0, - QTime: 1 }, - response: { - numFound: 0, - start: 0, - docs: [ ] }, - spellcheck: { - suggestions: [ - m, - { - numFound: 4, - startOffset: 0, - endOffset: 1, - suggestion: [ - marca, - marcacom, - mis, - mispelotas ] } ] } } But when choosing the Request handler this way it doesn't: http://solr1:8080/events/data/select?*qt=/sugges*twt=jsonq=*:* http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggestwt=jsonq=*:* 2013/10/23 Luis Cappa Banda luisca...@gmail.com Hello! I'be been trying to enable Spellchecking using sharding following the steps from the Wiki, but I failed, :-( What I do is: *Solrconfig.xml* *searchComponent name=suggest* class=solr.SpellCheckComponent lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldsuggestion/str str name=buildOnOptimizetrue/str /lst /searchComponent *requestHandler name=/suggest* class=solr.SearchHandler lst name=defaults str name=dfsuggestion/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=last-components strsuggest/str /arr /requestHandler *Note:* I have two shards (solr1 and solr2) and both have the same solrconfig.xml. Also, bot indexes were optimized to create the spellchecker indexes. *Query* solr1:8080/events/data/select?q=mqt=/suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data * * *Response* * * { - responseHeader: { - status: 404, - QTime: 12, - params: { - shards: solr1:8080/events/data,solr2:8080/events/data, - shards.qt: /suggestion, - q: m, - wt: json, - qt: /suggestion } }, - error: { - msg: Server at http://solr1:8080/events/data returned non ok status:404, message:Not Found, - code: 404 } } More query syntaxes that I used and that doesn't work: http://solr1:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data http://solrclusterd.buguroo.dev:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data http://solr1:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data http://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data Any idea of what I'm doing wrong? Thank you very much in advance! Best regards, -- - Luis Cappa -- - Luis Cappa -- - Luis Cappa -- - Luis Cappa
Re: Searching on special characters
I'm not sure what you mean. Based on what you are saying, is there an example of how I can setup my schema.xml to get the result I need? Also, the way I execute a search is using http://localhost:8080/solr/select/?q=search-term Does your solution require me to change this? If so, in what way? It would be great if all this is documented somewhere, so I won't have to bug you guys !!! --MJ -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Oct 24, 2013 9:39 am Subject: Re: Searching on special characters Have two or three copies of the text, one field could be raw string and boosted heavily for exact match, a second could be text using the keyword tokenizer but with lowercase filter also heavily boosted, and the third field general, tokenized text with a lower boost. You could also have a copy that uses the keyword tokenizer to maintain a single token but also applies a regex filter to strip special characters and applies a lower case filter and give that an intermediate boost. -- Jack Krupansky -Original Message- From: johnmu...@aol.com Sent: Thursday, October 24, 2013 9:20 AM To: solr-user@lucene.apache.org Subject: Searching on special characters Hi, How should I setup Solr so I can search and get hit on special characters such as: + - || ! ( ) { } [ ] ^ ~ * ? : \ My need is, if a user has text like so: Doc-#1: (Solr) Doc-#2: Solr And they type (solr) I want a hit on (solr) only in document #1, with the brackets matching. And if they type solr, they will get a hit in Document #2 only. An additional nice-to-have is, if they type solr, I want a hit in both document #1 and #2. Here is what my current schema.xml looks like: analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=1 stemEnglishPossessive=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Currently, special characters are being stripped. Any idea how I can configure Solr to do this? I'm using Solr 3.6. Thanks !! -MJ
Re: Issue with large html indexing
On 10/24/2013 2:11 AM, Raheel Hasan wrote: ok. see this: http://s23.postimg.org/yck2s5k1n/html_indexing.png A recap. You said your index analysis chain is this: HTMLStripCharFilterFactory WhitespaceTokenizerFactory (create tokens) StopFilterFactory WordDelimiterFilterFactory ICUFoldingFilterFactory PorterStemFilterFactory RemoveDuplicatesTokenFilterFactory LengthFilterFactory Your picture says you have 1 document, and this field contains 1036 terms. The numbers are likely numbers that are in your html document. You never showed us the input document. It is likely that the whitespace tokenizer and/or the WordDelimeter filter are producing these numbers as standalone tokens. The tokenizer is pretty easy to understand - it splits on whitespace. Please see the following to know what the options for WordDelimeterFilterFactory will do: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory Thanks, Shawn
Re: Query result caching with custom functions
On 10/24/2013 5:35 AM, Mathias Lux wrote: I've written a custom function, which is able to provide a distance based on some DocValues to re-sort result lists. This basically works great, but we've got the problem that if I don't change the query, but the function parameters, Solr delivers a cached result without re-ordering. I turned off caching and see there, problem solved. But of course this is not a avenue I want to pursue further as it doesn't make sense for a prodcutive system. Do you have any ideas (beyond fake query modification and turning off caching) to counteract? btw. I'm using Solr 4.4 (so if you are aware of the issue and it has been resolved in 4.5 I'll port it :) The code I'm using is at https://bitbucket.org/dermotte/liresolr I suspect that the queryResultCache is not paying attention to the fact that parameters for your plugin have changed. This probably means that your plugin must somehow inform the cache check code that something HAS changed. How you actually do this is a mystery to me because it involves parts of the code that are beyond my understanding, but it MIGHT involve making sure that parameters related to your code are saved as part of the entry that goes into the cache. Thanks, Shawn
Re: Solr subset searching in 100-million document index
Sandeep, This type of operation can often be expressed as a PostFilter very efficiently. This is particularly true if the region id's are integer keys. Joel On Thu, Oct 24, 2013 at 7:46 AM, Sandeep Gupta sandy@gmail.com wrote: Hi, We have a Solr index of around 100 million documents with each document being given a region id growing at a rate of about 10 million documents per month - the average document size being aronud 10KB of pure text. The total number of region ids are themselves in the range of 2.5 million. We want to search for a query with a given list of region ids. The number of region ids in this list is usually around 250-300 (most of the time), but can be upto 500, with a maximum cap of around 2000 ids in one request. What is the best way to model such queries besides using an IN param in the query, or using a Filter FQ in the query? Are there any other faster methods available? If it may help, the index is on a VM with 4 virtual-cores and has currently 4GB of Java memory allocated out of 16GB in the machine. The number of queries do not exceed more than 1 per minute for now. If needed, we can throw more hardware to the index - but the index will still be only on a single machine for atleast 6 months. Regards, Sandeep Gupta --
Re: Query result caching with custom functions
Mathias, I'd have to do a close review of the function sort code to be sure, but I suspect if you implement the equals() method on the ValueSource it should solve your caching issue. Also implement hashCode(). Joel On Thu, Oct 24, 2013 at 10:35 AM, Shawn Heisey s...@elyograg.org wrote: On 10/24/2013 5:35 AM, Mathias Lux wrote: I've written a custom function, which is able to provide a distance based on some DocValues to re-sort result lists. This basically works great, but we've got the problem that if I don't change the query, but the function parameters, Solr delivers a cached result without re-ordering. I turned off caching and see there, problem solved. But of course this is not a avenue I want to pursue further as it doesn't make sense for a prodcutive system. Do you have any ideas (beyond fake query modification and turning off caching) to counteract? btw. I'm using Solr 4.4 (so if you are aware of the issue and it has been resolved in 4.5 I'll port it :) The code I'm using is at https://bitbucket.org/dermotte/liresolr I suspect that the queryResultCache is not paying attention to the fact that parameters for your plugin have changed. This probably means that your plugin must somehow inform the cache check code that something HAS changed. How you actually do this is a mystery to me because it involves parts of the code that are beyond my understanding, but it MIGHT involve making sure that parameters related to your code are saved as part of the entry that goes into the cache. Thanks, Shawn
Re: Solr not indexing everything from MongoDB
That's typical for an index that receives updates to the same document. Are you sure your keys are unique? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Oct 23, 2013 at 5:57 PM, gohome190 gohome...@gmail.com wrote: numFound is 10. numDocs is 10, maxDoc is 23. Yeah, Solr 4.x! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-not-indexing-everything-from-MongoDB-tp4097302p4097340.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: New shard leaders or existing shard replicas depends on zookeeper?
I think my question is easier, because I think the problem below was caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg' zk collection name didn't specify the number of shards (and thus defaulted to 1). So, how can I change the number of shards for an existing collection/zk collection name, especially when the ZK ensemble in question is the production version and supporting other Solr collections that I do not want to interrupt. (Which I think means that I can't just delete the clusterstate.json and restart the ZKs as this will also lose the other Solr collection information.) Thanks in advance, Gil -Original Message- From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] Sent: 24 October 2013 10:13 To: solr-user@lucene.apache.org Subject: RE: New shard leaders or existing shard replicas depends on zookeeper? Absolutely, the scenario I'm seeing does _sound_ like I've not specified the number of shards, but I think I have - the evidence is: - DnumShards=24 defined within the /etc/sysconfig/solrnode* files - DnumShards=24 seen on each 'ps' line (two nodes listed here): tomcat 26135 1 5 09:51 ?00:00:22 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode1 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp org.apache.catalina.startup.Bootstrap start tomcat 26225 1 5 09:51 ?00:00:19 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode2 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp org.apache.catalina.startup.Bootstrap start - The Solr node dashboard shows -DnumShards=24 in its list of Args for each node And yet, the ldwa01 nodes are leader and replica of shard 17 and there are no other shard leaders created. Plus, if I only change the ZK ensemble declarations in /etc/system/solrnode* to the different dev ZK servers, all 24 leaders are created before any replicas are added. I can also mention, when I browse the Cloud view, I can see both the ldwa01 collection and the ukdomain collection listed, suggesting that this information comes from the ZKs - I assume this is as expected. Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed for ldwa01 but these addresses are also listed as 'Down' in the ukdomain collection (except for :8983 which only shows in the ldwa01 collection). Any help very gratefully received. Gil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 23 October 2013 18:50 To: solr-user@lucene.apache.org Subject: Re: New shard leaders or existing shard replicas depends on zookeeper? My first impulse would be to ask how you created the collection. It sure _sounds_ like you didn't specify 24 shards and thus have only a single shard, one leader and 23 replicas bq: ...to point to the zookeeper ensemble also used for the ukdomain collection... so my guess is that this ZK ensemble has the ldwa01 collection defined as having only one shard I admit I pretty much skimmed your post though... Best, Erick On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil gil.hogga...@bl.uk wrote: Hi solr-users, I'm seeing some confusing behaviour in Solr/zookeeper and hope you can shed some light on what's happening/how I can correct it. We have two physical servers running automated builds of RedHat 6.4 and Solr 4.4.0 that host two separate Solr services. The first server (called ld01) has 24 shards and hosts a collection called 'ukdomain'; the second server (ld02) also has 24 shards and hosts a different collection called 'ldwa01'. It's evidently important to note that previously both of these physical servers provided the 'ukdomain' collection, but the 'ldwa01' server has been rebuilt for the new collection. When I start the ldwa01 solr
Re: New shard leaders or existing shard replicas depends on zookeeper?
Ah yes, I was about to mention that, -DnumShards is only actually used when the collection is being created for the first time. After that point (i.e. once the collection exists in ZK), passing it along the command line is redundant (Solr won't actually read it). I know preferred mechanism of creating collections is to use the collectionAPI, in which case you never use -DnumShards at all. Having it on the command line can be confusing (we've fallen into that trap too!) The only way to change the number of shards on a collection is to use the collection API to split a shard (and currently you can only do that in steps of 2, so you'll need to do 1-2, 2-4, 4-8, 8-16. You can't get from 1 - 24 as its not a power of 2 :( What you want is https://issues.apache.org/jira/browse/SOLR-5004 Otherwise, you'll need to create a new collection and re-index everything into that. On 24 October 2013 16:35, Hoggarth, Gil gil.hogga...@bl.uk wrote: I think my question is easier, because I think the problem below was caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg' zk collection name didn't specify the number of shards (and thus defaulted to 1). So, how can I change the number of shards for an existing collection/zk collection name, especially when the ZK ensemble in question is the production version and supporting other Solr collections that I do not want to interrupt. (Which I think means that I can't just delete the clusterstate.json and restart the ZKs as this will also lose the other Solr collection information.) Thanks in advance, Gil -Original Message- From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] Sent: 24 October 2013 10:13 To: solr-user@lucene.apache.org Subject: RE: New shard leaders or existing shard replicas depends on zookeeper? Absolutely, the scenario I'm seeing does _sound_ like I've not specified the number of shards, but I think I have - the evidence is: - DnumShards=24 defined within the /etc/sysconfig/solrnode* files - DnumShards=24 seen on each 'ps' line (two nodes listed here): tomcat 26135 1 5 09:51 ?00:00:22 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode1 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp org.apache.catalina.startup.Bootstrap start tomcat 26225 1 5 09:51 ?00:00:19 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode2 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp org.apache.catalina.startup.Bootstrap start - The Solr node dashboard shows -DnumShards=24 in its list of Args for each node And yet, the ldwa01 nodes are leader and replica of shard 17 and there are no other shard leaders created. Plus, if I only change the ZK ensemble declarations in /etc/system/solrnode* to the different dev ZK servers, all 24 leaders are created before any replicas are added. I can also mention, when I browse the Cloud view, I can see both the ldwa01 collection and the ukdomain collection listed, suggesting that this information comes from the ZKs - I assume this is as expected. Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed for ldwa01 but these addresses are also listed as 'Down' in the ukdomain collection (except for :8983 which only shows in the ldwa01 collection). Any help very gratefully received. Gil -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 23 October 2013 18:50 To: solr-user@lucene.apache.org Subject: Re: New shard leaders or existing shard replicas depends on zookeeper? My first impulse would be to ask how you created the collection. It sure _sounds_ like you didn't specify 24 shards and thus have only a single shard, one leader and
Re: Query result caching with custom functions
That's a possibility, I'll try that and report on the effects. Thanks, Mathias Am 24.10.2013 16:52 schrieb Joel Bernstein joels...@gmail.com: Mathias, I'd have to do a close review of the function sort code to be sure, but I suspect if you implement the equals() method on the ValueSource it should solve your caching issue. Also implement hashCode(). Joel On Thu, Oct 24, 2013 at 10:35 AM, Shawn Heisey s...@elyograg.org wrote: On 10/24/2013 5:35 AM, Mathias Lux wrote: I've written a custom function, which is able to provide a distance based on some DocValues to re-sort result lists. This basically works great, but we've got the problem that if I don't change the query, but the function parameters, Solr delivers a cached result without re-ordering. I turned off caching and see there, problem solved. But of course this is not a avenue I want to pursue further as it doesn't make sense for a prodcutive system. Do you have any ideas (beyond fake query modification and turning off caching) to counteract? btw. I'm using Solr 4.4 (so if you are aware of the issue and it has been resolved in 4.5 I'll port it :) The code I'm using is at https://bitbucket.org/dermotte/liresolr I suspect that the queryResultCache is not paying attention to the fact that parameters for your plugin have changed. This probably means that your plugin must somehow inform the cache check code that something HAS changed. How you actually do this is a mystery to me because it involves parts of the code that are beyond my understanding, but it MIGHT involve making sure that parameters related to your code are saved as part of the entry that goes into the cache. Thanks, Shawn
Re: Proposal for new feature, cold replicas, brainstorming
With a shard with listening status and some logic on the mechanism that does the load balancing between replicas, we can achieve the goal. The SPLITSHARD action makes replicas from the original shard which are in inactive state, this shards buffering the updates and when the operation ends, the parent shard becomes inactive and the new replicas are promoted to active state. Like inactive state, we can have a listening state that never becomes active, unless a leader election operation happen and this shard with listening status be the unique that is alive. In addition, is necessary add new metadata to the shard on clusterstate.json file to mark that replica as a replica with replication purposes, and resigns when other replica becomes active. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, October 24, 2013 at 2:16 PM, Toke Eskildsen wrote: On Thu, 2013-10-24 at 13:27 +0200, yriveiro wrote: The motivation of this is simple, I want have replication but I don't want have n replicas actives with full resources allocated (cache and so on). This is usefull in enviroments where replication is needed but a high query throughput is not fundamental and the resources are limited. Coincidentally we recently talked about the exact same setup. We are looking at sharding a 20 TB index into 20 * 1 TB shards, each located on their own dedicated physical SSD, which has more than enough horsepower for our needs. For replication, we have a remote storage system capable of serving requests for 2-4 shards with acceptable latency. Projected performance for the SSD setup is superior (5-10 times) to our remote storage, so we would like to hit only the SSDs if possible. Setting up a cloud to issue all requests to the SSD-shards unless a catastrophic failure happened to on of them and in that case fallback to the remote story replica for only that shard, would be perfect. I know that right now is not possible, but I think that it's a feature that can be implemented in a easy way creating a new status for shards. shardIsLastResort=true? On paper it seems like a simple addition, but I am not at familiar enough with the SolrCloud-code to guess if it is easy to implement. - Toke Eskildsen, State and University Library, Denmark
[ANNOUNCE] Apache Solr 4.5.1 released.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Solr™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug fixes. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90 ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z xrG1W1o2sjrYkioJ7nZK =8++G -END PGP SIGNATURE-
Re: [ANNOUNCE] Apache Solr 4.5.1 released.
Download redirects to 4.5.0 Is there a typo in the server path? On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller markrmil...@apache.org wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Solr™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug fixes. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90 ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z xrG1W1o2sjrYkioJ7nZK =8++G -END PGP SIGNATURE-
Re: [ANNOUNCE] Apache Solr 4.5.1 released.
Use a different server than default gets 4.5.1 On Thu, Oct 24, 2013 at 9:35 AM, Jack Park jackp...@topicquests.org wrote: Download redirects to 4.5.0 Is there a typo in the server path? On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller markrmil...@apache.org wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Solr™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug fixes. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn 4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90 ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9 3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z xrG1W1o2sjrYkioJ7nZK =8++G -END PGP SIGNATURE-
Re: Changing indexed property on a field from false to true
Upayavira - Nice idea pushing in a nominal update when all fields are stored, and it does work. The nominal update could be sent to a boolean type dynamic field, that's not to be used for anything other than maybe identifying documents that are done re-indexing. On Wed, Oct 23, 2013 at 7:47 PM, Upayavira u...@odoko.co.uk wrote: The content needs to be re-indexed, the question is whether you can use the info in the index to do it rather than pushing fresh copies of the documents to the index. I've often wondered whether atomic updates could be used to handle this sort of thing. If all fields are stored, push a nominal update to cause the document to be re-indexed. I've never tried it though. I'd be curious to know if it works. Upayavira On Wed, Oct 23, 2013, at 02:25 PM, michael.boom wrote: Being given field name=title type=string bindexed=false* stored=true multiValued=false / Changed to field name=title type=string bindexed=true* stored=true multiValued=false / Once the above is done and the collection reloaded, is there a way I can build that index on that field, without reindexing the everything? Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: New query-time multi-word synonym expander
Jack - watch https://issues.apache.org/jira/browse/SOLR-5379 - comments from the author are there. Markus - ah, yes. I see I even managed to (re)name SOLR-5379 *exactly* the same as SOLR-4381 :) But the author of SOLR-5379 points out its advantages over SOLR-4381. Would be great if people could try it and leave comments with any issues, so we can iterate on the patch to make it committable. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Oct 23, 2013 at 1:13 PM, Markus Jelsma markus.jel...@openindex.io wrote: Nice, but now we got three multi-word synonym parsers? Didn't the LUCENE-4499 or SOLR-4381 patches work? I know the latter has had a reasonable amount of users and committers on github, but it was never brought back to ASF it seems. -Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Wednesday 23rd October 2013 18:54 To: solr-user@lucene.apache.org Subject: New query-time multi-word synonym expander Hi, Heads up that there is new query-time multi-word synonym expander patch in https://issues.apache.org/jira/browse/SOLR-5379 This worked for our customer and we hope it works for others. Any feedback would be greatly appreciated. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com
Re: Changing indexed property on a field from false to true
When this gets interesting is if we had batch atomic updates. Imagine you could do indexCount++ fro all docs matching the query category:sport. Could be really useful. /dreaming. Upayavira On Thu, Oct 24, 2013, at 05:40 PM, Aloke Ghoshal wrote: Upayavira - Nice idea pushing in a nominal update when all fields are stored, and it does work. The nominal update could be sent to a boolean type dynamic field, that's not to be used for anything other than maybe identifying documents that are done re-indexing. On Wed, Oct 23, 2013 at 7:47 PM, Upayavira u...@odoko.co.uk wrote: The content needs to be re-indexed, the question is whether you can use the info in the index to do it rather than pushing fresh copies of the documents to the index. I've often wondered whether atomic updates could be used to handle this sort of thing. If all fields are stored, push a nominal update to cause the document to be re-indexed. I've never tried it though. I'd be curious to know if it works. Upayavira On Wed, Oct 23, 2013, at 02:25 PM, michael.boom wrote: Being given field name=title type=string bindexed=false* stored=true multiValued=false / Changed to field name=title type=string bindexed=true* stored=true multiValued=false / Once the above is done and the collection reloaded, is there a way I can build that index on that field, without reindexing the everything? Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple facet fields in defaults section of a Request Handler
: Now a client wants to use multi select faceting. He calls the following API: : http://localhost:8983/solr/collection1/search?q=*:*facet.field={!ex=foo}categoryfq={!tag=foo}category : :cat : Putting the facet definitions in appends cases it to facet category 2 : times. : : Is there a way where he does not have to provide all the facet.field : parameters in the API call? What you are asking is essentially I want to configure faceting on X and Y by default, but i want clients to be able to add faceting on Z and have that disable faceting on X while still faceting on Y It doens't matter that X and Z are both field facets based arround the field name category -- the tag exclusion makes them completley different. The basic default/invariants/appends logic doesn't give you any easy mechanism to ignore arbitrary params like that - you could probably write a custom component that inspected the params and droped ones you don't want, but this wouldn't make sense as generalized logic in the FacetComponent since faceting on a field both with and w/o a tag expclusion at the same time is a very common use case. -Hoss
Re: Solr subset searching in 100-million document index
Hi Joel, Thanks a lot for the information - I haven't worked with PostFilter's before but found an example at http://java.dzone.com/articles/custom-security-filtering-solr. Will try it over the next few days and come back if still have questions. Thanks again! Keep Walking, ~ Sandeep On Thu, Oct 24, 2013 at 8:25 PM, Joel Bernstein joels...@gmail.com wrote: Sandeep, This type of operation can often be expressed as a PostFilter very efficiently. This is particularly true if the region id's are integer keys. Joel On Thu, Oct 24, 2013 at 7:46 AM, Sandeep Gupta sandy@gmail.com wrote: Hi, We have a Solr index of around 100 million documents with each document being given a region id growing at a rate of about 10 million documents per month - the average document size being aronud 10KB of pure text. The total number of region ids are themselves in the range of 2.5 million. We want to search for a query with a given list of region ids. The number of region ids in this list is usually around 250-300 (most of the time), but can be upto 500, with a maximum cap of around 2000 ids in one request. What is the best way to model such queries besides using an IN param in the query, or using a Filter FQ in the query? Are there any other faster methods available? If it may help, the index is on a VM with 4 virtual-cores and has currently 4GB of Java memory allocated out of 16GB in the machine. The number of queries do not exceed more than 1 per minute for now. If needed, we can throw more hardware to the index - but the index will still be only on a single machine for atleast 6 months. Regards, Sandeep Gupta --
Re: Terms function join with a Select function ?
Dear, humI don't know how can I use it..; I tried: my query: ti:snowboard (3095 results) I would like to have at the end of my XML, the Terms statistic for the field AP (applicant field (patent notice)) but I haven't that... Please help, Bruno /select?q=ti%Asnowboardversion=2.2start=0rows=10indent=onfacet=truef.ap.facet.limit=10 Le 24/10/2013 14:04, Erik Hatcher a écrit : That would be called faceting :) http://wiki.apache.org/solr/SimpleFacetParameters On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote: Dear All, Ok I have an answer concerning the first question (limit) It's the terms.limit parameters. But I can't find how to apply a Terms request on a query result any idea ? Bruno Le 23/10/2013 23:19, Bruno Mannina a écrit : Dear Solr users, I use the Terms function to see the frequency data in a field but it's for the whole database. I have 2 questions: - Is it possible to increase the number of statistic ? actually I have the 10 first frequency term. - Is it possible to limit this statistic to the result of a request ? PS: the second question is very important for me. Many thanks --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Re: Terms function join with a Select function ?
humm facet perfs are very bad (Solr 3.6.0) My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram) I thought facets will work only on the result but it seems it's not the case. My request: http://localhost:2727/solr/select?q=ti:snowboardrows=0facet=truefacet.field=apfacet.limit=5 Do you think my request is wrong ? Maybe it's not possible to have statistic on a field (like Terms function) on a query. Thx for your help, Bruno Le 24/10/2013 19:40, Bruno Mannina a écrit : Dear, humI don't know how can I use it..; I tried: my query: ti:snowboard (3095 results) I would like to have at the end of my XML, the Terms statistic for the field AP (applicant field (patent notice)) but I haven't that... Please help, Bruno /select?q=ti%Asnowboardversion=2.2start=0rows=10indent=onfacet=truef.ap.facet.limit=10 Le 24/10/2013 14:04, Erik Hatcher a écrit : That would be called faceting :) http://wiki.apache.org/solr/SimpleFacetParameters On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote: Dear All, Ok I have an answer concerning the first question (limit) It's the terms.limit parameters. But I can't find how to apply a Terms request on a query result any idea ? Bruno Le 23/10/2013 23:19, Bruno Mannina a écrit : Dear Solr users, I use the Terms function to see the frequency data in a field but it's for the whole database. I have 2 questions: - Is it possible to increase the number of statistic ? actually I have the 10 first frequency term. - Is it possible to limit this statistic to the result of a request ? PS: the second question is very important for me. Many thanks --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Re: Terms function join with a Select function ?
Just a little precision: solr down after running my URL :( so bad... Le 24/10/2013 22:04, Bruno Mannina a écrit : humm facet perfs are very bad (Solr 3.6.0) My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram) I thought facets will work only on the result but it seems it's not the case. My request: http://localhost:2727/solr/select?q=ti:snowboardrows=0facet=truefacet.field=apfacet.limit=5 Do you think my request is wrong ? Maybe it's not possible to have statistic on a field (like Terms function) on a query. Thx for your help, Bruno Le 24/10/2013 19:40, Bruno Mannina a écrit : Dear, humI don't know how can I use it..; I tried: my query: ti:snowboard (3095 results) I would like to have at the end of my XML, the Terms statistic for the field AP (applicant field (patent notice)) but I haven't that... Please help, Bruno /select?q=ti%Asnowboardversion=2.2start=0rows=10indent=onfacet=truef.ap.facet.limit=10 Le 24/10/2013 14:04, Erik Hatcher a écrit : That would be called faceting :) http://wiki.apache.org/solr/SimpleFacetParameters On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote: Dear All, Ok I have an answer concerning the first question (limit) It's the terms.limit parameters. But I can't find how to apply a Terms request on a query result any idea ? Bruno Le 23/10/2013 23:19, Bruno Mannina a écrit : Dear Solr users, I use the Terms function to see the frequency data in a field but it's for the whole database. I have 2 questions: - Is it possible to increase the number of statistic ? actually I have the 10 first frequency term. - Is it possible to limit this statistic to the result of a request ? PS: the second question is very important for me. Many thanks --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Join Query Behavior
We're attempting to upgrade from Solr 4.2 to 4.5 but are finding that 4.5 is not honoring this join query: first part of query... fq={!join from=project_id_i to=project_id_im}user_id_i:65615 -role_id_i:18 type:UserRole last part of query On our Solr 4.2 instance adding/removing that query gives us different (and expected) results, while the query doesn't affect the results at all in 4.5. Is there any known join query behavior differences/fixes between 4.2 and 4.5 that might explain this, or should I be looking at other factors? Thanks, Andy Pickler
Post filter cache question
Hi If I run this query it is very fast (10 ms) because it uses a TopList filter: q=*:* fl=adr_geopoint,adr_city,filterflags *fq=(filterflags:TopList) * and the number of relevant documents are 3000 out of 7 million. If I run the same query but add a spatial filter with cost: q=*:* fl=adr_geopoint,adr_city,filterflags *fq=(filterflags:TopList) * pt=49.594,8.468 sfield=adr_geopoint fq={!bbox d=30} fq={!frange l=15 u=30 *cache=false *cost=200}geodist() It takes over 3 seconds even though it should only scan around 3000 documents from the first cached filter? Could it be a problem with my cache settings in solrconfig.xml (solr 3.1) or is my query wrong? Thanks regards Ericz
Re: measure result set quality
: As a first approach I will evaluate (manually :( ) hits that are out of the : intersection set for every query in each system. Anyway I will keep FYI: LucidWorks has a Relevancy Workbench tool that serves as a simple UI designed explicitly for the purpose of comparing the result sets of from different solr query configurations... http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/ -Hoss
Re: difference between apache tomcat vs Jetty
This is good to know, and I find it welcome advice; I would recommend making sure this advice is clearly highlighted in the relevant Solr docs, such as any getting started docs. I'm not sure everyone realizes this, and some go down tomcat route without realizing the Solr committers recommend jetty -- or use a stock jetty without realizing the 'example' jetty is recommended and actually intended to be used by Solr users in production! I think it's easy to not catch this advice. On 10/20/13 5:55 PM, Shawn Heisey wrote: On 10/20/2013 2:57 PM, Shawn Heisey wrote: We recommend jetty. The solr example uses jetty. I have a clarification for this statement. We actually recommend using the jetty that's included in the Solr 4.x example. It is stripped of all unnecessary features and its config has had some minor tuning so it's optimized for Solr. The jetty binaries in 4.x are completely unmodified from the upstream download, we just don't include all of them. On the 1.x and 3.x examples, there was a small bug in Jetty 6, so those versions included modified binaries. If you download jetty from eclipse.org or install it from your operating system's repository, it will include components you don't need and its config won't be optimized for Solr, but it will still be a lot closer to what's actually tested than tomcat is. Thanks, Shawn
Re: Post filter cache question
: Could it be a problem with my cache settings in solrconfig.xml (solr 3.1) : or is my query wrong? 3.1? ouch ... PostFilter wasn't even added until 3.4... https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters ...so your spatial filter is definitely being applied to the entire index and then getting cached. . . . Below is what i wrote before i saw that 3.4 comment at the end of your email... : If I run the same query but add a spatial filter with cost: : q=*:* : fl=adr_geopoint,adr_city,filterflags : *fq=(filterflags:TopList) * : pt=49.594,8.468 : sfield=adr_geopoint : fq={!bbox d=30} : fq={!frange l=15 u=30 *cache=false *cost=200}geodist() : : It takes over 3 seconds even though it should only scan around 3000 : documents from the first cached filter? You've also added a bbox filter, which will be computed against the entire index and cached. I'm not sure whta FieldType you are using, and i don't know a lot of the detials about hte spatial queries -- but things you should look into... 1) does the bbox gain you anything if you are already doing the geodist filter as a post filter? (my hunch would be that the only point of a bbox fq is if you are *scoring* documents by distance and you want to ignore things beyond a set distance) 2) does {!bbox} support PostFilter on your FieldType? does adding cache=false cost=150 to the bbox filter improve things? -Hoss
Re: difference between apache tomcat vs Jetty
I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair to go as far as saying Jetty is 100% THE recommended container for Solr, or should a recommendation be avoided, and maybe just a list of pros/cons? Cheers, Tim
Re: Reclaiming disk space from (large, optimized) segments
I didn't dig into the details of your mail too much, but a few things jumped out at me... : - At some time in the past, a manual force merge / optimize with : maxSegments=2 was run to troubleshoot high disk i/o and remove too many Have you tried a simple commit using expungeDeletes=true? It should be a little less intensive then a optimizing. (under the covers it does IndexWriter.forceMergeDeletes()) : - Merge policies are all at Solr 4 defaults. Index size is currently ~50M : maxDocs, ~35M numDocs, 276GB. Solr 4 defaults is way to vague to be meaningful: 4.0? 4.1? ... 4.4? Do you mean you are using the example configs that came with that version of Solr, or do you mean you have no mergePolicy configured and you are getting the hardcoded defaults? .. either way it's important to specify exactly which version of Solr are you running and exactly what does your entire indexConfig/ section looks like since both the example configs and the hardcoded default behavior when configs aren't specified have evolved since 4.0-ALPHA. -Hoss
Problem with glassfish and zookeeper 3.4.5
Hi, Glassfish 3.1.2.2 Solr 4.5 Zookeeper 3.4.5 We have set up a SolrCloud with 4 Solr nodes and 3 zookeeper instances. It seems to be working fine from Solr admin page. but when I am trying to connect it to web application using Solrj 4.5. I am creating my Solr Cloud Server as suggested on the wiki page LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer( SOLR_INSTANCE01, SOLR_INSTANCE02, SOLR_INSTANCE03, SOLR_INSTANCE04); solrServer = new CloudSolrServer(zk1:p1, zk2:p1, zk3:p1, lbHttpSolrServer); solrServer.setDefaultCollection(collection); It seems to be working fine for a while even though I am getting a WARNING as below - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: 'XYZ_path/SolrCloud_04/config/login.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. -- The application is deployed on a single node cluster on glassfish. as soon as my application has made some queries to the Solr server it will start throwing error in the solrServer.runQuery() method. The reason of the error is not clear.. Application logs shows following error trace many times... - [#|2013-10-24T14:07:53.750-0700|WARNING|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: 'XYZ_PATH/config/login.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.|#] [#|2013-10-24T14:07:53.750-0700|INFO|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|Opening socket connection to server server_name/IP3:2181|#] [#|2013-10-24T14:07:53.750-0700|INFO|glassfish3.1.2|org.apache.solr.common.cloud.ConnectionManager|_ThreadID=1435;_ThreadName=Thread-2;|Watcher org.apache.solr.common.cloud.ConnectionManager@187eaada name:ZooKeeperConnection Watcher:IP1:2181,IP2:2181,IP3:2181 got event WatchedEvent state:AuthFailed type:None path:null path:null type:None|#] [#|2013-10-24T14:07:53.750-0700|INFO|glassfish3.1.2|org.apache.solr.common.cloud.ConnectionManager|_ThreadID=1435;_ThreadName=Thread-2;|Client-ZooKeeper status change trigger but we are already closed|#] [#|2013-10-24T14:07:53.751-0700|INFO|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|Socket connection established to server_name/IP3:2181, initiating session|#] [#|2013-10-24T14:07:53.751-0700|INFO|glassfish3.1.2|org.apache.solr.common.cloud.ConnectionManager|_ThreadID=1420;_ThreadName=Thread-2;|Watcher org.apache.solr.common.cloud.ConnectionManager@4ba50169 name:ZooKeeperConnection Watcher:IP1:2181,IP2:2181,IP3:2181 got event WatchedEvent state:Disconnected type:None path:null path:null type:None|#] [#|2013-10-24T14:07:53.751-0700|WARNING|glassfish3.1.2|org.apache.zookeeper.ClientCnxn|_ThreadID=1434;_ThreadName=Thread-2;|Session 0x0 for serverserver_name/IP3:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:166) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) |#] -- before this happen the zookeeper logs on all the 3 instances starts showing following warning 2013-10-24 14:05:55,200 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /IP_APPLICATION_SEVER - max is 200 it means that my application is making too many connections with the zookeeper and it is exceeding the limit which is set to 200. Is there a way I can control the number of connections my application is making with the zookeeper. The only component which is connecting to zookeeper in my application is CloudSolrServer object. As per my investigation SASL warning is related to a existing bug in Zookeeper 3.4.5 and is being solved for Zookeeper 3.5 and it should not cause this issue I need help and guidance.. Thanks, Kaustubh -- View this message in context:
Re: Problem with glassfish and zookeeper 3.4.5
On 10/24/2013 4:30 PM, kaustubh147 wrote: Glassfish 3.1.2.2 Solr 4.5 Zookeeper 3.4.5 We have set up a SolrCloud with 4 Solr nodes and 3 zookeeper instances. It seems to be working fine from Solr admin page. but when I am trying to connect it to web application using Solrj 4.5. I am creating my Solr Cloud Server as suggested on the wiki page LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer( SOLR_INSTANCE01, SOLR_INSTANCE02, SOLR_INSTANCE03, SOLR_INSTANCE04); solrServer = new CloudSolrServer(zk1:p1, zk2:p1, zk3:p1, lbHttpSolrServer); solrServer.setDefaultCollection(collection); If this is what you are seeing as instructions for connecting from SolrJ to SolrCloud, then something's really screwy. Can you give me the URL that shows this, so I can see about getting it changed? The following code example is how you should be doing that. For this example, zookeeper is using the default port of 2181 and the zookeeper hosts are zoo1, zoo2, and zoo3. String zkHost = zoo1:2181,zoo2:2181,zoo3:2181; // If you are using a chroot, use something like this instead: // String zkHost = zoo1:2181,zoo2:2181,zoo3:2181/chroot; CloudSolrServer server = new CloudSolrServer(zkHost); server.setDefaultCollection(collection1); It seems to be working fine for a while even though I am getting a WARNING as below - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: 'XYZ_path/SolrCloud_04/config/login.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. -- Later you said this sounds like a bug you saw in ZK 3.4.5. It may be that glassfish turns on some system-wide setting related to authentication that zookeeper picks up on. I would tend to agree that this probably is not related to the other problems mentioned below. before this happen the zookeeper logs on all the 3 instances starts showing following warning 2013-10-24 14:05:55,200 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /IP_APPLICATION_SEVER - max is 200 it means that my application is making too many connections with the zookeeper and it is exceeding the limit which is set to 200. Are you creating one CloudSolrServer object (static would be OK) and using it for all interaction with SolrCloud, or are you creating many CloudSolrServer objects over the life of your application? Is there more than one thread or instance of your application running, and each one has its own CloudSolrServer object? It is strongly recommended that you only create one object for your entire application and use it for all queries, updates, etc. You can set the collection parameter on each query or request object that you use, if you need to use more than one. If you *DO* create many CloudSolrServer objects over the life of your application and cannot immediately change your code so that it uses one object, be sure to shutdown() each one when it is no longer required. Depending on the exact nature of your application, you may also need to increase the maximum number of connections allowed in your zookeeper config. Thanks, Shawn
Re: difference between apache tomcat vs Jetty
Thought you may want to have a look at this: https://issues.apache.org/jira/browse/SOLR-4792 P.S: There are no timelines for 5.0 for now, but it's the future nevertheless. On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt t...@elementspace.comwrote: I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair to go as far as saying Jetty is 100% THE recommended container for Solr, or should a recommendation be avoided, and maybe just a list of pros/cons? Cheers, Tim -- Anshum Gupta http://www.anshumgupta.net
Re: difference between apache tomcat vs Jetty
Hmm, thats an interesting move. I'm on the fence on that one but it surely simplifies some things. Good info, thanks! Tim On 24 October 2013 16:46, Anshum Gupta ans...@anshumgupta.net wrote: Thought you may want to have a look at this: https://issues.apache.org/jira/browse/SOLR-4792 P.S: There are no timelines for 5.0 for now, but it's the future nevertheless. On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt t...@elementspace.com wrote: I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair to go as far as saying Jetty is 100% THE recommended container for Solr, or should a recommendation be avoided, and maybe just a list of pros/cons? Cheers, Tim -- Anshum Gupta http://www.anshumgupta.net
Re: Post filter cache question
Hi Chris Thank you for your response. I will try to migrate to Solr 4.4 first! Best regards On Thu, Oct 24, 2013 at 10:44 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Could it be a problem with my cache settings in solrconfig.xml (solr 3.1) : or is my query wrong? 3.1? ouch ... PostFilter wasn't even added until 3.4... https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters ...so your spatial filter is definitely being applied to the entire index and then getting cached. . . . Below is what i wrote before i saw that 3.4 comment at the end of your email... : If I run the same query but add a spatial filter with cost: : q=*:* : fl=adr_geopoint,adr_city,filterflags : *fq=(filterflags:TopList) * : pt=49.594,8.468 : sfield=adr_geopoint : fq={!bbox d=30} : fq={!frange l=15 u=30 *cache=false *cost=200}geodist() : : It takes over 3 seconds even though it should only scan around 3000 : documents from the first cached filter? You've also added a bbox filter, which will be computed against the entire index and cached. I'm not sure whta FieldType you are using, and i don't know a lot of the detials about hte spatial queries -- but things you should look into... 1) does the bbox gain you anything if you are already doing the geodist filter as a post filter? (my hunch would be that the only point of a bbox fq is if you are *scoring* documents by distance and you want to ignore things beyond a set distance) 2) does {!bbox} support PostFilter on your FieldType? does adding cache=false cost=150 to the bbox filter improve things? -Hoss
First test cloud error question...
Background: all testing done on a Win7 platform. This is my first migration from a single Solr server to a simple cloud. Everything is configured exactly as specified in the wiki. I created a simple 3-node client, all localhost with different server URLs, and a lone external zookeeper. The online admin shows they are all up. I then start an agent which sends in documents to bootstrap the index. That's when issues start. A clip from the log shows this: First, I create a SolrDocument with this JSON data: DEBUG 2013-10-24 18:00:09,143 [main] - SolrCloudClient.mapToDocument- {locator:EssayNodeType,smallIcon:\/images\/cogwheel.png,subOf:[NodeType],details:[The TopicQuests NodeTypes typology essay type.],isPrivate:false,creatorId:SystemUser,label:[Essay Type],largeIcon:\/images\/cogwheel_sm.png,lastEditDate:Thu Oct 24 18:00:09 PDT 2013,createdDate:Thu Oct 24 18:00:09 PDT 2013} Then, send it in from SolrJ which has a CloudSolrServer initialized with localhost:2181 and an instance of LBHttpSolrServer initialized with http://localhost:8983/solr/ That trace follows INFO 2013-10-24 18:00:09,145 [main] - Initiating client connection, connectString=localhost:2181 sessionTimeout=1 watcher=org.apache.solr.common.cloud.ConnectionManager@e6c INFO 2013-10-24 18:00:09,148 [main] - Waiting for client to connect to ZooKeeper INFO 2013-10-24 18:00:09,150 [main-SendThread(0:0:0:0:0:0:0:1:2181)] - Opening socket connection to server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) ERROR 2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)] - Unable to open socket to 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2181 WARN 2013-10-24 18:00:09,151 [main-SendThread(0:0:0:0:0:0:0:1:2181)] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.SocketException: Address family not supported by protocol family: connect at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(Unknown Source) at org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:266) I can watch the Zookeeper console running; it's mostly complaining about too many connections from /127.0.0.1 ; I am seeing the errors in the agent's log file. Following that trace in the log is this: INFO 2013-10-24 18:00:09,447 [main-SendThread(127.0.0.1:2181)] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) INFO 2013-10-24 18:00:09,448 [main-SendThread(127.0.0.1:2181)] - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] - Session establishment request sent on 127.0.0.1/127.0.0.1:2181 DEBUG 2013-10-24 18:00:09,449 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration INFO 2013-10-24 18:00:09,501 [main-SendThread(127.0.0.1:2181)] - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x141ece7e6160017, negotiated timeout = 1 INFO 2013-10-24 18:00:09,501 [main-EventThread] - Watcher org.apache.solr.common.cloud.ConnectionManager@42bad8a8 name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO 2013-10-24 18:00:09,502 [main] - Client is connected to ZooKeeper DEBUG 2013-10-24 18:00:09,502 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,502 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,503 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,503 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,504 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,504 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,505 [main-SendThread(127.0.0.1:2181)] - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG 2013-10-24 18:00:09,506 [main-SendThread(127.0.0.1:2181)] - Reading reply sessionid:0x141ece7e6160017, packet:: clientPath:null serverPath:null finished:false header:: 1,3 replyHeader:: 1,541,0 request:: '/clusterstate.json,F response::
Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)
Hey Solr-users, I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105 million records) and a lot of daily churn of newly indexed files (auto softcommit and commits). I'm trying to bring another matching node into the mix, and am getting these errors on the new node: org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). On the old server, still running, I'm getting: shard update error StdNode: http://server1:/solr/collection/:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://server2:/solr/collection the new core never actually comes online, stays in recovery mode. The other two tiny cores (100,000+ records each and not updated frequently), work just fine. is this SOLR-4327 bug? https://issues.apache.org/jira/browse/SOLR-5331 And if so, how can I get the new node up and running so I can get back in production with some redundancy and speed? I'm running an external zookeeper, and that is all running just fine. Also internal Solrj/jetty with little to no modifications. Any ideas would be appreciated, thanks, M.
Solr indexing on email mime body and attachment
Hi, I am integrating solr search engine with my email clients. I am sending POST request to Solr using REST. I am successfully able to post email's to, from, subject etc headers to solr for making index. Since email can have mime type bodies and attachments so I am not able to understand how to post the email body and attachment so that solr could make indexing. Any help is highly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-on-email-mime-body-and-attachment-tp4097692.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reclaiming disk space from (large, optimized) segments
Only skimmed your email, but purge every 4 hours jumped out at me. Would it make sense to have time-based indices that can be periodically dropped instead of being purged? Otis Solr ElasticSearch Support http://sematext.com/ On Oct 23, 2013 10:33 AM, Scott Lundgren scott.lundg...@carbonblack.com wrote: *Background:* - Our use case is to use SOLR as a massive FIFO queue. - Document additions and updates happen continuously. - Documents are being added at sustained a rate of 50 - 100 documents per second. - About 50% of these document are updates to existing docs, indexed using atomic updates: the original doc is thus deleted and re-added. - There is a separate purge operation running every four hours that deletes the oldest docs, if required based on a number of unrelated configuration parameters. - At some time in the past, a manual force merge / optimize with maxSegments=2 was run to troubleshoot high disk i/o and remove too many segments as a potential variable. Currently, the largest fdts are 74G and 43G. There are 47 total segments, the largest other sizes are all around 2G. - Merge policies are all at Solr 4 defaults. Index size is currently ~50M maxDocs, ~35M numDocs, 276GB. *Issue:* The background purge operation is deleting docs on schedule, but the disk space is not being recovered. *Presumptions:* I presume, but have not confirmed (how?) the 15M deleted documents are predominately in the two large segments. Because they are largely in the two large segments, and those large segments still have (some/many) live documents, the segment backing files are not deleted. *Questions:* - When will those segments get merged and documents recovered? Does it happen when _all_ the documents in those segments are deleted? Some percentage of the segment is filled with deleted documents? - Is there a way to do it right now vs. just waiting? - In some cases, the purge delete conditional is _just_ free disk space: when index free space, delete oldest. Those setups are now in scenarios where index free space, and getting worse. How does low disk space effect above two questions? - Is there a way for me to determine stats on a per-segment basis? - for example, how many deleted documents in a particular segment? - On the flip side, can I determine in what segment a particular document is located? Thank you, Scott -- Scott Lundgren Director of Engineering Carbon Black, Inc. (210) 204-0483 | scott.lundg...@carbonblack.com