solr cloud index corruption
Hi, We are frequently getting issues of index corruption on the cloud, this used to not happen in our master slave setup with solr 3.6. I have tried to check the logs, but don't see an exact reason. I have run the index checker and it recovers, but I am not able to understand as to why this is happening. Any pointers would help. regards, rohit
Re: solr cloud index corruption
Hi, Maybe you can describe how you are using Solr? Which version exactly? Can you share the errors you are seeing? etc. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 9, 2013 at 2:07 AM, Cool Techi cooltec...@outlook.com wrote: Hi, We are frequently getting issues of index corruption on the cloud, this used to not happen in our master slave setup with solr 3.6. I have tried to check the logs, but don't see an exact reason. I have run the index checker and it recovers, but I am not able to understand as to why this is happening. Any pointers would help. regards, rohit
two types of answers in my query
Hi, A general question: Let's say I have Car And CarParts 1:n relation. And I have discovered that the user had entered in the search field instead of car name - a part serial number (SKU). (I discovered it useing regex) Is there a way to fetch different types of answers in Solr? Is there a way to fetch mixed types in the answers? Is there something similiar to that and how is that feature called? Thank you.
Re: two types of answers in my query
On 9 July 2013 12:08, Mysurf Mail stammail...@gmail.com wrote: Hi, A general question: Let's say I have Car And CarParts 1:n relation. And I have discovered that the user had entered in the search field instead of car name - a part serial number (SKU). (I discovered it useing regex) Is there a way to fetch different types of answers in Solr? Is there a way to fetch mixed types in the answers? Is there something similiar to that and how is that feature called? Your description is not clear enough. What do you mean by different types of answers, and mixed types? Assuming that you want to have a different query, or multiple different queries, when you deduce on the front-end that the user might have entered a part number instead of a name, you will need to change the query/queries going to Solr, and collate the results. Regards, Gora
dataDir not being stored in solr.xml
I am migrating from solr 3.6 to 4.3.1. Using the core create rest call, something like: http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo I am able to add data to the index it creates within the /home/solrdata/foo directory and search it. The solr config however does not contain the dataDir path. When the process is restarted the dataDir is set to /home/solrdata and not /home/solrdata/foo. Now if I create the index, index some docs, stop the process, manually edit the solr.xml to include dataDir search works. I am not sure but it seems that in the following class dataDir is not persisted in a case that looks like it is work in progress for solr 5.0. CoreContainer.addPersistOneCore I also played with passing properties in the create args of the form: property.dataDir=/home/solrdata/foo That didnt seem to help but I may not be understanding the exact property syntax. Any clues? Cheers C
Re: Solr limitations
5. No more than 32 nodes in your SolrCloud cluster. I hope this isn't too OT, but what tradeoffs is this based on? Would have thought it easy to hit this number for a big index and high load (hence with the view of both the number of shards and replicas horizontally scaling..) 6. Don't return more than 250 results on a query. None of those is a hard limit, but don't go beyond them unless your Proof of Concept testing proves that performance is acceptable for your situation. Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary tests and then scale as needed. Dynamic and multivalued fields? Try to stay away from them - excepts for the simplest cases, they are usually an indicator of a weak data model. Sure, it's fine to store a relatively small number of values in a multivalued field (say, dozens of values), but be aware that you can't directly access individual values, you can't tell which was matched on a query, and you can't coordinate values between multiple multivalued fields. Except for very simple cases, multivalued fields should be flattened into multiple documents with a parent ID. Since you brought up the topic of dynamic fields, I am curious how you got the impression that they were a good technique to use as a starting point. They're fine for prototyping and hacking, and fine when used in moderation, but not when used to excess. The whole point of Solr is searching and searching is optimized within fields, not across fields, so having lots of dynamic fields is counter to the primary strengths of Lucene and Solr. And... schemas with lots of dynamic fields tend to be difficult to maintain. For example, if you wanted to ask a support question here, one of the first things we want to know is what your schema looks like, but with lots of dynamic fields it is not possible to have a simple discussion of what your schema looks like. Sure, there is something called schemaless design (and Solr supports that in 4.4), but that's very different from heavy reliance on dynamic fields in the traditional sense. Schemaless design is A-OK, but using dynamic fields for arrays of data in a single document is a poor match for the search features of Solr (e.g., Edismax searching across multiple fields.) One other tidbit: Although Solr does not enforce naming conventions for field names, and you can put special characters in them, there are plenty of features in Solr, such as the common fl parameter, where field names are expected to adhere to Java naming rules. When people start going wild with dynamic fields, it is common that they start going wild with their names as well, using spaces, colons, slashes, etc. that cannot be parsed in the fl and qf parameters, for example. Please don't go there! In short, put up a small cluster and start doing a Proof of Concept cluster. Stay within my suggested guidelines and you should do okay. -- Jack Krupansky -Original Message- From: Marcelo Elias Del Valle Sent: Monday, July 08, 2013 9:46 AM To: solr-user@lucene.apache.org Subject: Solr limitations Hello everyone, I am trying to search information about possible solr limitations I should consider in my architecture. Things like max number of dynamic fields, max number o documents in SolrCloud, etc. Does anyone know where I can find this info? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Calculating Solr document score by ignoring the boost field.
Greetings, I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on its own boost field to my Solr schema field name=boost type=float stored=true indexed=false/ Now due to some reason I always get boost = 0.0 and due to this my Solr's document score is also always 0.0. Is there any way in Solr that it ignores the boost field's value for its document's score calculation ? Regards, Khan
Field not available on Edimax query
Hello to all, I load solr by data-import. I add in db_data_config.xml inside the product entity the tag entity as follow : | | entity name=product_tags query=select t.name as tags, id_product FROM ps_product_tag as pt JOIN ps_tag as t ON pt.id_tag =t.id_tag AND t.id_lang=2 WHERE id_product='${product.id_product}' parentDeltaQuery=select id_product as id from ps_product where id_product=${product_features.id_product} field column=tags name=tag / /entity /entitiy //main product entity close shema.xml : field name=tag type=text_fr indexed=true stored=true multiValued=true / When I use a comon select query I get the field tag and his values . However when i use edimax query with the following details, I'm not able to retreive the field tag. And it seems that it is not taken in match score too. The edimax qf parameters are : qf=id^1.0 ref^9.0 name^6.0 descriptif^1.0 cat^7.0 brand^5.0 fphonetic^5.0 tag^7.0 features^3.0 q.alt=*:* Could you help me to understand why ? Regards David
Re: Restrict/change numFound solr result
Hi Erick, thanks for reply, I am doing the same thing already. But for paging calculation i am depending on numFound=120 value. That result i want .(result name=response numFound=120 start=0) thanks aniljayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Restrict-change-numFound-solr-result-tp4075882p4076485.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3 Pivot Performance Issue
Hi Jack, Thanks for your answer. I upgraded Solr from 4.0.0 (LUCENE_40) to 4.3.0 (LUCENE_43), and later to solr 4.3.1. As result the pivot queries I had already running against solr 4.0.0 that were taking a few milisecs (100ms, 150ms), now, with solr 4.3.1, are taking arround 13 secs. An index optimization reduce the index size, and the time needed to 9 secs but is still far from the time we had before. I would like to avoid a full reindex, and, as far as I read in the documentation, it isn't really needed if the major version doesn't changed. Is there something I missed? Is somebody facing the same problem? Thanks Francisco On Tue, Jul 2, 2013 at 2:35 PM, Jack Krupansky-2 [via Lucene] ml-node+s472066n407467...@n3.nabble.com wrote: What is the nature of your degradation? -- Jack Krupansky -Original Message- From: solrUserJM Sent: Tuesday, July 02, 2013 4:22 AM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4074679i=0 Subject: Solr 4.3 Pivot Performance Issue Hi There, I notice with the upgrade from solr 4.0 to solr 4.3 that we had a degradation of queries that are using pivot fields. Have someone else notice it too? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-tp4074617.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-tp4074617p4074679.html To unsubscribe from Solr 4.3 Pivot Performance Issue, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4074617code=ZnJhbmNpc2NvLnNwYWV0aEBnbWFpbC5jb218NDA3NDYxN3wtMTE4NTkyMjQ4 . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Francisco Späth -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-tp4074617p4076516.html Sent from the Solr - User mailing list archive at Nabble.com.
Phrase search without stopwords
Hi solr-user!!! I have an issue I want to know that is it possible to implement StopwordFilterFactory with KeywordTokenizer? example I have multiple title : 1)title:Canadian journal of information and library science 2)title:Canadian information of science 3)title:Southern information and library science what I want is if i search for q=title:Canadian information of science OR q=title:Canadian information science My output should be only title no. 2,i.e Canadian information of science. my schema.xml is: fieldType name=itext class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType field name=title type=itext indexed=true stored=true required=false multiValued=false / Then exact search is working but search without stopwords is not working and if I use WhitespaceTokenizer instead of KeywordTokenizer then search without stopwords is working but all the 3 title are coming as output.Please reply ASAP. -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase search without stopwords
Hi Parul, You might find this useful : https://github.com/cominvent/exactmatch/ From: Parul Gupta(Knimbus) parulgp...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, July 9, 2013 12:03 PM Subject: Phrase search without stopwords Hi solr-user!!! I have an issue I want to know that is it possible to implement StopwordFilterFactory with KeywordTokenizer? example I have multiple title : 1)title:Canadian journal of information and library science 2)title:Canadian information of science 3)title:Southern information and library science what I want is if i search for q=title:Canadian information of science OR q=title:Canadian information science My output should be only title no. 2,i.e Canadian information of science. my schema.xml is: fieldType name=itext class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType field name=title type=itext indexed=true stored=true required=false multiValued=false / Then exact search is working but search without stopwords is not working and if I use WhitespaceTokenizer instead of KeywordTokenizer then search without stopwords is working but all the 3 title are coming as output.Please reply ASAP. -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataDir not being stored in solr.xml
There's been a lot of action around this recently, this is a known issue in 4.3.1. The short form is it should all be better in Solr 4.4 which may be out in the next couple of weeks, assuming we can get agreement. But look at Solr-4862, 4910, 4982 and related if you want to see the ugly details. Best Erick On Tue, Jul 9, 2013 at 3:50 AM, Chris Collins ch...@geekychris.com wrote: I am migrating from solr 3.6 to 4.3.1. Using the core create rest call, something like: http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo I am able to add data to the index it creates within the /home/solrdata/foo directory and search it. The solr config however does not contain the dataDir path. When the process is restarted the dataDir is set to /home/solrdata and not /home/solrdata/foo. Now if I create the index, index some docs, stop the process, manually edit the solr.xml to include dataDir search works. I am not sure but it seems that in the following class dataDir is not persisted in a case that looks like it is work in progress for solr 5.0. CoreContainer.addPersistOneCore I also played with passing properties in the create args of the form: property.dataDir=/home/solrdata/foo That didnt seem to help but I may not be understanding the exact property syntax. Any clues? Cheers C
Re: [Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest
According to code, at least in Solr 4.2, getParams of CoreAdminRequest.Unload returns locally created ModifiableSolrParams. It means that parameters that are set in such way won't be received in CoreAdminHandler. I'm going to open an issue in Jira and provide a patch for this. Best regards, Lyuba On Fri, Jul 5, 2013 at 6:12 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: SolrJ doesn't have explicit support for that param but you can always add it yourself. For example: CoreAdminRequest.Unload req = new CoreAdminRequest.Unload(false); ((ModifiableSolrParams) req.getParams()).set(deleteInstanceDir, true); req.process(server); On Thu, Jul 4, 2013 at 12:50 PM, Lyuba Romanchuk lyuba.romanc...@gmail.com wrote: Hi, I need to unload core with deleting instance directory of the core. According to code of Solr4.2 I don't see the support for this parameter in solrj. Is there the fix or open issue for this? Best regards, Lyuba -- Regards, Shalin Shekhar Mangar.
Re: Solr limitations
I think Jack was mostly thinking in slam dunk terms. I know of SolrCloud demo clusters with 500+ nodes, and at that point people said it's going to work for our situation, we don't need to push more. As you start getting into that kind of scale, though, you really have a bunch of ops considerations etc. Mostly when I get into larger scales I pretty much want to examine my assumptions and see if they're correct, perhaps start to trim my requirements etc. FWIW, Erick On Tue, Jul 9, 2013 at 4:07 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: 5. No more than 32 nodes in your SolrCloud cluster. I hope this isn't too OT, but what tradeoffs is this based on? Would have thought it easy to hit this number for a big index and high load (hence with the view of both the number of shards and replicas horizontally scaling..) 6. Don't return more than 250 results on a query. None of those is a hard limit, but don't go beyond them unless your Proof of Concept testing proves that performance is acceptable for your situation. Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary tests and then scale as needed. Dynamic and multivalued fields? Try to stay away from them - excepts for the simplest cases, they are usually an indicator of a weak data model. Sure, it's fine to store a relatively small number of values in a multivalued field (say, dozens of values), but be aware that you can't directly access individual values, you can't tell which was matched on a query, and you can't coordinate values between multiple multivalued fields. Except for very simple cases, multivalued fields should be flattened into multiple documents with a parent ID. Since you brought up the topic of dynamic fields, I am curious how you got the impression that they were a good technique to use as a starting point. They're fine for prototyping and hacking, and fine when used in moderation, but not when used to excess. The whole point of Solr is searching and searching is optimized within fields, not across fields, so having lots of dynamic fields is counter to the primary strengths of Lucene and Solr. And... schemas with lots of dynamic fields tend to be difficult to maintain. For example, if you wanted to ask a support question here, one of the first things we want to know is what your schema looks like, but with lots of dynamic fields it is not possible to have a simple discussion of what your schema looks like. Sure, there is something called schemaless design (and Solr supports that in 4.4), but that's very different from heavy reliance on dynamic fields in the traditional sense. Schemaless design is A-OK, but using dynamic fields for arrays of data in a single document is a poor match for the search features of Solr (e.g., Edismax searching across multiple fields.) One other tidbit: Although Solr does not enforce naming conventions for field names, and you can put special characters in them, there are plenty of features in Solr, such as the common fl parameter, where field names are expected to adhere to Java naming rules. When people start going wild with dynamic fields, it is common that they start going wild with their names as well, using spaces, colons, slashes, etc. that cannot be parsed in the fl and qf parameters, for example. Please don't go there! In short, put up a small cluster and start doing a Proof of Concept cluster. Stay within my suggested guidelines and you should do okay. -- Jack Krupansky -Original Message- From: Marcelo Elias Del Valle Sent: Monday, July 08, 2013 9:46 AM To: solr-user@lucene.apache.org Subject: Solr limitations Hello everyone, I am trying to search information about possible solr limitations I should consider in my architecture. Things like max number of dynamic fields, max number o documents in SolrCloud, etc. Does anyone know where I can find this info? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: Calculating Solr document score by ignoring the boost field.
My guess is that you're not really passing on the boost field's value and getting the default. Don't quite know how I'd track that down though Best Erick On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com wrote: Greetings, I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on its own boost field to my Solr schema field name=boost type=float stored=true indexed=false/ Now due to some reason I always get boost = 0.0 and due to this my Solr's document score is also always 0.0. Is there any way in Solr that it ignores the boost field's value for its document's score calculation ? Regards, Khan
Re: Restrict/change numFound solr result
No, there's no good way to make Solr return numFound=120 when there are 540 (or whatever) records. Why do you care? If you need to stop at 120, just stop at 120 and ignore the numFound. If you need to display the 120 to the end user even if there are more docs, just do that. Best Erick On Tue, Jul 9, 2013 at 2:33 AM, aniljayanti aniljaya...@yahoo.co.in wrote: Hi Erick, thanks for reply, I am doing the same thing already. But for paging calculation i am depending on numFound=120 value. That result i want .(result name=response numFound=120 start=0) thanks aniljayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Restrict-change-numFound-solr-result-tp4075882p4076485.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Live Nodes not updating immediately
Hi, I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud setup in 3 machines. If I kill a node running in any of the machine using /kill -9/, status of the killed node is not updating immediately in web console of solr. I takes hardly /20+ mins/ to mark that as Gone node. My question is 1. Why does it takes so much time to update the status of the inactive node. 2. And if the leader node itself is killed means, i cant able to use the service till the status of the node gets updated. Thanks in advance Ranjith Venkatesan -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560.html Sent from the Solr - User mailing list archive at Nabble.com.
Document count mismatch
I've run a command to find term counts at my index: solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on it gives me a result like that: ... result name=response numFound=3245092 start=0 maxScore=1.0/result ... lst name=teno int name=lev3107206/int int name=tenu59821/int ... when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however * numFound=3245092 *how it comes? *PS:* Returned list has 100 elements. Does Solr returns max 100 elements for such kind of situations?
Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7
Am 05.07.2013 um 16:36 schrieb Shalin Shekhar Mangar: Okay so just for the rest of the people who dig up this thread. You had to put all the extra jar files required by typo3 into WEB-INF/lib to make this work. Is that right? Maybe this works aswell but I'd put it in a directory called lib within the core's folder. That way it is loaded automatically, too, says the example solrconfig.xml: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml Cheers, Michael Am 05.07.2013 um 16:36 schrieb Shalin Shekhar Mangar: Okay so just for the rest of the people who dig up this thread. You had to put all the extra jar files required by typo3 into WEB-INF/lib to make this work. Is that right? On Fri, Jul 5, 2013 at 8:03 PM, Michael Bakonyi kont...@mb-neuemedien.de wrote: Hi Shalin, Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar: There are plenty of use-cases for having multiple cores. You may have two different schemas for two different kind of documents. Perhaps you are indexing content in multiple languages and you may want a core per language. In SolrCloud, a node can have multiple cores to support more than one shard on the same box. alright, so it depends on the use case. I guess for me the different use cases will be combinations of domain.tld and language. But for me this is far future I think. The Solr war file has all the classes it needs to startup and run (well except for some optional components like DataImportHandler etc) and the SolrInfoMBean is most definitely present in the war file. Enabling or disabling jmx has nothing to do with loading that class. This is what I guessed, too. But I'm neither know Java or Tomcat nor Solr so I tried everything I could. It is very difficult to guess what's wrong with your setup this way. Why don't you try using the example jetty? It works and is well supported and optimized for Solr. Giovanni's guess was right, so this error disappeared luckily. Cheers, Michael Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar: On Thu, Jul 4, 2013 at 4:32 PM, Michael Bakonyi kont...@mb-neuemedien.de wrote: Hi everyone, I'm trying to get the CMS TYPO3 connected with Solr 3.6.2. By now I followed the installation at http://wiki.apache.org/solr/SolrTomcat except that I didn't copy the .war-file into the $SOLR_HOME but referencing to it at a different location via Tomcat Context fragment file. Until then the Solr-Server works – I can reach the GUI via URL. To get Solr connected with the CMS I then created a new core-folder (btw. can anybody give me kind of a live example, when to use different cores? Until now I still don't really understand the concept of cores ..) by duplicating the example-folder in which I overwrote some files (especially solrconfig.xml) with files offered by the TYPO3-community. I also moved the file solr.xml one level up and edited it (added core-fragment and especially adjusted instanceDir) to get a correct multicore-setup like in the example multicore-setup within the downloaded solr-tgz-package. There are plenty of use-cases for having multiple cores. You may have two different schemas for two different kind of documents. Perhaps you are indexing content in multiple languages and you may want a core per language. In SolrCloud, a node can have multiple cores to support more than one shard on the same box. But now I get the Java-exception java.lang.NoClassDefFoundError: org/apache/solr/core/SolrInfoMBean at java.lang.ClassLoader.defineClass1(Native Method) In the Tomcat-log file it is said additionally: Caused by: java.lang.ClassNotFoundException: org.apache.solr.core.SolrInfoMBean. My guess is, that within the new solrconfig.xml there are calls to classes which aren't included correctly. There are some libs, which are included at the top of this file but the paths of the references should be ok as I checked them via Bash: At http://wiki.apache.org/solr/SolrConfigXml it is said that the lib dir= directory is relative to the instanceDir, so this is what I've checked. I also inserted absolute paths but this wasn't successful either. Can anybody give me a hint how to solve this problem? Would be great :) The Solr war file has all the classes it needs to startup and run (well except for some optional components like DataImportHandler etc) and the SolrInfoMBean is most definitely present in the war file. Enabling or disabling jmx has nothing to do with loading that class. It is very difficult to guess what's wrong with your setup this way. Why don't you try using the example jetty? It works and is well supported and optimized for Solr. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Phrase search without stopwords
Hi I solve it by copying the field in a string field type. And query on this field only. Regards David Le 09/07/2013 11:03, Parul Gupta(Knimbus) a écrit : Hi solr-user!!! I have an issue I want to know that is it possible to implement StopwordFilterFactory with KeywordTokenizer? example I have multiple title : 1)title:Canadian journal of information and library science 2)title:Canadian information of science 3)title:Southern information and library science what I want is if i search for q=title:Canadian information of science OR q=title:Canadian information science My output should be only title no. 2,i.e Canadian information of science. my schema.xml is: fieldType name=itext class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType field name=title type=itext indexed=true stored=true required=false multiValued=false / Then exact search is working but search without stopwords is not working and if I use WhitespaceTokenizer instead of KeywordTokenizer then search without stopwords is working but all the 3 title are coming as output.Please reply ASAP. -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field not available on Edimax query
Any suggestion ? Le 09/07/2013 12:29, It-forum a écrit : Hello to all, I load solr by data-import. I add in db_data_config.xml inside the product entity the tag entity as follow : | | entity name=product_tags query=select t.name as tags, id_product FROM ps_product_tag as pt JOIN ps_tag as t ON pt.id_tag =t.id_tag AND t.id_lang=2 WHERE id_product='${product.id_product}' parentDeltaQuery=select id_product as id from ps_product where id_product=${product_features.id_product} field column=tags name=tag / /entity /entitiy //main product entity close shema.xml : field name=tag type=text_fr indexed=true stored=true multiValued=true / When I use a comon select query I get the field tag and his values . However when i use edimax query with the following details, I'm not able to retreive the field tag. And it seems that it is not taken in match score too. The edimax qf parameters are : qf=id^1.0 ref^9.0 name^6.0 descriptif^1.0 cat^7.0 brand^5.0 fphonetic^5.0 tag^7.0 features^3.0 q.alt=*:* Could you help me to understand why ? Regards David
Re: Document count mismatch
1. Try facet.missing=true to count the number of documents that do not have a value for that field. 2. Try facet.limit=n to set the number of returned facet values to a larger or smaller value than the default of 100. 3. Try reading the Faceting chapter of my book! -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09, 2013 8:09 AM To: solr-user@lucene.apache.org Subject: Document count mismatch I've run a command to find term counts at my index: solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on it gives me a result like that: ... result name=response numFound=3245092 start=0 maxScore=1.0/result ... lst name=teno int name=lev3107206/int int name=tenu59821/int ... when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however * numFound=3245092 *how it comes? *PS:* Returned list has 100 elements. Does Solr returns max 100 elements for such kind of situations?
Re: Calculating Solr document score by ignoring the boost field.
I am passing boost value (via nutch) and i.e boost =0.0. But my question is why Solr is showing me score = 0.0 when my boost (index time boost) = 0.0 ? Should not Solr calculate its documents score on the basis of TF-IDF ? And if not how can I make Solr to only consider TF-IDF while calculating document's score ? Regards, Khan On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson erickerick...@gmail.comwrote: My guess is that you're not really passing on the boost field's value and getting the default. Don't quite know how I'd track that down though Best Erick On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com wrote: Greetings, I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on its own boost field to my Solr schema field name=boost type=float stored=true indexed=false/ Now due to some reason I always get boost = 0.0 and due to this my Solr's document score is also always 0.0. Is there any way in Solr that it ignores the boost field's value for its document's score calculation ? Regards, Khan
Re: Calculating Solr document score by ignoring the boost field.
Simple math: x times zero equals zero. That's why the default document boost is 1.0 - score times 1.0 equals score. Any particular reason you wanted to zero out the document score from the document level? -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Tuesday, July 09, 2013 9:23 AM To: solr-user@lucene.apache.org Subject: Re: Calculating Solr document score by ignoring the field. I am passing boost value (via nutch) and i.e boost =0.0. But my question is why Solr is showing me score = 0.0 when my boost (index time boost) = 0.0 ? Should not Solr calculate its documents score on the basis of TF-IDF ? And if not how can I make Solr to only consider TF-IDF while calculating document's score ? Regards, Khan On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson erickerick...@gmail.comwrote: My guess is that you're not really passing on the boost field's value and getting the default. Don't quite know how I'd track that down though Best Erick On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com wrote: Greetings, I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on its own boost field to my Solr schema field name=boost type=float stored=true indexed=false/ Now due to some reason I always get boost = 0.0 and due to this my Solr's document score is also always 0.0. Is there any way in Solr that it ignores the boost field's value for its document's score calculation ? Regards, Khan
Re: Document count mismatch
Ok, one more question. I have another field at my schema: *url*. How can I get urls at each facet? 2013/7/9 Jack Krupansky j...@basetechnology.com 1. Try facet.missing=true to count the number of documents that do not have a value for that field. 2. Try facet.limit=n to set the number of returned facet values to a larger or smaller value than the default of 100. 3. Try reading the Faceting chapter of my book! -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09, 2013 8:09 AM To: solr-user@lucene.apache.org Subject: Document count mismatch I've run a command to find term counts at my index: solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on it gives me a result like that: ... result name=response numFound=3245092 start=0 maxScore=1.0/result ... lst name=teno int name=lev3107206/int int name=tenu59821/int ... when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however * numFound=3245092 *how it comes? *PS:* Returned list has 100 elements. Does Solr returns max 100 elements for such kind of situations?
Re: two types of answers in my query
Usually a car term and a car part term will look radically different. So, simply use the edismax query parser and set qf to be both the car and car part fields. If either matches, the document will be selected. And if you have a type field, you can check that to see if a car or part was matched in the results. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, July 09, 2013 2:38 AM To: solr-user@lucene.apache.org Subject: two types of answers in my query Hi, A general question: Let's say I have Car And CarParts 1:n relation. And I have discovered that the user had entered in the search field instead of car name - a part serial number (SKU). (I discovered it useing regex) Is there a way to fetch different types of answers in Solr? Is there a way to fetch mixed types in the answers? Is there something similiar to that and how is that feature called? Thank you.
Re: Document count mismatch
I don't quite follow the question. Give us an example. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09, 2013 9:37 AM To: solr-user@lucene.apache.org Subject: Re: Document count mismatch Ok, one more question. I have another field at my schema: *url*. How can I get urls at each facet? 2013/7/9 Jack Krupansky j...@basetechnology.com 1. Try facet.missing=true to count the number of documents that do not have a value for that field. 2. Try facet.limit=n to set the number of returned facet values to a larger or smaller value than the default of 100. 3. Try reading the Faceting chapter of my book! -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09, 2013 8:09 AM To: solr-user@lucene.apache.org Subject: Document count mismatch I've run a command to find term counts at my index: solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on it gives me a result like that: ... result name=response numFound=3245092 start=0 maxScore=1.0/result ... lst name=teno int name=lev3107206/int int name=tenu59821/int ... when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however * numFound=3245092 *how it comes? *PS:* Returned list has 100 elements. Does Solr returns max 100 elements for such kind of situations?
Re: Phrase search without stopwords
Hey thanks. Its some what works for me -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527p4076598.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document count mismatch
I've another field at my schema: it is *url*. When I get results as facet I see that there are 3107206 numbers of *lev* (int name=lev3107206/int). However what are the urls of that 3107206 documents? I tried grouping instead of facet: /solr/select/?q=*:*group=truegroup.field=langwt=xmlfl=url and I get only one result for each group. I want to get all of them. on the other hand if I change my query into that: /solr/select/?q=*:*group=truegroup.field=langwt=xmlfl=url* group.query=teno:lev* * * I get that error:* * str name=msgshard 0 did not set sort field values (FieldDoc.fields is null); you must pass fillFields=true to IndexSearcher.search on each shard/strstr name=tracejava.lang.IllegalArgumentException: shard 0 did not set sort field values (FieldDoc.fields is null); you must pass fillFields=true to IndexSearcher.search on each shard at org.apache.lucene.search.TopDocs$MergeSortQueue.init(TopDocs.java:143) at org.apache.lucene.search.TopDocs.merge(TopDocs.java:214) ... 2013/7/9 Jack Krupansky j...@basetechnology.com I don't quite follow the question. Give us an example. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09, 2013 9:37 AM To: solr-user@lucene.apache.org Subject: Re: Document count mismatch Ok, one more question. I have another field at my schema: *url*. How can I get urls at each facet? 2013/7/9 Jack Krupansky j...@basetechnology.com 1. Try facet.missing=true to count the number of documents that do not have a value for that field. 2. Try facet.limit=n to set the number of returned facet values to a larger or smaller value than the default of 100. 3. Try reading the Faceting chapter of my book! -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09, 2013 8:09 AM To: solr-user@lucene.apache.org Subject: Document count mismatch I've run a command to find term counts at my index: solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on it gives me a result like that: ... result name=response numFound=3245092 start=0 maxScore=1.0/result ... lst name=teno int name=lev3107206/int int name=tenu59821/int ... when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however * numFound=3245092 *how it comes? *PS:* Returned list has 100 elements. Does Solr returns max 100 elements for such kind of situations?
Re: Best way to call asynchronously - Custom data import handler
On 7/8/2013 11:10 PM, Learner wrote: I wrote a custom data import handler to import data from files. I am trying to figure out a way to make asynchronous call instead of waiting for the data import response. Is there an easy way to invoke asynchronously (other than using futures and callables) ? public class CustomFileImportHandler extends RequestHandlerBase implements SolrCoreAware{ public void handleRequestBody(SolrQueryRequest arg0, SolrQueryResponse arg1){ indexer a= new indexer(); // constructor String status= a.Index(); // method to do indexing, trying to make it async } } Generally speaking, it's easier to write a separate program than write a Solr plugin, unless you just want to add a tiny tweak to an existing class and not make fundamental changes in how it works. The dataimport handler is designed around a model of starting and frequently checking the status to know whether it's done. For what you want to do, I'd write a subroutine, module, or a separate program using a Solr API for your language that obtains the data from the source and indexes it to Solr directly. This is definitely the preferred method if your code is written in Java, but it's generally the right way to go no matter what language you're using. Thanks, Shawn
Re: Solr Live Nodes not updating immediately
Something is wrong if it actually takes 20 minutes. - Mark On Jul 9, 2013, at 7:43 AM, Ranjith Venkatesan ranjit...@zohocorp.com wrote: Hi, I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud setup in 3 machines. If I kill a node running in any of the machine using /kill -9/, status of the killed node is not updating immediately in web console of solr. I takes hardly /20+ mins/ to mark that as Gone node. My question is 1. Why does it takes so much time to update the status of the inactive node. 2. And if the leader node itself is killed means, i cant able to use the service till the status of the node gets updated. Thanks in advance Ranjith Venkatesan -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there an easy way to know if a Solr cloud node is a shard leader?
I would like to be able to do it without consulting Zookeeper. Is there some variable or API I can call on a specific Solr cloud node to know if it is currently a shard leader? The reason I want to know is I want to perform index backup on the shard leader from a cron job *only* if that node is a shard leader. Bob
Re: Solr Live Nodes not updating immediately
On 7/9/2013 5:43 AM, Ranjith Venkatesan wrote: I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud setup in 3 machines. If I kill a node running in any of the machine using /kill -9/, status of the killed node is not updating immediately in web console of solr. I takes hardly /20+ mins/ to mark that as Gone node. My question is 1. Why does it takes so much time to update the status of the inactive node. 2. And if the leader node itself is killed means, i cant able to use the service till the status of the node gets updated. As Mark said, something is very wrong if it takes 20 minutes for the cloud state to update. I'm wondering why you have done a kill -9 to stop Solr? If running a stop command (or a standard SIGTERM) doesn't properly shut the process down, then you may have some other underlying operating system issue that needs to be solved, and could be causing the node status problem. Thanks, Shawn
Re: Solr Live Nodes not updating immediately
The same scenario happens if network to any one of the machine is unavailable. (i.e if we manually disconnect network cable also, status of the node not gets updated immediately). Pls help me in this issue -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560p4076621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Live Nodes not updating immediately
We are going to use solr in production. There are chances that the machine itself might shutdown due to power failure or the network is disconnected due to manual intervention. We need to address those cases as well to build a robust system.. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560p4076633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Live Nodes not updating immediately
We are going to use solr in production. There are chances that the machine itself might shutdown due to power failure or the network is disconnected due to manual intervention. We need to address those cases as well to build a robust system.. The latest version of Solr is 4.3.1, and 4.4 is right around the corner. Any chance you can test a nightly 4.4 build or a checkout of the lucene_solr_4_4 branch,ji so we can know whether you are running into the same problems with what will be released soon? No sense in fixing a problem that no longer exists. Thanks, Shawn
Re: Field not available on Edimax query
On Tue, Jul 9, 2013 at 6:29 AM, It-forum it-fo...@meseo.fr wrote: However when i use edimax query with the following details, I'm not able to retreive the field tag. And it seems that it is not taken in match score too. You seem to have two problems here. One not matching (use debug flags for that) and one not retrieving. But what do you mean by not retrieving? By default all the fields are returned regardless of the query. So if you are getting it in one but not in another you might be either getting different documents without that field populated or you have explicitly mis-defined which fields to return (with 'fl' parameter). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Solr Hangs During Updates for over 10 minutes
I'll give you the high level before delving deep into setup etc. I have been struggeling at work with a seemingly random problem when solr will hang for 10-15 minutes during updates. This outage always seems to immediately be proceeded by an EOF exception on the replica. Then 10-15 minutes later we see an exception on the leader for a socket timeout to the replica. The leader will then tell the replica to recover which in most cases it does and then the outage is over. Here are the setup details: We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines. We have 2 active collections each with only 1 shard (we have in total about 15 collections but most are empty or have less than 100 docs). The first index (collection1) is 6.5GB and has ~18M documents. The 2nd index (collection2) is 9GB and has about 13M documents. In all cases the leader resides on 1 server and the replica resides on the other. Both servers are AWS XL High Mem instances. (8 CPUs @ 2.67Ghz, 70GB Ram) with the index residing on a 1TB raid 10 using ephemeral storage disks. We are starting solr using the embedded jetty with the following java memory and GC options: -Xmx16382m -Xms4092m -XX:MaxPermSize=256m -Xss256k -XX:NewSize=1536m -XX:SurvivorRatio=16 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=2 -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSParallelRemarkEnabled Both collections receive a constant stream of updates ~10k per hour (both adds/deletes). Approximately once per day the following events transpire: 1. We see a log entry for a distributed update that takes just over 5 ms followed by an EOF write exception on the replica. In all cases this exception is triggered by an update to the 9GB collection. 2. Occasionally we'll see a 503 shard update error on the leader but usually not. 3. Approximately 15 minutes after this exception we see a timeout error for a this distributed update request on the leader. 4. The leader then creates a new connection and tells the replica to recover, which it does and everything is OK again. 5. During the 15 minute window from when the replica throws the EOF until the SocketTimeout by the leader no other updates are processed: ERROR ON REPLICA: Jul 8, 2013 6:38:16 PM org.apache.solr.core.SolrCore execute INFO: [collection2_0] webapp=/solr path=/update params={distrib.from=http://Solr4-1-1.domain.com:8983/solr/collection2_0/update.distrib=FROMLEADERwt=javabinversion=2} status=0 QTime=50012 Jul 8, 2013 6:38:16 PM org.apache.solr.common.SolrException log SEVERE: null:org.eclipse.jetty.io.EofException at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:154) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:101) at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:203) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:196) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:94) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:49) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:404) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at
Re: Is there an easy way to know if a Solr cloud node is a shard leader?
If you call /solr/zookeeper on a specific node, that servlet would tell you - output is a bit verbose for what you want though. - Mark On Jul 9, 2013, at 10:36 AM, Robert Stewart robert_stew...@epam.com wrote: I would like to be able to do it without consulting Zookeeper. Is there some variable or API I can call on a specific Solr cloud node to know if it is currently a shard leader? The reason I want to know is I want to perform index backup on the shard leader from a cron job *only* if that node is a shard leader. Bob
Re: Solr Hangs During Updates for over 10 minutes
On 7/9/2013 9:50 AM, Jed Glazner wrote: I'll give you the high level before delving deep into setup etc. I have been struggeling at work with a seemingly random problem when solr will hang for 10-15 minutes during updates. This outage always seems to immediately be proceeded by an EOF exception on the replica. Then 10-15 minutes later we see an exception on the leader for a socket timeout to the replica. The leader will then tell the replica to recover which in most cases it does and then the outage is over. Here are the setup details: We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines. After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced and have since been fixed. You're five releases and about nine months behind what's current. My recommendation: Upgrade to 4.3.1, ensure your configuration is up to date with changes to the example config between 4.0.0 and 4.3.1, and reindex. Ideally, you should set up a 4.0.0 testbed, duplicate your current problem, and upgrade the testbed to see if the problem goes away. A testbed will also give you practice for a smooth upgrade of your production system. Thanks, Shawn
Staggered Replication In Solr?
Hi, Is staggered replication possible in Solr through configuration? We are concern with the CPU spike (80%) and GC pauses on all the slaves when they try to replicate updated index from repeaters. We havent observed this behavior in v3.5 (Max spike were 50% during replication) In our case we have 8 slaves serving the traffic, and all start replicating the new index at the same time. When the switch for Reader happens after warm-up we see a spike in CPU and at the same time GC pause which causes request on our application to queue and eventually fails. It would be good to have a throttle on master/repeater for max replication request to serve at a given time. Planning to write a script and schedule it which will trigger replication in a staggered fashion so not all slaves are busy replicating. thanks Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Staggered-Replication-In-Solr-tp4076659.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Staggered Replication In Solr?
On 7/9/2013 10:37 AM, adityab wrote: Is staggered replication possible in Solr through configuration? You wouldn't be able to do this directly without switching to completely manually triggered replication, but the concept of a repeater may interest you. http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater You set up a limited number of slaves replicating from your master. Those slaves get also set up as masters, and the rest of your slaves replicate from those, instead of the true master. When the index gets updated, the repeaters do their replication, then the other slaves replicate from the repeaters. Thanks, Shawn
Re: dataDir not being stored in solr.xml
Thanks Erick I made a private patch to the CoreContainer until the real deal. C On Jul 9, 2013, at 4:35 AM, Erick Erickson erickerick...@gmail.com wrote: There's been a lot of action around this recently, this is a known issue in 4.3.1. The short form is it should all be better in Solr 4.4 which may be out in the next couple of weeks, assuming we can get agreement. But look at Solr-4862, 4910, 4982 and related if you want to see the ugly details. Best Erick On Tue, Jul 9, 2013 at 3:50 AM, Chris Collins ch...@geekychris.com wrote: I am migrating from solr 3.6 to 4.3.1. Using the core create rest call, something like: http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo I am able to add data to the index it creates within the /home/solrdata/foo directory and search it. The solr config however does not contain the dataDir path. When the process is restarted the dataDir is set to /home/solrdata and not /home/solrdata/foo. Now if I create the index, index some docs, stop the process, manually edit the solr.xml to include dataDir search works. I am not sure but it seems that in the following class dataDir is not persisted in a case that looks like it is work in progress for solr 5.0. CoreContainer.addPersistOneCore I also played with passing properties in the create args of the form: property.dataDir=/home/solrdata/foo That didnt seem to help but I may not be understanding the exact property syntax. Any clues? Cheers C
Re: Best way to call asynchronously - Custom data import handler
Other than using futures and callables? Runnables ;-) Other than that you will need async request (ie. client). But in case sb else is looking for an easy-recipe for the server-side async: public void handleRequestBody(.) { if (isBusy()) { rsp.add(message, Batch processing is already running...); rsp.add(status, busy); return; } runAsynchronously(new LocalSolrQueryRequest(req.getCore(), req.getParams())); } private void runAsynchronously(SolrQueryRequest req) { final SolrQueryRequest request = req; thread = new Thread(new Runnable() { public void run() { try { while (queue.hasMore()) { runSynchronously(queue, request); } } catch (Exception e) { log.error(e.getLocalizedMessage()); } finally { request.close(); setBusy(false); } } }); thread.start(); } On Tue, Jul 9, 2013 at 1:10 AM, Learner bbar...@gmail.com wrote: I wrote a custom data import handler to import data from files. I am trying to figure out a way to make asynchronous call instead of waiting for the data import response. Is there an easy way to invoke asynchronously (other than using futures and callables) ? public class CustomFileImportHandler extends RequestHandlerBase implements SolrCoreAware{ public void handleRequestBody(SolrQueryRequest arg0, SolrQueryResponse arg1){ indexer a= new indexer(); // constructor String status= a.Index(); // method to do indexing, trying to make it async } } -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-call-asynchronously-Custom-data-import-handler-tp4076475.html Sent from the Solr - User mailing list archive at Nabble.com.
Perl Solr help - doing *:* query
This is primarily to Andy Lester, who wrote the WebService::Solr module on CPAN, but I'll take a response from anyone who knows what I can do. If I use the following Perl code, I get an error. If I try to build some other query besides *:* to request all documents, the script runs, but the query doesn't do what I asked it to do. http://apaste.info/3j3Q How can I use a perl script with a proper Solr API to count the number of documents in my Solr index? I already have a version of my script that parses a JSON response as plain text, but as I have just learned, it's possible to get invalid information out of it. Specifically, the shards.info output has multiple numFound instances in it, which broke my script. The shards.info parameter is in the request handler defaults. I'd like to future-proof it by using actual objects. Thanks, Shawn
Re: Perl Solr help - doing *:* query
On Jul 9, 2013, at 2:48 PM, Shawn Heisey s...@elyograg.org wrote: This is primarily to Andy Lester, who wrote the WebService::Solr module on CPAN, but I'll take a response from anyone who knows what I can do. If I use the following Perl code, I get an error. What error do you get? Never say I get an error. Always say I get this error: . If I try to build some other query besides *:* to request all documents, the script runs, but the query doesn't do what I asked it to do. What DOES it do? http://apaste.info/3j3Q For the sake of future readers, please put your code in the message. This message will get archived, and future people reading the lists will not be able to read the code at some arbitrary paste site. Shawn's code is: use strict; use WebService::Solr; use WebService::Solr::Query; use WebService::Solr::Response; my $url = http://idx.REDACTED.com:8984/solr/ncmain;; my $solr = WebService::Solr-new($url); my $query = WebService::Solr::Query-new(*:*); my $response = $solr-search($query, {'rows' = '0'}); my $numFound = $response-content-{response}-{numFound}; print nf: $numFound\n; xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
replication getting stuck on a file
Hi My solr 3.6.1 slave farm is suddenly getting stuck during replication. It seems to stop on a random file on various slaves (not all) and not continue. I've tried stoping and restarting tomcat etc but some slaves just can't get the index pulled down. Note there is plenty of space on the hard drive. I don't get it. Everything else seems fine. Does this ring a bell for anyone? I have the slaves set for five minute polling intervals. Here is what I see in admin page, it just stays on that one file and won't get past it while the speed steadily averages down to 0kbs: Master http://ssbuyma01:8983/solr/1/replication Latest Index Version:null, Generation: null Replicatable Index Version:1276893670111, Generation: 127205 Poll Interval00:05:00 Local Index Index Version: 1276893670084, Generation: 127202 Location: /var/LucidWorks/lucidworks/solr/1/data/index Size: 23.06 GB Times Replicated Since Startup: 48903 Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013 Current Replication Status Start Time: Tue Jul 09 12:55:00 EDT 2013 Files Downloaded: 59 / 486 Downloaded: 88.73 MB / 23.06 GB [0.0%] Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s Robert (Robi) Petersen Senior Software Engineer Search Department
AW: Solr Hangs During Updates for over 10 minutes
Hi Shawn, I have been trying to duplicate this problem without success for the last 2 weeks which is one reason I'm getting flustered. It seems reasonable to be able to duplicate it but I can't. We do have a story to upgrade but that is still weeks if not months before that gets rolled out to production. We have another cluster running the same version but with 8 shards and 8 replicas with each shard at 100gb and more load and more indexing requests without this problem but we send docs in batches here and all fields are stored. Where as the trouble index has only 1 or 2 stored fields and only send docs 1 at a time. Could that have anything to do with it? Jed Von Samsung Mobile gesendet Ursprüngliche Nachricht Von: Shawn Heisey s...@elyograg.org Datum: 07.09.2013 18:33 (GMT+01:00) An: solr-user@lucene.apache.org Betreff: Re: Solr Hangs During Updates for over 10 minutes On 7/9/2013 9:50 AM, Jed Glazner wrote: I'll give you the high level before delving deep into setup etc. I have been struggeling at work with a seemingly random problem when solr will hang for 10-15 minutes during updates. This outage always seems to immediately be proceeded by an EOF exception on the replica. Then 10-15 minutes later we see an exception on the leader for a socket timeout to the replica. The leader will then tell the replica to recover which in most cases it does and then the outage is over. Here are the setup details: We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines. After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced and have since been fixed. You're five releases and about nine months behind what's current. My recommendation: Upgrade to 4.3.1, ensure your configuration is up to date with changes to the example config between 4.0.0 and 4.3.1, and reindex. Ideally, you should set up a 4.0.0 testbed, duplicate your current problem, and upgrade the testbed to see if the problem goes away. A testbed will also give you practice for a smooth upgrade of your production system. Thanks, Shawn
Re: Perl Solr help - doing *:* query
On 7/9/2013 2:02 PM, Andy Lester wrote: What error do you get? Never say I get an error. Always say I get this error: . This is the actual error when trying *:* : Can't locate object method _struct_ via package WebService::Solr::Query at /usr/local/share/perl/5.14.2/WebService/Solr/Query.pm line 37. If I try to build some other query besides *:* to request all documents, the script runs, but the query doesn't do what I asked it to do. What DOES it do? If I change the query line to this: my $query = WebService::Solr::Query-new({tag_id = '[* TO *]'}); With this, numFound is zero. The tag_id field is my uniqueKey, and is a StrField. When I use Dumper to print out the actual response from this query, it contains the following info: 'q' = '(tag_id:\\[\\* TO \\*\\])', I didn't ask for a phrase search (the quotes) or for escaping on the special query characters. By automatically doing this, it makes complex queries like ranges impossible. Is there something else that should be done for more complex queries? Thanks, Shawn
Deleted Docs
Hello, I am curious about the Deleted Docs: statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some documents that number decreased, eventually to 18 Deleted Docs. I understood these Deleted Docs are from situations where two docs have the same UniqueKey. However my data had way more deleted docs than I expected. I was using a data-generated uniquekey, when I changed to using the UUID generator there were 0 deleted docs. But I just wanted to double check, are there any other cases which would create a Deleted Doc? Thanks so much!! :) Katie
Re: Deleted Docs
Solr (Lucene, actually) will be doing segment merge operations in the background, continually, so generally you won't need to do optimize operations. Generally, an explicit delete and a replace of an existing document are the only two ways that you would get a deleted document. -- Jack Krupansky -Original Message- From: Katie McCorkell Sent: Tuesday, July 09, 2013 5:38 PM To: solr-user@lucene.apache.org Subject: Deleted Docs Hello, I am curious about the Deleted Docs: statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some documents that number decreased, eventually to 18 Deleted Docs. I understood these Deleted Docs are from situations where two docs have the same UniqueKey. However my data had way more deleted docs than I expected. I was using a data-generated uniquekey, when I changed to using the UUID generator there were 0 deleted docs. But I just wanted to double check, are there any other cases which would create a Deleted Doc? Thanks so much!! :) Katie
Re: Deleted Docs
On 7/9/2013 3:38 PM, Katie McCorkell wrote: I am curious about the Deleted Docs: statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some documents that number decreased, eventually to 18 Deleted Docs. I understood these Deleted Docs are from situations where two docs have the same UniqueKey. However my data had way more deleted docs than I expected. I was using a data-generated uniquekey, when I changed to using the UUID generator there were 0 deleted docs. But I just wanted to double check, are there any other cases which would create a Deleted Doc? Changes to deleted documents can happen through normal segment merging. Optimizing is just an explicit and deliberate merge down to a single segment, but segment merging is a normal part of Solr/Lucene indexing. Any deleted documents in segments that get merged will be purged. I believe the UUID generator will always generate a new value even if a document with the same information in the other fields is indexed again. This option should only be used if you do not have an existing field with unique values on every document. Thanks, Shawn
RE: replication getting stuck on a file
Look at the speed and time remaining on this one, pretty funny: Master http://ssbuyma01:8983/solr/1/replication Latest Index Version:null, Generation: null Replicatable Index Version:1276893670202, Generation: 127213 Poll Interval00:05:00 Local Index Index Version: 1276893670108, Generation: 127204 Location: /var/LucidWorks/lucidworks/solr/1/data/index Size: 23.13 GB Times Replicated Since Startup: 48874 Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013 Current Replication Status Start Time: Tue Jul 09 13:12:04 PDT 2013 Files Downloaded: 10 / 538 Downloaded: 1.67 MB / 23.13 GB [0.0%] Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%] Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s -Original Message- From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] Sent: Tuesday, July 09, 2013 1:22 PM To: solr-user@lucene.apache.org Subject: replication getting stuck on a file Hi My solr 3.6.1 slave farm is suddenly getting stuck during replication. It seems to stop on a random file on various slaves (not all) and not continue. I've tried stoping and restarting tomcat etc but some slaves just can't get the index pulled down. Note there is plenty of space on the hard drive. I don't get it. Everything else seems fine. Does this ring a bell for anyone? I have the slaves set for five minute polling intervals. Here is what I see in admin page, it just stays on that one file and won't get past it while the speed steadily averages down to 0kbs: Master http://ssbuyma01:8983/solr/1/replication Latest Index Version:null, Generation: null Replicatable Index Version:1276893670111, Generation: 127205 Poll Interval00:05:00 Local Index Index Version: 1276893670084, Generation: 127202 Location: /var/LucidWorks/lucidworks/solr/1/data/index Size: 23.06 GB Times Replicated Since Startup: 48903 Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013 Current Replication Status Start Time: Tue Jul 09 12:55:00 EDT 2013 Files Downloaded: 59 / 486 Downloaded: 88.73 MB / 23.06 GB [0.0%] Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s Robert (Robi) Petersen Senior Software Engineer Search Department
Re: Solr Hangs During Updates for over 10 minutes
Hi Jed, This is really with Solr 4.0? If so, it may be wiser to jump on 4.4 that is about to be released. We did not have fun working with 4.0 in SolrCloud mode a few months ago. You will save time, hair, and money if you convince your manager to let you use Solr 4.4. :) Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner jglaz...@adobe.com wrote: Hi Shawn, I have been trying to duplicate this problem without success for the last 2 weeks which is one reason I'm getting flustered. It seems reasonable to be able to duplicate it but I can't. We do have a story to upgrade but that is still weeks if not months before that gets rolled out to production. We have another cluster running the same version but with 8 shards and 8 replicas with each shard at 100gb and more load and more indexing requests without this problem but we send docs in batches here and all fields are stored. Where as the trouble index has only 1 or 2 stored fields and only send docs 1 at a time. Could that have anything to do with it? Jed Von Samsung Mobile gesendet Ursprüngliche Nachricht Von: Shawn Heisey s...@elyograg.org Datum: 07.09.2013 18:33 (GMT+01:00) An: solr-user@lucene.apache.org Betreff: Re: Solr Hangs During Updates for over 10 minutes On 7/9/2013 9:50 AM, Jed Glazner wrote: I'll give you the high level before delving deep into setup etc. I have been struggeling at work with a seemingly random problem when solr will hang for 10-15 minutes during updates. This outage always seems to immediately be proceeded by an EOF exception on the replica. Then 10-15 minutes later we see an exception on the leader for a socket timeout to the replica. The leader will then tell the replica to recover which in most cases it does and then the outage is over. Here are the setup details: We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines. After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced and have since been fixed. You're five releases and about nine months behind what's current. My recommendation: Upgrade to 4.3.1, ensure your configuration is up to date with changes to the example config between 4.0.0 and 4.3.1, and reindex. Ideally, you should set up a 4.0.0 testbed, duplicate your current problem, and upgrade the testbed to see if the problem goes away. A testbed will also give you practice for a smooth upgrade of your production system. Thanks, Shawn
join not working with UUIDs
Hello, I am trying to create a POC to test query joins. However, I was surprised when I saw my test worked with some ids, but when my document ids are UUIDs, it doesn't work. Follows an example, using solrj: SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20); doc.addField(cor_parede, branca); doc.addField(num_cadeiras, 34); solr.add(doc); // Add children SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, computador1); doc2.addField(acessorio1, Teclado); doc2.addField(acessorio2, Mouse); doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20); solr.add(doc2); When I execute: ///select params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado} SolrQuery query = new SolrQuery(); query.setStart(0); query.setRows(10); query.set(q, cor_parede:branca); query.set(fq, {!join from=root_id to=id}acessorio1:Teclado); QueryResponse response = DGSolrServer.get().query(query); long numFound = response.getResults().getNumFound(); it returns zero results. However, if I use room1 for first document's id and for root_id field on second document, it works. Any idea why? What am I missing? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: join not working with UUIDs
Your join is requesting to use the join_id field (from) of documents matching the query of cor_parede:branca, but the join_id field of that document is empty. Maybe you intended to search in the other direction, like acessorio1:Teclado. -- Jack Krupansky -Original Message- From: Marcelo Elias Del Valle Sent: Tuesday, July 09, 2013 7:34 PM To: solr-user@lucene.apache.org Subject: join not working with UUIDs Hello, I am trying to create a POC to test query joins. However, I was surprised when I saw my test worked with some ids, but when my document ids are UUIDs, it doesn't work. Follows an example, using solrj: SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20); doc.addField(cor_parede, branca); doc.addField(num_cadeiras, 34); solr.add(doc); // Add children SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, computador1); doc2.addField(acessorio1, Teclado); doc2.addField(acessorio2, Mouse); doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20); solr.add(doc2); When I execute: ///select params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado} SolrQuery query = new SolrQuery(); query.setStart(0); query.setRows(10); query.set(q, cor_parede:branca); query.set(fq, {!join from=root_id to=id}acessorio1:Teclado); QueryResponse response = DGSolrServer.get().query(query); long numFound = response.getResults().getNumFound(); it returns zero results. However, if I use room1 for first document's id and for root_id field on second document, it works. Any idea why? What am I missing? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: join not working with UUIDs
Oops... I misread and confused your q and fq params. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Tuesday, July 09, 2013 7:47 PM To: solr-user@lucene.apache.org Subject: Re: join not working with UUIDs Your join is requesting to use the join_id field (from) of documents matching the query of cor_parede:branca, but the join_id field of that document is empty. Maybe you intended to search in the other direction, like acessorio1:Teclado. -- Jack Krupansky -Original Message- From: Marcelo Elias Del Valle Sent: Tuesday, July 09, 2013 7:34 PM To: solr-user@lucene.apache.org Subject: join not working with UUIDs Hello, I am trying to create a POC to test query joins. However, I was surprised when I saw my test worked with some ids, but when my document ids are UUIDs, it doesn't work. Follows an example, using solrj: SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20); doc.addField(cor_parede, branca); doc.addField(num_cadeiras, 34); solr.add(doc); // Add children SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, computador1); doc2.addField(acessorio1, Teclado); doc2.addField(acessorio2, Mouse); doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20); solr.add(doc2); When I execute: ///select params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado} SolrQuery query = new SolrQuery(); query.setStart(0); query.setRows(10); query.set(q, cor_parede:branca); query.set(fq, {!join from=root_id to=id}acessorio1:Teclado); QueryResponse response = DGSolrServer.get().query(query); long numFound = response.getResults().getNumFound(); it returns zero results. However, if I use room1 for first document's id and for root_id field on second document, it works. Any idea why? What am I missing? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Overseer queues confused me
Hi there: In solr4.3 source code , I found overseer use 3 queues to handle all solrcloud management request: 1: /overseer/queue 2: /overseer/queue-work 3: /overseer/collection-queue-work ClusterStateUpdater use 1st 2nd queue to handle solrcloud shard or state request. When peek request from 1st queue, then offer it to 2nd queue and handle it. OverseerCollectionProcessor use 3rd queue to handle collection related request. My question is why ClusterStateUpdater use 2 queues but OverseerCollectionProcessor use 1 also can handle request correctly? Is there any additional design for ClusterStateUpdater? Thanks in advance:) Best Regards, Illu Ying Assistant Supervisor, NESC-SH.MIS +86-021-51530666*41417 Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042) ONCE YOU KNOW, YOU NEWEGG. CONFIDENTIALITY NOTICE: This email and any files transmitted with it may contain privileged or otherwise confidential information. It is intended only for the person or persons to whom it is addressed. If you received this message in error, you are not authorized to read, print, retain, copy, disclose, disseminate, distribute, or use this message any part thereof or any information contained therein. Please notify the sender immediately and delete all copies of this message. Thank you in advance for your cooperation. 保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!
Norms
I have a field that has omitNorms=true, but when I look at debugQuery I see that the field is being normalized for the score. What can I do to turn off normalization in the score? I want a simple way to do 2 things: boost geodist() highest at 1 mile and lowest at 100 miles. plus add a boost for a query=edgefield^5. I only want tf() and no queryNorm. I am not even sure I want idf() but I can probably live with rare names being boosted. The results are being normalized. See below. I tried dismax and edismax - bf, bq and boost. requestHandler name=autoproviderdist class=solr.SearchHandler lst name=defaults str name=echoParamsnone/str str name=defTypeedismax/str float name=tie0.01/float str name=fl display_name,city_state,prov_url,pwid,city_state_alternative /str !-- str name=bq_val_:sum(recip(geodist(store_geohash), .5, 6, 6), 0.1)^10/str -- str name=boostsum(recip(geodist(store_geohash), .5, 6, 6), 0.1)/str int name=rows5/int str name=q.alt*:*/str str name=qfname_edgy^.9 name_edge^.9 name_word/str str name=grouptrue/str str name=group.fieldpwid/str str name=group.maintrue/str !-- str name=pfname_edgy/str do not turn on -- str name=sortscore desc, last_name asc/str str name=d100/str str name=pt39.740112,-104.984856/str str name=sfieldstore_geohash/str str name=hlfalse/str str name=hl.flname_edgy/str str name=mm2-1 4-2 6-3/str /lst /requestHandler 0.058555886 = queryNorm product of: 10.854807 = (MATCH) sum of: 1.8391232 = (MATCH) max plus 0.01 times others of: 1.8214592 = (MATCH) weight(name_edge:paul^0.9 in 231378), product of: 0.30982485 = queryWeight(name_edge:paul^0.9), product of: 0.9 = boost 5.8789964 = idf(docFreq=26567, maxDocs=3493655)* 0.058555886 = queryNorm* 5.8789964 = (MATCH) fieldWeight(name_edge:paul in 231378), product of: 1.0 = tf(termFreq(name_edge:paul)=1) 5.8789964 = idf(docFreq=26567, maxDocs=3493655) 1.0 = fieldNorm(field=name_edge, doc=231378) 1.7664119 = (MATCH) weight(name_edgy:paul^0.9 in 231378), product of: 0.30510724 = queryWeight(name_edgy:paul^0.9), product of: 0.9 = boost 5.789479 = idf(docFreq=29055, maxDocs=3493655)* 0.058555886 = queryNorm* 5.789479 = (MATCH) fieldWeight(name_edgy:paul in 231378), product of: 1.0 = tf(termFreq(name_edgy:paul)=1) 5.789479 = idf(docFreq=29055, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 9.015684 = (MATCH) max plus 0.01 times others of: 8.9352665 = (MATCH) weight(name_word:nutting in 231378), product of: 0.72333425 = queryWeight(name_word:nutting), product of: 12.352887 = idf(docFreq=40, maxDocs=3493655) 0.058555886 = queryNorm 12.352887 = (MATCH) fieldWeight(name_word:nutting in 231378), product of: 1.0 = tf(termFreq(name_word:nutting)=1) 12.352887 = idf(docFreq=40, maxDocs=3493655) 1.0 = fieldNorm(field=name_word, doc=231378) 8.04174 = (MATCH) weight(name_edgy:nutting^0.9 in 231378), product of: 0.65100086 = queryWeight(name_edgy:nutting^0.9), product of: 0.9 = boost 12.352887 = idf(docFreq=40, maxDocs=3493655)* 0.058555886 = queryNorm* 12.352887 = (MATCH) fieldWeight(name_edgy:nutting in 231378), product of: 1.0 = tf(termFreq(name_edgy:nutting)=1) 12.352887 = idf(docFreq=40, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 1.0855998 = sum(6.0/(0.5*float(geodist(39.74168747663498,-104.9849385023117,39.740112,-104.984856))+6.0),const(0.1)) -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Surround query parser not working?
Can we get a sample fieldType and field definition? Thanks. On Mon, Jul 8, 2013 at 8:40 AM, Jack Krupansky j...@basetechnology.comwrote: Yes, you should be able to used nested query parsers to mix the queries. Solr 4.1(?) made it easier. -- Jack Krupansky -Original Message- From: Abeygunawardena, Niran Sent: Monday, July 08, 2013 7:00 AM To: solr-user@lucene.apache.org Subject: Re: Surround query parser not working? Hi, Thanks. I found out that my issue was the default field (df) was being ignored and I had to specify the parameter by adding df=text in the URL. Thank you for updating the wiki page on the surround parser: http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser Hopefully, ordered proximity searches will be supported in the edismax query parser itself as the surround query parser is not as good as the edismax parser: https://issues.apache.org/**jira/browse/SOLR-3101https://issues.apache.org/jira/browse/SOLR-3101 Is there a way to AND the surround parser query with the edismax query so the ordered proximity search can be run through the surround query parser and the results combined/queried with the edismax query parser for other parts of the query? Can nested queries support this? Thanks, Niran Niran - Looks like you're being bitten by a known feature* of the surround query parser. It does not analyze the text, as some of the other more commonly used query parsers does. The dismax, edismax, and lucene query parsers all leverage field analysis on the query terms or phrases. The surround query parser just takes the terms as-is. It's by design, but not necessarily something that can't at least be optionally available. But as it is, you'll need to lowercase, at least. Be careful with index-time stemming, as you'd have to account for that in the surround query parser syntax by wildcarding things a bit. Instead of searching for finding, one would use find* (and index without stemming) in the query to match finds, finding. It was by design to not analyze in the surround query parser because it can be handy to use less analysis tricks at index time, and let the query itself be more sophisticated to allow more flexible and indeed more complex query-time constructs. Erik * http://wiki.apache.org/solr/**SurroundQueryParser#**Limitationshttp://wiki.apache.org/solr/SurroundQueryParser#Limitations- though it'd be useful to have analysis at least optionally available. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Calculating Solr document score by ignoring the boost field.
Jack due to 'some' reason my nutch is returning me index time boost =0.0 and just for a moment suppose that nutch is and will always return boost =0. Now my simple question was why Solr is showing me document's score = 0 ? Why is it depending upon index time boost value ? Why or how to make Solr to only calculate the score value on TF-IDF ? Regards, Khan On Tue, Jul 9, 2013 at 6:31 PM, Jack Krupansky j...@basetechnology.comwrote: Simple math: x times zero equals zero. That's why the default document boost is 1.0 - score times 1.0 equals score. Any particular reason you wanted to zero out the document score from the document level? -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Tuesday, July 09, 2013 9:23 AM To: solr-user@lucene.apache.org Subject: Re: Calculating Solr document score by ignoring the field. I am passing boost value (via nutch) and i.e boost =0.0. But my question is why Solr is showing me score = 0.0 when my boost (index time boost) = 0.0 ? Should not Solr calculate its documents score on the basis of TF-IDF ? And if not how can I make Solr to only consider TF-IDF while calculating document's score ? Regards, Khan On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson erickerick...@gmail.com** wrote: My guess is that you're not really passing on the boost field's value and getting the default. Don't quite know how I'd track that down though Best Erick On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com wrote: Greetings, I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on its own boost field to my Solr schema field name=boost type=float stored=true indexed=false/ Now due to some reason I always get boost = 0.0 and due to this my Solr's document score is also always 0.0. Is there any way in Solr that it ignores the boost field's value for its document's score calculation ? Regards, Khan