Re: Bad fieldNorm when using morphologic synonyms
Your analyzer needs to set positionIncrement correctly: sounds like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem (on the same position, of course). The stemming is done on a specific language, so other languages are not stemmed at all. Because of that, two documents with the same amount of terms, may have different termVector size. document which contains many words that being stemmed, will have a double sized termVector. This behaviour affects the relevance score in a BAD way. the fieldNorm of these documents reduces thier score. This is NOT the wanted behaviour in our case. We are looking for a way to mark the stemmed words (on index time, of course) so they won't affect the fieldNorm. Do such a way exist? Do you have another idea?
Re: SolrCloud FunctionQuery inconsistency
Thank you, Chris. I notice crontabs are performed at different time in replicas(delayed for 10 minutes against its leader), and these crontabs is to reload dic files. Therefore, the terms are slightly different between replicas. So the maxScore shows difference. Best, Sling -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4105293.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Bad fieldNorm when using morphologic synonyms
1) positions look all right (for me). 2) fieldNorm is determined by the size of the termVector, isn't it? the termVector size isn't affected by the positions. On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir rcm...@gmail.com wrote: Your analyzer needs to set positionIncrement correctly: sounds like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem (on the same position, of course). The stemming is done on a specific language, so other languages are not stemmed at all. Because of that, two documents with the same amount of terms, may have different termVector size. document which contains many words that being stemmed, will have a double sized termVector. This behaviour affects the relevance score in a BAD way. the fieldNorm of these documents reduces thier score. This is NOT the wanted behaviour in our case. We are looking for a way to mark the stemmed words (on index time, of course) so they won't affect the fieldNorm. Do such a way exist? Do you have another idea?
Re: Question about external file fields
I guess you refer to this post? http://1opensourcelover.wordpress.com/2013/07/02/solr-external-file-fields/ If so .. he already provides at least one possible use case: *snip* We use Solr to serve our company’s browse pages. Our browse pages are similar to how a typical Stackoverflow tag page looks. That “browse” page has the question title (which links to the actual page that contains the question, comments and replies), view count, snippet of the question text, questioner’s profile info, tags and time information. One thing that can change quite frequently on such a page is the view count. I believe Stackoverflow uses Redis to keep track of the view counts, but we have to currently manage this in Solr, since Solr is our only datastore to serve these browse pages. The problem before Solr 4.0 was that you could not update a single field in a document. You have to form the entire document first (either by querying Solr or using an alternate data source which contains all the info), update the view count and then post the entire document to Solr. With Solr 4+, you can do atomic update of a single field – the Solr server internally handles fetching the entire document, updating the field and updating its index. But atomic update comes with some caveats – you must store all your Solr fields (other than copyFields), which can increase your storage space and enable updateLog, which can slow down Solr start-up. For this specific problem of updating a field more frequently than the rest of the document, external file fields (EFFs) can come in quite handy. They have one main restriction though – you cannot use them in your queries directly i.e. they cannot be used in the q parameter directly. But we will see how we can circumvent this problem at least partially using function query hacks. */snip* another case, out of my head, might be product pricing or updates on stock count. - Stefan On Thursday, December 5, 2013 at 11:11 PM, yriveiro wrote: Hi, I read this post http://1opensourcelover.wordpress.com/ about EEF's and I found very interesting. Can someone give me more use cases about the utility of EEF's? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-external-file-fields-tp4105213.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
passing SYS_REFCURSOR as out parameter for Oracle stored procedure
Hi, I am using solr 3.3 for index generation with sql server, generating index successfully, now I am trying to generate with Oracle DB. I am using *UDP_Getdetails* procedure to generate the required indexes. In this procedure its taking 2 inputs and 1 output parameters. *input params : id name output params : cv_1 IN OUT SYS_REFCURSOR* In solr, data-config.xml below is my configuration. *entity name=index query=UDP_Getdetails(32,'GT', ); * I donot know how to pass *SYS_REFCURSOR* to procedure in solr. Please help me out of this. Thanks in Advance, Aniljayanti -- View this message in context: http://lucene.472066.n3.nabble.com/passing-SYS-REFCURSOR-as-out-parameter-for-Oracle-stored-procedure-tp4105307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Xml Query Parser
You are right that the XmlQueryParser isn't completely/yet implemented in Solr. There is the JIRA mentioned above, which is still WIP, so you could use that as a basis and extend it. If you aren't familiar with Solr and Java, you might find that a struggle, in which case you might want to consider other parser options. Depends how much of a challenge you want :) We have taken that (the original patch and made it work in Solr 4.0 (as I mentioned in the update last year), as well as increasing its support both with existing queries and some of our own. We plan to submit that back but at the moment it isn't really in a good enough state with tests, etc for that to happen. Hopefully in the new year once higher priority things have been dealt with. I haven't tried Erik's alternative, but that does look promising (and a lot more concise than our approach!) On 6 December 2013 06:51, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi Gora, Had seen that before but took a look again. Since it is not yet resolved, I assumed it is still a work in progress. Should I try an patch the current 4.6 code with the patches? How would you suggest I proceed? I am new to Solr and Java and so do not have much experience with this. Regards Puneet On Fri, Dec 6, 2013 at 11:54 AM, Gora Mohanty g...@mimirtech.com wrote: On 6 December 2013 11:35, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi, I am testing using Solr 4.6 and would like to know if there is some implementation like XmlQueryParser of Lucene in solr. [...] Please take a look at this JIRA issue: https://issues.apache.org/jira/browse/SOLR-839 Regards, Gora
LocalParam for nested query without escaping?
We want to set a LocalParam on a nested query. When quering with v inline parameter, it works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\} the parsedquery_toString is +id:TERM1 +(text:term2 text:term3 text:term4 term5) Query using the _query_ also works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\ (parsedquery is exactly the same). BUT, when trying to put the nested query in place, it yields syntax error: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5) org.apache.solr.search.SyntaxError: Cannot parse '(TERM2' The previous options are less preferred, because the escaping that should be made on the nested query. Can't I set a LocalParam to a nested query without escaping the query?
Re: LocalParam for nested query without escaping?
Obviously, there is the option of external parameter ({... v=$nestedq}nestedq=...) This is a good solution, but it is not practical, when having a lot of such nested queries. Any ideas? On Friday, December 6, 2013, Isaac Hebsh wrote: We want to set a LocalParam on a nested query. When quering with v inline parameter, it works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\} the parsedquery_toString is +id:TERM1 +(text:term2 text:term3 text:term4 term5) Query using the _query_ also works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\ (parsedquery is exactly the same). BUT, when trying to put the nested query in place, it yields syntax error: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5) org.apache.solr.search.SyntaxError: Cannot parse '(TERM2' The previous options are less preferred, because the escaping that should be made on the nested query. Can't I set a LocalParam to a nested query without escaping the query?
Re: Bad fieldNorm when using morphologic synonyms
termvectors have nothing to do with any of this. please, fix your analyzer first. if you want to add a synonym, it should be position increment of zero. i bet exact phrase queries aren't working correctly either. On Fri, Dec 6, 2013 at 12:50 AM, Isaac Hebsh isaac.he...@gmail.com wrote: 1) positions look all right (for me). 2) fieldNorm is determined by the size of the termVector, isn't it? the termVector size isn't affected by the positions. On Fri, Dec 6, 2013 at 10:46 AM, Robert Muir rcm...@gmail.com wrote: Your analyzer needs to set positionIncrement correctly: sounds like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem (on the same position, of course). The stemming is done on a specific language, so other languages are not stemmed at all. Because of that, two documents with the same amount of terms, may have different termVector size. document which contains many words that being stemmed, will have a double sized termVector. This behaviour affects the relevance score in a BAD way. the fieldNorm of these documents reduces thier score. This is NOT the wanted behaviour in our case. We are looking for a way to mark the stemmed words (on index time, of course) so they won't affect the fieldNorm. Do such a way exist? Do you have another idea?
Maven archetype
Hi to everybody. Im not going to say that im new in solr, but im new in solr. I been googling a lot of things to start with solr, but i would like to know if there is a maven archetype for the 4.6 version (to deploy in tomcat). Also i would like to know (based in best practices) what the comunity recommends about solr deployment, if is better to use a solr home directory or use just a war inside tomcat instalation with all the configuration inside. Thanks and regards Erwin
Re: Maven archetype
Hi, if you want to deploy the SOLR war on tomcat you should do once so why do you need a maven archetype? You can just get the war from the website and deploy to your server. If you need to use maven because you are, for example, developing in eclipse and you want to just launch jetty:run and have your configuration, jetty and solr up and ready you could add the solr war dependency to your pom.xml and appropriately configure your maven-jetty plugin dependency groupIdorg.apache.solr/groupId artifactIdsolr/artifactId version4.6.0/version /dependency My personal experience on SOLR: the servlet engine where you deploy SOLR is not very important, assuming you are choosing one of the most popular (Jetty, Tomcat, JBoss, WebLogic)...personally, since SOLR is a war, I would prefer a simple servlet engine instead of an application server (e.g. JBoss, WebSphere, Geronimo, Weblogic) because you dont need (if you don't need) all beautiful and complex things that those monsters carry on... Best, Andrea On 12/06/2013 03:04 PM, Erwin Etchart wrote: Hi to everybody. Im not going to say that im new in solr, but im new in solr. I been googling a lot of things to start with solr, but i would like to know if there is a maven archetype for the 4.6 version (to deploy in tomcat). Also i would like to know (based in best practices) what the comunity recommends about solr deployment, if is better to use a solr home directory or use just a war inside tomcat instalation with all the configuration inside. Thanks and regards Erwin
Introducing Luwak for high-performance stored Lucene queries
Hi all, We've now released the library we mentioned in our presentation at Lucene Revolution: https://github.com/flaxsearch/luwak You can use this to apply tens of thousands of stored Lucene queries to an incoming document in a second or so on relatively modest hardware. We use it for media monitoring applications but it could equally be useful for categorisation, classification etc. It's currently based on a fork of Lucene (details supplied) but hopefully it'll work with release versions soon. Feedback is very welcome! Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Introducing Luwak for high-performance stored Lucene queries
Hi Charlie, Very nice - thanks! I'd love to see a side-by-side comparison with ES percolator. got something like that in your blog topic queue? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Dec 6, 2013 at 9:29 AM, Charlie Hull char...@flax.co.uk wrote: Hi all, We've now released the library we mentioned in our presentation at Lucene Revolution: https://github.com/flaxsearch/luwak You can use this to apply tens of thousands of stored Lucene queries to an incoming document in a second or so on relatively modest hardware. We use it for media monitoring applications but it could equally be useful for categorisation, classification etc. It's currently based on a fork of Lucene (details supplied) but hopefully it'll work with release versions soon. Feedback is very welcome! Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Introducing Luwak for high-performance stored Lucene queries
On 06/12/2013 14:35, Otis Gospodnetic wrote: Hi Charlie, Very nice - thanks! I'd love to see a side-by-side comparison with ES percolator. got something like that in your blog topic queue? It's a good idea, I'll add it to the list. May need some more roundtuits C Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Dec 6, 2013 at 9:29 AM, Charlie Hull char...@flax.co.uk wrote: Hi all, We've now released the library we mentioned in our presentation at Lucene Revolution: https://github.com/flaxsearch/luwak You can use this to apply tens of thousands of stored Lucene queries to an incoming document in a second or so on relatively modest hardware. We use it for media monitoring applications but it could equally be useful for categorisation, classification etc. It's currently based on a fork of Lucene (details supplied) but hopefully it'll work with release versions soon. Feedback is very welcome! Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Maven archetype
Hi Andrea, i been looking for a archetype because i am using eclipse and the specific solr config must be easy to deploy (now we are using maven), mvn package mvn assembly or simliars, the idea is leave solr config easy to package and well stored on svn. I saw that is a very common way to use solr server create a solr home and store all xml and properties config in there , it is the common way to use solr server? regards Erwin 2013/12/6 Andrea Gazzarini a.gazzar...@gmail.com Hi, if you want to deploy the SOLR war on tomcat you should do once so why do you need a maven archetype? You can just get the war from the website and deploy to your server. If you need to use maven because you are, for example, developing in eclipse and you want to just launch jetty:run and have your configuration, jetty and solr up and ready you could add the solr war dependency to your pom.xml and appropriately configure your maven-jetty plugin dependency groupIdorg.apache.solr/groupId artifactIdsolr/artifactId version4.6.0/version /dependency My personal experience on SOLR: the servlet engine where you deploy SOLR is not very important, assuming you are choosing one of the most popular (Jetty, Tomcat, JBoss, WebLogic)...personally, since SOLR is a war, I would prefer a simple servlet engine instead of an application server (e.g. JBoss, WebSphere, Geronimo, Weblogic) because you dont need (if you don't need) all beautiful and complex things that those monsters carry on... Best, Andrea On 12/06/2013 03:04 PM, Erwin Etchart wrote: Hi to everybody. Im not going to say that im new in solr, but im new in solr. I been googling a lot of things to start with solr, but i would like to know if there is a maven archetype for the 4.6 version (to deploy in tomcat). Also i would like to know (based in best practices) what the comunity recommends about solr deployment, if is better to use a solr home directory or use just a war inside tomcat instalation with all the configuration inside. Thanks and regards Erwin
Re: Introducing Luwak for high-performance stored Lucene queries
+1 on this. - Mensaje original - De: Otis Gospodnetic otis.gospodne...@gmail.com Para: solr-user@lucene.apache.org Enviados: Viernes, 6 de Diciembre 2013 9:35:25 Asunto: Re: Introducing Luwak for high-performance stored Lucene queries Hi Charlie, Very nice - thanks! I'd love to see a side-by-side comparison with ES percolator. got something like that in your blog topic queue? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Dec 6, 2013 at 9:29 AM, Charlie Hull char...@flax.co.uk wrote: Hi all, We've now released the library we mentioned in our presentation at Lucene Revolution: https://github.com/flaxsearch/luwak You can use this to apply tens of thousands of stored Lucene queries to an incoming document in a second or so on relatively modest hardware. We use it for media monitoring applications but it could equally be useful for categorisation, classification etc. It's currently based on a fork of Lucene (details supplied) but hopefully it'll work with release versions soon. Feedback is very welcome! Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
How to programatically unload a shard from a single server to horizontally scale on SolrCloud
I'm writing a script so that when my SolrCloud setup is slowing down, I can add a new physical machine and run a script to split the shard with the most data and send half of the shard to the new machine. Here's the general thinking I'm following: - Pick the machine with the most data currently indexed on it. - Add a new machine to replicate that data - Call SPLITSHARD through the Collections API to split that data into two shards - Delete the original shard, so now the 2 subshards exist on both machines - Delete one subshardshard from the original machine, and the other subshard from the new machine. At this point, the data should be more evenly distributed, which will help us continue to scale. This also seems like an easily scriptable process, which is what I'm trying to do. My question is simple. I can call collections?action=SPLITSHARD to split the shards, and collections?action=DELETESHARD to delete the original shard, but how can I then delete (or unload?) one of the subshards from each machine? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-programatically-unload-a-shard-from-a-single-server-to-horizontally-scale-on-SolrCloud-tp4105343.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to programatically unload a shard from a single server to horizontally scale on SolrCloud
Use Core API, which provides the UNLOAD operation. Simply just unload the cores you don't need and they'll be automatically removed from SolrCloud. You can also specify options like deleteDataDir or deleteIndex to cleanup the disk space or you can do it in your script. http://wiki.apache.org/solr/CoreAdmin#UNLOAD - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-programatically-unload-a-shard-from-a-single-server-to-horizontally-scale-on-SolrCloud-tp4105343p4105344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to programatically unload a shard from a single server to horizontally scale on SolrCloud
Thanks Michael. I didn't realize the cores and collections APIs were interchangeable like that. I'd assumed that the cores API was meant for vanilla Solr, while Collections was specific to SolrCloud. I appreciate you clarifying that. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-programatically-unload-a-shard-from-a-single-server-to-horizontally-scale-on-SolrCloud-tp4105343p4105345.html Sent from the Solr - User mailing list archive at Nabble.com.
alternative to DisMaxRequestHandler needed for upgrade to solr 4.6.0
Hi I'm trying to upgrade a solr installation from 1.4 (yes, really) to 4.6.0, and I find our requesthandler was solr.DisMaxRequestHandler, which is now not only deprecated but deleted from solr-core-4.6.0.jar. Can anyone advise on suitable alternatives, or was there any form of direct replacement? Cheers!
Re: Xml Query Parser
Hi Daniel Thanks for the heads up. I'll try to get the patch integrated. Regards Puneet On 6 Dec 2013 16:39, Daniel Collins danwcoll...@gmail.com wrote: You are right that the XmlQueryParser isn't completely/yet implemented in Solr. There is the JIRA mentioned above, which is still WIP, so you could use that as a basis and extend it. If you aren't familiar with Solr and Java, you might find that a struggle, in which case you might want to consider other parser options. Depends how much of a challenge you want :) We have taken that (the original patch and made it work in Solr 4.0 (as I mentioned in the update last year), as well as increasing its support both with existing queries and some of our own. We plan to submit that back but at the moment it isn't really in a good enough state with tests, etc for that to happen. Hopefully in the new year once higher priority things have been dealt with. I haven't tried Erik's alternative, but that does look promising (and a lot more concise than our approach!) On 6 December 2013 06:51, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi Gora, Had seen that before but took a look again. Since it is not yet resolved, I assumed it is still a work in progress. Should I try an patch the current 4.6 code with the patches? How would you suggest I proceed? I am new to Solr and Java and so do not have much experience with this. Regards Puneet On Fri, Dec 6, 2013 at 11:54 AM, Gora Mohanty g...@mimirtech.com wrote: On 6 December 2013 11:35, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi, I am testing using Solr 4.6 and would like to know if there is some implementation like XmlQueryParser of Lucene in solr. [...] Please take a look at this JIRA issue: https://issues.apache.org/jira/browse/SOLR-839 Regards, Gora
pool-1-thread-4 java.lang.NoSuchMethodError: org.apache.solr.util.SimplePostTool
pool-1-thread-4 java.lang.NoSuchMethodError: org.apache.solr.util.SimplePostTool I am getting this error while posting Data to Solr from XML generated file. Although the Solr post.jar is present in the Library Class Path and I also tried keeping the Source class of the Post Tool. Urgent Call. Thanks!
Change Velocity Template Directory in Solr 4.6
I would like to know how to set the Velocity Template Directory in Solr. About 6 months ago I asked this question on this list: http://lucene.472066.n3.nabble.com/Change-Velocity-Template-Directory-td4078120.html At that time Erik Hatcher advised me to use the v.base_dir in solrconfig.xml. This worked perfectly in Solr 4.3. However now I am attempting to move my code/data to Solr 4.6, and this does not work i.e. it does not recognize the v.base_dir in solrconfig.xml. Doing a diff of org.apache.solr.response.VelocityResponseWriter I can see that some code has been removed from the getEngine() method in the new 4.6 version. I was discussing this with hossman on the IRC, and he pointed me to https://issues.apache.org/jira/browse/SOLR-4882 I understand that this is a security issue and I am ready to take the risk because for now this would only be used internally by non-technical folk. Hossman pointed me to https://gist.github.com/hossman/7827910 This is a system property solr.allow.unsafe.resourceloading=true that would supposedly enable unsafe template loading from other locations. However this does not work. (Here I am assuming I start up Solr with java -Dsolr.allow.unsafe.resourceloading= -jar start.jar i.e. I have tried setting this property on the commandline.) Any ideas? If this has been changed, then someone might need to remove v.base_dir from the documentation at http://wiki.apache.org/solr/VelocityResponseWriter Thank you, O. O.
Re: Function query matching
I added some timing logging to IndexSearcher and ScaleFloatFunction and compared a simple DisMax query with a DisMax query wrapped in the scale function. The index size was 500K docs, 61K docs match the DisMax query. The simple DisMax query took 33 ms, the function query took 89 ms. What I found was: 1. The scale query only normalized the scores once (in ScaleInfo.createScaleInfo) and added 33 ms to the Qtime. Subsequent calls to ScaleFloatFuntion.getValues bypassed 'createScaleInfo and added ~0 time. 2. The FunctionQuery 'nextDoc' iterations added 16 ms over the DisMax 'nextDoc' iterations. Here's the breakdown: Simple DisMax query: weight.scorer: 3 ms (get term enum) scorer.score: 23 ms (nextDoc iterations) other: 3 ms Total: 33 ms DisMax wrapped in ScaleFloatFunction: weight.scorer: 39 ms (get scaled values) scorer.score: 39 ms (nextDoc iterations) other: 11 ms Total: 89 ms Even with any improvements to 'scale', all function queries will add a linear increase to the Qtime as index size increases, since they match all docs. Trey: I'd be happy to test any patch that you find improves the speed. On Mon, Dec 2, 2013 at 11:21 PM, Trey Grainger solrt...@gmail.com wrote: We're working on the same problem with the combination of the scale(query(...)) combination, so I'd like to share a bit more information that may be useful. *On the scale function:* Even thought the scale query has to calculate the scores for all documents, it is actually doing this work twice for each ValueSource (once to calculate the min and max values, and then again when actually scoring the documents), which is inefficient. To solve the problem, we're in the process of putting a cache inside the scale function to remember the values for each document when they are initially computed (to find the min and max) so that the second pass can just use the previously computed values for each document. Our theory is that most of the extra time due to the scale function is really just the result of doing duplicate work. No promises this won't be overly costly in terms of memory utilization, but we'll see what we get in terms of speed improvements and will share the code if it works out well. Alternate implementation suggestions (or criticism of a cache like this) are also welcomed. *On the NoOp product function: scale(prod(1, query(...))):* We do the same thing, which ultimately is just an unnecessary waste of a loop through all documents to do an extra multiplication step. I just debugged the code and uncovered the problem. There is a Map (called context) that is passed through to each value source to store intermediate state, and both the query and scale functions are passing the ValueSource for the query function in as the KEY to this Map (as opposed to using some composite key that makes sense in the current context). Essentially, these lines are overwriting each other: Inside ScaleFloatFunction: context.put(this.source, scaleInfo); //this.source refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object Inside QueryValueSource: context.put(this, w); //this refers to the same QueryValueSource from above, and the w refers to a Weight object As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the context Map, it unexpectedly pulls the Weight object out instead and thus the invalid case exception occurs. The NoOp multiplication works because it puts an different ValueSource between the query and the ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this (in QueryValueSource). This should be an easy fix. I'll create a JIRA ticket to use better key names in these functions and push up a patch. This will eliminate the need for the extra NoOp function. -Trey On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm persuing this possible PostFilter solution, I can see how to collect all the hits and recompute the scores in a PostFilter, after all the hits have been collected (for scaling). Now, I can't see how to get the custom doc/score values back into the main query's HitQueue. Any advice? Thanks, Peter On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan peterlkee...@gmail.com wrote: Instead of using a function query, could I use the edismax query (plus some low cost filters not shown in the example) and implement the scale/sum/product computation in a PostFilter? Is the query's maxScore available there? Thanks, Peter On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan peterlkee...@gmail.com wrote: Although the 'scale' is a big part of it, here's a closer breakdown. Here are 4 queries with increasing functions, and theei response times (caching turned off in solrconfig): 100 msec: select?q={!edismax v='news' qf='title^2 body'} 135 msec: select?qq={!edismax v='news' qf='title^2
Re: Function query matching
In my previous posting, I said: Subsequent calls to ScaleFloatFuntion.getValues bypassed 'createScaleInfo and added ~0 time. These subsequent calls are for the remaining segments in the index reader (21 segments). Peter On Fri, Dec 6, 2013 at 2:10 PM, Peter Keegan peterlkee...@gmail.com wrote: I added some timing logging to IndexSearcher and ScaleFloatFunction and compared a simple DisMax query with a DisMax query wrapped in the scale function. The index size was 500K docs, 61K docs match the DisMax query. The simple DisMax query took 33 ms, the function query took 89 ms. What I found was: 1. The scale query only normalized the scores once (in ScaleInfo.createScaleInfo) and added 33 ms to the Qtime. Subsequent calls to ScaleFloatFuntion.getValues bypassed 'createScaleInfo and added ~0 time. 2. The FunctionQuery 'nextDoc' iterations added 16 ms over the DisMax 'nextDoc' iterations. Here's the breakdown: Simple DisMax query: weight.scorer: 3 ms (get term enum) scorer.score: 23 ms (nextDoc iterations) other: 3 ms Total: 33 ms DisMax wrapped in ScaleFloatFunction: weight.scorer: 39 ms (get scaled values) scorer.score: 39 ms (nextDoc iterations) other: 11 ms Total: 89 ms Even with any improvements to 'scale', all function queries will add a linear increase to the Qtime as index size increases, since they match all docs. Trey: I'd be happy to test any patch that you find improves the speed. On Mon, Dec 2, 2013 at 11:21 PM, Trey Grainger solrt...@gmail.com wrote: We're working on the same problem with the combination of the scale(query(...)) combination, so I'd like to share a bit more information that may be useful. *On the scale function:* Even thought the scale query has to calculate the scores for all documents, it is actually doing this work twice for each ValueSource (once to calculate the min and max values, and then again when actually scoring the documents), which is inefficient. To solve the problem, we're in the process of putting a cache inside the scale function to remember the values for each document when they are initially computed (to find the min and max) so that the second pass can just use the previously computed values for each document. Our theory is that most of the extra time due to the scale function is really just the result of doing duplicate work. No promises this won't be overly costly in terms of memory utilization, but we'll see what we get in terms of speed improvements and will share the code if it works out well. Alternate implementation suggestions (or criticism of a cache like this) are also welcomed. *On the NoOp product function: scale(prod(1, query(...))):* We do the same thing, which ultimately is just an unnecessary waste of a loop through all documents to do an extra multiplication step. I just debugged the code and uncovered the problem. There is a Map (called context) that is passed through to each value source to store intermediate state, and both the query and scale functions are passing the ValueSource for the query function in as the KEY to this Map (as opposed to using some composite key that makes sense in the current context). Essentially, these lines are overwriting each other: Inside ScaleFloatFunction: context.put(this.source, scaleInfo); //this.source refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object Inside QueryValueSource: context.put(this, w); //this refers to the same QueryValueSource from above, and the w refers to a Weight object As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the context Map, it unexpectedly pulls the Weight object out instead and thus the invalid case exception occurs. The NoOp multiplication works because it puts an different ValueSource between the query and the ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this (in QueryValueSource). This should be an easy fix. I'll create a JIRA ticket to use better key names in these functions and push up a patch. This will eliminate the need for the extra NoOp function. -Trey On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm persuing this possible PostFilter solution, I can see how to collect all the hits and recompute the scores in a PostFilter, after all the hits have been collected (for scaling). Now, I can't see how to get the custom doc/score values back into the main query's HitQueue. Any advice? Thanks, Peter On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan peterlkee...@gmail.com wrote: Instead of using a function query, could I use the edismax query (plus some low cost filters not shown in the example) and implement the scale/sum/product computation in a PostFilter? Is the query's maxScore available there? Thanks, Peter On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan peterlkee...@gmail.com wrote: Although
Re: Prioritize search returns by URL path?
Thanks all. Yes, we can differentiate between content types by URL. Everything else being equal, Wiki posts should always be returned higher than blog posts, and blog posts should always be returned higher than forum posts. Within forum posts, we want to rank Verified answered and Suggested answered posts higher than unanswered posts. These cannot be identified via path - only via metadata attached to the individual post. Any suggestions? @Alex, I'll investigate the references you provided. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html Sent from the Solr - User mailing list archive at Nabble.com.
Configurable collectors for custom ranking
I looked at SOLR-4465 and SOLR-5045, where it appears that there is a goal to be able to do custom sorting and ranking in a PostFilter. So far, it looks like only custom aggregation can be implemented in PostFilter (5045). Custom sorting/ranking can be done in a pluggable collector (4465), but this patch is no longer in dev. Is there any other dev. being done on adding custom sorting (after collection) via a plugin? Thanks, Peter
Re: Maven archetype
Hi Erwin; If you want to run Solr within a servlet container and if you are new to Solr you should examine the example folder of Solr. Run it and configure its files. You can start reading from here: http://lucene.apache.org/solr/4_6_0/tutorial.html If you look at that example you can customize it for a Solr instance according to your needs. On the other hand if you don't want to have problems to open Solr project within your ide you can run *ant eclipse* or *ant idea* command (eclipse for Eclipse and idea is for Intellij IDEA) under lucene-solr folder. This will setup configuration files for your ide. However if you want to run Solr as a maven project instead of using ant you can read here: http://www.petrikainulainen.net/programming/maven/running-solr-with-maven/ It explains running Solr with maven. Example is for Solr 4.3.0 If you have any problems you can ask it. Thanks; Furkan KAMACI 2013/12/6 Erwin Etchart erwin.etch...@gmail.com Hi Andrea, i been looking for a archetype because i am using eclipse and the specific solr config must be easy to deploy (now we are using maven), mvn package mvn assembly or simliars, the idea is leave solr config easy to package and well stored on svn. I saw that is a very common way to use solr server create a solr home and store all xml and properties config in there , it is the common way to use solr server? regards Erwin 2013/12/6 Andrea Gazzarini a.gazzar...@gmail.com Hi, if you want to deploy the SOLR war on tomcat you should do once so why do you need a maven archetype? You can just get the war from the website and deploy to your server. If you need to use maven because you are, for example, developing in eclipse and you want to just launch jetty:run and have your configuration, jetty and solr up and ready you could add the solr war dependency to your pom.xml and appropriately configure your maven-jetty plugin dependency groupIdorg.apache.solr/groupId artifactIdsolr/artifactId version4.6.0/version /dependency My personal experience on SOLR: the servlet engine where you deploy SOLR is not very important, assuming you are choosing one of the most popular (Jetty, Tomcat, JBoss, WebLogic)...personally, since SOLR is a war, I would prefer a simple servlet engine instead of an application server (e.g. JBoss, WebSphere, Geronimo, Weblogic) because you dont need (if you don't need) all beautiful and complex things that those monsters carry on... Best, Andrea On 12/06/2013 03:04 PM, Erwin Etchart wrote: Hi to everybody. Im not going to say that im new in solr, but im new in solr. I been googling a lot of things to start with solr, but i would like to know if there is a maven archetype for the 4.6 version (to deploy in tomcat). Also i would like to know (based in best practices) what the comunity recommends about solr deployment, if is better to use a solr home directory or use just a war inside tomcat instalation with all the configuration inside. Thanks and regards Erwin
Re: alternative to DisMaxRequestHandler needed for upgrade to solr 4.6.0
Try edismax it's an improved dismax. Warning, though, it behaves a bit differently than dismax so you'll have to look again at the results and perhaps tweak. Best, Erick On Dec 6, 2013 10:58 AM, Peri Stracchino peri.stracch...@york.ac.uk wrote: Hi I'm trying to upgrade a solr installation from 1.4 (yes, really) to 4.6.0, and I find our requesthandler was solr.DisMaxRequestHandler, which is now not only deprecated but deleted from solr-core-4.6.0.jar. Can anyone advise on suitable alternatives, or was there any form of direct replacement? Cheers!
Re: alternative to DisMaxRequestHandler needed for upgrade to solr 4.6.0
On 12/6/2013 8:58 AM, Peri Stracchino wrote: I'm trying to upgrade a solr installation from 1.4 (yes, really) to 4.6.0, and I find our requesthandler was solr.DisMaxRequestHandler, which is now not only deprecated but deleted from solr-core-4.6.0.jar. Can anyone advise on suitable alternatives, or was there any form of direct replacement? Erick is right, you should probably use edismax. In addition, it's important to note a critical distinction here ... it's the *handler* object that's deprecated and removed, not the parser. The old dismax query parser is still alive and well, alongside the new extended dismax query parser. You need to use a standard search request handler and set the defType parameter to dismax or edismax. http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax I would recommend that you not use /dismax or /edismax for the handler name, just to avoid terminology clashes. I use /ncdismax for my handler name ... the string nc has meaning for our web application. Eventually I hope to move all searching to edismax and therefore just use /select or /search for the handler name. Right now we do almost everything with the standard query parser, and we are still tuning edismax. This is my handler definition: requestHandler name=/ncdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows70/int str name=dfcatchall/str xi:include href=shards.xml xmlns:xi=http://www.w3.org/2001/XInclude/ str name=shards.qt/search/str str name=shards.infotrue/str str name=shards.toleranttrue/str float name=tie0.1/float int name=qs3/int int name=ps3/int str name=qfcatchall/str str name=pfcatchall^2/str str name=boostmin(recip(abs(ms(NOW/HOUR,pd)),1.92901e-10,1.5,1.5),0.85)/str str name=mm100%/str str name=q.alt*:*/str bool name=lowercaseOperatorsfalse/bool /lst /requestHandler Thanks, Shawn
Re: Function query matching
I had to do a double take when i read this sentence... : Even with any improvements to 'scale', all function queries will add a : linear increase to the Qtime as index size increases, since they match all : docs. ...because that smelled like either a bug in your methodology, or a bug in Solr. To convince myself there wasn't a bug in Solr, i wrote a test case (i'll commit tomorow, bunch of churn in svn right now making ant precommit unhappy) to prove that when wrapping boost functions arround queries, Solr will only evaluate the functions for docs matching the wrapped query -- so there is no linear increase as the index size increases, just the (neccessary) libera increase as the number of *matching* docs grows. (for most functions anyway -- as mentioned scale is special). BUT! ... then i remembered how this thread started, and your goal of scaling the scores from a wrapped query. I want to be clear for 99% of the people reading this, if you find yourself writting a query structure like this... q={!func}..functions involving wrapping $qq ... qq={!edismax ...lots of stuff but still only matching subset of the index...} fq={!query v=$qq} ...Try to restructure the match you want to do into the form of a multiplier q={!boost b=$b v=$qq} b=...functions producing a score multiplier... qq={!edismax ...lots of stuff but still only matching subset of the index...} Because the later case is much more efficient and Solr will only compute the function values for hte docs it needs to (that match the wrapped $qq query) But for your specific goal Peter: Yes, if the whole point of a function you have is to wrap generated a scaled score of your base $qq, then the function (wrapping the scale(), wrapping the query()) is going to have to be evaluated for every doc -- that will definitely be linear based on the size of the index. -Hoss http://www.lucidworks.com/
Null pointer exception in spell checker at addchecker method
Im trying to use spell check component. My *schema* is:(i have included only fields necessary for spell check not the entire schema) fields field name=doc_id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=title type=text indexed=true stored=true/ field name=_version_ type=long indexed=true stored=true multiValued=false/ copyfield source=id dest=text / dynamicField name=ignored_* type=text indexed=false stored=false multiValued=true/ field name=spelltext type=spell indexed=true stored=false multiValued=true / copyField source=contents dest=spelltext / /fields types fieldType name=spell class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishMinimalStemFilterFactory / filter class=solr.SnowballPorterFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.EnglishMinimalStemFilterFactory / filter class=solr.SnowballPorterFilterFactory / /analyzer /fieldType /types My *solrconfig* is: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedirect/str str name=fieldcontents/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.8/float int name=maxEdits1/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength3/int float name=maxQueryFrequency0.01/float /lst /searchComponent searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldcontents/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarydirect/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler I get this *error*: java.lang.NullPointerException at org.apache.solr.spelling.*ConjunctionSolrSpellChecker.addChecker*(ConjunctionSolrSpellChecker.java:58) at org.apache.solr.handler.component.SpellCheckComponent.getSpellChecker(SpellCheckComponent.java:475) at org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:106) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at I know that the error might be in addchecker method,i read this method but the coding of this method is such that, for all the null values, default values are added. (eg: if (queryAnalyzer == null) queryAnalyzer = checker.getQueryAnalyzer(); ) Now so i feel that the Null checker value is sent when /checkers.add(checker);/ is executed. If i am right tell me how to resolve this,else what has gone wrong. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Null-pointer-exception-in-spell-checker-at-addchecker-method-tp4105489.html Sent from the Solr - User mailing list archive at Nabble.com.