Re: How to improve this solr query?
Hi Erick and Michael, It's not asterisk at all. Sorry to confuse you guys, it's actually *dot *letter. I put it that way because it contains quite a lot of fields there. The reason I'm doing that is because I have some string fields and non-string fields. The idea is to send quoted value to string fields and non-quoted value to non-string fields. I have to do that in order to match string fields. I have tried using pf, but it doesn't match the string field at all. Do you have any good resource about how to use pf? I looked into several latest solr books, but they said very little about it. On Wed, Jul 4, 2012 at 3:51 AM, Erick Erickson erickerick...@gmail.comwrote: Chamnap: I've seen various e-mail programs put the asterisk in for terms that are in bold face. The queries you pasted have lots of * characters in it, I suspect that they were just things you put in bold in your original, that may be the source of the confusion about whether you were using wildcards. But on to your question. If your q1 and q2 are the same words, wouldn't it just work to specify the pf (phrase field) parameter for edismax? That automatically takes the terms in the query and turns it into a phrase query that's boosted higher. And what's the use-case here? I think hou might be making this more complex than it needs to be Best Erick On Tue, Jul 3, 2012 at 8:41 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Chamnap, I have a hunch you can get away with not using *s. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Tue, Jul 3, 2012 at 2:16 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Lance, I didn't use widcard at all. I use only this, the difference is quoted or not. q2=*apartment* q1=*apartment* * * On Tue, Jul 3, 2012 at 12:06 PM, Lance Norskog goks...@gmail.com wrote: q2=*apartment* q1=*apartment* These are wildcards On Mon, Jul 2, 2012 at 8:30 PM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Hi Lance, I didn't use wildcards at all. This is a normal text search only. I need a string field because it needs to be matched exactly, and the value is sometimes a multi-word, so quoted it is necessary. By the way, if I do a super plain query, it takes at least 600ms. I'm not sure why. On another solr instance with similar amount of data, it takes only 50ms. I see something strange on the response, there is always str name=commandbuild/str What does that mean? On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog goks...@gmail.com wrote: Wildcards are slow. Leading wildcards are even more slow. Is there some way to search that data differently? If it is a string, can you change it to a text field and make sure 'apartment' is a separate word? On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Hi Michael, Thanks for quick response. Based on documentation, facet.mincount means that solr will return facet fields that has at least that number. For me, I just want to ensure my facet fields count doesn't have zero value. I try to increase to 10, but it still slows even for the same query. Actually, those 13 million documents are divided into 200 portals. I already include fq=portal_uuid: kjkjkjk inside each nested query, but it's still slow. On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Chamnap, The first thing that jumped out at me was facet.mincount=1. Are you sure you need this? Increasing this number should drastically improve speed. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Hi all, I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The problem is that my query is so slow; the average response time is 12 secs against 13 millions documents. What I am doing is to send quoted string (q2) to string fields and non-quoted string (q1) to other fields and combine the result together. facet=truesort=score+descq2=*apartment*facet.mincount=1q1=*apartment* tie=0.1q.alt=*:*wt=jsonversion=2.2rows=20fl=uuidfacet.query=has_map:+truefacet.query=has_image:+truefacet.query=has_website:+truestart=0q= * _query_:+{!dismax+qf='.'+fq='..'+v=$q1}+OR+_query_:+{!dismax+qf='..'+fq='...'+v=$q2} * facet.field={!ex%3Ddt}sub_category_uuidsfacet.field={!ex%3Ddt}location_uuid I have done solr optimize already,
Re: How to improve this solr query?
Couple questions: 1) Why are you explicitly telling solr to sort by score desc, shouldn't it do that for you? Could this be a source of performance problems since sorting requires the loading of the field caches? 2) Of the query parameters, q1 and q2, which one is actually doing text searching on your index? It looks like q1 is doing non-string related stuff, could this be better handled in either the bf or bq section of the edismax config? Looking at the sample though I don't understand how q1=apartment would hit non-string fields again (but see #3) 3) Are the string fields literally of string type (i.e. no analysis on the field) or are you saying string loosely to mean text field. pf == phrase fields == given a multiple word query, will ensure that the specified phrase exists in the specified fields separated by some slop (hello my world may match hello world depending on this slop value). The qf means that given a multi term query, each term exists in the specified fields (name, description whatever text fields you want). Best Amit On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Hi all, I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The problem is that my query is so slow; the average response time is 12 secs against 13 millions documents. What I am doing is to send quoted string (q2) to string fields and non-quoted string (q1) to other fields and combine the result together. facet=truesort=score+descq2=*apartment*facet.mincount=1q1=*apartment* tie=0.1q.alt=*:*wt=jsonversion=2.2rows=20fl=uuidfacet.query=has_map:+truefacet.query=has_image:+truefacet.query=has_website:+truestart=0q= * _query_:+{!dismax+qf='.'+fq='..'+v=$q1}+OR+_query_:+{!dismax+qf='..'+fq='...'+v=$q2} * facet.field={!ex%3Ddt}sub_category_uuidsfacet.field={!ex%3Ddt}location_uuid I have done solr optimize already, but it's still slow. Any idea how to improve the speed? Am I done anything wrong? -- Chhorn Chamnap http://chamnap.github.com/
Use of Solr as primary store for search engine
Hello all, I am curious to know how people are using Solr in conjunction with other data stores when building search engines to power web sites (say an ecommerce site). The question I have for the group is given an architecture where the primary (transactional) data store is MySQL (Oracle, PostGres whatever) with periodic indexing into Solr, when your front end issues a search query to Solr and returns results, are there any joins with your primary Oracle/MySQL etc to help render results? Basically I guess my question is whether or not you store enough in Solr so that when your front end renders the results page, it never has to hit the database. The other option is that your search engine only returns primary keys that your front end then uses to hit the DB to fetch data to display to your end user. With Solr 4.0 and Solr moving towards the NoSQL direction, I am curious what people are doing and what application architectures with Solr look like. Thanks! Amit
Re: Use of Solr as primary store for search engine
Amit, not exactly a response to your question but doing this with a lucene index on i2geo.net has resulted in considerably performance boost (reading from stored-fields instead of reading from the xwiki objects which pull from the SQL database). However, it implied that we had to rewrite anything necessary for the rendering, hence the rendering has not re-used that many code. Paul Le 4 juil. 2012 à 09:54, Amit Nithian a écrit : Hello all, I am curious to know how people are using Solr in conjunction with other data stores when building search engines to power web sites (say an ecommerce site). The question I have for the group is given an architecture where the primary (transactional) data store is MySQL (Oracle, PostGres whatever) with periodic indexing into Solr, when your front end issues a search query to Solr and returns results, are there any joins with your primary Oracle/MySQL etc to help render results? Basically I guess my question is whether or not you store enough in Solr so that when your front end renders the results page, it never has to hit the database. The other option is that your search engine only returns primary keys that your front end then uses to hit the DB to fetch data to display to your end user. With Solr 4.0 and Solr moving towards the NoSQL direction, I am curious what people are doing and what application architectures with Solr look like. Thanks! Amit
Re: Something like 'bf' or 'bq' with MoreLikeThis
Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you please shed me some light on how to use a patch? point me to a wiki/doc is fine too. Thanks a lot! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html Sent from the Solr - User mailing list archive at Nabble.com.
How to change tmp directory
Hello all, I came about an odd issue today when I wanted to add ca. 7M documents to my Solr index: I got a SolrServerException telling me No space left on device. I had a look at the directory Solr (and its index) is installed in and there is plenty space (~300GB). I then noticed a file named upload_457ee97b_1385125274b__8000_0005.tmp had taken up all space of the machine's /tmp directory. The partition holding the /tmp directory only has around 1GB of space and this file already took nearly 800MB. I had a look at it and I realized that the file contained the data I was adding to Solr in an XML format. Is there a possibility to change the temporary directory for this action? I use an IteratorSolrInputDocument with the HttpSolrServer's add(Iterator) method for performance. So I can't just do commits from time to time. Best regards, Erik
Solr: MLT filter by a field in matched doc
MoreLikeThis can return the matched doc. My question is that can i somehow pass in a query param to indicate that i would like to filter on a field value of the matched doc? Is this doable? Or, if not doable, what's the work around? Thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MLT-filter-by-a-field-in-matched-doc-tp3992945.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Similarity of numbers in MoreLikeThisHandler
very well explained. However, you dont know the number (integer/float) field value of a matched in advance. So even suppose the Similarity field is constructed, how to use it in the query? -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-of-numbers-in-MoreLikeThisHandler-tp486350p3992949.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Synonyms and hyphens
Hi, Does anybody know why hyphen '-' and q.op=AND causes such a big difference between the two queries? I thought hyphens are removed by StandardTokenizer which means theoretically the two queries should be the same! Thanks On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote: Hi, I'm not sure if anybody has experienced this behavior before or not. I noticed that 'hyphen' plays a very important role here. I used Solr's default example directory. http://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+name:gb +name:gib +name:gigabyte +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes, While searching http://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+(name:gb name:gib name:gigabyte name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes), If you notice to the first query - with hyphens - you can see that the results of parsing is totally different. I know that hyphens are special characters in Solr, but there's no way that the first query returns any entry because it's asking for ALL synonyms. Am I missing something here? Thanks -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: how Solr/Lucene can support standard join operation
FYI, If denormalization doesn't work for you, check index time join http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html. here is the scratch for query and index time support: https://issues.apache.org/jira/browse/SOLR-3076 https://issues.apache.org/jira/browse/SOLR-3535 On Wed, Jun 27, 2012 at 3:47 PM, Lee Carroll lee.a.carr...@googlemail.comwrote: Sorry you have that link! and I did not see the question - apols index schema could look something like: id name classList - multi value majorClassList - multi value a standard query would do the equivalent of your sql again apols for not seeing the link lee c On 27 June 2012 12:37, Lee Carroll lee.a.carr...@googlemail.com wrote: In your example de-normalising would be fine in a vast number of use-cases. multi value fields are fine. If you really want to, see http://wiki.apache.org/solr/Join but make sure you loose the default relational dba world view first and only go down that route if you need to. On 27 June 2012 12:27, Robert Yu robert...@morningstar.com wrote: The ability of join operation supported as what http://wiki.apache.org/solr/Join says is so limited. I'm thinking how to support standard join operation in Solr/Lucene because not all can be de-normalized efficiently. Take 2 schemas below as an example: (1)Student sid name cid// class id (2)class cid name major In SQL, it will be easy to get all students' name and its class name where student's name start with 'p' and class's major is CS. Select s.name, c.name from student s, class c where s.namelike 'p%' and c.major = CS. How Solr/Lucene support the above query? It seems they do not. Thanks, Robert Yu Application Service - Backend Morningstar Shenzhen Ltd. Morningstar. Illuminating investing worldwide. +86 755 3311-0223 voice +86 137-2377-0925 mobile +86 755 - fax robert...@morningstar.commailto:robert...@morningstar.com 8FL, Tower A, Donghai International Center ( or East Pacific International Center) 7888 Shennan Road, Futian district, Shenzhen, Guangdong province, China 518040 http://cn.morningstar.comhttp://cn.morningstar.com/ This e-mail contains privileged and confidential information and is intended only for the use of the person(s) named above. Any dissemination, distribution, or duplication of this communication without prior written consent from Morningstar is strictly prohibited. If you have received this message in error, please contact the sender immediately and delete the materials from any computer. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr 3.6 issue - DataImportHandler with CachedSqlEntityProcessor not importing all multi-valued fields
It's hard to troubleshoot without debug logs. Pls pay attention that regular configuration for CachedSqlEP is slightly different http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor see where=xid=x.id On Wed, Jun 27, 2012 at 2:29 AM, ps_sra praveens1...@yahoo.com wrote: Not sure if this is the right forum to post this question. If not, please excuse. I'm trying to use the DataImportHandler with processor=CachedSqlEntityProcessor to speed up import from an RDBMS. While processor=CachedSqlEntityProcessor is much faster than processor=SqlEntityProcessor, the resulting Solr index does not contain multi-valued fields on sub-entities. So, for example, my db-data-config.xml has the following structure: document .. entity name=foo pk=id processor=SqlEntityProcessor query=SELECT f.id AS foo_id, f.name AS foo_name FROM foo f field column=foo_id name=foo_id / field column=foo_name name=foo_name / entity name=bar processor=CachedSqlEntityProcessor query=SELECT b.name as bar_name FROMbar b WHEREb.id = '${foo.id}' field column=bar_name name=bar_name / /entity /entity .. /document where the database relationship foo:bar is 1:m. The issue is that when I import with processor=SqlEntityProcessor , everything works fine and the multi-valued field - bar_name has multiple values, while importing with processor=CachedSqlEntityProcessor does not even create the bar_name field in the index. I've deployed Solr 3.6 on Weblogic 11g, with the patch https://issues.apache.org/jira/browse/SOLR-3360 applied. Any help on this issue is appreciated. Thanks, ps -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-issue-DataImportHandler-with-CachedSqlEntityProcessor-not-importing-all-multi-valued-fields-tp3991449.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Elevation togehter with grouping
Hi, I am facing an identical problem. Does anyone have any pointers on this ? Regards, Tushar -- View this message in context: http://lucene.472066.n3.nabble.com/Elevation-togehter-with-grouping-tp3916981p3992925.html Sent from the Solr - User mailing list archive at Nabble.com.
WordDelimiterFilter removes ampersands
If a user writes a query Apples Oranges the word delimiter filter factory will change this into Apples Oranges Which isn't very useful for me as I'd prefer especially when the phrase is wrapped in quotes that the original is preserved. However I still want to be able to separate ApplesOranges into Apples Oranges so preserveOriginal isn't really useful. What I really would like to be able to do is tell WordDelimeterFilter to treat it like it's neither alpha nor numeric, however that doesn't mean that you remove it completely. Thanks for your help Stephen
Get all matching terms of an OR query
Hi, is there an easy way to get the matches of an OR query? If I'm searching for android OR google OR apple OR iphone OR -ipod, I'd like to know which of these terms document X contains. I've been using debugQuery and tried to extract the info from the explain information, unfortunately this is too slow and I'm having troubles with the stemming of the query. Using the highlight component doesn't work either because my fields aren't stored (would the highlighter work with stemmed texts?) We're using Solr 3.6 in a distributed setting. I'd like to prevent storing the texts because of space issues, but if that's the only reasonable solution... . Thank you, Michael
Re: WordDelimiterFilter removes ampersands
That's a perfectly reasonable request. But, WDF doesn't have such a feature. Maybe what is needed is a distinct ampersand filter that runs before WDF and detects ampersands that are likely shorthands for and and expands them. It would also need to be able to detect ATT (capital letter before the ) and not expand it (and you can set up a character type table for WDF that treats as a letter. A single could also be expanded to and - that could also be done with the synonym filter, but that would not help you with the embedded of ApplesOranges. Maybe a simple character filter that always expands to and would be good enough for a lot of common cases, as a rough approximation. Maybe solr.PatternReplaceCharFilterFactory could be used to accomplish that. Match and replace with and . -- Jack Krupansky -Original Message- From: Stephen Lacy Sent: Wednesday, July 04, 2012 8:16 AM To: solr-user@lucene.apache.org Subject: WordDelimiterFilter removes ampersands If a user writes a query Apples Oranges the word delimiter filter factory will change this into Apples Oranges Which isn't very useful for me as I'd prefer especially when the phrase is wrapped in quotes that the original is preserved. However I still want to be able to separate ApplesOranges into Apples Oranges so preserveOriginal isn't really useful. What I really would like to be able to do is tell WordDelimeterFilter to treat it like it's neither alpha nor numeric, however that doesn't mean that you remove it completely. Thanks for your help Stephen
Urgent:Partial Search not Working
All, I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial search on two fields. Keywords using to search are The value inside the search ProdSymbl is M1.6X0.35 9P and I illl have to get the results if I search for M1.6 or X0.35 (Partial of the search value). I have tried using both NGramTokenizerFactory and solr.EdgeNGramFilterFactory in the schema.xml !-- bigram -- !-- fieldType name=bigram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType -- fieldType name=bigram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer /fieldType Fields I have configured as field name=prodsymbl type=bigraml indexed=true stored=true multiValued=true/ field name=measure1 type=bigram indexed=true stored=true multiValued=true/ Copy field as copyField source=prodsymbl dest=text/ copyField source=measure1 dest=text/ Please let me know IF I and missing anything, this is kind of Urgent requirement needs to be addressed at the earliest, Please help. Thanks in advance, Jay
Boosting the whole documents
Hi, I have the following problem. I would like to give a boost to the whole documents as I index them. I am sending to solr xml in the form: adddoc boost=2.0/doc/add But it does't seem to alter the search scores in any way. I would expect that to multiply the final search score by two, am I correct? Probably I would need to alter schema.xml, but I found only information on how to do that for specific fields (just put omitNorms=false into the field tag). But what should I do, if I want to boost the whole document? Note: by boosting a whole document I mean, that if document A has search score 10.0 and document B has search score 15.0 and I give document A the boost 2.0, when I index it, I would expect its search score to be 20.0. Thanks in advance! Michal Danilak
Solr facet multiple constraint
Hi, I'm trying to make a facet search on a multiple value field and add a filter query on it and it doesn't work. Could you please help me find my mistake ? Here is my solr query : facet=true,sort=publishingdate desc,facet.mincount=1,q=service:1 AND publicationstatus:LIVE,facet.field={!ex=dt}user,wt=javabin,fq={!tag=dt}user:10,version=2 Thanks in advance for answers, David. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974.html Sent from the Solr - User mailing list archive at Nabble.com.
fl Parameter and Wildcards for Dynamic Fields
I'm using SOLR 3.3 and would like to know how to return a list of dynamic fields in my search results using a wildcard with the fl parameter. I found SOLR-2444 https://issues.apache.org/jira/browse/SOLR-2444 but this appears to be for SOLR 4.0. Am I correct in assuming this isn't doable yet? Please note that I don't want to query the dynamic fields, I just need them returned in the search results. Using fl=myDynamicField_* doesn't seem to work. Many Thanks! Josh
Re: leap second bug
explanation of the cause: https://lkml.org/lkml/2012/7/1/203 On Wed, Jul 4, 2012 at 1:48 AM, Óscar Marín Miró oscarmarinm...@gmail.comwrote: So, this was the solution, sorry to post it so late, just in case it helps anyone: /etc/init.d/ntp stop; date; date `date +%m%d%H%M%C%y.%S`; date; /etc/init.d/ntp start And tomcat magically switched from 100% CPU to 0.5% :) From: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY [from Michael McCandless help on this thread] On Sun, Jul 1, 2012 at 6:15 PM, Jack Krupansky j...@basetechnology.com wrote: Interesting: The sequence of dates of the UTC second markers will be: 2012 June 30, 23h 59m 59s 2012 June 30, 23h 59m 60s 2012 July 1, 0h 0m 0s See: http://wwp.greenwichmeantime.**com/info/leap-second.htm http://wwp.greenwichmeantime.com/info/leap-second.htm So, there were two consecutive second markers which were literally distinct, but numerically identical. What design pattern for timing did Linux violate? In other words, what lesson should we be learning to assure that we don't have a similar problem at an application level on a future leap second? -- Jack Krupansky -Original Message- From: Óscar Marín Miró Sent: Sunday, July 01, 2012 11:02 AM To: solr-user@lucene.apache.org Subject: Re: leap second bug Thanks Michael, nice information :) On Sun, Jul 1, 2012 at 5:29 PM, Michael McCandless luc...@mikemccandless.com wrote: Looks like this is a low-level Linux issue ... see Shay's email to the ElasticSearch list about it: https://groups.google.com/**forum/?fromgroups#!topic/** elasticsearch/_I1_OfaL7QY https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY Also see the comments here: http://news.ycombinator.com/**item?id=4182642 http://news.ycombinator.com/item?id=4182642 Mike McCandless http://blog.mikemccandless.com On Sun, Jul 1, 2012 at 8:08 AM, Óscar Marín Miró oscarmarinm...@gmail.com wrote: Hello Michael, thanks for the note :) I'm having a similar problem since yesterday, tomcats are wild on CPU [near 100%]. Did your solr servers did not reply to index/query requests? Thanks :) On Sun, Jul 1, 2012 at 1:22 PM, Michael Tsadikov mich...@myheritage.com wrote: Our solr servers went into GC hell, and became non-responsive on date change today. Restarting tomcats did not help. Rebooting the machine did. http://www.wired.com/**wiredenterprise/2012/07/leap-** second-bug-wreaks-havoc-with-**java-linux/ http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/ -- Whether it's science, technology, personal experience, true love, astrology, or gut feelings, each of us has confidence in something that we will never fully comprehend. --Roy H. William -- Whether it's science, technology, personal experience, true love, astrology, or gut feelings, each of us has confidence in something that we will never fully comprehend. --Roy H. William -- Whether it's science, technology, personal experience, true love, astrology, or gut feelings, each of us has confidence in something that we will never fully comprehend. --Roy H. William
Re: fl Parameter and Wildcards for Dynamic Fields
This appears to be the case. * is the only wildcard supported by fl before 4.0. -- Jack Krupansky -Original Message- From: Josh Harness Sent: Wednesday, July 04, 2012 9:08 AM To: solr-user@lucene.apache.org Subject: fl Parameter and Wildcards for Dynamic Fields I'm using SOLR 3.3 and would like to know how to return a list of dynamic fields in my search results using a wildcard with the fl parameter. I found SOLR-2444 https://issues.apache.org/jira/browse/SOLR-2444 but this appears to be for SOLR 4.0. Am I correct in assuming this isn't doable yet? Please note that I don't want to query the dynamic fields, I just need them returned in the search results. Using fl=myDynamicField_* doesn't seem to work. Many Thanks! Josh
Re: Get all matching terms of an OR query
First, OR -ipod needs to be written as OR (*:* -ipod) due to an ongoing deficiency in Lucene query parsing, but I wonder what you really think you are OR'ing in that clause - all documents that don't contain ipod? That seems odd. Maybe you really want to constrain the preceding query to exclude ipod? That would be: (android OR google OR apple OR iphone) -ipod -- Jack Krupansky -Original Message- From: Michael Jakl Sent: Wednesday, July 04, 2012 8:29 AM To: solr-user@lucene.apache.org Subject: Get all matching terms of an OR query Hi, is there an easy way to get the matches of an OR query? If I'm searching for android OR google OR apple OR iphone OR -ipod, I'd like to know which of these terms document X contains. I've been using debugQuery and tried to extract the info from the explain information, unfortunately this is too slow and I'm having troubles with the stemming of the query. Using the highlight component doesn't work either because my fields aren't stored (would the highlighter work with stemmed texts?) We're using Solr 3.6 in a distributed setting. I'd like to prevent storing the texts because of space issues, but if that's the only reasonable solution... . Thank you, Michael
Javadocs issue on Solr web site
Currently all Javadoc links seem to wind up pointing at the api-4_0_0-ALPHA versions - is that expected? E.g. do a Google search on StreamingUpdateSolrServer. First hit is for StreamingUpdateSolrServer (Solr 3.6.0 API) Follow that link, and you get a 404 for page http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html -- Ken -- Ken Krugler http://www.scaleunlimited.com custom big data solutions training Hadoop, Cascading, Mahout Solr
Re: Get all matching terms of an OR query
Hi! On 4 July 2012 17:01, Jack Krupansky j...@basetechnology.com wrote: First, OR -ipod needs to be written as OR (*:* -ipod) due to an ongoing deficiency in Lucene query parsing, but I wonder what you really think you are OR'ing in that clause - all documents that don't contain ipod? That seems odd. Maybe you really want to constrain the preceding query to exclude ipod? That would be: (android OR google OR apple OR iphone) -ipod Thanks, the example was ill-chosen, the -ipod part shouldn't be there. After some more tests and research, using the debugQuery method seems the only viable solution(?) Cheers, Michael
Re: Get all matching terms of an OR query
You could always do a custom search component, but all the same information (which terms matched) is in the debugQuery. For example, queryWeight(text:the) indicates that the appears in the document. What exactly is it that is too slow? Yes, you do have to accept that explain uses analyzed terms. I would note that you could try to correlate the parsedquery with the original query since the parsed query will contain stemmed terms. It would be nice to have an optional search component or query parser option that returned the analyzed term for each query term. But as things stand, I would suggest that you do your own fuzzy match between the debugQuery terms and your source terms. That may not be 100% accurate, but probably would cover most/many cases. -- Jack Krupansky -Original Message- From: Michael Jakl Sent: Wednesday, July 04, 2012 10:09 AM To: solr-user@lucene.apache.org Subject: Re: Get all matching terms of an OR query Hi! On 4 July 2012 17:01, Jack Krupansky j...@basetechnology.com wrote: First, OR -ipod needs to be written as OR (*:* -ipod) due to an ongoing deficiency in Lucene query parsing, but I wonder what you really think you are OR'ing in that clause - all documents that don't contain ipod? That seems odd. Maybe you really want to constrain the preceding query to exclude ipod? That would be: (android OR google OR apple OR iphone) -ipod Thanks, the example was ill-chosen, the -ipod part shouldn't be there. After some more tests and research, using the debugQuery method seems the only viable solution(?) Cheers, Michael
Re: WordDelimiterFilter removes ampersands
solr.**PatternReplaceCharFilterFactor**y is a brilliant idea, thanks so much :) On Wed, Jul 4, 2012 at 2:46 PM, Jack Krupansky j...@basetechnology.comwrote: That's a perfectly reasonable request. But, WDF doesn't have such a feature. Maybe what is needed is a distinct ampersand filter that runs before WDF and detects ampersands that are likely shorthands for and and expands them. It would also need to be able to detect ATT (capital letter before the ) and not expand it (and you can set up a character type table for WDF that treats as a letter. A single could also be expanded to and - that could also be done with the synonym filter, but that would not help you with the embedded of ApplesOranges. Maybe a simple character filter that always expands to and would be good enough for a lot of common cases, as a rough approximation. Maybe solr.**PatternReplaceCharFilterFactor**y could be used to accomplish that. Match and replace with and . -- Jack Krupansky -Original Message- From: Stephen Lacy Sent: Wednesday, July 04, 2012 8:16 AM To: solr-user@lucene.apache.org Subject: WordDelimiterFilter removes ampersands If a user writes a query Apples Oranges the word delimiter filter factory will change this into Apples Oranges Which isn't very useful for me as I'd prefer especially when the phrase is wrapped in quotes that the original is preserved. However I still want to be able to separate ApplesOranges into Apples Oranges so preserveOriginal isn't really useful. What I really would like to be able to do is tell WordDelimeterFilter to treat it like it's neither alpha nor numeric, however that doesn't mean that you remove it completely. Thanks for your help Stephen
Boosting the score of the whole documents
Hi guys, I have the following problem. I would like to give a boost to the whole documents as I index them. I am sending to solr the xml in the form: adddoc boost=2.0/doc/add But it does't seem to alter the search scores in any way. I would expect that to multiply the final search score by two, am I correct? Probably I would need to alter schema.xml, but I found only information on how to do that for specific fields (just put omitNorms=false into the field tag). But what should I do, if I want to boost the whole document? Note: by boosting a whole document I mean, that if document A has search score 10.0 and document B has search score 15.0 and I give document A the boost 2.0, when I index it, I would expect its search score to be 20.0. Thanks in advance!
Re: How to change tmp directory
Solr is probably simply using Java's temp directory, which you can redefine by setting the java.io.tmpdir system property on the java command line or using a system-specific environment variable. -- Jack Krupansky -Original Message- From: Erik Fäßler Sent: Wednesday, July 04, 2012 3:56 AM To: solr-user@lucene.apache.org Subject: How to change tmp directory Hello all, I came about an odd issue today when I wanted to add ca. 7M documents to my Solr index: I got a SolrServerException telling me No space left on device. I had a look at the directory Solr (and its index) is installed in and there is plenty space (~300GB). I then noticed a file named upload_457ee97b_1385125274b__8000_0005.tmp had taken up all space of the machine's /tmp directory. The partition holding the /tmp directory only has around 1GB of space and this file already took nearly 800MB. I had a look at it and I realized that the file contained the data I was adding to Solr in an XML format. Is there a possibility to change the temporary directory for this action? I use an IteratorSolrInputDocument with the HttpSolrServer's add(Iterator) method for performance. So I can't just do commits from time to time. Best regards, Erik
Re: difference between stored=false and stored=true ?
1. The useless combination of stored=false and indexed=false is useful to ignore fields. You might have input data which has fields that you have decided to ignore. 2. Stored fields take up memory for documents (fields) to be returned for search results in the Solr query response, so fewer stored fields is better for performance and memory usage. -- Jack Krupansky -Original Message- From: Amit Nithian Sent: Wednesday, July 04, 2012 12:54 AM To: solr-user@lucene.apache.org Subject: Re: difference between stored=false and stored=true ? So couple questions on this (comment first then question): 1) I guess you can't have four combinations b/c index=false/stored=false has no meaning? 2) If you set less fields stored=true does this reduce the memory footprint for the document cache? Or better yet, I can store more documents in the cache possibly increasing my cache efficiency? I read about the lazy loading of fields which seems like a good way to maximize the cache and gain the advantage of storing data in Solr too. Thanks Amit On Sat, Jun 30, 2012 at 11:01 AM, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Thank you François and Jack for those explainations. Cheers, GGhh 2012/6/30 François Schiettecatte: Giovanni stored=true means the data is stored in the index and [...] 2012/6/30 Jack Krupansky: indexed and stored are independent [...]
Re: Synonyms and hyphens
Terms with embedded special characters are treated as phrases with spaces in place of the special characters. So, gb-mb is treated as if you had enclosed the term in quotes. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 6:50 AM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Hi, Does anybody know why hyphen '-' and q.op=AND causes such a big difference between the two queries? I thought hyphens are removed by StandardTokenizer which means theoretically the two queries should be the same! Thanks On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote: Hi, I'm not sure if anybody has experienced this behavior before or not. I noticed that 'hyphen' plays a very important role here. I used Solr's default example directory. http://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+name:gb +name:gib +name:gigabyte +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes, While searching http://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+(name:gb name:gib name:gigabyte name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes), If you notice to the first query - with hyphens - you can see that the results of parsing is totally different. I know that hyphens are special characters in Solr, but there's no way that the first query returns any entry because it's asking for ALL synonyms. Am I missing something here? Thanks -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Synonyms and hyphens
Wow, I didn't know that. Is there a way to disable this feature? I mean, is it something coming from the Analyzer? On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.comwrote: Terms with embedded special characters are treated as phrases with spaces in place of the special characters. So, gb-mb is treated as if you had enclosed the term in quotes. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 6:50 AM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Hi, Does anybody know why hyphen '-' and q.op=AND causes such a big difference between the two queries? I thought hyphens are removed by StandardTokenizer which means theoretically the two queries should be the same! Thanks On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com* *wrote: Hi, I'm not sure if anybody has experienced this behavior before or not. I noticed that 'hyphen' plays a very important role here. I used Solr's default example directory. http://localhost:8983/solr/**select/?q=name:(gb-mb)** version=2.2start=0rows=10**indent=ondebugQuery=on** indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+name:gb +name:gib +name:gigabyte +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes, While searching http://localhost:8984/solr/** select/?q=name:(gbmb)version=**2.2start=0rows=10indent=on** debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+(name:gb name:gib name:gigabyte name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes), If you notice to the first query - with hyphens - you can see that the results of parsing is totally different. I know that hyphens are special characters in Solr, but there's no way that the first query returns any entry because it's asking for ALL synonyms. Am I missing something here? Thanks -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Boosting the score of the whole documents
Make sure to review the similarity javadoc page to understand what any of these factors does to the document score. See: http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html Sure, a document boost applies a multiplicative factor, but that is all relative to all of the other factors for that document and query. In other words, all other things being equal, a doc-boost of 2.0 would double the score, but all other things are usually not equal. Try different doc-boost values and see how the score is affected. The document may have such a low score that a boost of 2.0 doesn't move the needle relative to other documents. I believe that the doc-boost is included within the fieldNorm value that is shown in the explain section if you add debugQuery=true to your query request. This is explained under norm in the similarity javadoc. I did try a couple of examples with the Solr 3.6 example, such as doc boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0 to move a document up. -- Jack Krupansky -Original Message- From: Danilak Michal Sent: Wednesday, July 04, 2012 10:57 AM To: solr-user@lucene.apache.org Subject: Boosting the score of the whole documents Hi guys, I have the following problem. I would like to give a boost to the whole documents as I index them. I am sending to solr the xml in the form: adddoc boost=2.0/doc/add But it does't seem to alter the search scores in any way. I would expect that to multiply the final search score by two, am I correct? Probably I would need to alter schema.xml, but I found only information on how to do that for specific fields (just put omitNorms=false into the field tag). But what should I do, if I want to boost the whole document? Note: by boosting a whole document I mean, that if document A has search score 10.0 and document B has search score 15.0 and I give document A the boost 2.0, when I index it, I would expect its search score to be 20.0. Thanks in advance!
Debugging jetty IllegalStateException errors?
Greetings, I'm wondering if anybody has experienced (and found root cause) for errors like this. We're running Solr 3.6.0 with latest stable Jetty 7 (7.6.4.v20120524). I know this is likely due to a client (or the server) terminating the connection unexpectedly, but we see these fairly frequently and can't determine what the impact is or why they are happening (who is closing early, why?) Any tips/tricks on troubleshooting or what to do to possibly minimize or help prevent these from happening (we are using a fairly old python client to programmatically access this solr instance). ---snip--- 17:25:13,250 [qtp581536050-12] WARN jetty.server.Response null - Committed before 500 null org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:952) at org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:438) at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:94) at org.eclipse.jetty.server.AbstractHttpConnection$Output.flush(AbstractHttpConnection.java:1016) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1332) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:77) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:247) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1332) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:348) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:452) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:894) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:948) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:851) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:77) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:620) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:46) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:603) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:538) at java.lang.Thread.run(Thread.java:662) Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:137) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:359) at java.nio.channels.SocketChannel.write(SocketChannel.java:360) at org.eclipse.jetty.io.nio.ChannelEndPoint.gatheringFlush(ChannelEndPoint.java:371) at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:330) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:330) at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:876) ... 37 more 17:25:13,250 [qtp581536050-12] WARN jetty.servlet.ServletHandler null - /solr/artists/select java.lang.IllegalStateException: Committed at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1087) at
Re: Javadocs issue on Solr web site
: Currently all Javadoc links seem to wind up pointing at the api-4_0_0-ALPHA versions - is that expected? yes. /solr/api has always pointed at the javadocs for the most recent release of solr. All that's changed now is that we host multiple copies of hte javadocs (just like Lucene-Core has for a long time) and the canonical URLs make it clear which version you are looking at. there's an open Jira to make a landing page listing all the versions that i'm going to try to get to later today, but you can still find the 3.6 javadocs here... http://lucene.apache.org/solr/api-3_6_0/ : E.g. do a Google search on StreamingUpdateSolrServer. First hit is for StreamingUpdateSolrServer (Solr 3.6.0 API) : : Follow that link, and you get a 404 for page : http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html that's to be expected: 1) google hasn't recrawled yet so it doesn't know about the new versions in general 2) that class was removed in 4.0 -Hoss
Re: Synonyms and hyphens
There is one other detail that should clarify the situation. At query time, the query parser itself is breaking your query into space-delimited terms, and only calling the analyzer for each of those terms, each of which will be treated as if a quoted phrase. So it doesn't matter whether it is the standard analyzer or word delimiter filter or other filter that is breaking up the compound term. And the default query operator only applies to the terms as the query parser parsed them, not for the sub-terms of a compound term like CD-ROM or gb-mb. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 12:05 PM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Wow, I didn't know that. Is there a way to disable this feature? I mean, is it something coming from the Analyzer? On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.comwrote: Terms with embedded special characters are treated as phrases with spaces in place of the special characters. So, gb-mb is treated as if you had enclosed the term in quotes. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 6:50 AM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Hi, Does anybody know why hyphen '-' and q.op=AND causes such a big difference between the two queries? I thought hyphens are removed by StandardTokenizer which means theoretically the two queries should be the same! Thanks On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com* *wrote: Hi, I'm not sure if anybody has experienced this behavior before or not. I noticed that 'hyphen' plays a very important role here. I used Solr's default example directory. http://localhost:8983/solr/**select/?q=name:(gb-mb)** version=2.2start=0rows=10**indent=ondebugQuery=on** indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+name:gb +name:gib +name:gigabyte +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes, While searching http://localhost:8984/solr/** select/?q=name:(gbmb)version=**2.2start=0rows=10indent=on** debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+(name:gb name:gib name:gigabyte name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes), If you notice to the first query - with hyphens - you can see that the results of parsing is totally different. I know that hyphens are special characters in Solr, but there's no way that the first query returns any entry because it's asking for ALL synonyms. Am I missing something here? Thanks -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Synonyms and hyphens
ok, so how can I prevent this behavior to happen? As you can see the parsed query is very different in these two cases. On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky j...@basetechnology.comwrote: There is one other detail that should clarify the situation. At query time, the query parser itself is breaking your query into space-delimited terms, and only calling the analyzer for each of those terms, each of which will be treated as if a quoted phrase. So it doesn't matter whether it is the standard analyzer or word delimiter filter or other filter that is breaking up the compound term. And the default query operator only applies to the terms as the query parser parsed them, not for the sub-terms of a compound term like CD-ROM or gb-mb. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 12:05 PM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Wow, I didn't know that. Is there a way to disable this feature? I mean, is it something coming from the Analyzer? On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.com* *wrote: Terms with embedded special characters are treated as phrases with spaces in place of the special characters. So, gb-mb is treated as if you had enclosed the term in quotes. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 6:50 AM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Hi, Does anybody know why hyphen '-' and q.op=AND causes such a big difference between the two queries? I thought hyphens are removed by StandardTokenizer which means theoretically the two queries should be the same! Thanks On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com * *wrote: Hi, I'm not sure if anybody has experienced this behavior before or not. I noticed that 'hyphen' plays a very important role here. I used Solr's default example directory. http://localhost:8983/solr/select/?q=name:(gb-mb)**http://localhost:8983/solr/**select/?q=name:(gb-mb)** version=2.2start=0rows=10indent=ondebugQuery=on** indent=onwt=jsonq.op=ANDhtt**p://localhost:8983/solr/** select/?q=name:(gb-mb)**version=2.2start=0rows=10** indent=ondebugQuery=on**indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+name:gb +name:gib +name:gigabyte +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes, While searching http://localhost:8984/solr/** select/?q=name:(gbmb)version=2.2start=0rows=10indent=**on** debugQuery=onindent=onwt=jsonq.op=ANDhttp://** localhost:8984/solr/select/?q=**name:(gbmb)version=2.2start=** 0rows=10indent=on**debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+(name:gb name:gib name:gigabyte name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes), If you notice to the first query - with hyphens - you can see that the results of parsing is totally different. I know that hyphens are special characters in Solr, but there's no way that the first query returns any entry because it's asking for ALL synonyms. Am I missing something here? Thanks -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Boosting the score of the whole documents
Should there be made any modification into scheme.xml file? For example, to enable field boosts, one has to set omitNorms to false. Is there some similar field for document boosts? On Wed, Jul 4, 2012 at 7:29 PM, Jack Krupansky j...@basetechnology.comwrote: Make sure to review the similarity javadoc page to understand what any of these factors does to the document score. See: http://lucene.apache.org/core/**3_6_0/api/all/org/apache/** lucene/search/Similarity.htmlhttp://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html Sure, a document boost applies a multiplicative factor, but that is all relative to all of the other factors for that document and query. In other words, all other things being equal, a doc-boost of 2.0 would double the score, but all other things are usually not equal. Try different doc-boost values and see how the score is affected. The document may have such a low score that a boost of 2.0 doesn't move the needle relative to other documents. I believe that the doc-boost is included within the fieldNorm value that is shown in the explain section if you add debugQuery=true to your query request. This is explained under norm in the similarity javadoc. I did try a couple of examples with the Solr 3.6 example, such as doc boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0 to move a document up. -- Jack Krupansky -Original Message- From: Danilak Michal Sent: Wednesday, July 04, 2012 10:57 AM To: solr-user@lucene.apache.org Subject: Boosting the score of the whole documents Hi guys, I have the following problem. I would like to give a boost to the whole documents as I index them. I am sending to solr the xml in the form: adddoc boost=2.0/doc/add But it does't seem to alter the search scores in any way. I would expect that to multiply the final search score by two, am I correct? Probably I would need to alter schema.xml, but I found only information on how to do that for specific fields (just put omitNorms=false into the field tag). But what should I do, if I want to boost the whole document? Note: by boosting a whole document I mean, that if document A has search score 10.0 and document B has search score 15.0 and I give document A the boost 2.0, when I index it, I would expect its search score to be 20.0. Thanks in advance!
Re: Urgent:Partial Search not Working
Could anyone please reply the solution to this On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.comwrote: All, I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial search on two fields. Keywords using to search are The value inside the search ProdSymbl is M1.6X0.35 9P and I willl have to get the results if I search for M1.6 or X0.35 (Partial of the search value). I have tried using both NGramTokenizerFactory and solr.EdgeNGramFilterFactory in the schema.xml !-- bigram -- !-- fieldType name=bigram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType -- fieldType name=bigram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer /fieldType Fields I have configured as field name=prodsymbl type=bigraml indexed=true stored=true multiValued=true/ field name=measure1 type=bigram indexed=true stored=true multiValued=true/ Copy field as copyField source=prodsymbl dest=text/ copyField source=measure1 dest=text/ Please let me know IF I and missing anything, this is kind of Urgent requirement needs to be addressed at the earliest, Please help. Thanks in advance, Jay
Re: Something like 'bf' or 'bq' with MoreLikeThis
No worries! What version of Solr are you using? One that you downloaded as a tarball or one that you checked out from SVN (trunk)? I'll take a bit of time and document steps and respond. I'll review the patch to see that it fits a general case. Question for you with MLT, are your users doing a blank search (no text) for something or are you returning results More Like results that were generated as a result of a user typing some text query. I may have built this patch assuming a blank query but I can make it work (or try to) make it work for text based queries. Thanks Amit On Wed, Jul 4, 2012 at 1:37 AM, nanshi nanshi.e...@gmail.com wrote: Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you please shed me some light on how to use a patch? point me to a wiki/doc is fine too. Thanks a lot! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Urgent:Partial Search not Working
You need to apply the edge n-gram filter only at index time, not at query time. So, you need to specify two analyzers for these field types, an index and a query analyzer. They should be roughly the same, but the query analyzer would not have the edge n-gram filter since you are accepting the single n-gram given by the user and then matching it against the full list of n-grams that are in the index. It is unfortunate that the wiki example is misleading. Just as bad, we don't have an example in the example schema. Basically, take a text field type that you like from the Solr example schema and then add the edge n-gram filter to its index analyzer, probably as the last token filter. I would note that the edge n-gram filter will interact with the stemming filter, but there is not much you can do other than try different stemmers and experiment with whether stemming should be before or after the edge n-gram filter. I suspect that having stemming after edge n-gram may be better. -- Jack Krupansky -Original Message- From: jayakeerthi s Sent: Wednesday, July 04, 2012 1:41 PM To: solr-user@lucene.apache.org ; solr-user-h...@lucene.apache.org Subject: Re: Urgent:Partial Search not Working Could anyone please reply the solution to this On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.comwrote: All, I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial search on two fields. Keywords using to search are The value inside the search ProdSymbl is M1.6X0.35 9P and I willl have to get the results if I search for M1.6 or X0.35 (Partial of the search value). I have tried using both NGramTokenizerFactory and solr.EdgeNGramFilterFactory in the schema.xml !-- bigram -- !-- fieldType name=bigram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType -- fieldType name=bigram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer /fieldType Fields I have configured as field name=prodsymbl type=bigraml indexed=true stored=true multiValued=true/ field name=measure1 type=bigram indexed=true stored=true multiValued=true/ Copy field as copyField source=prodsymbl dest=text/ copyField source=measure1 dest=text/ Please let me know IF I and missing anything, this is kind of Urgent requirement needs to be addressed at the earliest, Please help. Thanks in advance, Jay
Re: Use of Solr as primary store for search engine
Paul, Thanks for your response! Were you using the SQL database as an object store to pull XWiki objects or did you have to execute several queries to reconstruct these objects? I don't know much about them sorry.. Also for those responding, can you provide a few basic metrics for me? 1) Number of nodes receiving queries 2) Approximate queries per second 3) Approximate latency per query I know some of this may be sensitive depending on where you work so reasonable ranges would be nice (i.e. sub-second isn't hugely helpful since 50,100,200 ms have huge impacts depending on your site). Thanks again! Amit On Wed, Jul 4, 2012 at 1:09 AM, Paul Libbrecht p...@hoplahup.net wrote: Amit, not exactly a response to your question but doing this with a lucene index on i2geo.net has resulted in considerably performance boost (reading from stored-fields instead of reading from the xwiki objects which pull from the SQL database). However, it implied that we had to rewrite anything necessary for the rendering, hence the rendering has not re-used that many code. Paul Le 4 juil. 2012 à 09:54, Amit Nithian a écrit : Hello all, I am curious to know how people are using Solr in conjunction with other data stores when building search engines to power web sites (say an ecommerce site). The question I have for the group is given an architecture where the primary (transactional) data store is MySQL (Oracle, PostGres whatever) with periodic indexing into Solr, when your front end issues a search query to Solr and returns results, are there any joins with your primary Oracle/MySQL etc to help render results? Basically I guess my question is whether or not you store enough in Solr so that when your front end renders the results page, it never has to hit the database. The other option is that your search engine only returns primary keys that your front end then uses to hit the DB to fetch data to display to your end user. With Solr 4.0 and Solr moving towards the NoSQL direction, I am curious what people are doing and what application architectures with Solr look like. Thanks! Amit
Re: Synonyms and hyphens
You could pre-process your queries to convert hyphen and other special characters to spaces. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 12:56 PM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens ok, so how can I prevent this behavior to happen? As you can see the parsed query is very different in these two cases. On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky j...@basetechnology.comwrote: There is one other detail that should clarify the situation. At query time, the query parser itself is breaking your query into space-delimited terms, and only calling the analyzer for each of those terms, each of which will be treated as if a quoted phrase. So it doesn't matter whether it is the standard analyzer or word delimiter filter or other filter that is breaking up the compound term. And the default query operator only applies to the terms as the query parser parsed them, not for the sub-terms of a compound term like CD-ROM or gb-mb. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 12:05 PM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Wow, I didn't know that. Is there a way to disable this feature? I mean, is it something coming from the Analyzer? On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.com* *wrote: Terms with embedded special characters are treated as phrases with spaces in place of the special characters. So, gb-mb is treated as if you had enclosed the term in quotes. -- Jack Krupansky -Original Message- From: Alireza Salimi Sent: Wednesday, July 04, 2012 6:50 AM To: solr-user@lucene.apache.org Subject: Re: Synonyms and hyphens Hi, Does anybody know why hyphen '-' and q.op=AND causes such a big difference between the two queries? I thought hyphens are removed by StandardTokenizer which means theoretically the two queries should be the same! Thanks On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com * *wrote: Hi, I'm not sure if anybody has experienced this behavior before or not. I noticed that 'hyphen' plays a very important role here. I used Solr's default example directory. http://localhost:8983/solr/select/?q=name:(gb-mb)**http://localhost:8983/solr/**select/?q=name:(gb-mb)** version=2.2start=0rows=10indent=ondebugQuery=on** indent=onwt=jsonq.op=ANDhtt**p://localhost:8983/solr/** select/?q=name:(gb-mb)**version=2.2start=0rows=10** indent=ondebugQuery=on**indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+name:gb +name:gib +name:gigabyte +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes, While searching http://localhost:8984/solr/** select/?q=name:(gbmb)version=2.2start=0rows=10indent=**on** debugQuery=onindent=onwt=jsonq.op=ANDhttp://** localhost:8984/solr/select/?q=**name:(gbmb)version=2.2start=** 0rows=10indent=on**debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND results in parsedquery:+(name:gb name:gib name:gigabyte name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes), If you notice to the first query - with hyphens - you can see that the results of parsing is totally different. I know that hyphens are special characters in Solr, but there's no way that the first query returns any entry because it's asking for ALL synonyms. Am I missing something here? Thanks -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Use of Solr as primary store for search engine
Le 4 juil. 2012 à 21:17, Amit Nithian a écrit : Thanks for your response! Were you using the SQL database as an object store to pull XWiki objects or did you have to execute several queries to reconstruct these objects? The first. It's all fairly transparent. There are XWiki Classes and XWiki objects which are rendered, they live as composite of the XWiki-java-objects which hibernate-persisted. I don't know much about them sorry.. Also for those responding, can you provide a few basic metrics for me? 1) Number of nodes receiving queries 2) Approximate queries per second 3) Approximate latency per query I admire those that have this at hand. I know some of this may be sensitive depending on where you work so reasonable ranges would be nice (i.e. sub-second isn't hugely helpful since 50,100,200 ms have huge impacts depending on your site). I think caching comes into play here in a very strong manner, so these measures are fairly difficult to establish. One Solr I run, in particular, makes differences between 100ms (uncached queries) and 9 ms (cached query). Paul
Re: Urgent:Partial Search not Working
Hi Jack, Many thanks for your reply... yes i have tried both ngram and Edgegram filterfactory, still no result. Please le t me know any alternatives On Thu, Jul 5, 2012 at 12:42 AM, Jack Krupansky j...@basetechnology.comwrote: You need to apply the edge n-gram filter only at index time, not at query time. So, you need to specify two analyzers for these field types, an index and a query analyzer. They should be roughly the same, but the query analyzer would not have the edge n-gram filter since you are accepting the single n-gram given by the user and then matching it against the full list of n-grams that are in the index. It is unfortunate that the wiki example is misleading. Just as bad, we don't have an example in the example schema. Basically, take a text field type that you like from the Solr example schema and then add the edge n-gram filter to its index analyzer, probably as the last token filter. I would note that the edge n-gram filter will interact with the stemming filter, but there is not much you can do other than try different stemmers and experiment with whether stemming should be before or after the edge n-gram filter. I suspect that having stemming after edge n-gram may be better. -- Jack Krupansky -Original Message- From: jayakeerthi s Sent: Wednesday, July 04, 2012 1:41 PM To: solr-user@lucene.apache.org ; solr-user-help@lucene.apache.**orgsolr-user-h...@lucene.apache.org Subject: Re: Urgent:Partial Search not Working Could anyone please reply the solution to this On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.com wrote: All, I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial search on two fields. Keywords using to search are The value inside the search ProdSymbl is M1.6X0.35 9P and I willl have to get the results if I search for M1.6 or X0.35 (Partial of the search value). I have tried using both NGramTokenizerFactory and solr.EdgeNGramFilterFactory in the schema.xml !-- bigram -- !-- fieldType name=bigram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.**LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.**WhitespaceTokenizerFactory / filter class=solr.**LowerCaseFilterFactory/ /analyzer /fieldType -- fieldType name=bigram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.**StandardTokenizerFactory/ filter class=solr.**StandardFilterFactory/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer /fieldType Fields I have configured as field name=prodsymbl type=bigraml indexed=true stored=true multiValued=true/ field name=measure1 type=bigram indexed=true stored=true multiValued=true/ Copy field as copyField source=prodsymbl dest=text/ copyField source=measure1 dest=text/ Please let me know IF I and missing anything, this is kind of Urgent requirement needs to be addressed at the earliest, Please help. Thanks in advance, Jay
Re: Urgent:Partial Search not Working
Don't forget to test your field type analyzers on the Solr Admin analysis page. It will show you exactly how terms gets analyzed at both index and query time. If something is not working, be specific as to what the case is and exactly what is not as you would expect, both the expected value and the actual value. -- Jack Krupansky -Original Message- From: jayakeerthi s Sent: Wednesday, July 04, 2012 3:44 PM To: solr-user@lucene.apache.org Subject: Re: Urgent:Partial Search not Working Hi Jack, Many thanks for your reply... yes i have tried both ngram and Edgegram filterfactory, still no result. Please le t me know any alternatives On Thu, Jul 5, 2012 at 12:42 AM, Jack Krupansky j...@basetechnology.comwrote: You need to apply the edge n-gram filter only at index time, not at query time. So, you need to specify two analyzers for these field types, an index and a query analyzer. They should be roughly the same, but the query analyzer would not have the edge n-gram filter since you are accepting the single n-gram given by the user and then matching it against the full list of n-grams that are in the index. It is unfortunate that the wiki example is misleading. Just as bad, we don't have an example in the example schema. Basically, take a text field type that you like from the Solr example schema and then add the edge n-gram filter to its index analyzer, probably as the last token filter. I would note that the edge n-gram filter will interact with the stemming filter, but there is not much you can do other than try different stemmers and experiment with whether stemming should be before or after the edge n-gram filter. I suspect that having stemming after edge n-gram may be better. -- Jack Krupansky -Original Message- From: jayakeerthi s Sent: Wednesday, July 04, 2012 1:41 PM To: solr-user@lucene.apache.org ; solr-user-help@lucene.apache.**orgsolr-user-h...@lucene.apache.org Subject: Re: Urgent:Partial Search not Working Could anyone please reply the solution to this On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.com wrote: All, I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial search on two fields. Keywords using to search are The value inside the search ProdSymbl is M1.6X0.35 9P and I willl have to get the results if I search for M1.6 or X0.35 (Partial of the search value). I have tried using both NGramTokenizerFactory and solr.EdgeNGramFilterFactory in the schema.xml !-- bigram -- !-- fieldType name=bigram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.**LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.**WhitespaceTokenizerFactory / filter class=solr.**LowerCaseFilterFactory/ /analyzer /fieldType -- fieldType name=bigram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.**StandardTokenizerFactory/ filter class=solr.**StandardFilterFactory/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer /fieldType Fields I have configured as field name=prodsymbl type=bigraml indexed=true stored=true multiValued=true/ field name=measure1 type=bigram indexed=true stored=true multiValued=true/ Copy field as copyField source=prodsymbl dest=text/ copyField source=measure1 dest=text/ Please let me know IF I and missing anything, this is kind of Urgent requirement needs to be addressed at the earliest, Please help. Thanks in advance, Jay
Re: Boosting the score of the whole documents
I'm not completely sure. I wouldn't expect that document boost should require field norms, but glancing at the code, it seems that having omitNorms=true does mean that the score for a field will not get the document boost, and in fact such a field gets a constant score. In other words, that the score for any field within the document will only get the document boost if that field does not have omitNorms=true. But as long as at least one field has norms, the document score should get some boost from document boost. I am not sure if this is the way the code is supposed to work, or whether it just happens to be this way. I would hope that some committer with detailed knowledge of norms and similarity weigh in on this matter. -- Jack Krupansky -Original Message- From: Danilak Michal Sent: Wednesday, July 04, 2012 1:11 PM To: solr-user@lucene.apache.org Subject: Re: Boosting the score of the whole documents Should there be made any modification into scheme.xml file? For example, to enable field boosts, one has to set omitNorms to false. Is there some similar field for document boosts? On Wed, Jul 4, 2012 at 7:29 PM, Jack Krupansky j...@basetechnology.comwrote: Make sure to review the similarity javadoc page to understand what any of these factors does to the document score. See: http://lucene.apache.org/core/**3_6_0/api/all/org/apache/** lucene/search/Similarity.htmlhttp://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html Sure, a document boost applies a multiplicative factor, but that is all relative to all of the other factors for that document and query. In other words, all other things being equal, a doc-boost of 2.0 would double the score, but all other things are usually not equal. Try different doc-boost values and see how the score is affected. The document may have such a low score that a boost of 2.0 doesn't move the needle relative to other documents. I believe that the doc-boost is included within the fieldNorm value that is shown in the explain section if you add debugQuery=true to your query request. This is explained under norm in the similarity javadoc. I did try a couple of examples with the Solr 3.6 example, such as doc boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0 to move a document up. -- Jack Krupansky -Original Message- From: Danilak Michal Sent: Wednesday, July 04, 2012 10:57 AM To: solr-user@lucene.apache.org Subject: Boosting the score of the whole documents Hi guys, I have the following problem. I would like to give a boost to the whole documents as I index them. I am sending to solr the xml in the form: adddoc boost=2.0/doc/add But it does't seem to alter the search scores in any way. I would expect that to multiply the final search score by two, am I correct? Probably I would need to alter schema.xml, but I found only information on how to do that for specific fields (just put omitNorms=false into the field tag). But what should I do, if I want to boost the whole document? Note: by boosting a whole document I mean, that if document A has search score 10.0 and document B has search score 15.0 and I give document A the boost 2.0, when I index it, I would expect its search score to be 20.0. Thanks in advance!
Re: Something like 'bf' or 'bq' with MoreLikeThis
Amit, I am using Solr3.6 and directly imported apache-solr-3.6.0.war into Eclipse (Indigo). I will need to directly invoke a MoreLikeThis(/mlt) call using a unique id to get MoreLikeThis results. The hard part is that I need to use a float number field (that i cannot use mlt.fl or mlt.fq since it's not a string) in the matched document of the MLT response to find MLT results - this is purely for relevance improvement. I found a work around that I can use a standard query parameter fq=Rating:[1.5 TO 2.5]; however, for the run time queries, i have to extract the rating number from the matched doc(/mlt?q=id:12345) that i dont know how to extract this at run time If the matched rating is 2, for instance, then i can construct [1.5 TO 2.5] to say that 2 is more like a value within the range from 1.5 to 2.5So, the same thing i will encounter if i use a bf parameter to calculate distance, i will still need to get the Rating value out of the matched document. -- View this message in context: http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3993079.html Sent from the Solr - User mailing list archive at Nabble.com.
Internal Error 500 - How to diagnose?
Hi, Sorry for this post, but im having a hard time getting my head around this. I installed Solr on Tomcat and it seems to work fine. I get the solr admin page and the it works page from tomcat. When I try to query my solr server I get this message: *Internal Server Error The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.* I had this working before but I have changed almost everything since so I dont know where to start diagnosing this. Can anyone give me a bit of input on where I should go next? Is there a log file that will give more information? Really quite confused and stuck! Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Internal Error 500 - How to diagnose?
Check your /var/log/tomcat*/. It logs to a catalina.out file unless you modified log4j.properties. -Original message- From:Spadez james_will...@hotmail.com Sent: Thu 05-Jul-2012 00:36 To: solr-user@lucene.apache.org Subject: Internal Error 500 - How to diagnose? Hi, Sorry for this post, but im having a hard time getting my head around this. I installed Solr on Tomcat and it seems to work fine. I get the solr admin page and the it works page from tomcat. When I try to query my solr server I get this message: *Internal Server Error The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.* I had this working before but I have changed almost everything since so I dont know where to start diagnosing this. Can anyone give me a bit of input on where I should go next? Is there a log file that will give more information? Really quite confused and stuck! Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Internal Error 500 - How to diagnose?
Thank you, the query seems to have got through, thats good i guess? *Jul 4, 2012 6:32:34 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={facet=truefacet.query={!key%3Danytime}date:[*+TO+*]facet.query={!key%3D1day}date:[NOW/DAY-1DAY+TO+NOW/DAY]facet.query={!key%3D3days}date:[NOW/DAY-3DAYS+TO+NOW/DAY]facet.query={!key%3D1week}date:[NOW/DAY-7DAYS+TO+NOW/DAY]facet.query={!key%3D1month}date:[NOW/DAY-1MONTH+TO+NOW/DAY]facet.query={!geofilt+d%3D10+key%3D10kms}facet.query={!geofilt+d%3D30+key%3D30kms}facet.query={!geofilt+d%3D50+key%3D50kms}facet.query={!geofilt+d%3D100+key%3D100kms}start=0q=(title:(test))+OR+(description:(test))+OR+(company:(test))+OR+(location_name:(test))sfield=latlngpt=51.27241,0.190898wt=pythonfq={!geofilt+d%3D10}rows=10} hits=0 status=0 QTime=3 * -- View this message in context: http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087p3993089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Internal Error 500 - How to diagnose?
Eclipse and IntelliJ have remote debugging for tomcat. Sometime it is the only way. On Wed, Jul 4, 2012 at 3:48 PM, Spadez james_will...@hotmail.com wrote: Thank you, the query seems to have got through, thats good i guess? *Jul 4, 2012 6:32:34 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={facet=truefacet.query={!key%3Danytime}date:[*+TO+*]facet.query={!key%3D1day}date:[NOW/DAY-1DAY+TO+NOW/DAY]facet.query={!key%3D3days}date:[NOW/DAY-3DAYS+TO+NOW/DAY]facet.query={!key%3D1week}date:[NOW/DAY-7DAYS+TO+NOW/DAY]facet.query={!key%3D1month}date:[NOW/DAY-1MONTH+TO+NOW/DAY]facet.query={!geofilt+d%3D10+key%3D10kms}facet.query={!geofilt+d%3D30+key%3D30kms}facet.query={!geofilt+d%3D50+key%3D50kms}facet.query={!geofilt+d%3D100+key%3D100kms}start=0q=(title:(test))+OR+(description:(test))+OR+(company:(test))+OR+(location_name:(test))sfield=latlngpt=51.27241,0.190898wt=pythonfq={!geofilt+d%3D10}rows=10} hits=0 status=0 QTime=3 * -- View this message in context: http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087p3993089.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Problem with sorting solr docs
Would all optional fields need the sortmissinglast and sortmissingfirst set even when not sorting on that field? Seems broken to me. Sent from my Mobile device 720-256-8076 On Jul 3, 2012, at 6:45 AM, Shubham Srivastava shubham.srivast...@makemytrip.com wrote: Just adding to the below-- If there is a field(say X) which is not populated and in the query I am not sorting on this particular field but on another field (say Y) still the result ordering would depend on X . Infact in the below problem mentioned by Harsh making X as sortMissingLast=false sortMissingFirst=false solved the problem while in the query he was sorting on Y. This seems a bit illogical. Regards, Shubham From: Harshvardhan Ojha [harshvardhan.o...@makemytrip.com] Sent: Tuesday, July 03, 2012 5:58 PM To: solr-user@lucene.apache.org Subject: RE: Problem with sorting solr docs Hi, I have added field name=latlng indexed=true stored=true sortMissingLast=false sortMissingFirst=false/ to my schema.xml, although I am searching on name field. It seems to be working fine. What is its default behavior? Regards Harshvardhan Ojha -Original Message- From: Rafał Kuć [mailto:r@solr.pl] Sent: Tuesday, July 03, 2012 5:35 PM To: solr-user@lucene.apache.org Subject: Re: Problem with sorting solr docs Hello! But the latlng field is not taken into account when sorting with sort defined such as in your query. You only sort on the name field and only that field. You can also define Solr behavior when there is no value in the field, but adding sortMissingLast=true or sortMissingFirst=true to your type definition in the schema.xml file. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, Thanks for reply. I want to sort my docs on name field, it is working well only if I have all fields populated well. But my latlng field is optional, every doc will not have this value. So those docs are not getting sorted. Regards Harshvardhan Ojha -Original Message- From: Rafał Kuć [mailto:r@solr.pl] Sent: Tuesday, July 03, 2012 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Problem with sorting solr docs Hello! Your query suggests that you are sorting on the 'name' field instead of the latlng field (sort=name +asc). The question is what you are trying to achieve ? Do you want to sort your documents from a given geographical point ? If that's the case you may want to look here: http://wiki.apache.org/solr/SpatialSearch/ and look at the possibility of sorting on the distance from a given point. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, I have 260 docs which I want to sort on a single field latlng. doc str name=id1/str str name=nameAmphoe Khanom/str str name=latlng1.0,1.0/str /doc My query is : http://localhost:8080/solr/select?q=*:*sort=name +asc This query sorts all documents except those which doesn’t have latlng, and I can’t keep any default value for this field. My question is how can I sort all docs on latlng? Regards Harshvardhan Ojha | Software Developer - Technology Development | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India What's new?: Inspire - Discover an inspiring new way to plan and book travel online. Office Map Facebook Twitter
Re: How to space between spatial search results? (Declustering)
Hi mcb You're looking for spatial clustering. I answered this question yesterday on Stack Overflow: http://stackoverflow.com/a/11321723/92186 ~ David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-space-between-spatial-search-results-Declustering-tp3992668p3993106.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr facet multiple constraint
Please someone can help me, we are a team waiting for a fix. We try several ways to implement it without success. Thanks for reading anyway, David. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974p3993119.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use of Solr as primary store for search engine
On 7/4/2012 1:54 AM, Amit Nithian wrote: I am curious to know how people are using Solr in conjunction with other data stores when building search engines to power web sites (say an ecommerce site). The question I have for the group is given an architecture where the primary (transactional) data store is MySQL (Oracle, PostGres whatever) with periodic indexing into Solr, when your front end issues a search query to Solr and returns results, are there any joins with your primary Oracle/MySQL etc to help render results? We used to pull almost everything from our previous search engine. Shortly after we switched to Solr, we began deploying a new version of our website which pulls more from the original data source. The current goal is to only store just enough data in Solr to render a search result grid (pulling thumbails from the filesystem), but go to the database and the filesystem for detail pages. We'd like to reduce the index size to the point where the whole thing will fit in RAM, which we hope will also reduce the amount of time required for a full reindex. What I hope to gain out of upgrading to Solr 4: Use the NRT features so that we can index item popularity and purchase data fast enough to make it actually useful. Thanks, Shawn