adjusting score slightly by date field
Hello, In our application there are a lot of old records that we still want in our index but would like for them to be scored lower than some newer records. Is it possible for a date field to weigh in on the score slightly in some way? Or if not is there another way to push up newer records in the order of results while still maintaining the scoring? -Mike
Re: Solr Update Handler Failes with Some Doc Characters
I see that the update handler fails even if the character is NOT right next to XML closing tag. If the character is anywhere in any of the XML tags, the update handler fails to parse the XML. Does posting the utf8-example in the exampledocs directory work?
Re: Solr Update Handler Failes with Some Doc Characters
Hi, I specify the following encoding when POSTING the data to Solr: text/xml; charset=utf-8 The encoding of the actual XML is also UTF-8. I see that the update handler fails even if the character is NOT right next to XML closing tag. If the character is anywhere in any of the XML tags, the update handler fails to parse the XML. Thanks, Av - Original Message From: Yonik Seeley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, May 9, 2007 10:45:43 AM Subject: Re: Solr Update Handler Failes with Some Doc Characters On 5/9/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I run the example using Jetty on Windows 2003 machine. When I submit some documents containing upper ASCII characters, Solr update handler fails with an XML parsing error saying that it encountered an EOF before the closing tags. Normally if there is a charset mixup, you will just get weird looking results. I suppose that if a char that is greater than 128 is used, and Solr is treating as UTF-8, then the following char would be treated as part of a single multibyte character. Hence if the char is directly followed by XML markup, part of that XML markup will be lost (hence the parse exception). In short, this is probably a char encoding issue. What character encoding are you using when posting to Solr, and is it declared in the HTTP header? -Yonik Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. http://games.yahoo.com/games/front
Re: Solr Update Handler Failes with Some Doc Characters
I'm not sure this is the case, but did you use CDATA section in your XML? Or try to use character reference to represent copyright symbol. I believe it is copy; or #169; . Hope this helps, Koji [EMAIL PROTECTED] wrote: Hi, I run the example using Jetty on Windows 2003 machine. When I submit some documents containing upper ASCII characters, Solr update handler fails with an XML parsing error saying that it encountered an EOF before the closing tags. The XML is perfectly correct and is using utf-8 encoding. It is generated using XmlWriter from C#. When viewing the XML in a browser it parses and displayes properly. For exampe, Solr breaks on the copyright symbol (c). Is there some configuration setting that I need to change to make sure it is able to parse this documents correctly? Thank you in advance! Av __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Solr Update Handler Failes with Some Doc Characters
Hi, I tried CDATA. It fails the same way. I will try to check of the utf8-example.xml works ok (I just have to change it to match my schema). I just ran a test by adding (R) symbolt into the XML to get the exact error message. See below. Thanks, Av *** SUBMITTED REQUEST *** (as captured by HTTP proxy) POST /solr/update HTTP/1.1 Content-Type: text/xml; charset=utf-8 Host: ws2006b:8983 Content-Length: 695 Expect: 100-continue Proxy-Connection: Close ?xml version=1.0 encoding=UTF-8? add doc field name=id1000194/field field name=urlhttp://www®barharborinn®com/field field name=titleBar Harbor Hotels and Bar Harbor Inn near Acadia in Bar Harbor ME/field field name=metaDescriptionBar Harbor Inn- premier oceanfront hotel in Bar Harbor ME® Rated Superior First Class by OHG, Best in-town location®Special Value Packages-fine dining-personal service-amenities®/field field name=metaKeywordsBar Harbor Inn- premier oceanfront hotel in Bar Harbor ME® Rated Superior First Class by OHG, Best in-town location®Special Value Packages-fine dining-personal service-amenities®/field /doc *** ERROR MESSAGE *** html head titleError 500 no more data available - expected end tag lt;/addgt; to close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e Packages-fine dining-personal service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7 java.io.EOFException: no more data available - expected end tag lt;/addgt; to close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e Packages-fine dining-personal service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7 at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:3015) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:159) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:188) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:156) at org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(WebApplicationHandler.java:821) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:471) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at org.mortbay.http.HttpContext.handle(HttpContext.java:1530) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:633) at org.mortbay.http.HttpContext.handle(HttpContext.java:1482) at org.mortbay.http.HttpServer.service(HttpServer.java:909) at org.mortbay.http.HttpConnection.service(HttpConnection.java:820) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:986) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:245) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) /title /head body h2HTTP ERROR: 500/h2preno more data available - expected end tag lt;/addgt; to close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e Packages-fine dining-personal service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7 java.io.EOFException: no more data available - expected end tag lt;/addgt; to close start tag lt;addgt; from line 2, parser stopped on END_TAG seen ...e Packages-fine dining-personal service-amenities\uaelt;/fieldgt;\r\nlt;/docgt;... @9:7 at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:3015) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:159) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:188) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:156) at org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(WebApplicationHandler.java:821) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:471) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at
Re: Solr Update Handler Failes with Some Doc Characters
On 5/9/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: add doc field name=id1000194/field field name=urlhttp://www(r)barharborinn(r)com/field field name=titleBar Harbor Hotels and Bar Harbor Inn near Acadia in Bar Harbor ME/field field name=metaDescriptionBar Harbor Inn- premier oceanfront hotel in Bar Harbor ME(r) Rated Superior First Class by OHG, Best in-town location(r)Special Value Packages-fine dining-personal service-amenities(r)/field field name=metaKeywordsBar Harbor Inn- premier oceanfront hotel in Bar Harbor ME(r) Rated Superior First Class by OHG, Best in-town location(r)Special Value Packages-fine dining-personal service-amenities(r)/field /doc *** ERROR MESSAGE *** html head titleError 500 no more data available - expected end tag /add to close start tag That seems to be the problem... where is the /add? -Yonik
date range search
does solr support date range searching? i've tried all the examples on the lucene site as well as using the solr response format and a few others that seemed nifty but so far I always get query parsing errors. i know i can easily convert the dates to ints and do ranges that way but all the documentation seemed to imply it was possibly to do range searches on real dates, it just didn't give an example that i could see. - will
Re: Index corruptions?
On 5/7/07, Tom Hill [EMAIL PROTECTED] wrote: Is the cp-lr in snapshot really guaranteed to be atomic? Or is it just fast, and unlikely to be interrupted? It's called from Solr within a synchronized context, and it's guaranteed that no index changes (via Solr at least) will happen concurrently. -Yonik
Dismax Config?
I'd love to see some explanation of what's going on here, and how to configure it for my own use. I've changed the fields to match my own columns, but it'd be great if I could actually understand it.. 243 requestHandler name=dismax class=solr.DisMaxRequestHandler 244 lst name=defaults 245 str name=echoParamsexplicit/str 246 float name=tie0.01/float 247 str name=qf 248 text^0.5 description^1.0 name^5.0 style_id^1.5 product_id^10.0 brand^4.1 product_type^1.4 249 /str 250 str name=pf 251 text^0.2 description^1.1 name^1.5 brand^1.4 brandexact^1.9 252 /str 253 str name=bf 254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 255 /str 256 str name=fl 257 product_id,name,price,score 258 /str 259 str name=mm 260 2lt;-1 5lt;-2 6lt;90% 261 /str 262 int name=ps100/int 263 /lst 264 /requestHandler Thank you! ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Dismax Config?
check: http://wiki.apache.org/solr/DisMaxRequestHandler For now, most of the docs for dismax are in the javadocs: http://lucene.apache.org/solr/api/org/apache/solr/request/DisMaxRequestHandler.html Matthew Runo wrote: I'd love to see some explanation of what's going on here, and how to configure it for my own use. I've changed the fields to match my own columns, but it'd be great if I could actually understand it.. 243 requestHandler name=dismax class=solr.DisMaxRequestHandler 244 lst name=defaults 245 str name=echoParamsexplicit/str 246 float name=tie0.01/float 247 str name=qf 248 text^0.5 description^1.0 name^5.0 style_id^1.5 product_id^10.0 brand^4.1 product_type^1.4 249 /str 250 str name=pf 251 text^0.2 description^1.1 name^1.5 brand^1.4 brandexact^1.9 252 /str 253 str name=bf 254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 255 /str 256 str name=fl 257 product_id,name,price,score 258 /str 259 str name=mm 260 2lt;-1 5lt;-2 6lt;90% 261 /str 262 int name=ps100/int 263 /lst 264 /requestHandler Thank you! ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Dismax Config?
Perfect! I had seen the wiki, but did not visit the class page since I am using Perl. What is slop? heh ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On May 9, 2007, at 11:00 AM, Ryan McKinley wrote: check: http://wiki.apache.org/solr/DisMaxRequestHandler For now, most of the docs for dismax are in the javadocs: http://lucene.apache.org/solr/api/org/apache/solr/request/ DisMaxRequestHandler.html Matthew Runo wrote: I'd love to see some explanation of what's going on here, and how to configure it for my own use. I've changed the fields to match my own columns, but it'd be great if I could actually understand it.. 243 requestHandler name=dismax class=solr.DisMaxRequestHandler 244 lst name=defaults 245 str name=echoParamsexplicit/str 246 float name=tie0.01/float 247 str name=qf 248 text^0.5 description^1.0 name^5.0 style_id^1.5 product_id^10.0 brand^4.1 product_type^1.4 249 /str 250 str name=pf 251 text^0.2 description^1.1 name^1.5 brand^1.4 brandexact^1.9 252 /str 253 str name=bf 254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 255 /str 256 str name=fl 257 product_id,name,price,score 258 /str 259 str name=mm 260 2lt;-1 5lt;-2 6lt;90% 261 /str 262 int name=ps100/int 263 /lst 264 /requestHandler Thank you! ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Dismax Config?
Ah hah! After doing some research, slop is a fun term for how sloppy a match SOLR will make. Eg, slop = 0, means that only exact matches will work. Slop = 1 means that they can be off by one word... etc Yes? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On May 9, 2007, at 11:19 AM, Matthew Runo wrote: Perfect! I had seen the wiki, but did not visit the class page since I am using Perl. What is slop? heh ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On May 9, 2007, at 11:00 AM, Ryan McKinley wrote: check: http://wiki.apache.org/solr/DisMaxRequestHandler For now, most of the docs for dismax are in the javadocs: http://lucene.apache.org/solr/api/org/apache/solr/request/ DisMaxRequestHandler.html Matthew Runo wrote: I'd love to see some explanation of what's going on here, and how to configure it for my own use. I've changed the fields to match my own columns, but it'd be great if I could actually understand it.. 243 requestHandler name=dismax class=solr.DisMaxRequestHandler 244 lst name=defaults 245 str name=echoParamsexplicit/str 246 float name=tie0.01/float 247 str name=qf 248 text^0.5 description^1.0 name^5.0 style_id^1.5 product_id^10.0 brand^4.1 product_type^1.4 249 /str 250 str name=pf 251 text^0.2 description^1.1 name^1.5 brand^1.4 brandexact^1.9 252 /str 253 str name=bf 254 ord(poplarity)^0.5 recip(rord(price),1,1000,1000) ^0.3 255 /str 256 str name=fl 257 product_id,name,price,score 258 /str 259 str name=mm 260 2lt;-1 5lt;-2 6lt;90% 261 /str 262 int name=ps100/int 263 /lst 264 /requestHandler Thank you! ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Dismax Config?
On 5/9/07, Matthew Runo [EMAIL PROTECTED] wrote: Ah hah! After doing some research, slop is a fun term for how sloppy a match SOLR will make. Eg, slop = 0, means that only exact matches will work. Slop = 1 means that they can be off by one word... etc Yes? All terms must appear, but the positions can be off. It's called a sloppy phrase query, or proximity query. It's actually based on how many moves need to be made to get the tokens to match in the correct positions. Example: a b~1 will match fields with a b, a x b, or b a, but not b x a. The last would require a slop of 2 -Yonik
Re: Dismax Config?
: Example: : a b~1 will match fields with a b, a x b, or b a, but not b x a. : The last would require a slop of 2 also note that there are two slop params in the DisMax handler ... qs refers to how much slop' will be used when querying the qf fields if the user actually types in a query string containing quotes. ie, if hte user types in... Chris Hostetter Solr that will create a big complex DisjunctionMaxQuery across all of hte qf fields for the term Solr and the phrase Chris Hostetter ... and the qs param will determine how much slop is allowed for the qs field ps refers to the amount of slop that will be used on the artificially constructed phrase query used to boost the scores of documents that match the entire query string as a single phrase on any of the pf fields. in the previous example, Chris Hostetter Solr as a single phrase will be queried across all of the pf fields with ps slop, and any matches will get their overall scores increased. As a general rules, you probably want qs to be small since it affects how loose your matching will be, while ps can be quite large (because it's only increasing the scores of existing matches, and regardless of the ps value, looser matches will sore lower then tighter matches). qs exists mainly to deal with situations where you know there might be a small offset between terms that you would otherwise consider sequential (ie: due to synonym injection, or stop word removal) -Hoss
Re: Look ahead queries
You could perhaps use faceting to do this for single terms. Set the base query to whatever you want (or *:* for everything) Then use facet.field=textfacet.prefix=foo If you indexed field values as strings like fuel consumption (instead of breaking it up into tokens) then you could get your phrases but phrase detection is not automatic. -Yonik On 5/3/07, Ge, Yao (Y.) [EMAIL PROTECTED] wrote: I am planning to develop look ahead queries with Solr so that as user type query terms a list of related terms is shown in a popup window (similar to Google suggest).
Index Concurrency
Hello, I'm a bit new to search indexing and I'm hoping some of you here can help me with an e-mail application I'm working on. I have a mail retrieval program that accesses multiple POP accounts in parallel, and parses each message into a database. I would like to add a new document to a solr index each time I process a message. My first intuition is to give each user their own index. My thinking here is that querying would be faster (since each user's index would be much smaller than one big index,) and, more importantly, that I would dodge any concurrency issues stemming from multiple threads trying to update the same index simultaneously. I realize that Lucene implements a locking mechanism to protect against concurrent access, but I seem to hit the lock access timeout quite easily with only a couple threads. After looking at solr, I would really like to take advantage of the many features it adds to Lucene, but it doesn't look like I'll be able to achieve multiple indexes. Am I completely off in thinking that I need multiple indexes? Is there some best practice for this sort of thing that I haven't stumbled upon? Any advice would be greatly appreciated. Thanks, Joe -- View this message in context: http://www.nabble.com/Index-Concurrency-tf3718634.html#a10403918 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet only support english?
On 5/5/07, James liu [EMAIL PROTECTED] wrote: Expect it to support other language like chinese. maybe solr facet can config like this when it support other language. str name=facet.querytitle:诺基亚/str solrconfig.xml is a normal XML document. It currently starts off with ?xml version=1.0? which has no char encoding specified and the XML parser may default to something you don't want. If you are saving the file in UTF-8 format, then try changing the first line to be this: ?xml version=1.0 encoding=UTF-8? -Yonik
Re: Facet only support english?
On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: ?xml version=1.0 encoding=UTF-8? We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? -Yonik
Re: Facet only support english?
On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: ?xml version=1.0 encoding=UTF-8? We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? No--I'm not sure that it'll bring clarity for anyone who isn't aware of xml encoding issues, but I can't see it hurting. -Mike
Re: Index Concurrency
On 5/9/07, joestelmach [EMAIL PROTECTED] wrote: My first intuition is to give each user their own index. My thinking here is that querying would be faster (since each user's index would be much smaller than one big index,) and, more importantly, that I would dodge any concurrency issues stemming from multiple threads trying to update the same index simultaneously. I realize that Lucene implements a locking mechanism to protect against concurrent access, but I seem to hit the lock access timeout quite easily with only a couple threads. After looking at solr, I would really like to take advantage of the many features it adds to Lucene, but it doesn't look like I'll be able to achieve multiple indexes. No, not currently. Start your implementation with just a single index... unless it is very large, it will likely be fast enough. Solr also handles all the concurrency issues, and you should never hit lock access timeout when updating from multiple threads. -Yonik
Re: Facet only support english?
Yonik Seeley wrote: On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: ?xml version=1.0 encoding=UTF-8? We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? I'm for it... but if the xml parser uses getReader() does it make any difference?
Re: Facet only support english?
On 5/9/07, Ryan McKinley [EMAIL PROTECTED] wrote: Yonik Seeley wrote: We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? I'm for it... but if the xml parser uses getReader() does it make any difference? For Solr's XML config files, DocumentBuilder.parse(InputStream) is called, so we don't construct a reader first. -Yonik
Re: Facet only support english?
+1 on explicit encoding declarations. Yonik Seeley wrote: On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: If you are saving the file in UTF-8 format, then try changing the first line to be this: ?xml version=1.0 encoding=UTF-8? We should probably change the example solrconfig.xml and schema.xml to be UTF-8 by default. Any objections? -Yonik
Re: Facet only support english?
+1 on explicit encoding declarations. Done (even though it really wasn't needed since there were no int'l chars in the example). As Mike points out, it only marginally helps... if the user adds international chars to the config and saves it as something other than UTF-8 they are still hosed. At least UTF-8 is a better default than something like latin-1 though. -Yonik
Re: Facet only support english?
On 5/9/07, Yonik Seeley [EMAIL PROTECTED] wrote: +1 on explicit encoding declarations. Done (even though it really wasn't needed since there were no int'l chars in the example). As Mike points out, it only marginally helps... if the user adds international chars to the config and saves it as something other than UTF-8 they are still hosed. At least UTF-8 is a better default than something like latin-1 though. I thought that conformant parsers use UTF-8 as the default anyway: http://www.w3.org/TR/REC-xml/#charencoding -Mike
Re: Facet only support english?
I didn't remember that requirement, so I looked it up. It was added in XML 1.0 2nd edition. Originally, unspecified encodings were open for auto-detection. Content type trumps encoding declarations, of course, per RFC 3023 and allowed by the XML spec. wunder On 5/9/07 4:19 PM, Mike Klaas [EMAIL PROTECTED] wrote: I thought that conformant parsers use UTF-8 as the default anyway: http://www.w3.org/TR/REC-xml/#charencoding -Mike
Re: Solr Sorting, merging/weighting sort fields
No problem. Use a boost function. In a DisMaxRequestHandler spec in solrconfig.xml, specify this: str name=bf popularity^0.5 /str This value will be added to the score before ranking. You will probably need to fuss with the multiplier to get the popularity to the right proportion of the total score. I find it handy to return the score and the popularity value and look over a few test queries to adjust that. wunder On 5/9/07 4:58 PM, Nick Jenkin [EMAIL PROTECTED] wrote: Hi all, I have a popularity field in my solr index, this field is a popularity rating of a particular product (based on the number of product views etc). I want to be able to integrate this number into the search result sorting such that a product with a higher popularity rating is ranking higher in the search results. I can always do: title:(harry potter); popularity desc, score desc; In this example I will say be searching for harry potter books, obviously the latest book has a very high popularity rating, but lets say a completely unrelated product with a high popularity also matches for harry potter and gets into the search results (but has a very low score value), this will bring it to the top of the results. I want to be able to weight popularity in such a way that it boosts the score, but will not greatly affect the search results. Is this possible? Thanks
Re: facet.sort does not work in python output
On 5/3/07, Mike Klaas [EMAIL PROTECTED] wrote: On 5/3/07, Jack L [EMAIL PROTECTED] wrote: The Python output uses nested dictionaries for facet counts. This might be fixed in the future It's fixed in the current development version (future 1.2), already. See http://wiki.apache.org/solr/SolJSON which is the base for both Python and Ruby. The default is json.nl=flat which results in alternating term and count in a flat array. facet_fields:{ cat:[ electronics,3, card,2, graphics,2, music,1]}}, -Yonik
Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?
Yes, for good (hopefully) or bad. -Sean Shridhar Venkatraman wrote on 5/7/2007, 12:37 AM: Interesting.. Surrogates can also bring the searcher's subjectivity (opinion and context) into it by the learning process ? shridhar Sean Timm wrote: It may not be easy or even possible without major changes, but having global collection statistics would allow scores to be compared across searchers. To do this, the master indexes would need to be able to communicate with each other. An other approach to merging across searchers is described here: Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, Greg Pass, Ophir Frieder, "Surrogate Scoring for Improved Metasearch Precision" , Proceedings of the 2005 ACM Conference on Research and Development in Information Retrieval (SIGIR-2005), Salvador, Brazil, August 2005. -Sean [EMAIL PROTECTED] wrote: On 4/11/07, Chris Hostetter [EMAIL PROTECTED] wrote: A custom Similaity class with simplified tf, idf, and queryNorm functions might also help you get scores from the Explain method that are more easily manageable since you'll have predictible query structures hard coded into your application. ie: run the large query once, get the results back, and for each result look at the explanation and pull out the individual pieces of hte explanation and compare them with those of hte other matches to create your own "normalization". Chuck Williams mentioned a proposal he had for normalization of scores that would give a constant score range that would allow comparison of scores. Chuck, did you ever write any code to that end or was it just algorithmic discussion? Here is the point I'm at now: I have my matching engine working. The fields to be indexed and the queries are defined by the user. Hoss, I'm not sure how that affects your idea of having a custom Similarity class since you mentioned that having predictable query structures was important... The user kicks off an indexing then defines the queries they want to try matching with. Here is an example of the query fragments I'm working with right now: year_str:"${Year}"^2 year_str:[${Year -1} TO ${Year +1}] title_title_mv:"${Title}"^10 title_title_mv:${Title}^2 +(title_title_mv:"${Title}"~^5 title_title_mv:${Title}~) director_name_mv:"${Director}"~2^10 director_name_mv:${Director}^5 director_name_mv:${Director}~.7 For each item in the source feed, the variables are interpolated (the query term is transformed into a grouped term if there are multiple values for a variable). That query is then made to find the overall best match. I then determine the relevance for each query fragment. I haven't written any plugins for Lucene yet, so my current method of determining the relevance is by running each query fragment by itself then iterating through the results looking to see if the overall best match is in this result set. If it is, I record the rank and multiply that rank (e.g. 5 out of 10) by a configured fragment weight. Since the scores aren't normalized, I have no good way of determining a poor overall match from a really high quality one. The overall item could be the first item returned in each of the query fragments. Any help here would be very appreciated. Ideally, I'm hoping that maybe Chuck has a patch or plugin that I could use to normalize my scores such that I could let the user do a matching run, look at the results and determine what score threshold to set for subsequent runs. Thanks, Daniel
Re: Solr Sorting, merging/weighting sort fields
Thanks, worked perfectly! -Nick On 5/10/07, Walter Underwood [EMAIL PROTECTED] wrote: No problem. Use a boost function. In a DisMaxRequestHandler spec in solrconfig.xml, specify this: str name=bf popularity^0.5 /str This value will be added to the score before ranking. You will probably need to fuss with the multiplier to get the popularity to the right proportion of the total score. I find it handy to return the score and the popularity value and look over a few test queries to adjust that. wunder On 5/9/07 4:58 PM, Nick Jenkin [EMAIL PROTECTED] wrote: Hi all, I have a popularity field in my solr index, this field is a popularity rating of a particular product (based on the number of product views etc). I want to be able to integrate this number into the search result sorting such that a product with a higher popularity rating is ranking higher in the search results. I can always do: title:(harry potter); popularity desc, score desc; In this example I will say be searching for harry potter books, obviously the latest book has a very high popularity rating, but lets say a completely unrelated product with a high popularity also matches for harry potter and gets into the search results (but has a very low score value), this will bring it to the top of the results. I want to be able to weight popularity in such a way that it boosts the score, but will not greatly affect the search results. Is this possible? Thanks -- - Nick
Re: Index Concurrency
Yonik, Thanks for your fast reply. No, not currently. Start your implementation with just a single index... unless it is very large, it will likely be fast enough. My index will get quite large Solr also handles all the concurrency issues, and you should never hit lock access timeout when updating from multiple threads. Does solr provide any additional concurrency control over what Lucene provides? In my simple testing of indexing 2,000 messages, solr would issue lock access timeouts with as little as 10 threads. Running all 2,000 messages through sequentially yields no problems at all. Actually, I'm able churn through over 100,000 messages when no threads are involved. Am I missing some concurrency settings? Thanks, Joe -- View this message in context: http://www.nabble.com/Index-Concurrency-tf3718634.html#a10406382 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index Concurrency
On 5/9/07, joestelmach [EMAIL PROTECTED] wrote: Does solr provide any additional concurrency control over what Lucene provides? Yes, coordination between the main index searcher, the index writer, and the index reader needed to delete other documents. In my simple testing of indexing 2,000 messages, solr would issue lock access timeouts with as little as 10 threads. That's weird... I've never seen that. The lucene write lock is only obtained when the IndexWriter is created. Can you post the relevant part of the log file where the exception happens? Also, unless you have at least 6 CPU cores or so, you are unlikely to see greater throughput with 10 threads. If you add multiple documents per HTTP-POST (such that HTTP latency is minimized), the best setting would probably be nThreads == nCores. For a single doc per POST, more threads will serve to cover the latency and keep Solr busy. -Yonik
Question about delete
i use command like this curl http://localhost:8983/solr/update --data-binary 'deletequeryname:DDR/query/delete' curl http://localhost:8983/solr/update --data-binary 'commit/' and i get numDocs : 0 maxDoc : 1218819 when i search something which exists in before delete and find nothing. but index file size not changed and maxDoc not changed. why it happen? -- regards jl
Re: Requests per second/minute monitor?
Walter Underwood wrote: This is for monitoring -- what happened in the last 30 seconds. Log file analysis doesn't really do that. I would respectfully disagree. Log file analysis of each request can give you that, and a whole lot more. you could either grab the stats via a regular cron job, or create a separate filter to parse them real time. It would then let you grab more sophisticated stats if you choose to. What I would like to know is (and excuse the newbieness of the question) how to enable solr to log a file with the following data. - time spent (ms) in the request. - IP# of the incoming request - what the request was (and what handler executed it) - a status code to signal if the request failed for some reasons - number of rows fetched and - the number of rows actually returned is this possible? (I'm using tomcat if that changes the answer). regards Ian -- View this message in context: http://www.nabble.com/Re%3A-Requests-per-second-minute-monitor--tf3659369.html#a10407072 Sent from the Solr - User mailing list archive at Nabble.com.