Solr autocomplete keyword and geolocation based
I am looking for getting auto complete suggestions using Solr based on keyword as well as geolocation. Is there a way the 'Suggester' component or any other way, Solr can take in multiple fields for auto completion? For e.g. if I have a restaurants database and I want to get suggestions using keyword e.g. 'Piz', the results should be based both on the keyword 'Piz' and also the locations that are close to certain latitude, longitude. Is there a way to do it in Solr ? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-autocomplete-keyword-and-geolocation-based-tp4025466.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR4 (sharded) and join query
On Thu, Dec 6, 2012 at 6:47 PM, Erick Erickson erickerick...@gmail.com wrote: see: http://wiki.apache.org/solr/DistributedSearch joins aren't supported in distributed search. Any time you have more than one shard in SolrCloud, you are, by definition, doing distributed search. It is supported, but there is a limitation: You can use joins, but those joins will each be calculated locally per-shard based on common terms within the same shard. And you can ensure that certain sets of related documents appear on the same shard with the new document routing feature: https://issues.apache.org/jira/browse/SOLR-2592 -Yonik http://lucidworks.com
Re: Modeling openinghours using multipoints
Thanks for the discussion, I've added this to my bag of tricks, way cool! Erick On Sat, Dec 8, 2012 at 10:52 PM, britske gbr...@gmail.com wrote: Brilliant! Got some great ideas for this. Indeed all sorts of usecases which use multiple temporal ranges could benefit.. Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants to model multiple temporary offers per product (free shopping for christmas, 20% discount for Black friday , etc) .. All possible with this out of the box. Factor in 'offer category' in x and y as well for some extra powerfull querying. Yup im enthousiastic about it , which im sure you can tell :) Thanks a lot David, Cheers, Geert-Jan Sent from my iPhone On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] ml-node+s472066n4025434...@n3.nabble.com wrote: britske wrote That's seriously awesome! Some change in the query though: You described: To query for a business that is open during at least some part of a given time duration I want To query for a business that is open during at least the entire given time duration. Feels like a small difference but probably isn't (I'm still wrapping my head on the intersect query I must admit) So this would be a slightly different rectangle query. Interestingly, you simply swap the location in the rectangle where you put the start and end time. In summary: Indexed span CONTAINS query span: minX minY maxX maxY - 0 end start * Indexed span INTERSECTS (i.e. OVERLAPS) query span: minX minY maxX maxY - 0 start end * Indexed span WITHIN query span: minX minY maxX maxY - start 0 * end I'm using '*' here to denote the max possible value. At some point I may add that as a feature. That was a fun exercise! I give you credit in prodding me in this direction as I'm not sure if this use of spatial would have occurred to me otherwise. britske wrote Moreover, any indication on performance? Should, say, 50.000 docs with about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 'your mileage may very' etc. but just a guestimate :) You should have absolutely no problem. The real clincher in your favor is the fact that you only need 9600 discrete time values (so you said), not Long.MAX_VALUE. Using Long.MAX_VALUE would simply not be possible with the current implementation because it's using Doubles which has 52 bits of precision not the 64 that would be required to be a complete substitute for any time/date. Even given the 52 bits, a quad SpatialPrefixTree with maxLevels=52 would probably not perform well or might fail; not sure. Eventually when I have time to work on an implementation that can be based on a configurable number of grid cells (not unlike how you can configure precisionStep on the Trie numeric fields), 52 should be no problem. I'll have to remember to refer back to this email on the approach if I create a field type that wraps this functionality. ~ David britske wrote Again, this looks good! Geert-Jan 2012/12/8 David Smiley (@MITRE.org) [via Lucene] [hidden email] Hello again Geert-Jan! What you're trying to do is indeed possible with Solr 4 out of the box. Other terminology people use for this is multi-value time duration. This creative solution is a pure application of spatial without the geospatial notion -- we're not using an earth or other sphere model -- it's a flat plane. So no need to make reference to longitude latitude, it's x y. I would put opening time into x, and closing time into y. To express a point, use x y (x space y), and supply this as a string to your SpatialRecursivePrefixTreeFieldType based field for indexing. You can give it multiple values and it will work correctly; this is one of RPT's main features that set it apart from Solr 3 spatial. To query for a business that is open during at least some part of a given time duration, say 6-8 o'clock, the query would look like openDuration:Intersects(minX minY maxX maxY) and put 0 or minX (always), 6 for minY (start time), 8 for maxX (end time), and the largest possible value for maxY. You wouldn't actually use 6 8, you'd use the number of 15 minute intervals since your epoch for this equivalent time span. You'll need to configure the field correctly: geo=false worldBounds=0 0 maxTime maxTime substituting an appropriate value for maxTime based on your unit of time (number of 15 minute intervals you need) and distErrPct=0 (full precision). Let me know how this works for you. ~ David Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book If you reply to this email, your message will be added to the discussion below:
Re: SolrCloud - Query performance degrades with multiple servers
Thank you very much will wait for the results from your tests. From: Mark Miller-3 [via Lucene] ml-node+s472066n4025457...@n3.nabble.commailto:ml-node+s472066n4025457...@n3.nabble.com Date: Saturday, December 8, 2012 11:08 PM To: Sarkar, Sauvik sausar...@ebay.commailto:sausar...@ebay.com Subject: Re: SolrCloud - Query performance degrades with multiple servers If that's true, we will fix it for 4.1. I can look closer tomorrow. Mark Sent from my iPhone On Dec 9, 2012, at 2:04 AM, sausarkar [hidden email]/user/SendEmail.jtp?type=nodenode=4025457i=0 wrote: Spoke too early it seems that SolrCloud is still distributing queries to all the servers even if numShards=1 We are seeing POST request to all servers in the cluster, please let me know what is the solution. Here is an example: (the variable isShard should be false in our case as single shard, please help) POST /solr/core0/select HTTP/1.1 Content-Charset: UTF-8 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0 Content-Length: 991 Host: server1 Connection: Keep-Alive lowercaseOperators=truemm=70%fl=EntityIddf=EntityIdq.op=ANDq.alt=*:*qs=10stopwords=truedefType=edismaxrows=3000q=*:*start=0fsv=truedistrib=false*isShard=true*shard.url=*server1*:9090/solr/core0/|*server2*:9090/solr/core0/|*server3*:9090/solr/core0/NOW=1354918880447wt=javabinversion=2 Re: SolrCloud - Query performance degrades with multiple servers Dec 06, 2012; 6:29pm — by Mark Miller-3 On Dec 6, 2012, at 5:08 PM, sausarkar [hidden email] wrote: We solved the issue by explicitly adding numShards=1 argument to the solr start up script. Is this a bug? Sounds like it…perhaps related to SOLR-3971…not sure though. - Mark -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025455.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025457.html To unsubscribe from SolrCloud - Query performance degrades with multiple servers, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4024660code=c2F1c2Fya2FyQGViYXkuY29tfDQwMjQ2NjB8LTE0MTU2ODg5MDk=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025573.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: error opening index solr 4.0 with lukeall-4.0.0-ALPHA.jar
Hi, Thanks for the package, it is useful. I decided to adapt it to Lucene trunk (ver. 5.0-SNAPSHOT). The package with the source code and a binary (dir: target) can be found along the same link. It worked fine against trunk SOLR / Lucene index. There could be bugs though, please drop a line if you test and find any. Regards, Dmitry Kan On Fri, Dec 7, 2012 at 5:50 PM, Neil Ireson n.ire...@dcs.shef.ac.uk wrote: In case it is of use, I have just uploaded an updated and mavenised version of the Luke code to the Luke discussion list, see https://groups.google.com/d/**topic/luke-discuss/MNT_**teDxVno/discussionhttps://groups.google.com/d/topic/luke-discuss/MNT_teDxVno/discussion . It seems to work with the latest (4.0.0 4.1-SNAPSHOT) versions of Lucene. N
Re: Modeling openinghours using multipoints
If these are not raw times, but quantized on-the-hour, would it be faster to create a bit map of hours and then query across the bit maps? On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson erickerick...@gmail.com wrote: Thanks for the discussion, I've added this to my bag of tricks, way cool! Erick On Sat, Dec 8, 2012 at 10:52 PM, britske gbr...@gmail.com wrote: Brilliant! Got some great ideas for this. Indeed all sorts of usecases which use multiple temporal ranges could benefit.. Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants to model multiple temporary offers per product (free shopping for christmas, 20% discount for Black friday , etc) .. All possible with this out of the box. Factor in 'offer category' in x and y as well for some extra powerfull querying. Yup im enthousiastic about it , which im sure you can tell :) Thanks a lot David, Cheers, Geert-Jan Sent from my iPhone On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] ml-node+s472066n4025434...@n3.nabble.com wrote: britske wrote That's seriously awesome! Some change in the query though: You described: To query for a business that is open during at least some part of a given time duration I want To query for a business that is open during at least the entire given time duration. Feels like a small difference but probably isn't (I'm still wrapping my head on the intersect query I must admit) So this would be a slightly different rectangle query. Interestingly, you simply swap the location in the rectangle where you put the start and end time. In summary: Indexed span CONTAINS query span: minX minY maxX maxY - 0 end start * Indexed span INTERSECTS (i.e. OVERLAPS) query span: minX minY maxX maxY - 0 start end * Indexed span WITHIN query span: minX minY maxX maxY - start 0 * end I'm using '*' here to denote the max possible value. At some point I may add that as a feature. That was a fun exercise! I give you credit in prodding me in this direction as I'm not sure if this use of spatial would have occurred to me otherwise. britske wrote Moreover, any indication on performance? Should, say, 50.000 docs with about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 'your mileage may very' etc. but just a guestimate :) You should have absolutely no problem. The real clincher in your favor is the fact that you only need 9600 discrete time values (so you said), not Long.MAX_VALUE. Using Long.MAX_VALUE would simply not be possible with the current implementation because it's using Doubles which has 52 bits of precision not the 64 that would be required to be a complete substitute for any time/date. Even given the 52 bits, a quad SpatialPrefixTree with maxLevels=52 would probably not perform well or might fail; not sure. Eventually when I have time to work on an implementation that can be based on a configurable number of grid cells (not unlike how you can configure precisionStep on the Trie numeric fields), 52 should be no problem. I'll have to remember to refer back to this email on the approach if I create a field type that wraps this functionality. ~ David britske wrote Again, this looks good! Geert-Jan 2012/12/8 David Smiley (@MITRE.org) [via Lucene] [hidden email] Hello again Geert-Jan! What you're trying to do is indeed possible with Solr 4 out of the box. Other terminology people use for this is multi-value time duration. This creative solution is a pure application of spatial without the geospatial notion -- we're not using an earth or other sphere model -- it's a flat plane. So no need to make reference to longitude latitude, it's x y. I would put opening time into x, and closing time into y. To express a point, use x y (x space y), and supply this as a string to your SpatialRecursivePrefixTreeFieldType based field for indexing. You can give it multiple values and it will work correctly; this is one of RPT's main features that set it apart from Solr 3 spatial. To query for a business that is open during at least some part of a given time duration, say 6-8 o'clock, the query would look like openDuration:Intersects(minX minY maxX maxY) and put 0 or minX (always), 6 for minY (start time), 8 for maxX (end time), and the largest possible value for maxY. You wouldn't actually use 6 8, you'd use the number of 15 minute intervals since your epoch for this equivalent time span. You'll need to configure the field correctly: geo=false worldBounds=0 0 maxTime maxTime substituting an appropriate value for maxTime based on your unit of time (number of 15 minute intervals you need) and distErrPct=0 (full precision). Let me know how this works for you. ~ David Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book Author:
Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)
Thanks, It looks like my cluster is in a wedged state after I tried to delete a collection that didn't exist. There are about 80 items in the queue after the delete op (that it can't get by). Is that a known bug? I guess for now I'll just check that a collection exists before sending any deletes. :) Brett On Fri, Dec 7, 2012 at 10:50 AM, Mark Miller markrmil...@gmail.com wrote: Anything in any of the other logs (the other nodes)? The key is getting the logs from the node designated as the overseer - it should hopefully have the error. Right now because you pass this stuff off to the overseer, you will always get back a 200 - there is a JIRA issue that addresses this though (collection API responses) and I hope to get it committed soon. - Mark On Dec 7, 2012, at 7:26 AM, Brett Hoerner br...@bretthoerner.com wrote: For what it's worth this is the log output with DEBUG on, Dec 07, 2012 2:00:48 PM org.apache.solr.handler.admin.CollectionsHandler handleCreateAction INFO: Creating Collection : action=CREATEname=foonumShards=4 Dec 07, 2012 2:01:03 PM org.apache.solr.core.SolrCore execute INFO: [15671] webapp=/solr path=/admin/system params={wt=json} status=0 QTime=5 Dec 07, 2012 2:01:15 PM org.apache.solr.handler.admin.CollectionsHandler handleDeleteAction INFO: Deleting Collection : action=DELETEname=default Dec 07, 2012 2:01:20 PM org.apache.solr.core.SolrCore execute Neither the CREATE or DELETE actually did anything, though. (Again, HTTP 200 OK) Still stuck here, any ideas? Brett On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner br...@bretthoerner.com wrote: Hi, I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, which I called default and haven't used since. I'm using an external ZK ensemble that was completely empty before I started this cloud. Once I had all 4 nodes in the cloud I used the collection API to create the real collections I wanted. I also tested that deleting works. For example, # this worked curl http://localhost:8984/solr/admin/collections?action=CREATEname=15678numShards=4 # this worked curl http://localhost:8984/solr/admin/collections?action=DELETEname=15678; Next, I started my indexer service which happily sent many, many updates to the cloud. Queries against the collections also work just fine. Finally, a few hours later, I tried doing a create and a delete. Both operations did nothing, although Solr replied with a 200 OK. $ curl -i http://localhost:8984/solr/admin/collections?action=CREATEname=15679numShards=4 HTTP/1.1 200 OK Content-Type: application/xml; charset=UTF-8 Transfer-Encoding: chunked ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/int/lst There is nothing in the stdout/stderr logs, nor the Java logs (I have it set to WARN). I have tried bouncing the nodes and it doesn't change anything. Any ideas? How can I further debug this or what else can I provide?
Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)
Yeah it is - this was fixed a while ago on 4x and will be in 4.1. An exception would kill the collection manager wait loop. - Mark On Sun, Dec 9, 2012 at 9:21 PM, Brett Hoerner br...@bretthoerner.com wrote: Thanks, It looks like my cluster is in a wedged state after I tried to delete a collection that didn't exist. There are about 80 items in the queue after the delete op (that it can't get by). Is that a known bug? I guess for now I'll just check that a collection exists before sending any deletes. :) Brett On Fri, Dec 7, 2012 at 10:50 AM, Mark Miller markrmil...@gmail.com wrote: Anything in any of the other logs (the other nodes)? The key is getting the logs from the node designated as the overseer - it should hopefully have the error. Right now because you pass this stuff off to the overseer, you will always get back a 200 - there is a JIRA issue that addresses this though (collection API responses) and I hope to get it committed soon. - Mark On Dec 7, 2012, at 7:26 AM, Brett Hoerner br...@bretthoerner.com wrote: For what it's worth this is the log output with DEBUG on, Dec 07, 2012 2:00:48 PM org.apache.solr.handler.admin.CollectionsHandler handleCreateAction INFO: Creating Collection : action=CREATEname=foonumShards=4 Dec 07, 2012 2:01:03 PM org.apache.solr.core.SolrCore execute INFO: [15671] webapp=/solr path=/admin/system params={wt=json} status=0 QTime=5 Dec 07, 2012 2:01:15 PM org.apache.solr.handler.admin.CollectionsHandler handleDeleteAction INFO: Deleting Collection : action=DELETEname=default Dec 07, 2012 2:01:20 PM org.apache.solr.core.SolrCore execute Neither the CREATE or DELETE actually did anything, though. (Again, HTTP 200 OK) Still stuck here, any ideas? Brett On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner br...@bretthoerner.com wrote: Hi, I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, which I called default and haven't used since. I'm using an external ZK ensemble that was completely empty before I started this cloud. Once I had all 4 nodes in the cloud I used the collection API to create the real collections I wanted. I also tested that deleting works. For example, # this worked curl http://localhost:8984/solr/admin/collections?action=CREATEname=15678numShards=4 # this worked curl http://localhost:8984/solr/admin/collections?action=DELETEname=15678; Next, I started my indexer service which happily sent many, many updates to the cloud. Queries against the collections also work just fine. Finally, a few hours later, I tried doing a create and a delete. Both operations did nothing, although Solr replied with a 200 OK. $ curl -i http://localhost:8984/solr/admin/collections?action=CREATEname=15679numShards=4 HTTP/1.1 200 OK Content-Type: application/xml; charset=UTF-8 Transfer-Encoding: chunked ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/int/lst There is nothing in the stdout/stderr logs, nor the Java logs (I have it set to WARN). I have tried bouncing the nodes and it doesn't change anything. Any ideas? How can I further debug this or what else can I provide? -- - Mark
Re: stress testing Solr 4.x
Hmmm...EOF on the segments file is odd... How were you killing the nodes? Just stopping them or kill -9 or what? - Mark On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com wrote: Hi, I have re-ran my tests today after I updated Solr 4.1 to apply the patch. First, the good news : it works i.e. if I stop all three Solr servers and then restart one, it will try to find the other two for a while (about 3 minutes I think) then give up, become the leader and start processing requests. Now, the not-so-good : I encountered several exceptions that seem to indicate 2 other issues. Here are the relevant bits. 1) The ZK session expiry problem : not sure what caused it but I did a few Solr or ZK node restarts while the system was under load. SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710) ... 10 more SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue/qn- at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207) at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229) at org.apache.solr.cloud.ZkController.publish(ZkController.java:824) at org.apache.solr.cloud.ZkController.publish(ZkController.java:797) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed to start due to the exceptions below and both servers went into a seemingly endless loop of exponential retries. The fix was to stop both faulty servers, remove the data directory of this core and restart : replication then took place correctly. As above, not sure what exactly caused this to happen; no updates were taking place, only searches. On server 1 : INFO: Closing directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785 Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400) at
Re: Modeling openinghours using multipoints
Colleagues, What are benefits of this approach at contrast to block join? Thanks 10.12.2012 3:35 пользователь Lance Norskog goks...@gmail.com написал: If these are not raw times, but quantized on-the-hour, would it be faster to create a bit map of hours and then query across the bit maps? On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson erickerick...@gmail.com wrote: Thanks for the discussion, I've added this to my bag of tricks, way cool! Erick On Sat, Dec 8, 2012 at 10:52 PM, britske gbr...@gmail.com wrote: Brilliant! Got some great ideas for this. Indeed all sorts of usecases which use multiple temporal ranges could benefit.. Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants to model multiple temporary offers per product (free shopping for christmas, 20% discount for Black friday , etc) .. All possible with this out of the box. Factor in 'offer category' in x and y as well for some extra powerfull querying. Yup im enthousiastic about it , which im sure you can tell :) Thanks a lot David, Cheers, Geert-Jan Sent from my iPhone On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] ml-node+s472066n4025434...@n3.nabble.com wrote: britske wrote That's seriously awesome! Some change in the query though: You described: To query for a business that is open during at least some part of a given time duration I want To query for a business that is open during at least the entire given time duration. Feels like a small difference but probably isn't (I'm still wrapping my head on the intersect query I must admit) So this would be a slightly different rectangle query. Interestingly, you simply swap the location in the rectangle where you put the start and end time. In summary: Indexed span CONTAINS query span: minX minY maxX maxY - 0 end start * Indexed span INTERSECTS (i.e. OVERLAPS) query span: minX minY maxX maxY - 0 start end * Indexed span WITHIN query span: minX minY maxX maxY - start 0 * end I'm using '*' here to denote the max possible value. At some point I may add that as a feature. That was a fun exercise! I give you credit in prodding me in this direction as I'm not sure if this use of spatial would have occurred to me otherwise. britske wrote Moreover, any indication on performance? Should, say, 50.000 docs with about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 'your mileage may very' etc. but just a guestimate :) You should have absolutely no problem. The real clincher in your favor is the fact that you only need 9600 discrete time values (so you said), not Long.MAX_VALUE. Using Long.MAX_VALUE would simply not be possible with the current implementation because it's using Doubles which has 52 bits of precision not the 64 that would be required to be a complete substitute for any time/date. Even given the 52 bits, a quad SpatialPrefixTree with maxLevels=52 would probably not perform well or might fail; not sure. Eventually when I have time to work on an implementation that can be based on a configurable number of grid cells (not unlike how you can configure precisionStep on the Trie numeric fields), 52 should be no problem. I'll have to remember to refer back to this email on the approach if I create a field type that wraps this functionality. ~ David britske wrote Again, this looks good! Geert-Jan 2012/12/8 David Smiley (@MITRE.org) [via Lucene] [hidden email] Hello again Geert-Jan! What you're trying to do is indeed possible with Solr 4 out of the box. Other terminology people use for this is multi-value time duration. This creative solution is a pure application of spatial without the geospatial notion -- we're not using an earth or other sphere model -- it's a flat plane. So no need to make reference to longitude latitude, it's x y. I would put opening time into x, and closing time into y. To express a point, use x y (x space y), and supply this as a string to your SpatialRecursivePrefixTreeFieldType based field for indexing. You can give it multiple values and it will work correctly; this is one of RPT's main features that set it apart from Solr 3 spatial. To query for a business that is open during at least some part of a given time duration, say 6-8 o'clock, the query would look like openDuration:Intersects(minX minY maxX maxY) and put 0 or minX (always), 6 for minY (start time), 8 for maxX (end time), and the largest possible value for maxY. You wouldn't actually use 6 8, you'd use the number of 15 minute intervals since your epoch for this equivalent time span. You'll need to configure the field correctly: geo=false worldBounds=0 0 maxTime maxTime substituting an
How does fq skip score value?
Hello, I would like to know fq parameters doesnt deal with scoring so on,,, I have been digging the code, to see where it separates and executes fq parameters but couldnt find yet... anyone knows how does fq work to skip score information? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/How-does-fq-skip-score-value-tp4025608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How does fq skip score value?
Sure. Here the fq's docsets are intersected https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L864 and here https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1471that docset is passed to lucene search. On Mon, Dec 10, 2012 at 10:06 AM, deniz denizdurmu...@gmail.com wrote: Hello, I would like to know fq parameters doesnt deal with scoring so on,,, I have been digging the code, to see where it separates and executes fq parameters but couldnt find yet... anyone knows how does fq work to skip score information? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/How-does-fq-skip-score-value-tp4025608.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com