Re: multiple attribute
Michael, Your question is a little bit confusing. Business entities have attributes. We model entities as documents, and attributes as fields. That's why adding attributes to a filed is contradictory. Btw there few nearby conceptions in Lucene, which are Payloads and TermsPositions. About the problem itself I can suggest: - http://wiki.apache.org/solr/Join - http://wiki.apache.org/solr/FieldCollapsing - it supports faceting - block join http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 - patch only but really performant - http://wiki.apache.org/solr/SurroundQueryParser available in 4.0 can be used by the way proposed here http://goo.gl/R2bxc video http://vimeo.com/album/2012142/video/33817062 Have a good dive! On Mon, Dec 10, 2012 at 12:27 PM, Michael Jones michaelj...@gmail.comwrote: Hi, I know that solr doesn't provide support for nested documents, but can I add multiple attributes to a field? add document field name=test foo=one bar =two index=true And specify an index on those attributes? I have a nested document that needs to be saved and searched. If the above can not be achieved what would be a suitable alternative? Would I have to do something like: add document field name=test foo=one bar =two field name=foo index=trueone/field field name=bar index=truetwo/field And just return name=test ? Thanks -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: multiple attribute
Hi, Sorry if anyone found my question confusing. I have an XML document that is nested file foo bar/ thunk/ /foo /file And I know that with solr that you have to flatten your data, so I was just trying to workout the best way to do a search on nested document. I was looking to see if instead of having multiple nodes, I would have those nodes as attributes on one single node and still be able to search. So if my node looked like field name=person date=2000-01-01 location=earth username=bob job=test index=true / I would be able to search: date 2002 location = earth job = test But I'm not sure if that is the best way to do it? I know I would have to specify a type for each attribute in the config. On Mon, Dec 10, 2012 at 9:36 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Michael, Your question is a little bit confusing. Business entities have attributes. We model entities as documents, and attributes as fields. That's why adding attributes to a filed is contradictory. Btw there few nearby conceptions in Lucene, which are Payloads and TermsPositions. About the problem itself I can suggest: - http://wiki.apache.org/solr/Join - http://wiki.apache.org/solr/FieldCollapsing - it supports faceting - block join http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 - patch only but really performant - http://wiki.apache.org/solr/SurroundQueryParser available in 4.0 can be used by the way proposed here http://goo.gl/R2bxc video http://vimeo.com/album/2012142/video/33817062 Have a good dive! On Mon, Dec 10, 2012 at 12:27 PM, Michael Jones michaelj...@gmail.com wrote: Hi, I know that solr doesn't provide support for nested documents, but can I add multiple attributes to a field? add document field name=test foo=one bar =two index=true And specify an index on those attributes? I have a nested document that needs to be saved and searched. If the above can not be achieved what would be a suitable alternative? Would I have to do something like: add document field name=test foo=one bar =two field name=foo index=trueone/field field name=bar index=truetwo/field And just return name=test ? Thanks -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Wildcards and fuzzy/phonetic query
It's been two months since I asked about wildcards and phonetic filters, and finally the task of upgrading Solr to version 4.0 was prioritized in our project. So the last couple of days I've been working on it. Another team member upgraded Solr from 3.4 to 4.0, and I've been making changes to schema.xml to accommodate the new multiterm functionality. However, it doesn't seem to work.. Lowercasing is still not done when I do a fuzzy search, not through the regular index analyzer and its support of MultitermAwareComponents, and not when I try to define a special multiterm analyzer. Do I have to do anything special to enable the multiterm functionality in Solr 4.0? Regards, Hågen Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson: whether phonetic filters can be multiterm aware: I'd be leery of this, as I basically don't quite know how that would behave. You'd have to insure that the algorithms changed the first parts of the words uniformly, regardless of what followed. I'm pretty sure that _some_ phonetic algorithms do not follow this pattern, i.e. eric wouldn't necessarily have the same beginning as erickson. That said, some of the algorithms _may_ follow this rule and might be OK candidates for being MultiTermAware But, you don't need this in order to try it out. See the Expert Level Schema Possibilities at: http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ You can define your own analysis chain for wildcards as part of your fieldType definition and include whatever you want, whether or not it's MultiTermAware and it will be applied at query time. Use the analyzer type=query entry as a basis. _But_ you shouldn't include anything in this section that produces more than one output per input token. Note, token, not field. I.e. a really bad candidate for this section is WordDelimiterFilterFactory if you use the admin/analysis page (which you'll get to know intimately) and look at a type that has WordDelimiterFilterFactory in its chain and put something like erickErickson1234, you'll see what I mean.. Make sure and check the verbose box If you can determine that some of the phonetic algorithms _should_ be MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect it'll be on a case-by-case basis. Best Erick On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. (https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
about NRTCachingDirectory
I have a doubt about how NRTCachingDirectory works. As far as I've seen, it receives a delegator Directory and caches newly created segments. So, if MMapDirectory use to be the default: 1.- Does NRTCachingDirectory works acting sort of as a wrapper of MMap caching the new segments? 2.- If I have a master/slave setup and deploy a full optimized index with a single segment and the slave is configured with NRTCachingDirectory, will it try to cache that segment (I suppose not)? And let's say I remove the replication, and start adding docs to that slave, creating small segments every 10 minutes, will by default the NRTCachingDirectory start caching this new small segments? And finally, If I set up again the replication, when a full new index with single segment is deployed, how NRTCachingDirectory would behave? Know it's not a typical use case, but would like to know how it behaves in those different situations. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/about-NRTCachingDirectory-tp4025665.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Versioning
Depends on exactly what you mean by versioning. But if you mean that every document in Solr gets a version-number which is increased every time the document is updated, all you need to do is to add a _version_ field in you schema: http://wiki.apache.org/solr/SolrCloud#Required_Config Believe you will get optimistic locking out-of-the-box if you do this (you will also need the updateLog configured in solrconfig.xml). Or else you can take my patch for SOLR-3178 and have optimistic locking work as described on: http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics Regards, Per Steffensen Sushil jain skrev: Hello Everyone, I am a Solr beginner. I just want to know if versioning of data is possible in Solr, if yes then please share the procedure. Thanks Regards, Sushil Jain
Re: stress testing Solr 4.x
Hi Mark, Usually I was stopping them with ctrl-c but several times, one of the servers was hung and had to be stopped with kill -9. Thanks, Alain On Mon, Dec 10, 2012 at 5:09 AM, Mark Miller markrmil...@gmail.com wrote: Hmmm...EOF on the segments file is odd... How were you killing the nodes? Just stopping them or kill -9 or what? - Mark On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com wrote: Hi, I have re-ran my tests today after I updated Solr 4.1 to apply the patch. First, the good news : it works i.e. if I stop all three Solr servers and then restart one, it will try to find the other two for a while (about 3 minutes I think) then give up, become the leader and start processing requests. Now, the not-so-good : I encountered several exceptions that seem to indicate 2 other issues. Here are the relevant bits. 1) The ZK session expiry problem : not sure what caused it but I did a few Solr or ZK node restarts while the system was under load. SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710) ... 10 more SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue/qn- at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207) at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229) at org.apache.solr.cloud.ZkController.publish(ZkController.java:824) at org.apache.solr.cloud.ZkController.publish(ZkController.java:797) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed to start due to the exceptions below and both servers went into a seemingly endless loop of exponential retries. The fix was to stop both faulty servers, remove the data directory of this core and restart : replication then took place correctly. As above, not sure what exactly caused this to happen; no updates were taking place, only searches. On server 1 : INFO: Closing
Re: Wildcards and fuzzy/phonetic query
Lowercasing actually seems to work with Wildcard queries, but not with fuzzy queries. Are there any reasons why I should experience such a difference? Regards, Haagen Den 10. des. 2012 kl. 13:24 skrev Haagen Hasle: It's been two months since I asked about wildcards and phonetic filters, and finally the task of upgrading Solr to version 4.0 was prioritized in our project. So the last couple of days I've been working on it. Another team member upgraded Solr from 3.4 to 4.0, and I've been making changes to schema.xml to accommodate the new multiterm functionality. However, it doesn't seem to work.. Lowercasing is still not done when I do a fuzzy search, not through the regular index analyzer and its support of MultitermAwareComponents, and not when I try to define a special multiterm analyzer. Do I have to do anything special to enable the multiterm functionality in Solr 4.0? Regards, Hågen Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson: whether phonetic filters can be multiterm aware: I'd be leery of this, as I basically don't quite know how that would behave. You'd have to insure that the algorithms changed the first parts of the words uniformly, regardless of what followed. I'm pretty sure that _some_ phonetic algorithms do not follow this pattern, i.e. eric wouldn't necessarily have the same beginning as erickson. That said, some of the algorithms _may_ follow this rule and might be OK candidates for being MultiTermAware But, you don't need this in order to try it out. See the Expert Level Schema Possibilities at: http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ You can define your own analysis chain for wildcards as part of your fieldType definition and include whatever you want, whether or not it's MultiTermAware and it will be applied at query time. Use the analyzer type=query entry as a basis. _But_ you shouldn't include anything in this section that produces more than one output per input token. Note, token, not field. I.e. a really bad candidate for this section is WordDelimiterFilterFactory if you use the admin/analysis page (which you'll get to know intimately) and look at a type that has WordDelimiterFilterFactory in its chain and put something like erickErickson1234, you'll see what I mean.. Make sure and check the verbose box If you can determine that some of the phonetic algorithms _should_ be MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect it'll be on a case-by-case basis. Best Erick On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. (https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
RE: Modeling openinghours using multipoints
Maybe it would? I don't completely get your drift. But you're talking about a user writing a bunch of custom code to build, save, and query the bitmap whereas working on top of existing functionality seems to me a lot more maintainable on the user's part. ~ David From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com] Sent: Sunday, December 09, 2012 6:35 PM To: Smiley, David W. Subject: Re: Modeling openinghours using multipoints If these are not raw times, but quantized on-the-hour, would it be faster to create a bit map of hours and then query across the bit maps? On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden email]UrlBlockedError.aspx wrote: Thanks for the discussion, I've added this to my bag of tricks, way cool! Erick On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden email]UrlBlockedError.aspx wrote: Brilliant! Got some great ideas for this. Indeed all sorts of usecases which use multiple temporal ranges could benefit.. Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants to model multiple temporary offers per product (free shopping for christmas, 20% discount for Black friday , etc) .. All possible with this out of the box. Factor in 'offer category' in x and y as well for some extra powerfull querying. Yup im enthousiastic about it , which im sure you can tell :) Thanks a lot David, Cheers, Geert-Jan Sent from my iPhone On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] [hidden email]UrlBlockedError.aspx wrote: britske wrote That's seriously awesome! Some change in the query though: You described: To query for a business that is open during at least some part of a given time duration I want To query for a business that is open during at least the entire given time duration. Feels like a small difference but probably isn't (I'm still wrapping my head on the intersect query I must admit) So this would be a slightly different rectangle query. Interestingly, you simply swap the location in the rectangle where you put the start and end time. In summary: Indexed span CONTAINS query span: minX minY maxX maxY - 0 end start * Indexed span INTERSECTS (i.e. OVERLAPS) query span: minX minY maxX maxY - 0 start end * Indexed span WITHIN query span: minX minY maxX maxY - start 0 * end I'm using '*' here to denote the max possible value. At some point I may add that as a feature. That was a fun exercise! I give you credit in prodding me in this direction as I'm not sure if this use of spatial would have occurred to me otherwise. britske wrote Moreover, any indication on performance? Should, say, 50.000 docs with about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 'your mileage may very' etc. but just a guestimate :) You should have absolutely no problem. The real clincher in your favor is the fact that you only need 9600 discrete time values (so you said), not Long.MAX_VALUE. Using Long.MAX_VALUE would simply not be possible with the current implementation because it's using Doubles which has 52 bits of precision not the 64 that would be required to be a complete substitute for any time/date. Even given the 52 bits, a quad SpatialPrefixTree with maxLevels=52 would probably not perform well or might fail; not sure. Eventually when I have time to work on an implementation that can be based on a configurable number of grid cells (not unlike how you can configure precisionStep on the Trie numeric fields), 52 should be no problem. I'll have to remember to refer back to this email on the approach if I create a field type that wraps this functionality. ~ David britske wrote Again, this looks good! Geert-Jan 2012/12/8 David Smiley (@MITRE.org) [via Lucene] [hidden email] Hello again Geert-Jan! What you're trying to do is indeed possible with Solr 4 out of the box. Other terminology people use for this is multi-value time duration. This creative solution is a pure application of spatial without the geospatial notion -- we're not using an earth or other sphere model -- it's a flat plane. So no need to make reference to longitude latitude, it's x y. I would put opening time into x, and closing time into y. To express a point, use x y (x space y), and supply this as a string to your SpatialRecursivePrefixTreeFieldType based field for indexing. You can give it multiple values and it will work correctly; this is one of RPT's main features that set it apart from Solr 3 spatial. To query for a business that is open during at least some part of a given time duration, say 6-8 o'clock, the query would look like openDuration:Intersects(minX minY maxX maxY) and put 0 or minX (always), 6 for minY (start time), 8 for maxX (end time), and the largest possible value for maxY. You wouldn't
RE: Modeling openinghours using multipoints
Mikhail, Join of any nature should be chosen in last resort to using a single index (when it's possible), especially if there is minimal to no denormalization of data. In this specific case, if the average document had 200 temporal ranges to index (100 days out, 2 per day), a Join based solution would have 200+1 documents in the index. That's an explosion of the document count by 200x! Yoyzah! Obviously what we're discussing here, modeling numeric ranges as x-y points has its limits -- namely that the spatial module is limited to 2 dimensions currently. It's plausible to see it generalized, but I don't think it'll scale well beyond 4-5 dimensions. I recall a research paper talking about multi-dimensional numeric indexes seriously breaking down at about 6. ~ David From: Mikhail Khludnev [via Lucene] [ml-node+s472066n4025602...@n3.nabble.com] Sent: Monday, December 10, 2012 12:15 AM To: Smiley, David W. Subject: Re: Modeling openinghours using multipoints Colleagues, What are benefits of this approach at contrast to block join? Thanks 10.12.2012 3:35 пользователь Lance Norskog [hidden email]UrlBlockedError.aspx написал: If these are not raw times, but quantized on-the-hour, would it be faster to create a bit map of hours and then query across the bit maps? On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden email]UrlBlockedError.aspx wrote: Thanks for the discussion, I've added this to my bag of tricks, way cool! Erick On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden email]UrlBlockedError.aspx wrote: Brilliant! Got some great ideas for this. Indeed all sorts of usecases which use multiple temporal ranges could benefit.. Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants to model multiple temporary offers per product (free shopping for christmas, 20% discount for Black friday , etc) .. All possible with this out of the box. Factor in 'offer category' in x and y as well for some extra powerfull querying. Yup im enthousiastic about it , which im sure you can tell :) Thanks a lot David, Cheers, Geert-Jan Sent from my iPhone On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] [hidden email]UrlBlockedError.aspx wrote: britske wrote That's seriously awesome! Some change in the query though: You described: To query for a business that is open during at least some part of a given time duration I want To query for a business that is open during at least the entire given time duration. Feels like a small difference but probably isn't (I'm still wrapping my head on the intersect query I must admit) So this would be a slightly different rectangle query. Interestingly, you simply swap the location in the rectangle where you put the start and end time. In summary: Indexed span CONTAINS query span: minX minY maxX maxY - 0 end start * Indexed span INTERSECTS (i.e. OVERLAPS) query span: minX minY maxX maxY - 0 start end * Indexed span WITHIN query span: minX minY maxX maxY - start 0 * end I'm using '*' here to denote the max possible value. At some point I may add that as a feature. That was a fun exercise! I give you credit in prodding me in this direction as I'm not sure if this use of spatial would have occurred to me otherwise. britske wrote Moreover, any indication on performance? Should, say, 50.000 docs with about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 'your mileage may very' etc. but just a guestimate :) You should have absolutely no problem. The real clincher in your favor is the fact that you only need 9600 discrete time values (so you said), not Long.MAX_VALUE. Using Long.MAX_VALUE would simply not be possible with the current implementation because it's using Doubles which has 52 bits of precision not the 64 that would be required to be a complete substitute for any time/date. Even given the 52 bits, a quad SpatialPrefixTree with maxLevels=52 would probably not perform well or might fail; not sure. Eventually when I have time to work on an implementation that can be based on a configurable number of grid cells (not unlike how you can configure precisionStep on the Trie numeric fields), 52 should be no problem. I'll have to remember to refer back to this email on the approach if I create a field type that wraps this functionality. ~ David britske wrote Again, this looks good! Geert-Jan 2012/12/8 David Smiley (@MITRE.org) [via Lucene] [hidden email] Hello again Geert-Jan! What you're trying to do is indeed possible with Solr 4 out of the box. Other terminology people use for this is multi-value time duration. This creative solution is a pure application of spatial without the geospatial notion -- we're not
Re: setting hostPort for SolrCloud
Thanks for the information. Bill On Fri, Dec 7, 2012 at 3:04 PM, Mark Miller markrmil...@gmail.com wrote: Yup, solr.xml is pretty much required - especially if you want to use solrcloud. The only reason anything works without is for back compat. We are working towards removing the need for it, but's considered required these days. - Mark On Dec 7, 2012, at 11:04 AM, Bill Au bill.w...@gmail.com wrote: I actually was not using a solr.xml. I am only using a single core. I am using the default core name collection1. I know for sure I will not be using more than a single core so I did not bother with having a solr.xml. Is that a bad thing? Everything works when I had tomcat config to run on port 8983. But once I configure tomcat to use a different port, I notice that SolrCloud is still using port 8983 so it wasn't working. I then tried adding -Djetty.port=8000 and -DhostPort=8000 to the environment variable JAVA_OPTS before running the tomcat start script bin/startup.sh. But SolrCloud was still using 8983. I ended up setting hostPort in solr.xml and got things working. It solr.xml is required, then I can just set the port for SolrCloud in there. But I was hoping I did not have to bother with solr.xml at all. One less configuration file, one less thing that can go wrong. Bill On Wed, Dec 5, 2012 at 4:40 PM, Mark Miller markrmil...@gmail.com wrote: Be aware that you still have to setup tomcat to run Solr on the right port - and you also have to provide the port to Solr on startup. With jetty we do both with -Djetty.port - with Tomcat you have to setup Tomcat to run on the right port *and* tell Solr what that port is. By default that means also passing -Djetty.port - but you can change that to whatever you want in solr.xml (to hostPort or solr.port or whatever). The problem is that it's difficult for a webapp to find what ports it's running on - you can only do it when a request actually comes in to my knowledge. - Mark On Dec 5, 2012, at 1:05 PM, Bill Au bill.w...@gmail.com wrote: I am using tomcat. In my tomcat start script I have tried setting system properties with both -Djetty.port=8080 and -DhostPort=8080 but neither changed the host port for SolrCloud. It still uses the default 8983. Bill On Wed, Dec 5, 2012 at 12:11 PM, Jack Krupansky j...@basetechnology.com wrote: Solr runs in a container and the container controls the port. So, you need to tell the container which port to use. For example, java -Djetty.port=8180 -jar start.jar -- Jack Krupansky -Original Message- From: Bill Au Sent: Wednesday, December 05, 2012 10:30 AM To: solr-user@lucene.apache.org Subject: setting hostPort for SolrCloud Can hostPort for SolrCloud only be set in solr.xml? I tried setting the system property hostPort and jetty.port on the Java command line but neither of them work. Bill
RE: Need help with delta import
Its surprising that your full import is working for you. Both your query and your deltaImportQuery have: SELECT ID FROM... ...So both your full-import (query attr) and your delta-import (deltaImportQuery attr) are only getting the ID field from your db. Shouldn't you be at least be getting email and fname to index also? So by changing both these queries to something like: SELECT ID, EMAIL, FNAME FROM... ...You should see these 3 fields come through after your full-import. Then, after changing data in your rbdms and doing a delta you should see the data update. Besides this, your log looks right: Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta O: Completed ModifiedRowKey for Entity: person rows obtained : 8 ...so it looks like it was going to update 8 rows. But seeing that your deltaImportQuery is only pulling back the ID, it couldn't possibly change the values for fields like email and fname. Make sense? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: umajava [mailto:umaj...@gmail.com] Sent: Thursday, December 06, 2012 8:59 PM To: solr-user@lucene.apache.org Subject: Need help with delta import Hi, I am trying to do delta import and I am not able to get it to work. How ever full import does work. Could you please help me figure out what I am missing? data-config.xml file document name=persons entity name=person pk=ID query=select id from uma_test deltaImportQuery=select id from uma_test where ID='${dataimport.delta.id}' deltaQuery=select ID from uma_test where upd_ts gt; '${dataimport.last_index_time}' field column=ID name=id indexed=true stored=true / field column=email name=email indexed=true stored=true / field column=fname name=fname indexed=true stored=true / /entity /document dataimport.properties file metadataObject.last_index_time=2012-09-20 11\:12\:47 person.last_index_time=2012-11-18 13\:54\:29 interval=10 port=8080 server=localhost params=/select?qt\=/dataimportcommand\=delta-importclean\=falsecommit\=true webapp=solr syncEnabled=1 last_index_time=2012-11-18 13\:54\:29 syncCores=coreHr,coreEn log output Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DataImporter maybeReloadConfiguration O: Loading DIH Configuration: C://Software//apache-solr-4.0.0//apache-solr-4.0.0//Uma//db//db-data-config.xml Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DataImporter loadDataConfig O: Data Configuration loaded successfully Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DataImporter doDeltaImport O: Starting Delta Import Dec-2012 02:49:24 org.apache.solr.core.SolrCore execute O: [collection1] webapp=/solr path=/dataimport params={commit=falsecommand=delta-import} status=0 QTime=16 Dec-2012 02:49:24 org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties O: Read dataimport.properties Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder doDelta O: Starting delta collection. Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta O: Running ModifiedRowKey() for Entity: person Dec-2012 02:49:24 org.apache.solr.handler.dataimport.JdbcDataSource$1 call O: Creating a connection for entity person with URL: jdbc:mysql://localhost/test Dec-2012 02:49:24 org.apache.solr.handler.dataimport.JdbcDataSource$1 call O: Time taken for getConnection(): 125 Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta O: Completed ModifiedRowKey for Entity: person rows obtained : 8 Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta O: Completed DeletedRowKey for Entity: person rows obtained : 0 Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder collectDelta O: Completed parentDeltaQuery for Entity: person Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder doDelta O: Delta Import completed successfully Dec-2012 02:49:24 org.apache.solr.handler.dataimport.DocBuilder execute O: Time taken = 0:0:0.156 Dec-2012 02:49:24 org.apache.solr.update.processor.LogUpdateProcessor finish O: [collection1] webapp=/solr path=/dataimport params={commit=falsecommand=delta-import} status=0 QTime=16 {} 0 Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-delta-import-tp4025003.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Intersect Circle is matching points way outside the radius ( Solr 4 Spatial)
Javi, The center point of your query circle and the indexed point is just under 49.9km (just under your query radius); this is why it matched. I plugged in your numbers here: http://www.movable-type.co.uk/scripts/latlong.html Perhaps you are misled by the projection you are using to view the map, on how far away the points are. FYI The default distErrPct of 0.025 should be fine in general and wasn't the issue. You should (almost) never use 0.0 on the field type because that means your indexed non-point shapes (rectangles you said) will use a ton of indexed terms unless they are very small rectangles (relative to your grid resolution -- 1 meter in your case). Using distErrPct=0 in the query is safe, on the other hand. Cheers, David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025704.html Sent from the Solr - User mailing list archive at Nabble.com.
highlighting multiple occurrences
Hi all, I have a solr instance with one field configured for highlighting as follows: str name=hlon/str str name=hl.flconteudo/str str name=hl.fragsize500/str str name=hl.maxAnalyzedChars9/str str name=hl.simple.prelt;font style=background-color: yellowgt;/str but I was willing to have the highlighter display multiple occurrences of the query instead of the first one... is it possible? I tried searching this mailing list but I couldn't find anyone mentioning this... best regards, Rafael -- View this message in context: http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: highlighting multiple occurrences
Did you mean that you want multiple snippets? http://wiki.apache.org/solr/HighlightingParameters#hl.snippets -Original Message- From: Rafael Ribeiro [mailto:rafae...@gmail.com] Sent: Monday, December 10, 2012 11:20 AM To: solr-user@lucene.apache.org Subject: highlighting multiple occurrences Hi all, I have a solr instance with one field configured for highlighting as follows: str name=hlon/str str name=hl.flconteudo/str str name=hl.fragsize500/str str name=hl.maxAnalyzedChars9/str str name=hl.simple.prelt;font style=background-color: yellowgt;/str but I was willing to have the highlighter display multiple occurrences of the query instead of the first one... is it possible? I tried searching this mailing list but I couldn't find anyone mentioning this... best regards, Rafael -- View this message in context: http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Is there a way to round data when index, but still able to return original content?
When you apply your analyzers/filters/tokenizers, the result value is kept in the indexed; however, the input value is actually stored. For example, from schema.xml file: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType This particular field type will strip out the HTML. So if the input is: bHello/b It's being tokenized in the index as Hello It's being stored (and hence returned to you) as bHello/b So you can create your own charFilter or filter class which converts your date for the indexer, but the original data will automatically be stored. I hope this makes sense. -Original Message- From: jefferyyuan [mailto:yuanyun...@gmail.com] Sent: Monday, December 10, 2012 10:24 AM To: solr-user@lucene.apache.org Subject: Re: Is there a way to round data when index, but still able to return original content? Erick, Thanks for your reply. I know how to implement the solution 1. But no idea how yo implement the solution 2 you mentioned: === If you put some sort of (perhaps custom) filter in place, then the original value would go in as stored and the altered value would get in the index and you could do both in the same field. Can you please describe more about how to store original data and index the altered value in the same filed? Thanks :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025695.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Modeling openinghours using multipoints
Bit maps can be done with a separate term for each bit. You search for all of the terms in the bit range you want. On 12/10/2012 06:34 AM, David Smiley (@MITRE.org) wrote: Maybe it would? I don't completely get your drift. But you're talking about a user writing a bunch of custom code to build, save, and query the bitmap whereas working on top of existing functionality seems to me a lot more maintainable on the user's part. ~ David From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com] Sent: Sunday, December 09, 2012 6:35 PM To: Smiley, David W. Subject: Re: Modeling openinghours using multipoints If these are not raw times, but quantized on-the-hour, would it be faster to create a bit map of hours and then query across the bit maps? On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden email]UrlBlockedError.aspx wrote: Thanks for the discussion, I've added this to my bag of tricks, way cool! Erick On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden email]UrlBlockedError.aspx wrote: Brilliant! Got some great ideas for this. Indeed all sorts of usecases which use multiple temporal ranges could benefit.. Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants to model multiple temporary offers per product (free shopping for christmas, 20% discount for Black friday , etc) .. All possible with this out of the box. Factor in 'offer category' in x and y as well for some extra powerfull querying. Yup im enthousiastic about it , which im sure you can tell :) Thanks a lot David, Cheers, Geert-Jan Sent from my iPhone On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] [hidden email]UrlBlockedError.aspx wrote: britske wrote That's seriously awesome! Some change in the query though: You described: To query for a business that is open during at least some part of a given time duration I want To query for a business that is open during at least the entire given time duration. Feels like a small difference but probably isn't (I'm still wrapping my head on the intersect query I must admit) So this would be a slightly different rectangle query. Interestingly, you simply swap the location in the rectangle where you put the start and end time. In summary: Indexed span CONTAINS query span: minX minY maxX maxY - 0 end start * Indexed span INTERSECTS (i.e. OVERLAPS) query span: minX minY maxX maxY - 0 start end * Indexed span WITHIN query span: minX minY maxX maxY - start 0 * end I'm using '*' here to denote the max possible value. At some point I may add that as a feature. That was a fun exercise! I give you credit in prodding me in this direction as I'm not sure if this use of spatial would have occurred to me otherwise. britske wrote Moreover, any indication on performance? Should, say, 50.000 docs with about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 'your mileage may very' etc. but just a guestimate :) You should have absolutely no problem. The real clincher in your favor is the fact that you only need 9600 discrete time values (so you said), not Long.MAX_VALUE. Using Long.MAX_VALUE would simply not be possible with the current implementation because it's using Doubles which has 52 bits of precision not the 64 that would be required to be a complete substitute for any time/date. Even given the 52 bits, a quad SpatialPrefixTree with maxLevels=52 would probably not perform well or might fail; not sure. Eventually when I have time to work on an implementation that can be based on a configurable number of grid cells (not unlike how you can configure precisionStep on the Trie numeric fields), 52 should be no problem. I'll have to remember to refer back to this email on the approach if I create a field type that wraps this functionality. ~ David britske wrote Again, this looks good! Geert-Jan 2012/12/8 David Smiley (@MITRE.org) [via Lucene] [hidden email] Hello again Geert-Jan! What you're trying to do is indeed possible with Solr 4 out of the box. Other terminology people use for this is multi-value time duration. This creative solution is a pure application of spatial without the geospatial notion -- we're not using an earth or other sphere model -- it's a flat plane. So no need to make reference to longitude latitude, it's x y. I would put opening time into x, and closing time into y. To express a point, use x y (x space y), and supply this as a string to your SpatialRecursivePrefixTreeFieldType based field for indexing. You can give it multiple values and it will work correctly; this is one of RPT's main features that set it apart from Solr 3 spatial. To query for a business that is open during at least some part of a given time duration, say 6-8 o'clock, the query would look like openDuration:Intersects(minX minY maxX maxY) and put 0 or minX (always), 6 for minY (start time), 8 for maxX (end time), and the largest
RE: highlighting multiple occurrences
yep! I tried enabling this and settings various values bot no success... still it only shows the first fragment of the search found... I also saw this http://lucene.472066.n3.nabble.com/hl-snippets-in-solr-3-1-td2445178.html but increasing maxAnalyzedChars (that was already huge) produced no difference at all. Do I have to change anything else? For example, something on the velocity template??? best regards, Rafael -- View this message in context: http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715p4025771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems with UUIDUpdateProcessorFactory on SolrCloud
: In logs I can see some UUID is being generated when adding new document: : INFO: [selekta] webapp=/solr path=/update params={} : {add=[504a4ea8-7b82-48b6-a2fa-b8dd56376fd7]} 0 27 : but when I query Solr I got: : Dec 07, 2012 1:52:10 PM org.apache.solr.common.SolrException log : SEVERE: java.lang.NullPointerException : at : org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:879) Hmm... 1) exactly which version of solr are you using? 2) please show us the uniqueKey declaration from your schema.xml along with the field and fieldType declarations for that field. 3) what exacty does the config for your update chain look like? 4) are you certain every document was indexed using that chain, with that processor? you didn't have any old documents in your index? ...because that error seems to be suggesting that you have documents in your index w/o a stored value for your uniqueKey field (so the bug is happening when the results get merged) but solrcloud shouldn't be letting you do that at all. -Hoss
RE: highlighting multiple occurrences
Rafael, Can you share more on how you are rendering the results in your velocity template? The data is probably being sent to you, but you have to loop through and actually access the data. -Original Message- From: Rafael Ribeiro [mailto:rafae...@gmail.com] Sent: Monday, December 10, 2012 2:26 PM To: solr-user@lucene.apache.org Subject: RE: highlighting multiple occurrences yep! I tried enabling this and settings various values bot no success... still it only shows the first fragment of the search found... I also saw this http://lucene.472066.n3.nabble.com/hl-snippets-in-solr-3-1-td2445178.html but increasing maxAnalyzedChars (that was already huge) produced no difference at all. Do I have to change anything else? For example, something on the velocity template??? best regards, Rafael -- View this message in context: http://lucene.472066.n3.nabble.com/highlighting-multiple-occurrences-tp4025715p4025771.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Is there a way to round data when index, but still able to return original content?
Sorry to ask a question again, but I want to round date(TireDate) and TrieLongField, seems they don't support configuring analyzer: charFilter , tokenizer or filter. What I should do? Now I am thinking to write my custom date or long field, is there any other way? :) Thanks :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nested document workaround?
Would http://search-lucene.com/?q=solr+join do it for you? Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Dec 10, 2012 at 1:17 PM, Michael Jones michaelj...@gmail.comwrote: Hi, I realise that you can't get nested document to search in solr. But if I did this: doc field name=sourceTest/field field name=typebar/field field name=labelMap/field field name=date_long/ field name=date_short/ field name=persons-0446_nameGraham/field field name=persons-0446_linkfoo/field field name=persons-0446_locationCrosby/field field name=persons-0446_office/ field name=persons-0188_nameBob/field field name=persons-0188_linkfoo/field field name=persons-0188_locationtest/field field name=persons-0188_office/ field name=persons-0183_nameDenzil/field field name=persons-0183_linkfoo/field field name=persons-0183_locationtest/field field name=persons-0183_office/ /doc Could I still search for location with *_location ? Or is there another way to get relational data into solr? Thanks
RE: Is there a way to round data when index, but still able to return original content?
Hi, Nope...they don't. Generally, I am not sure if I'd bother rounding this information to reduce the index size. Have you determined how much index size space you'll actually be saving? I am not confident that it'd be worth your time; i.e. I'd just go with indexing/storing the time information as well. Regardless, if you do want to go this route, the only way I can think of that wouldn't be a complicated solution is to have one field that is indexed/rounded (and not stored) and another field that is just stored (and not indexed). Hope this helps. -Original Message- From: jefferyyuan [mailto:yuanyun...@gmail.com] Sent: Monday, December 10, 2012 3:14 PM To: solr-user@lucene.apache.org Subject: RE: Is there a way to round data when index, but still able to return original content? Sorry to ask a question again, but I want to round date(TireDate) and TrieLongField, seems they don't support configuring analyzer: charFilter , tokenizer or filter. What I should do? Now I am thinking to write my custom date or long field, is there any other way? :) Thanks :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nested document workaround?
How about aggregating all location fields into one searchable multi-Value field using copyField? It could be an index-only collection. Then, you just say all_locations:Crosby Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Dec 11, 2012 at 5:17 AM, Michael Jones michaelj...@gmail.comwrote: Hi, I realise that you can't get nested document to search in solr. But if I did this: doc field name=sourceTest/field field name=typebar/field field name=labelMap/field field name=date_long/ field name=date_short/ field name=persons-0446_nameGraham/field field name=persons-0446_linkfoo/field field name=persons-0446_locationCrosby/field field name=persons-0446_office/ field name=persons-0188_nameBob/field field name=persons-0188_linkfoo/field field name=persons-0188_locationtest/field field name=persons-0188_office/ field name=persons-0183_nameDenzil/field field name=persons-0183_linkfoo/field field name=persons-0183_locationtest/field field name=persons-0183_office/ /doc Could I still search for location with *_location ? Or is there another way to get relational data into solr? Thanks
Retrieving one object
I have stored multiple objects with the values; uniqueUri name timestamp. There can be multiple object with the same name, but they will have different timestamps (and different uniqueUri) I want to retrieve the object of a given name with the latest timestamp. As an example I might have 1. uniqueUri=99661, name=FOO, timestamp=1355174089270 2. uniqueUri=98765, name=FOO, timestamp=1355174089870 I want to retrieve only object 2. I have tried retrieving 1 row and sorting DECENDING on timestamp, but this still returns the first object. How can I do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Retrieving-one-object-tp4025810.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud OOM heap space
Hi All, I am getting constant OOM errors on a SolrCloud instance. (3 shards, 2 solr instance in each shard, each server with 22gb Of Memory, Xmx = 12GB for java ) . Here is a error log: http://pastie.org/private/dcga3kfatvvamslmtvrp0g As of now Iam not indexing any more documents. The total size of index on each server is around 36-40 GB. -Xmx12288m -DSTOP.PORT=8079 -DSTOP.KEY=ABC -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log -Djetty.port=8983 -DzkHost=ZooKeeperServer001:2181 -jar start.jar If anyone has faced similar issues please let me know. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documentation issue: apache-solr-XXX.jar?
Thanks Shawn, I am looking at README.txt file and jars/wars that came with Solr 4 binary distribution. So, if it is out of date, should I do Jira request? Or are documentation fixes handled differently? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Dec 11, 2012 at 9:43 AM, Shawn Heisey s...@elyograg.org wrote: On 12/10/2012 3:08 PM, Alexandre Rafalovitch wrote: In README.txt, it says: dist/apache-solr-XX.jar The Apache Solr Libraries. This JAR file is needed to compile Apache Solr Plugins (see http://wiki.apache.org/solr/**SolrPluginshttp://wiki.apache.org/solr/SolrPluginsfor more information). But I cannot see that in my 4.0 distribution. Has that changed (and doc needs to be updated) or am I missing something? If you have the dist directory at all, then you probably have the binary distribution, which is what you'll need. If your downloaded file contains -src or the unpacked file does not contain a dist directory, then you likely have the source distribution. With the source distribution, you would have to compile the dist target. Better to just get the binary distribution. Looking at the dist directory on what I just downloaded, it appears that most of the functionality required for writing code related to Solr would actually be in apache-solr-core-4.0.0.jar, and depending on what you are doing, you may need one or more of the other jars there. It looks like whatever documentation you are reading is definitely out of date. Thanks, Shawn
RE: SolrCloud OOM heap space
Hi - the stack trace and preceding log entries look similar to what i've seen and reported on. A patch has just been attached to the issue, perhaps you can try it if the description matches your scenario and report back on Jira. https://issues.apache.org/jira/browse/SOLR-4144 -Original message- From:shreejay shreej...@gmail.com Sent: Mon 10-Dec-2012 23:22 To: solr-user@lucene.apache.org Subject: SolrCloud OOM heap space Hi All, I am getting constant OOM errors on a SolrCloud instance. (3 shards, 2 solr instance in each shard, each server with 22gb Of Memory, Xmx = 12GB for java ) . Here is a error log: http://pastie.org/private/dcga3kfatvvamslmtvrp0g As of now Iam not indexing any more documents. The total size of index on each server is around 36-40 GB. -Xmx12288m -DSTOP.PORT=8079 -DSTOP.KEY=ABC -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log -Djetty.port=8983 -DzkHost=ZooKeeperServer001:2181 -jar start.jar If anyone has faced similar issues please let me know. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documentation issue: apache-solr-XXX.jar?
On 12/10/2012 3:51 PM, Alexandre Rafalovitch wrote: Thanks Shawn, I am looking at README.txt file and jars/wars that came with Solr 4 binary distribution. So, if it is out of date, should I do Jira request? Or are documentation fixes handled differently? Yes, filing a jira issue is an excellent idea.The README for trunk (5.x) has the same out of date info, so put both 4.1 and 5.0 in for the 'fix' versions. I'm not terrribly surprised that something like this slipped through the cracks. The number of incremental and major changes in Solr/Lucene that made up the 4.0 version is CRAZY. Thanks, Shawn
RE: SolrCloud OOM heap space
Thanks Markus. Is this issue only on 4.x and 5.x branches? I am currently running a v recent build of 4.x branch with an applied patch. I just want to make sure that this is not an issue with 4.0. In which case I can think of applying my patch to 4.0 instead of 4x or 5x. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025839.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documentation issue: apache-solr-XXX.jar?
: Looking at the dist directory on what I just downloaded, it appears that most : of the functionality required for writing code related to Solr would actually : be in apache-solr-core-4.0.0.jar, and depending on what you are doing, you may : need one or more of the other jars there. It looks like whatever : documentation you are reading is definitely out of date. I'm confused ... how is this info out ot date? 1) it's in a section labled Files included in an Apache Solr binary distribution to try and make it clear that the dist/ dir only exists in the binary distributions. the dist/apache-solr-*.jar files are in fact what you need to compile against if you are building a plugin (admitedly: you don't need to compile against *all* of them for a simple plugin). Don't get me wrong: I'm not saying there is no problem -- the README.txt should be targeting knew users/developers, not old farts like me who know all of this stuff in my sleep. If as a new uuser you are confused by README.txt then i am eger to change it to be less confusing, i'm just not sure i understand what the confusion is. -Hoss
RE: SolrCloud OOM heap space
Hi - We're using trunk (5x) but we don't see it on trunk builds from a few months ago. In the case of the linked issue the oom occurs some time after start up but i'm not sure this applies to you. You can test the patch if you think it applies to you, we will test it tomorrow. If the patch does not help you or you experience a different issue you might want to open a new issue. -Original message- From:shreejay shreej...@gmail.com Sent: Tue 11-Dec-2012 00:31 To: solr-user@lucene.apache.org Subject: RE: SolrCloud OOM heap space Thanks Markus. Is this issue only on 4.x and 5.x branches? I am currently running a v recent build of 4.x branch with an applied patch. I just want to make sure that this is not an issue with 4.0. In which case I can think of applying my patch to 4.0 instead of 4x or 5x. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025839.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documentation issue: apache-solr-XXX.jar?
Hi Chris (Hoss?), The issue is that README refers to a specific file apache-solr-XXX.jar, which does not exist. There is apache-solr-4.0.0.war which is referred in a para before, but not this one. So, maybe the fix is just to say that there is a bunch of jars now. (apache-solr-component-XXX.jar ?) Anyway, I created https://issues.apache.org/jira/browse/SOLR-4163. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Dec 11, 2012 at 10:25 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Looking at the dist directory on what I just downloaded, it appears that most : of the functionality required for writing code related to Solr would actually : be in apache-solr-core-4.0.0.jar, and depending on what you are doing, you may : need one or more of the other jars there. It looks like whatever : documentation you are reading is definitely out of date. I'm confused ... how is this info out ot date? 1) it's in a section labled Files included in an Apache Solr binary distribution to try and make it clear that the dist/ dir only exists in the binary distributions. the dist/apache-solr-*.jar files are in fact what you need to compile against if you are building a plugin (admitedly: you don't need to compile against *all* of them for a simple plugin). Don't get me wrong: I'm not saying there is no problem -- the README.txt should be targeting knew users/developers, not old farts like me who know all of this stuff in my sleep. If as a new uuser you are confused by README.txt then i am eger to change it to be less confusing, i'm just not sure i understand what the confusion is. -Hoss
RE: SolrCloud OOM heap space
Thanks Marcus. I will apply the patch to the 4x branch I have, and report back. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025858.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Intersect Circle is matching points way outside the radius ( Solr 4 Spatial)
Hi David, As it happens the points are using the right projection, I can see them in the same position using the page you just provided. There is something wrong with the radius of the circle though I need to investigate that but it is a relief to know that there is nothing wrong with Solr and that I didn't mix the concepts, it is just as in many cases the problem is somewhere else where you would never imagine. Thanks for the hint. Cheers, Javier On 11 December 2012 02:47, David Smiley (@MITRE.org) dsmi...@mitre.orgwrote: Javi, The center point of your query circle and the indexed point is just under 49.9km (just under your query radius); this is why it matched. I plugged in your numbers here: http://www.movable-type.co.uk/scripts/latlong.html Perhaps you are misled by the projection you are using to view the map, on how far away the points are. FYI The default distErrPct of 0.025 should be fine in general and wasn't the issue. You should (almost) never use 0.0 on the field type because that means your indexed non-point shapes (rectangles you said) will use a ton of indexed terms unless they are very small rectangles (relative to your grid resolution -- 1 meter in your case). Using distErrPct=0 in the query is safe, on the other hand. Cheers, David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025704.html Sent from the Solr - User mailing list archive at Nabble.com.
How to parse XML attributes with prefix using DIH?
Hi there, I'm new to Solr and DIH, recently I've been planning to use Solr/DIH to index some local xml files. Following the DIH example page on solr wiki, most things work fine, but I found that xml attributes with prefix cannot be parse. take the following xml file to be indexed for instance: --- book xmlns:bk='urn:samples' bk:genre='novel' self='test1' idtest/id title Pride And Prejudice/title /book --- The data-config.xml is like: --- field column=tsip.action xpath=/book/@xmlns:bk/ field column=tsip.cc xpath=/book/@bk:genre/ field column=tsip.se xpath=/book/@self/ field column=tsip.ki xpath=/book/id/ --- And all the columns have corresponding field definitions in schema.xml. But in the index result, only the following fields contain value. --- doc str name=tsip.setest/str str name=tsip.kitest/str date name=timestamp2012-12-11T09:26:42.716Z/date /doc --- Which means I cannot get the value for attributes with prefixes: tsip.action and tsip.cc. What configuration do I need to let DIH parse these attributes with prefix? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-parse-XML-attributes-with-prefix-using-DIH-tp4025888.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCell takes InputStream
: However my raw files are stored on some remote storage devices. I am able to : get an InputStream object for the file to be indexed. To me it may seem : awkward to have the file temporarily stored locally. Is there a way of : directly passing the InputStream in (e.g. constructing ContentStream using : the InputStream)? Sure, go write ahead. ContentStream is a really simple abstraction designed to make it easy to add some common pieces of information to either an InputStream or a Reader. Take a look at ContentStreamBase as a starting point for creating your own subclass that can point to whatever InputStream you want. -Hoss
Re: Different schema.xml versions in the binary distribution
Seems like a good idea. Could you open a JIRA issue for this task? Mark Sent from my iPhone On Dec 10, 2012, at 6:44 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I lost good several hours on this, so wanted to check whether this is fixable. In the (binary) distribution of Solr 4, there is a large number of schema.xml files. But their version numbers are all over the place. Some are 1.1, others 1.2 and - I think - only one on 1.5. I am talking about: schema name=rss version=1.1 Given that this tiny number strongly affects defaults and I think in some cases flips them, getting wrong (e.g. by coping DIH example instead of the main one) could really get a newbie confused. Would it be possible to make them all use a latest version? What would the testing process be? Would it be sufficient to index once with existing value, change the value, build second index and do binary file compare? Or is Lucene not as predictable as that (e.g. due to internal timestamps)? Or is there a way to export a normative encoding of the field definition and compare what changed and which fields/properties need to be set explicitly after version change? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: - Solr 4.0 - How do I enable JSP support ? ...
For anyone else looking to run JSPs on solr 4.0, note that supplying OPTIONS=jsp to the server etc doesn't work (checkout startup config in start.jar and you'll see why) - don't bother with all that. Instead do the following: create a directory ext under: $SOLR_HOME\example\lib copy the following jar files to this new folder ($SOLR_HOME\example\lib\ext): ant-1.8.2.jar ant-launcher.jar jsp-2.1-glassfish-2.1.v20091210.jar jsp-api-2.1-glassfish-2.1.v20091210 tools.jar the ant and glassfish jars can be downloaded from: http://search.maven.org. copy tools.jar from your jdk 1.6+ installation. Restart solr.. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-How-do-I-enable-JSP-support-tp3983763p4025897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - Query performance degrades with multiple servers
I missed this bug report! https://issues.apache.org/jira/browse/SOLR-3912 Will fix this very shortly. It's a problem with numShards=1. - Mark On Sun, Dec 9, 2012 at 4:21 PM, sausarkar sausar...@ebay.com wrote: Thank you very much will wait for the results from your tests. From: Mark Miller-3 [via Lucene] ml-node+s472066n4025457...@n3.nabble.commailto:ml-node+s472066n4025457...@n3.nabble.com Date: Saturday, December 8, 2012 11:08 PM To: Sarkar, Sauvik sausar...@ebay.commailto:sausar...@ebay.com Subject: Re: SolrCloud - Query performance degrades with multiple servers If that's true, we will fix it for 4.1. I can look closer tomorrow. Mark Sent from my iPhone On Dec 9, 2012, at 2:04 AM, sausarkar [hidden email]/user/SendEmail.jtp?type=nodenode=4025457i=0 wrote: Spoke too early it seems that SolrCloud is still distributing queries to all the servers even if numShards=1 We are seeing POST request to all servers in the cluster, please let me know what is the solution. Here is an example: (the variable isShard should be false in our case as single shard, please help) POST /solr/core0/select HTTP/1.1 Content-Charset: UTF-8 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0 Content-Length: 991 Host: server1 Connection: Keep-Alive lowercaseOperators=truemm=70%fl=EntityIddf=EntityIdq.op=ANDq.alt=*:*qs=10stopwords=truedefType=edismaxrows=3000q=*:*start=0fsv=truedistrib=false*isShard=true*shard.url=*server1*:9090/solr/core0/|*server2*:9090/solr/core0/|*server3*:9090/solr/core0/NOW=1354918880447wt=javabinversion=2 Re: SolrCloud - Query performance degrades with multiple servers Dec 06, 2012; 6:29pm — by Mark Miller-3 On Dec 6, 2012, at 5:08 PM, sausarkar [hidden email] wrote: We solved the issue by explicitly adding numShards=1 argument to the solr start up script. Is this a bug? Sounds like it…perhaps related to SOLR-3971…not sure though. - Mark -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025455.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025457.html To unsubscribe from SolrCloud - Query performance degrades with multiple servers, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4024660code=c2F1c2Fya2FyQGViYXkuY29tfDQwMjQ2NjB8LTE0MTU2ODg5MDk=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4025573.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Mark
Re: difference these two queries
Hi, The fq one is a FilterQuery that only does matching, but not scoring. It's results are stored in the filter cache, while the q uses the query cache. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote: Hi There, Sorry for sapmming if this question had already asked. Wha't the main difference between q=fieldA:value AND fieldB:value q=fieldA:valuefq=fieldB:value both query will give me the same result, I wonder what's the main difference and in practice what the better way? Thanks in advance Floyd
Re: difference these two queries
Thanks Otis. When talked about query performance(ignore scoring). To use fq is better? Floyd 2012/12/11 Otis Gospodnetic otis.gospodne...@gmail.com Hi, The fq one is a FilterQuery that only does matching, but not scoring. It's results are stored in the filter cache, while the q uses the query cache. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote: Hi There, Sorry for sapmming if this question had already asked. Wha't the main difference between q=fieldA:value AND fieldB:value q=fieldA:valuefq=fieldB:value both query will give me the same result, I wonder what's the main difference and in practice what the better way? Thanks in advance Floyd
Re: difference these two queries
If you don't need scoring on it then yes, just use fq. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Mon, Dec 10, 2012 at 10:34 PM, Floyd Wu floyd...@gmail.com wrote: Thanks Otis. When talked about query performance(ignore scoring). To use fq is better? Floyd 2012/12/11 Otis Gospodnetic otis.gospodne...@gmail.com Hi, The fq one is a FilterQuery that only does matching, but not scoring. It's results are stored in the filter cache, while the q uses the query cache. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote: Hi There, Sorry for sapmming if this question had already asked. Wha't the main difference between q=fieldA:value AND fieldB:value q=fieldA:valuefq=fieldB:value both query will give me the same result, I wonder what's the main difference and in practice what the better way? Thanks in advance Floyd
Re: Intersect Circle is matching points way outside the radius ( Solr 4 Spatial)
Javier, I want to expand upon what I said; you might already get this point but others may come along and read this and might not. Naturally you are using a 2D map as most applications do (Google Earth is the stand-out exception), and fundamentally this means the map is projected -- it has to be. There isn't a right (correct) projection, generally speaking. Most/all web based map APIs are strictly web mercator. If you have a map GUI selection tool in which a circle is drawn, a perfect looking round circle, then it's a lie unless you're looking directly at the equator. If the intent is for the user to draw a distance based circle, then ideally your map tool should draw an elliptical looking circle if it's to be accurate. This is why you got confused; you saw a circle yet the point wasn't drawn in the circle because that circle *should have been* stretched vertically to barely pass it. If on the other hand you intend for the query shape to be exactly what it displays to be (what appears to be a perfect circle), even though this means the true geodetic shape is not a perfect circle, then you could use geo=false (and configure some other attributes) such that you are using standard planar math, not geodetic. Then your query shape would appear to work correctly but IMO its misleading over the first option (draw an ellipse, not a circle). The circle misleads the user; it mislead you. ~ David Javier Molina wrote Hi David, As it happens the points are using the right projection, I can see them in the same position using the page you just provided. There is something wrong with the radius of the circle though I need to investigate that but it is a relief to know that there is nothing wrong with Solr and that I didn't mix the concepts, it is just as in many cases the problem is somewhere else where you would never imagine. Thanks for the hint. Cheers, Javier On 11 December 2012 02:47, David Smiley (@MITRE.org) lt; DSMILEY@ gt;wrote: Javi, The center point of your query circle and the indexed point is just under 49.9km (just under your query radius); this is why it matched. I plugged in your numbers here: http://www.movable-type.co.uk/scripts/latlong.html Perhaps you are misled by the projection you are using to view the map, on how far away the points are. FYI The default distErrPct of 0.025 should be fine in general and wasn't the issue. You should (almost) never use 0.0 on the field type because that means your indexed non-point shapes (rectangles you said) will use a ton of indexed terms unless they are very small rectangles (relative to your grid resolution -- 1 meter in your case). Using distErrPct=0 in the query is safe, on the other hand. Cheers, David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025704.html Sent from the Solr - User mailing list archive at Nabble.com. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Intersect-Circle-is-matching-points-way-outside-the-radius-Solr-4-Spatial-tp4025609p4025924.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update / replication of offline indexes
You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: Update / replication of offline indexes
Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org