Re: Clone (or Restore) Solrcloud
Hi David, The parent metadata persists only until the sub-shards become active. Actually the logic to make the sub-shards active depends on knowing when all 'sibling' sub-shards' replicas have recovered successfully. We store the parent to make that easier to look up. Once all replicas of all sub-shards have recovered, the shard states are updated. The 'updateshardstate' command also removes the 'parent' key from the sub-shards while switching them to 'active'. If you're seeing the 'parent' key on a 'active' sub-shard then it may be a bug. Please paste your clusterstate and I'll look into why it was left over. On Mon, Feb 3, 2014 at 10:19 AM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: I think I figured this out; I hope people find this useful.. It may not be possible to declare what the hash ranges are when you create the collection, but you *can* do so when you split via the 'ranges' parameter, which is a comma-delimited list. So this means you can create a new collection with one shard and then immediately split it to the desired ranges to line up with that of your backup. I also observed that if you create a collection and then split every shard (in 2), it will result in an equivalent collection to one that was created with twice as many shards to begin with. I hoped that was so and verified the ranges end up being the same both ways. The only thing that seems like it may be benign but not 100% certain is that if you split a shard, the new shards have a 'parent' reference to the name of the shard it was split from. And even if you delete that parent shard (since it's not needed anymore; it becomes inactive). I'm not sure why this metadata is recorded because, at least after the split, I can't see why it's pertinent to anything. ~ David David Smiley (@MITRE.org) wrote Hi, I'm attempting to come up with a SolrCloud restore / clone process for either recover to a known good state or to clone the environment for experimentation. At the moment my process involves either creating a new zookeeper environment or at least deleting the existing Collection so that I can create a new one. This works; I use the Core API; the first command defines the collection parameters, and I invoke it once for each replica. I don't use the Collection API because I want SolrCloud to go off trying to create all the replicas -- I know where each one is pre-positioned. What I'm concerned about is what happens once I start wanting to use Shard splitting, *especially* if I don't want to split all shards because shards are uneven due to custom routing (e.g. id:customer!myid). In this case I don't know how to create the collection with the hash ranges post-shard split. Solr doesn't have an API for me to explicitly say what the hash ranges should be on each shard (to match up with a backup). And I'm concerned about undocumented pitfalls that may exist in manually constructing a clusterstate.json, as another approach. Any ideas? ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Apache Solr.
Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks Regards. Vignesh.V cid:image001.jpg@01CA4872.39B33D40 Ninestars Information Technologies Limited., 72, Greams Road, Thousand Lights, Chennai - 600 006. India. Landline : +91 44 2829 4226 / 36 / 56 X: 144 blocked::http://www.ninestars.in/ www.ninestars.in -- 30 Million Advertisements displayed. Is yours there? http://www.safentrixads.com/adlink?cid=13 --
Solr and SDL Tridion Integration
Hi, I want to index sdl tridion content to solr. Can you suggest how this can be achieved. Is there any document/tutorial for this? Thanks Thanks, Prasi
Fwd: Need help for integrating solr-4.5.1 with UIMA
Hi, I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the solr-4.5.1\contrib\uima\readme.txt. Edited the solrconfig.xml as given in readme.txt. Also I have registered the required keys. But each time when I am indexing data , solr returns error: Feb 3, 2014 2:04:32 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(405) SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException at org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCalaisAnnotator.java:206) at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:173) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:79) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.ConnectException: Connection timed out: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at
Re: Solr and SDL Tridion Integration
This is a new one. You may want to start from Tridion's list and ask about API, export or any other ways to get to the data. Then come back with more specific question once you know what it looks like and granularity of update (hook on document change vs. full export only). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Feb 3, 2014 at 4:16 PM, Prasi S prasi1...@gmail.com wrote: Hi, I want to index sdl tridion content to solr. Can you suggest how this can be achieved. Is there any document/tutorial for this? Thanks Thanks, Prasi
Re: Apache Solr.
Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks Regards. Vignesh.V cid:image001.jpg@01CA4872.39B33D40 Ninestars Information Technologies Limited., 72, Greams Road, Thousand Lights, Chennai - 600 006. India. Landline : +91 44 2829 4226 / 36 / 56 X: 144 blocked::http://www.ninestars.in/ www.ninestars.in -- 30 Million Advertisements displayed. Is yours there? http://www.safentrixads.com/adlink?cid=13 --
Special NGRAMish requirement
Hi, we need to use something very similar to EdgeNGram (minGramSize=1 maxGramSize=50 side=front). The only thing missing is that we would like to reduce the number of matches. The request we need to implement is returning only those matches with the longest tokens (or terms if that is the right word). Is there a way to do this in Solr (not necessarily with EdgeNGram)? Thanks, Alexander
Re: Solr and SDL Tridion Integration
If SDL Tridion can export to CSV format, Solr can then import from CSV format. Otherwise, you may have to write a custom script or even maybe Java code to read from SDL Tridion and output a supported Solr format, such as Solr XML, Solr JSON, or CSV. -- Jack Krupansky -Original Message- From: Prasi S Sent: Monday, February 3, 2014 4:16 AM To: solr-user@lucene.apache.org Subject: Solr and SDL Tridion Integration Hi, I want to index sdl tridion content to solr. Can you suggest how this can be achieved. Is there any document/tutorial for this? Thanks Thanks, Prasi
weird exception on update
Hello! We are hitting a really strange and nasty issue when trying to delete by query and not when just adding documents. The exception says: http://pastebin.com/B1x5dAF7 Any ideas as to what is going on? The delete by query is referencing the unique field. The core's index does not contain the value that is being deleted. Solr: 4.3.1. -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Score of Search Term for every character remove
Hi, I'm new with using SOLR and I'm curious if this is capable of doing the following or similar. Sample: Query: ABCDEF Returns: ABCDEF 0 hits ABCDE 2 hits ABCD 3 hits ABC 10 hits AB 20 hits A 100 hits In one request only. Thanks. Abner G. Lusung Jr.| Java Web Development, Internet and Commerce, Global Web Services | Vishay Philippines Inc. 10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue, Makati City, Philippines 1200 Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514 Website : www.vishay.comhttp://www.vishay.com/ [Vishay]http://www.vishay.com/
Re: Import data from mysql to sold
I've been using DIH to import large Databases to XML file batches and It's blazing fast. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-03 rachun rachun.c...@gmail.com: Dear all gurus, I would like to import my data (mysql) about 4 Million rows into solar 4.6. What is the best way to do it? Please suggest me. Million thanks, Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Geospatial clustering + zoom in/out help
Hi David, I was hoping to get an answer on Geospatial topic from you :). These links basically confirm that approach I wanted to take should work ok with similar (or even bigger) amount of data than I plan to have. Instead of my custom NxM division of world, I'll try existing GeoHash encoding, it may be good enough (and will be quicker to implement). Thanks! Bojan On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Bojan. You've got some good ideas here along the lines of some that others have tried. I've through together a page on the wiki about this subject some time ago that I'm sure you will find interesting. It references a relevant stack-overflow post, and also a presentation at DrupalCon which had a segment from a guy using the same approach you suggest here involving field-collapsing and/or stats components. The video shows it in action. http://wiki.apache.org/solr/SpatialClustering It would be helpful for everyone if you share your experience with whatever you choose, once you give an approach a try. ~ David From: Bojan Šmid [bos...@gmail.com] Sent: Thursday, January 30, 2014 1:15 PM To: solr-user@lucene.apache.org Subject: Geospatial clustering + zoom in/out help Hi, I have an index with 300K docs with lat,lon. I need to cluster the docs based on lat,lon for display in the UI. The user then needs to be able to click on any cluster and zoom in (up to 11 levels deep). I'm using Solr 4.6 and I'm wondering how best to implement this efficiently? A bit more specific questions below. I need to: 1) cluster data points at different zoom levels 2) click on a specific cluster and zoom in 3) be able to select a region (bounding box or polygon) and show clusters in the selected area What's the best way to implement this so that queries are fast? What I thought I would try, but maybe there are better ways: * divide the world in NxM large squares and then each of these squares into 4 more squares, and so on - 11 levels deep * at index time figure out all squares (at all 11 levels) each data point belongs to and index that info into 11 different fields: e.g. id=1 name=foo lat=x lon=y zoom1=square1_62 zoom2=square1_62_47 zoom3=square1_62_47_33 * at search time, use field collapsing on zoomX field to get which docs belong to which square on particular level * calculate center point of each square (by calculating mean value of positions for all points in that square) using StatsComponent (facet on zoomX field, avg on lat and lon fields) - I would consider those squares as separate clusters (one square is one cluster) and center points of those squares as center points of clusters derived from them I *think* the problem with this approach is that: * there will be many unique fields for bigger zoom levels, which means field collapsing / StatsComponent maaay not work fast enough * clusters will not look very natural because I would have many clusters on each zoom level and what are real geographical clusters would be displayed as multiple clusters since their points would in some cases be dispersed into multiple squares. But that may be OK * a lot will depend on how the squares are calculated - linearly dividing 360 degrees by N to get equal size squares in degrees would produce issues with real square sizes and counts of points in each of them So I'm wondering if there is a better way? Thanks, Bojan
Re: Apache Solr.
That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to use Tikka to import binary/specific file types. http://tika.apache.org/1.4/formats.html alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-03 Siegfried Goeschl sgoes...@gmx.at: Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks Regards. Vignesh.V cid:image001.jpg@01CA4872.39B33D40 Ninestars Information Technologies Limited., 72, Greams Road, Thousand Lights, Chennai - 600 006. India. Landline : +91 44 2829 4226 / 36 / 56 X: 144 blocked::http://www.ninestars.in/ www.ninestars.in -- 30 Million Advertisements displayed. Is yours there? http://www.safentrixads.com/adlink?cid=13 --
Re: Apache Solr.
PDF files can be directly imported into Solr using Solr Cell (AKA ExtractingRequestHandler). See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Internally, Solr Cell uses Tika, which in turn uses PDFBox. -- Jack Krupansky -Original Message- From: Alexei Martchenko Sent: Monday, February 3, 2014 8:04 AM To: solr-user@lucene.apache.org Subject: Re: Apache Solr. That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to use Tikka to import binary/specific file types. http://tika.apache.org/1.4/formats.html alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-03 Siegfried Goeschl sgoes...@gmx.at: Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks Regards. Vignesh.V cid:image001.jpg@01CA4872.39B33D40 Ninestars Information Technologies Limited., 72, Greams Road, Thousand Lights, Chennai - 600 006. India. Landline : +91 44 2829 4226 / 36 / 56 X: 144 blocked::http://www.ninestars.in/ www.ninestars.in -- 30 Million Advertisements displayed. Is yours there? http://www.safentrixads.com/adlink?cid=13 --
Announce list
Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie
Writing a customize updateRequestHandler
Hi, I want to write a custom updateRequestHandler. Can you pl.s guide me the steps I need to perform for that ? -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: weird exception on update
This exception is similar to what is talked about here: https://gist.github.com/mbklein/6367133 http://irc.projecthydra.org/2013-08-28.html We found out that: 1. this happens iff on two cores inside the same container there is a query parser defined via defType. 2. After removing index files on one of the cores, the delete by query works just fine. Right after restarting the container, the same query fails. Is there a jira for this? Should I create one? Dmitry On Mon, Feb 3, 2014 at 2:03 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello! We are hitting a really strange and nasty issue when trying to delete by query and not when just adding documents. The exception says: http://pastebin.com/B1x5dAF7 Any ideas as to what is going on? The delete by query is referencing the unique field. The core's index does not contain the value that is being deleted. Solr: 4.3.1. -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: Writing a customize updateRequestHandler
In the book Apache Solr Beginner’s Guide there is a section dedicated to write new Solr plugins, perhaps it would be a good place to start, also in the wiki there is a page about this, but the it’s a light introduction. I’ve found that a very good starting point it’s just browse throw the code of some standard components similar to the one you’re trying to customize. On Feb 3, 2014, at 9:00 AM, neerajp neeraj_star2...@yahoo.com wrote: Hi, I want to write a custom updateRequestHandler. Can you pl.s guide me the steps I need to perform for that ? -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Strange Error Message while Full Import
Hallo, when I do a full import of a SOLR index I become a strange error message: org.apache.solr.handler.dataimport.DataImportHandlerException: java.sql.SQLRecoverableException: Closed Resultset: next It is only a simple query select FIRMEN_ID, FIRMIERUNG, FIRMENKENNUNG, PZN, DEBITORNUMMER, ADRESS_ID from DAT_FIRMA This error seems to be a subsequent error but there is no other cause in the stacktrace. Thanks for any hints. Ciao Peter Schütt P.S. The error stacktrace Feb 03, 2014 2:11:01 PM org.apache.solr.common.SolrException log SEVERE: getNext() failed for query 'select FIRMEN_ID, FIRMIERUNG, FIRMENKENNUNG, PZN, DEBITORNUMMER, ADRESS_ID from DAT_FIRMA':org.apache.solr.handler.dataimport.DataImportHandlerException : java.sql.SQLRecoverableException: Closed Resultset: next at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd Throw(DataImportHandlerException.java:63) at org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re sultSetIterator.hasnext(PreparedStatementJdbcDataSource.java:404) at org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re sultSetIterator.access$600(PreparedStatementJdbcDataSource.java:256) at org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re sultSetIterator$1.hasNext(PreparedStatementJdbcDataSource.java:324) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity ProcessorBase.java:116) at org.apache.solr.handler.dataimport.PreparedStatementSqlEntityProcesso r.handleQuery(PreparedStatementSqlEntityProcessor.java:119) at org.apache.solr.handler.dataimport.PreparedStatementSqlEntityProcesso r.nextRow(PreparedStatementSqlEntityProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent ityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument (DocBuilde r.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument (DocBuilde r.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump (DocBuilder.j ava:319) at org.apache.solr.handler.dataimport.DocBuilder.execute (DocBuilder.java :227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport (DataImpo rter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd (DataImporter.j ava:487) at org.apache.solr.handler.dataimport.DataImporter$1.run (DataImporter.ja va:468) Caused by: java.sql.SQLRecoverableException: Closed Resultset: next at oracle.jdbc.driver.OracleResultSetImpl.next (OracleResultSetImpl.java: 214) at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next (DelegatingResult Set.java:207) at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next (DelegatingResult Set.java:207) at org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re sultSetIterator.hasnext(PreparedStatementJdbcDataSource.java:396) ... 13 more
Re: Announce list
I don't think so. What would be the value? Would you be upgrading every 6-8 weeks as the new versions come out? Or are you downstream of Solr and want to check compatibility? Curious what the use case would be. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Feb 3, 2014 at 8:59 PM, Arie Zilberstein azilberst...@salesforce.com wrote: Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie
Re: Announce list
There's always http://projects.apache.org/feeds/rss.xml. L On 03/02/2014 14:59, Arie Zilberstein wrote: Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie
Re: weird exception on update
The solution (or workaround?) is to drop the defType from one of the cores and use {!qparser} local param on every query, including the delete by query. It would be really great, if this could be handled on the solr config side only without involving the client changes. On Mon, Feb 3, 2014 at 4:02 PM, Dmitry Kan solrexp...@gmail.com wrote: This exception is similar to what is talked about here: https://gist.github.com/mbklein/6367133 http://irc.projecthydra.org/2013-08-28.html We found out that: 1. this happens iff on two cores inside the same container there is a query parser defined via defType. 2. After removing index files on one of the cores, the delete by query works just fine. Right after restarting the container, the same query fails. Is there a jira for this? Should I create one? Dmitry On Mon, Feb 3, 2014 at 2:03 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello! We are hitting a really strange and nasty issue when trying to delete by query and not when just adding documents. The exception says: http://pastebin.com/B1x5dAF7 Any ideas as to what is going on? The delete by query is referencing the unique field. The core's index does not contain the value that is being deleted. Solr: 4.3.1. -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: shard1 gone missing ... (upgrade to 4.6.1)
Mark, I am testing the upgrade and indexing gives me this error: 914379 [http-apr-8080-exec-4] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe0 (at char #1, byte #-1) ... and a bunch of these request: http://xx.xx.xx.xx/col1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fxx.xx.xx.xx%3A8080%2Fcol1%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 1581335 [updateExecutor-1-thread-7] ERROR org.apache.solr.update.StreamingSolrServers ? error org.apache.solr.common.SolrException: Bad Request Nothing else in the process chain has changed. Does this have anything to do with the deprecated warnings: WARN org.apache.solr.handler.UpdateRequestHandler ? Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler thanks David On 01/31/2014 11:22 AM, Mark Miller wrote: On Jan 31, 2014, at 11:15 AM, David Santamauro david.santama...@gmail.com wrote: On 01/31/2014 10:22 AM, Mark Miller wrote: I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4. You can follow the progress in the CHANGES file we update for each release. Can I do a drop-in replacement of 4.4.0 ? It should be a drop in replacement. For some that use deep API’s in plugins, sometimes you might have to make a couple small changes to your code. Alway best to do a test with a copy of your index, but for most, it should be a drop in replacement. - Mark http://about.me/markrmiller
SolrCloud query results order master vs replica
Greetings, My setup is: - SolrCloud V4.3 - On collection - one shard - 1 master, 1 replica so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger). My question: - if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents. - if I do the same query on the replica, I get the same number of results but the docs are in a different order. - I do not specify a sort parameter in my query, simply a q=product name. - obviously if I force a sort order, everything is ok, same results, same order from both instances. - am I wrong in expecting the same results, in the SAME order? Follow up question if the order is not guaranteed: - should I force the dev. to use an explicit sort order? - if we force the sort, we then bypass the ranking / score order do we not? - should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master? Other useful information: - the admin page shows same number of documents in both instances. - logs are clean, load and replication and queries worked ok. - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency. Thank you for your help! Nic
Elevation and nested queries
I have a simple query 'q=hurco' (parser type edismax). Elevation is properly configured, so I get the expected results: ... doc str name=id7HURCO/str arr name=debtoritem str0~*/str /arr bool name=[elevated]true/bool /doc A similar query with a nested query 'q=(hurco AND _query_:{!field f=debtoritem v=0~*})' returns the same document but without elevation: ... doc str name=id7HURCO/str arr name=debtoritem str0~*/str /arr bool name=[elevated]false/bool /doc Does a nested query disable elevation? There is an additional spellcheck component added to the query. This is working as expected. arr name=last-components strspellcheck/str strelevator/str /arr Thanks, Holger
Re: Need help for integrating solr-4.5.1 with UIMA
On Mon, Feb 3, 2014 at 10:20 AM, rashi gandhi gandhirash...@gmail.comwrote: Hi, Hi, I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the solr-4.5.1\contrib\uima\readme.txt. Edited the solrconfig.xml as given in readme.txt. Also I have registered the required keys. [...] at java.lang.Thread.run(Thread.java:619) *Caused by: java.net.ConnectException: Connection timed out:* *connect* [...] What is going wrong? Please help me on this. In principle I've never integrate UIMA and solr, but quickly looking at your exception (please send only the meaningful part of the stack trace) seems you have a problem to connect. I would start from there. Regards Luca -- Luca Foppiano Software Engineer +31615253280 l...@foppiano.org www.foppiano.org
Re: SolrCloudServer questions
I've seen best throughput while indexing by sending in batches of documents rather than individual documents per request. You might try queueing on your indexing machines for a bit then sending off a batch every N documents. Thanks, Greg On Feb 1, 2014, at 6:49 PM, Software Dev static.void@gmail.com wrote: Also, if we are seeing a huge cpu spike on the leader when doing a bulk index, would changing any of the options help? On Sat, Feb 1, 2014 at 2:59 PM, Software Dev static.void@gmail.comwrote: Out use case is we have 3 indexing machines pulling off a kafka queue and they are all sending individual updates. On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller markrmil...@gmail.comwrote: Just make sure parallel updates is set to true. If you want to load even faster, you can use the bulk add methods, or if you need more fine grained responses, use the single add from multiple threads (though bulk add can also be done via multiple threads if you really want to try and push the max). - Mark http://about.me/markrmiller On Jan 31, 2014, at 3:50 PM, Software Dev static.void@gmail.com wrote: Which of any of these settings would be beneficial when bulk uploading? On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller markrmil...@gmail.com wrote: On Jan 31, 2014, at 1:56 PM, Greg Walters greg.walt...@answers.com wrote: I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore my response. -updatesToLeaders Only send documents to shard leaders while indexing. This saves cross-talk between slaves and leaders which results in more efficient document routing. Right, but recently this has less of an affect because CloudSolrServer can now hash documents and directly send them to the right place. This option has become more historical. Just make sure you set the correct id field on the CloudSolrServer instance for this hashing to work (I think it defaults to id). shutdownLBHttpSolrServer CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute requests (that aren't updates directly to leaders). Where did you find this? I don't see this in the javadoc anywhere but it is a boolean in the CloudSolrServer class. It looks like when you create a new CloudSolrServer and pass it your own LBHttpSolrServer the boolean gets set to false and the CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut down. parellelUpdates The javadoc's done have any description for this one but I checked out the code for CloudSolrServer and if parallelUpdates it looks like it executes update statements to multiple shards at the same time. Right, we should def add some javadoc, but this sends updates to shards in parallel rather than with a single thread. Can really increase update speed. Still not as powerful as using CloudSolrServer from multiple threads, but a nice improvement non the less. - Mark http://about.me/markrmiller I'm no dev but I can read so please excuse any errors on my part. Thanks, Greg On Jan 31, 2014, at 11:40 AM, Software Dev static.void@gmail.com wrote: Can someone clarify what the following options are: - updatesToLeaders - shutdownLBHttpSolrServer - parallelUpdates Also, I remember in older version of Solr there was an efficient format that was used between SolrJ and Solr that is more compact. Does this sill exist in the latest version of Solr? If so, is it the default? Thanks
Duplicate Facet.FIelds cause same results, should dedupe?
If we add : facet.field=prac_spec_heirfacet.field=prac_spec_heir we get it twice in the results. This breaks deserialization on wt=json since you cannot have the same name twice Thoughts? Seems like a new bug in 4.6 ? facet.field: [prac_spec_heir,all_proc_name_code,all_cond_name_code, prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name], -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: need help in understating solr cloud stats data
I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help in understating solr cloud stats data
You should contribute that and spread the dev load with others :) We need something like that at some point, it’s just no one has done it. We currently expect you to aggregate in the monitoring layer and it’s a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help in understating solr cloud stats data
The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to share it and there's some legal concerns with open-sourcing code within my company. That being said, I wouldn't mind rewriting it on my own time. Where can I find a starter kit for contributors with coding guidelines and the like? Spruced up some I'd be OK with submitting a patch. Thanks, Greg On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it’s just no one has done it. We currently expect you to aggregate in the monitoring layer and it’s a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud multiple data center support
Hello, we are using Solr in a SolrCloud configuration, with two Solr instances running with three Zookeepers in a single data center. We presently have a single search index with about 35 million entries in it, about 60GB disk space on each of the two Solr servers (120GB total). I would expect our usage of Solr to grow to include other search indexes, and likely larger data volumes. I'm writing because we're needing to grow beyond a single data center, with two (potentially incompatible) goals: 1. We need to be able to have a hot disaster recovery site, in a completely separate data center, that has a near-realtime replica of the search index. 2. We'd like to have the option to have multiple active/active data centers that each see and update the same search index, distributed across data centers. The options I'm aware of from reading archives: a. Simply set up the remote Solr instances as active parts of the same SolrCloud cluster. This will essentially involve us standing up multiple Zookeepers in the second data center, and multiple Solr instances, and they will all keep each other in sync magically. This will also solve both of our goals. However, I'm concerned about performance and whether SolrCloud is smart enough to route local search queries only to local Solr servers ... ? Also, how does such a cluster tolerate and recover from network partitions? b. The remote Solr instances form their own completely unrelated SolrCloud cluster. I have to invent some kind of replication logic of my own to sync data between them. This replication would have to be bidirectional to satisfy both of our goals. I strongly dislike this option since the application really should not concern itself with data distribution. But I'll do it if I must. So my questions are: - Can anyone give me any guidance as to option a? Anyone using this in a real production setting? Words of wisdom? Does it work? - Are there any other options that I'm not considering? - What is Solr's answer to such configurations (we can't be alone in needing one)? Any big enhancements coming on the Solr road map to deal with this? Thanks! Darrell Burgan [Description: Infor]http://www.infor.com/ Darrell Burgan | Chief Architect, PeopleAnswers office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | darrell.bur...@infor.commailto:darrell.bur...@infor.com | http://www.infor.com CONFIDENTIALITY NOTE: This email (including any attachments) is confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the information contained herein is prohibited. If you have received this message in error, please notify the sender by replying to this message and then delete this message in its entirety. Thank you for your cooperation.
Re: Announce list
I have seen other projects that have a releases mailing list, the only use cases I can think of are: 1) users who want notifications about new releases, but don't want the flood of the full user-list. 2) historical searching to see how often releases were made. Given there isn't an official timetable, its not really going to be useful as a forward planner, but might have some value looking at how often patch releases come out. One could attempt to infer some degree of stability (or more accurately lack of stability) if lots of patches for a given release came out quickly. Wasn't aware of the RSS feed, that's useful as an indicator for use case 1 at least. Use case 2 is probably too vague and has lots of assumptions/inferences that mean its a bad idea anyway :) On 3 February 2014 14:37, Lajos la...@protulae.com wrote: There's always http://projects.apache.org/feeds/rss.xml. L On 03/02/2014 14:59, Arie Zilberstein wrote: Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie
Re: Solr and SDL Tridion Integration
There are many ways to do this, Prasi. You have a lot of thinking to do on the subject. You could decide to publish your content to database, and then index that database in Solr. You could publish XML or CSV files of your content for Solr to read and index. You could use nutch or some other tool to crawl your web server. There are many more methods, probably. These being some of the more common. Does your site have dynamic content presentation? If so, you may want to consider having Solr examine your broker database. Static pages on your site? You may want to go with either a crawler or publishing a special file for Solr. Please check out https://tridion.stackexchange.com/ for more on this topic. -- chris_war...@yahoo.com On Monday, February 3, 2014 3:54 AM, Jack Krupansky j...@basetechnology.com wrote: If SDL Tridion can export to CSV format, Solr can then import from CSV format. Otherwise, you may have to write a custom script or even maybe Java code to read from SDL Tridion and output a supported Solr format, such as Solr XML, Solr JSON, or CSV. -- Jack Krupansky -Original Message- From: Prasi S Sent: Monday, February 3, 2014 4:16 AM To: solr-user@lucene.apache.org Subject: Solr and SDL Tridion Integration Hi, I want to index sdl tridion content to solr. Can you suggest how this can be achieved. Is there any document/tutorial for this? Thanks Thanks, Prasi
[ANN] Heliosearch 0.03 with off-heap field cache
A new Heliosearch pre-release has been cut for people to try out: https://github.com/Heliosearch/heliosearch/releases Release Notes: - This is Heliosearch v0.03 Heliosearch is forked from Apache Solr and includes the following additional features: - Off-Heap Filters to reduce garbage collection pauses and overhead. http://www.heliosearch.org/off-heap-filters - Removed the 1024 limit on the number of clauses in a boolean query. For example: q=id:(doc1 doc2 doc3 doc4 doc5 ... doc2000) will now work correctly without throwing an exception. - Deep Paging with cursorMark. This is not yet in a current release of Apache Solr, but should be in Solr 4.7 http://heliosearch.org/solr/paging-and-deep-paging/ - nCache - the new Off-Heap FieldCache to reduce garbage collection overhead and accelerate sorting, faceting, and function queries. http://heliosearch.org/solr-off-heap-fieldcache -Yonik http://heliosearch.com -- making solr shine
Re: Apache Solr.
You can have this kind of configuration in Data import handler xml file to index different type of files. dataConfig dataSource type=BinFileDataSource / document entity name=files dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=(enter the file repository path) fileName=.*.(doc)|(pdf)|(docx)|(txt)|(ppt)|(xls)|(xlsx)|(sql)|(vsd)|(zip) onError=skip recursive=true field column=fileAbsolutePath name=id / field column=fileSize name=size / field column=fileLastModified name=lastModified / entity name=tika-documentimport processor=TikaEntityProcessor url=${files.fileAbsolutePath} format=text field column=File name=fileName/ field column=Author name=author meta=true/ /entity /entity /document /dataConfig Hope this helps. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-tp4114996p4115102.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: JVM heap constraints and garbage collection
i2.xlarge looks vastly better than m2.2xlarge at about the same price, so I must be missing something: Is it the 120 IPs that explains why anyone would choose m2.2xlarge? i2.xlarge is a relatively new instance type (December 2013). In our case, we're partway through a yearlong reservation of m2.2xlarges and won't be up for reconsidering that for a few months. I don't think that Amazon has ever dropped a legacy instance type, so there's bound to be some overlap as they roll out new ones. And I imagine someone setting up a huge memcached pool might rather have the extra RAM over the SSD, so it still makes sense for the m2.2xlarge to be around. It can be kind of hard to understand how the various parameters that make up an instance type get decided on, though. I have to consult that ec2instances.info link all the time to make sure I'm not missing something regarding what types we should be using. On Feb 1, 2014 1:51 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Michael Della Bitta [michael.della.bi...@appinions.com] wrote: Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look pretty tasty primarily because of the SSD, and I'll probably push for a switch to those when our reservations run out. http://www.ec2instances.info/ i2.xlarge looks vastly better than m2.2xlarge at about the same price, so I must be missing something: Is it the 120 IPs that explains why anyone would choose m2.2xlarge? Anyhow, it is good to see that Amazon now has 11 different setups with SSD. The IOPS looks solid at around 40K/s (estimated) for the i2.xlarge and they even have TRIM ( http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance/). - Toke Eskildsen
Getting index schema in SolrCloud mode
I'm indexing data with a SolrJ client via SolrServer. Currently, I parse the schema returned from a HttpGet on: localhost:8983/solr/collection/schema/fields What is the recommended way to read the schema with CloudSolrServer? Can it be done with a single HttpGet to a ZK server? Thanks, Peter
Re: SolrCloud multiple data center support
SolrCloud has not tackled multi data center yet. I don’t think a or b are very good options yet. Honestly, I think the best current bet is to use something like Apache Flume to send data to both data centers - it will handle retries and keeping things in sync and splitting the stream. Doesn’t satisfy all use cases though. At some point, multi data center support will happen. I can’t remember where ZooKeeper’s support for it is at, but with that and some logic to favor nodes in your data center, that might be a viable route. - Mark http://about.me/markrmiller On Feb 3, 2014, at 11:48 AM, Darrell Burgan darrell.bur...@infor.com wrote: Hello, we are using Solr in a SolrCloud configuration, with two Solr instances running with three Zookeepers in a single data center. We presently have a single search index with about 35 million entries in it, about 60GB disk space on each of the two Solr servers (120GB total). I would expect our usage of Solr to grow to include other search indexes, and likely larger data volumes. I’m writing because we’re needing to grow beyond a single data center, with two (potentially incompatible) goals: 1. We need to be able to have a hot disaster recovery site, in a completely separate data center, that has a near-realtime replica of the search index. 2. We’d like to have the option to have multiple active/active data centers that each see and update the same search index, distributed across data centers. The options I’m aware of from reading archives: a. Simply set up the remote Solr instances as active parts of the same SolrCloud cluster. This will essentially involve us standing up multiple Zookeepers in the second data center, and multiple Solr instances, and they will all keep each other in sync magically. This will also solve both of our goals. However, I’m concerned about performance and whether SolrCloud is smart enough to route local search queries only to local Solr servers … ? Also, how does such a cluster tolerate and recover from network partitions? b. The remote Solr instances form their own completely unrelated SolrCloud cluster. I have to invent some kind of replication logic of my own to sync data between them. This replication would have to be bidirectional to satisfy both of our goals. I strongly dislike this option since the application really should not concern itself with data distribution. But I’ll do it if I must. So my questions are: - Can anyone give me any guidance as to option a? Anyone using this in a real production setting? Words of wisdom? Does it work? - Are there any other options that I’m not considering? - What is Solr’s answer to such configurations (we can’t be alone in needing one)? Any big enhancements coming on the Solr road map to deal with this? Thanks! Darrell Burgan Darrell Burgan | Chief Architect, PeopleAnswers office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | darrell.bur...@infor.com | http://www.infor.com CONFIDENTIALITY NOTE: This email (including any attachments) is confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the information contained herein is prohibited. If you have received this message in error, please notify the sender by replying to this message and then delete this message in its entirety. Thank you for your cooperation.
Re: need help in understating solr cloud stats data
I had to come up with some Solr stats monitoring for my Zabbix instance. I found that using JMX was the easiest way for us. There is a command line jmx client that works quite well for me. http://crawler.archive.org/cmdline-jmxclient/ I wrote a shell script to wrap around that and shove the data back to Zabbix for ingestion and monitoring. I've listed the stats that I am gathering, and the mbean that is called. My shell script is rather simplistic. !/bin/bash cmdLineJMXJar=/usr/local/lib/cmdline-jmxclient.jar jmxHost=$1 port=$2 query=$3 value=$4 java -jar ${cmdLineJMXJar} user:pass ${jmxHost}:${port} ${query} ${value} 21 | awk '{print $NF}' The script is called as so: jmxstats.sh solr server name or IP jmx port name of mbean value to query from mbean My collection name is productCatalog, so swap that with yours. *select requests*: solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select requests *select errors: *solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select errors *95th percentile request time*: solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select 95thPcRequestTime *update requests*: solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update requests *update errors:* solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update errors *95th percentile update time:* solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update 95thPcRequestTime *query result cache lookups*: solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_lookups *query result cache inserts*: solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_inserts *query result cache evictions*: solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_evictions *query result cache hit ratio: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_hitratio *document cache lookups: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_lookups *document cache inserts: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_inserts *document cache evictions: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_evictions *document cache hit ratio: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_hitratio *filter cache lookups: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_lookups *filter cache inserts: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_inserts *filter cache evictions: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_evictions *filter cache hit ratio: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_hitratio *field value cache lookups: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_lookups *field value cache inserts: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_inserts *field value cache evictions: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_evictions *field value cache hit ratio: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_evictions This set of stats gets me a pretty good idea of what's going on with my SolrCloud at any time. Anyone have any thoughts or suggestions? Joel Cohen Senior System Engineer Bluefly, Inc. On Mon, Feb 3, 2014 at 11:25 AM, Greg Walters greg.walt...@answers.comwrote: The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to share it and there's some legal concerns with open-sourcing code within my company. That being said, I wouldn't mind rewriting it on my own time. Where can I find a starter kit for contributors with coding guidelines and the like? Spruced up some I'd be OK with submitting a patch. Thanks, Greg On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it's just no one has done it. We currently expect you to aggregate in the monitoring layer and it's a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it
Re: need help in understating solr cloud stats data
Zabbix 2.2 has a jmx client built in as well as a few JVM templates. I wrote my own templates for my solr instance and monitoring and graphing is wonderful. David On 02/03/2014 12:55 PM, Joel Cohen wrote: I had to come up with some Solr stats monitoring for my Zabbix instance. I found that using JMX was the easiest way for us. There is a command line jmx client that works quite well for me. http://crawler.archive.org/cmdline-jmxclient/ I wrote a shell script to wrap around that and shove the data back to Zabbix for ingestion and monitoring. I've listed the stats that I am gathering, and the mbean that is called. My shell script is rather simplistic. !/bin/bash cmdLineJMXJar=/usr/local/lib/cmdline-jmxclient.jar jmxHost=$1 port=$2 query=$3 value=$4 java -jar ${cmdLineJMXJar} user:pass ${jmxHost}:${port} ${query} ${value} 21 | awk '{print $NF}' The script is called as so: jmxstats.sh solr server name or IP jmx port name of mbean value to query from mbean My collection name is productCatalog, so swap that with yours. *select requests*: solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select requests *select errors: *solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select errors *95th percentile request time*: solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select 95thPcRequestTime *update requests*: solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update requests *update errors:* solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update errors *95th percentile update time:* solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update 95thPcRequestTime *query result cache lookups*: solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_lookups *query result cache inserts*: solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_inserts *query result cache evictions*: solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_evictions *query result cache hit ratio: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache cumulative_hitratio *document cache lookups: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_lookups *document cache inserts: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_inserts *document cache evictions: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_evictions *document cache hit ratio: *solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache cumulative_hitratio *filter cache lookups: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_lookups *filter cache inserts: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_inserts *filter cache evictions: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_evictions *filter cache hit ratio: *solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache cumulative_hitratio *field value cache lookups: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_lookups *field value cache inserts: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_inserts *field value cache evictions: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_evictions *field value cache hit ratio: *solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache cumulative_evictions This set of stats gets me a pretty good idea of what's going on with my SolrCloud at any time. Anyone have any thoughts or suggestions? Joel Cohen Senior System Engineer Bluefly, Inc. On Mon, Feb 3, 2014 at 11:25 AM, Greg Walters greg.walt...@answers.comwrote: The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to share it and there's some legal concerns with open-sourcing code within my company. That being said, I wouldn't mind rewriting it on my own time. Where can I find a starter kit for contributors with coding guidelines and the like? Spruced up some I'd be OK with submitting a patch. Thanks, Greg On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it's just no one has done it. We currently expect you to aggregate in the monitoring layer and it's a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the
Re: Announce list
: Is there a mailing list for getting just announcements about new versions? This is the primary usecase for the general list, although it does occasionally get other traffic from people with questions/discussion about the project as a whole... https://lucene.apache.org/solr/discussion.html#general-discussion-generallucene https://mail-archives.apache.org/mod_mbox/lucene-general/ If you are looking for a really low volume list where release announcements are made, that's the place to start. -Hoss http://www.lucidworks.com/
Solr and Polygon/Radius based spatial searches
We have a public property search site that we are looking to replace the back end index server on and we are looking at Solr as a possible replacement (ElasticSearch is another possibility). One of the key search components of out site is to search on a bounding box (rectangle), custom multi-point polygon, and/or a radius from a point. It appears that Solr3 and Solr4 both supported spatial searching, but using different methods. Also, per this link, http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, it appears that Solr only supports point, rectangle and circle shapes and needs JTS and/or WKT to support multi-point non non rectangular polygon shapes. Our indexed data will included the long/lat values for all property records. If someone can provide sample queries for the following situations, it would be appreciated: - All properties/points that fall within a multi-point polygon (ie: Polygon points: Lo1 La1, Lo2 La2, Lo3 La3, Lo4 La4, Lo5 La5, Lo1, La1) - All properties that fall within 1.5 miles (radius) of point: Lo1 La1 Other spatial search type functionality that may be targeted included: - Ability to search within multiple polygons (both intersecting, non intersecting and combinations - Ability to search for properties that fall outside of a polygon Thanks Lee -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-Polygon-Radius-based-spatial-searches-tp4115121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Score of Search Term for every character remove
Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs you care about are Best, Erick On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner abner.lus...@vishay.comwrote: Hi, I'm new with using SOLR and I'm curious if this is capable of doing the following or similar. Sample: Query: ABCDEF Returns: ABCDEF 0 hits ABCDE 2 hits ABCD 3 hits ABC 10 hits AB 20 hits A 100 hits In one request only. Thanks. *Abner G. Lusung Jr.*| Java Web Development, Internet and Commerce, Global Web Services | Vishay Philippines Inc. 10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue, Makati City, Philippines 1200 Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514 Website : www.vishay.com [image: Vishay] http://www.vishay.com/
Re: Score of Search Term for every character remove
I think he want to do a bunch of separate queries and return separate result sets for each. Hmmm... maybe it would be nice to allow multiple q parameters in one query request, each returning a separate set of results. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Monday, February 3, 2014 2:08 PM To: solr-user@lucene.apache.org Subject: Re: Score of Search Term for every character remove Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs you care about are Best, Erick On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner abner.lus...@vishay.comwrote: Hi, I'm new with using SOLR and I'm curious if this is capable of doing the following or similar. Sample: Query: ABCDEF Returns: ABCDEF 0 hits ABCDE 2 hits ABCD 3 hits ABC 10 hits AB 20 hits A 100 hits In one request only. Thanks. *Abner G. Lusung Jr.*| Java Web Development, Internet and Commerce, Global Web Services | Vishay Philippines Inc. 10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue, Makati City, Philippines 1200 Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514 Website : www.vishay.com [image: Vishay] http://www.vishay.com/
Re: SolrCloud query results order master vs replica
This should only be happening if the scores are _exactly_ the same, which is actually quite rare. In that case, the tied scores are broken by the internal Lucene document ID, and the relative order of the docs on the two machines isn't guaranteed to be the same, the internal ID can change during segment merging, which is NOT the same on both machines. But this should be relatively rare. If you're doing *:* queries or other such, then they aren't scored (see ConstantScoreQuery). So in practical terms, I suspect you're seeing some kind of test artifact. Try adding debug=all to the query and you'll see how documents are scored. Best, Erick On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie nicflatte...@yahoo.com wrote: Greetings, My setup is: - SolrCloud V4.3 - On collection - one shard - 1 master, 1 replica so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger). My question: - if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents. - if I do the same query on the replica, I get the same number of results but the docs are in a different order. - I do not specify a sort parameter in my query, simply a q=product name. - obviously if I force a sort order, everything is ok, same results, same order from both instances. - am I wrong in expecting the same results, in the SAME order? Follow up question if the order is not guaranteed: - should I force the dev. to use an explicit sort order? - if we force the sort, we then bypass the ranking / score order do we not? - should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master? Other useful information: - the admin page shows same number of documents in both instances. - logs are clean, load and replication and queries worked ok. - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency. Thank you for your help! Nic
Re: need help in understating solr cloud stats data
See: http://wiki.apache.org/solr/HowToContribute It outlines how to get the code, how to work with patches, how to set up IntelliJ and Eclipse IDEs (links near the bottom?). There are formatting files for both IntelliJ and Eclipse that'll do the right thing in terms of indents and such. Legal issues aside, you don't to be very compulsive about cleaning up the code before posting the first patch! Just let people know you don't consider it ready to commit. You'll want to open a JIRA to attach it to. People often put in //nocommit in places they especially don't like, and the precommit ant target takes care of keeping these from getting into the code. People are quite happy to see hack, first-cut patches. You'll often get suggestions on approaches that may be easier and nobody will complain about bad code when they know that _you_ don't consider it submittable. Google for Yonik's law of half-baked patches. One thing that escapes people often... When attaching a patch to a JIRA, just call it SOLR-.patch, where is the JIRA number. Successive versions of the patch should have the _same_ name, they'll all be listed and the newest one will be live. It's easier to know what is the right patch that way. No big deal either way. Best, Erick On Mon, Feb 3, 2014 at 8:25 AM, Greg Walters greg.walt...@answers.com wrote: The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to share it and there's some legal concerns with open-sourcing code within my company. That being said, I wouldn't mind rewriting it on my own time. Where can I find a starter kit for contributors with coding guidelines and the like? Spruced up some I'd be OK with submitting a patch. Thanks, Greg On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it's just no one has done it. We currently expect you to aggregate in the monitoring layer and it's a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Not finding part of fulltext field when word ends in dot
That was a complicated answer, but ultimately the right one. Thank you very much. 2014-01-30 Jack Krupansky j...@basetechnology.com: The word delimiter filter will turn 26KA into two tokens, as if you had written 26 KA without the quotes. The autoGeneratePhraseQueries option will cause the multiple terms to be treated as if they actually were enclosed within quotes, otherwise they will be treated as separate and unquoted terms. If you do enclose 26KA in quotes in your query then autoGeneratePhraseQueries is not relevant. Ah... maybe the problem is that you have preserveOriginal=true in your query analyzer. Do you have your default query operator set to AND? If so, it would treat 26KA as 26 AND KA AND 26KA, which requires that 26KA (without the trailing dot) to be in the index. It seems counter-intuitive, but the attributes of the index and query word delimiter filters need to be slightly asymmetric. -- Jack Krupansky -Original Message- From: Thomas Michael Engelke Sent: Thursday, January 30, 2014 2:16 AM To: solr-user@lucene.apache.org Subject: Re: Not finding part of fulltext field when word ends in dot I'm not sure I got my problem across. If I understand the snippet of documentation right, autoGeneratePhraseQueries only affects queries that result in multiple tokens, which mine does not. The version also is 3.6.0.1, and we're not planning on upgrading to any 4.x version. 2014-01-29 Jack Krupansky j...@basetechnology.com You might want to add autoGeneratePhraseQueries=true to your field type, but I don't think that would cause a break when going from 3.6 to 4.x. The default for that attribute changed in Solr 3.5. What release was your data indexed using? There may have been some subtle word delimiter filter changes between 3.x and 4.x. Read: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/% 3CC0551C512C863540BC59694A118452AA0764A434@ITS-EMBX-03. adsroot.itcs.umich.edu%3E -Original Message- From: Thomas Michael Engelke Sent: Wednesday, January 29, 2014 11:16 AM To: solr-user@lucene.apache.org Subject: Re: Not finding part of fulltext field when word ends in dot The fieldType definition is a tad on the longer side: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory catenateWords=1 catenateNumbers=1 generateNumberParts=1 splitOnCaseChange=1 generateWordParts=1 catenateAll=0 preserveOriginal=1 splitOnNumerics=0 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=german/synonyms.txt ignoreCase=true expand=true/ filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=german/german-common-nouns.txt minWordSize=5 minSubwordSize=4 maxSubwordSize=15 onlyLongestMatch=true / filter class=solr.StopFilterFactory words=german/stopwords.txt ignoreCase=true enablePositionIncrements=true/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=german/protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory catenateWords=0 catenateNumbers=0 generateWordParts=1 splitOnCaseChange=1 generateNumberParts=1 catenateAll=0 preserveOriginal=1 splitOnNumerics=0 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=german/stopwords.txt ignoreCase=true enablePositionIncrements=true/
Re: Solr and Polygon/Radius based spatial searches
Hi Lee, On 2/3/14, 1:59 PM, leevduhl ld...@corp.realcomp.com wrote: We have a public property search site that we are looking to replace the back end index server on and we are looking at Solr as a possible replacement (ElasticSearch is another possibility). Both should work equally well. One of the key search components of out site is to search on a bounding box (rectangle), custom multi-point polygon, and/or a radius from a point. It appears that Solr3 and Solr4 both supported spatial searching, but using different methods. Also, per this link, http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, it appears that Solr only supports point, rectangle and circle shapes and needs JTS and/or WKT to support multi-point non non rectangular polygon shapes. Yup. I¹m not sure what you mean by a multi-point² polygon thoughŠ is that somehow different than a polygon that isn¹t multi-point? All polygons are comprised of at least 3 distinct points (a triangle). Our indexed data will included the long/lat values for all property records. If someone can provide sample queries for the following situations, it would be appreciated: - All properties/points that fall within a multi-point polygon (ie: Polygon points: Lo1 La1, Lo2 La2, Lo3 La3, Lo4 La4, Lo5 La5, Lo1, La1) mygeorptfieldname:²Intersects(POLYGON((x1 y1, x2 y2, x3 y3, Š, x1 y1)))² Inside of the immediate parenthesis of Intersects is a standard WKT formatted polygon. Note ³x y² order (longitude space latitude). - All properties that fall within 1.5 miles (radius) of point: Lo1 La1 Just use Solr¹s standard ³geofilt² query parser: fq={!geofilt}pt=lat,lond=0.021710 I got the distance value by converting miles to kilometers which is what goofily expects (1.5 * 1.60934400061469). Other spatial search type functionality that may be targeted included: - Ability to search within multiple polygons (both intersecting, non intersecting and combinations No problem for union: Use standard WKT: MULTIPOLYGON or GEOMETRYCOLLECTION. If you want to combine them in interesting ways then you¹re going to have to compute that client-side and send the resulting polygon(s) to Solr (or ElasticSearch). You could use JTS to do that, which has a trove of spatial functionality for such things. I¹m thinking of some day adding some basic operator extensions to the WKT so you don¹t have to do this on the client end. Leveraging JTS server-side it would be particularly easy, but it would also be pretty easy do it as a custom shape aggregate, similar to Spatial4j 0.4¹s ShapeCollection. - Ability to search for properties that fall outside of a polygon You could use ³IsDisjointTo (instead of ³Intersects²) but you¹ll generally get faster results by negating intersects. For an example, simply precede the first polygonal example with a ³NOT ³. Thanks Lee ~ David
Re: SolrCloud multiple data center support
Option a) doesn't really work out of the box, *if you need NRT support*. The main reason (for us at least) is the ZK ensemble and maintaining quorum. If you have a single ensemble, say 3 ZKs in 1 DC and 2 in another, then if you lose DC 2, you lose 2 ZKs and the rest are fine. But if you lose the main DC that has 3 ZKs, you lose quorum. Searches will be ok, but if you are an NRT-setup, your updates will all stall until you get another ZK started (and reload the whole Solr Cloud to give them the ID of that new ZK). For us, availability is more important than consistency, so we currently have 2 independent setups, 1 ZK ensemble and Solr Cloud per DC. We already had an indexing system that serviced DCs so we didn't need something like Flume. We also have external systems that handle routing to some extent, so we can route locally to each Cloud, and not have to worry about cross-DC traffic. One solution to that is have a 3rd DC with few instances in, say another 2 ZKs. That would take your total ensemble to 7, and you can lose 3 whilst still maintaining quorum. Since ZK is relatively light-weight, that 3rd Data Centre doesn't have to be as robust, or contain Solr replicas, its just a place to house 1 or 2 machines for holding ZKs. We will probably migrate to this kind of setup soon as it ticks more of our boxes. One other option is in ZK trunk (but not yet in a release) is the ability to dynamically reconfigure ZK ensembles ( https://issues.apache.org/jira/browse/ZOOKEEPER-107). That would give the ability to create new ZK instances in the event of a DC failure, and reconfigure the Solr Cloud without having to reload everything. That would help to some extent. If you don't need NRT, then the solution is somewhat easier, as you don't have to worry as much about ZK quorum, a single ZK ensemble across DCs might be sufficient for you in that case. On 3 February 2014 17:44, Mark Miller markrmil...@gmail.com wrote: SolrCloud has not tackled multi data center yet. I don't think a or b are very good options yet. Honestly, I think the best current bet is to use something like Apache Flume to send data to both data centers - it will handle retries and keeping things in sync and splitting the stream. Doesn't satisfy all use cases though. At some point, multi data center support will happen. I can't remember where ZooKeeper's support for it is at, but with that and some logic to favor nodes in your data center, that might be a viable route. - Mark http://about.me/markrmiller On Feb 3, 2014, at 11:48 AM, Darrell Burgan darrell.bur...@infor.com wrote: Hello, we are using Solr in a SolrCloud configuration, with two Solr instances running with three Zookeepers in a single data center. We presently have a single search index with about 35 million entries in it, about 60GB disk space on each of the two Solr servers (120GB total). I would expect our usage of Solr to grow to include other search indexes, and likely larger data volumes. I'm writing because we're needing to grow beyond a single data center, with two (potentially incompatible) goals: 1. We need to be able to have a hot disaster recovery site, in a completely separate data center, that has a near-realtime replica of the search index. 2. We'd like to have the option to have multiple active/active data centers that each see and update the same search index, distributed across data centers. The options I'm aware of from reading archives: a. Simply set up the remote Solr instances as active parts of the same SolrCloud cluster. This will essentially involve us standing up multiple Zookeepers in the second data center, and multiple Solr instances, and they will all keep each other in sync magically. This will also solve both of our goals. However, I'm concerned about performance and whether SolrCloud is smart enough to route local search queries only to local Solr servers ... ? Also, how does such a cluster tolerate and recover from network partitions? b. The remote Solr instances form their own completely unrelated SolrCloud cluster. I have to invent some kind of replication logic of my own to sync data between them. This replication would have to be bidirectional to satisfy both of our goals. I strongly dislike this option since the application really should not concern itself with data distribution. But I'll do it if I must. So my questions are: - Can anyone give me any guidance as to option a? Anyone using this in a real production setting? Words of wisdom? Does it work? - Are there any other options that I'm not considering? - What is Solr's answer to such configurations (we can't be alone in needing one)? Any big enhancements coming on the Solr road map to deal with this? Thanks! Darrell Burgan Darrell Burgan | Chief Architect, PeopleAnswers office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692
Adding HTTP Request Header in SolrJ
Our web services are using PKI authentication so we have a user DN, however we're querying an external Solr which is managed via a proxy which is expecting our server DN proxying the user DN. My question is, how do we add an HTTP header to the request being made by SolrJ? I looked through the source code and I see that we can specify an HttpClient when we create a new instance of an HttpSolrServer. I can set the header there, but that seems slightly hackey to me. I'd prefer to use a servlet filter if possible. Do you have any other suggestions? Thanks! *-- Andrew Doyle* Software Engineer II http://www.clearedgeit.com/news/2013/baltimore-sun-top-workplace/ 10620 Guilford Road, Suite 200 Jessup, MD 20794 direct: 410 854 5560 cell: 410 440 8478 *ado...@clearedgeit.com ado...@clearedgeit.com* * www.ClearEdgeIT.com http://www.ClearEdgeIT.com*
Re: Adding HTTP Request Header in SolrJ
On 2/3/2014 3:40 PM, Andrew Doyle wrote: Our web services are using PKI authentication so we have a user DN, however we're querying an external Solr which is managed via a proxy which is expecting our server DN proxying the user DN. My question is, how do we add an HTTP header to the request being made by SolrJ? I looked through the source code and I see that we can specify an HttpClient when we create a new instance of an HttpSolrServer. I can set the header there, but that seems slightly hackey to me. I'd prefer to use a servlet filter if possible. Do you have any other suggestions? I don't think there's any servlet information (like the filters you mentioned) available in SolrJ. There is in Solr itself, which uses SolrJ, but unless you're writing a servlet or custom server side code for Solr, you won't have access to any of that. If you are writing a servlet or custom server-side code, then they'll be available -- but not from SolrJ. I could be wrong about what I just said, but just now when I looked through the code for HttpSolrServer and SolrServer, I did not see anything about servlets or filters. In my own SolrJ application, I create an HttpClient instance that is used across dozens of HttpSolrServer instances. The following is part of the constructor code for my custom Core class. /* * If this is the first time a Core has been created, create the shared * httpClient with some increased connection properties. Synchronized to * ensure thread safety. */ synchronized (firstInstance) { if (firstInstance) { ModifiableSolrParams params = new ModifiableSolrParams(); params.add(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 200); params.add(HttpClientUtil.PROP_MAX_CONNECTIONS, 5000); httpClient = HttpClientUtil.createClient(params); firstInstance = false; } } These are the static class members used in the above code: /** * A static boolean value indicating whether this is the first instance of * this object. Also used for thread synchronization. */ private static Boolean firstInstance = true; /** * A static http client to use on all Solr server objects. */ private static HttpClient httpClient = null; Just so you know, the deprecations introduced by the recent upgrade to HttpClient 4.3 might complicate things further when it comes to user code. See SOLR-5604. I have some ideas about how to proceed on that issue, but haven'thad a lot of time to look into it, and before I do anything, I need to discuss it with people who are smarter than me. https://issues.apache.org/jira/browse/SOLR-5604 Thanks, Shawn
Re: Special NGRAMish requirement
Hi, Can you provide an example, Alexander? Otis Solr ElasticSearch Support http://sematext.com/ On Feb 3, 2014 5:28 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi, we need to use something very similar to EdgeNGram (minGramSize=1 maxGramSize=50 side=front). The only thing missing is that we would like to reduce the number of matches. The request we need to implement is returning only those matches with the longest tokens (or terms if that is the right word). Is there a way to do this in Solr (not necessarily with EdgeNGram)? Thanks, Alexander
Re: how to write an efficient query with a subquery to restrict the search space?
Hi, Sounds like a possible document and query routing use case. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 31, 2014 7:11 AM, svante karlsson s...@csi.se wrote: It seems to be faster to first restrict the search space and then do the scoring compared to just use the full query and let solr handle everything. For example in my application one of the scoring fields effectivly hits 1/12 of the database (a month field) and if we have 100'' items in the database the this matters. /svante 2014-01-30 Jack Krupansky j...@basetechnology.com: Lucene's default scoring should give you much of what you want - ranking hits of low-frequency terms higher - without any special query syntax - just list out your terms and use OR as your default operator. -- Jack Krupansky -Original Message- From: svante karlsson Sent: Thursday, January 23, 2014 6:42 AM To: solr-user@lucene.apache.org Subject: how to write an efficient query with a subquery to restrict the search space? I have a solr db containing 1 billion records that I'm trying to use in a NoSQL fashion. What I want to do is find the best matches using all search terms but restrict the search space to the most unique terms In this example I know that val2 and val4 is rare terms and val1 and val3 are more common. In my real scenario I'll have 20 fields that I want to include or exclude in the inner query depending on the uniqueness of the requested value. my first approach was: q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2 OR field4:val4)rows=100fl=* but what I think I get is . field4:val4 AND (field2:val2 OR field4:val4) this result is then OR'ed with the rest if I write q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND (field2:val2 OR field4:val4)rows=100fl=* then what I think I get is two sub-queries that is evaluated separately and then joined - performance wise this is bad. Whats the best way to write these types of queries? Are there any performance issues when running it on several solrcloud nodes vs a single instance or should it scale? /svante
Re: Adding DocValues in an existing field
Hi, You can change the field definition and then reindex. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 30, 2014 1:12 PM, yriveiro yago.rive...@gmail.com wrote: Hi, Can I add to an existing field the docvalue feature without wipe the actual? The modification on the schema will be something like this: field name=surrogate_id type=tlong indexed=true stored=true multiValued=false / field name=surrogate_id type=tlong indexed=true stored=true multiValued=false docValues=true/ I want use the actual data to reindex it again in the same collection but in the process create the docvalues too, it's possible? I'm using solr 4.6.1 - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-DocValues-in-an-existing-field-tp4114462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help in understating solr cloud stats data
Hi, Oh, I just saw Greg's email on dev@ about this. IMHO aggregating in the search engine is not the way to do. Leave that to external tools, which are likely to be more flexible when it comes to this. For example, our SPM for Solr can do all kinds of aggregations and filtering by a number of Solr and SolrCloud-specific dimensions already, without Solr having to do any sort of aggregation that it thinks Ops people will really want. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 11:08 AM, Mark Miller markrmil...@gmail.com wrote: You should contribute that and spread the dev load with others :) We need something like that at some point, it's just no one has done it. We currently expect you to aggregate in the monitoring layer and it's a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote: I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom request handler that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm not sure if there's a better way to get jvm-wide stats via jmx but it is *a* way to get it done. Thanks, Greg On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote: I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent on a request? so to get an idea of total request time, I should summarize all the values of all the cores? 2.update_handler/commits - does this include auto_commits? becuaste I'm pretty sure I'm not doing any manual commits and yet I see a number there. 3. update_handler/docs pending - what does this mean? pending for what? for flush to disk? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Duplicate Facet.FIelds cause same results, should dedupe?
Hi, Don't know if this is old or new problem, but it does feel like a bug to me. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 10:48 AM, William Bell billnb...@gmail.com wrote: If we add : facet.field=prac_spec_heirfacet.field=prac_spec_heir we get it twice in the results. This breaks deserialization on wt=json since you cannot have the same name twice Thoughts? Seems like a new bug in 4.6 ? facet.field: [prac_spec_heir,all_proc_name_code,all_cond_name_code, prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name], -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Duplicate Facet.FIelds cause same results, should dedupe?
THis is in 4.6.1. On Mon, Feb 3, 2014 at 9:11 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Don't know if this is old or new problem, but it does feel like a bug to me. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 10:48 AM, William Bell billnb...@gmail.com wrote: If we add : facet.field=prac_spec_heirfacet.field=prac_spec_heir we get it twice in the results. This breaks deserialization on wt=json since you cannot have the same name twice Thoughts? Seems like a new bug in 4.6 ? facet.field: [prac_spec_heir,all_proc_name_code,all_cond_name_code, prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name], -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Solr and SDL Tridion Integration
Thanks a lot for the options. Our site has dynamic content as well. I would look into what best suits. Thanks, Prasi On Mon, Feb 3, 2014 at 10:34 PM, Chris Warner chris_war...@yahoo.comwrote: There are many ways to do this, Prasi. You have a lot of thinking to do on the subject. You could decide to publish your content to database, and then index that database in Solr. You could publish XML or CSV files of your content for Solr to read and index. You could use nutch or some other tool to crawl your web server. There are many more methods, probably. These being some of the more common. Does your site have dynamic content presentation? If so, you may want to consider having Solr examine your broker database. Static pages on your site? You may want to go with either a crawler or publishing a special file for Solr. Please check out https://tridion.stackexchange.com/ for more on this topic. -- chris_war...@yahoo.com On Monday, February 3, 2014 3:54 AM, Jack Krupansky j...@basetechnology.com wrote: If SDL Tridion can export to CSV format, Solr can then import from CSV format. Otherwise, you may have to write a custom script or even maybe Java code to read from SDL Tridion and output a supported Solr format, such as Solr XML, Solr JSON, or CSV. -- Jack Krupansky -Original Message- From: Prasi S Sent: Monday, February 3, 2014 4:16 AM To: solr-user@lucene.apache.org Subject: Solr and SDL Tridion Integration Hi, I want to index sdl tridion content to solr. Can you suggest how this can be achieved. Is there any document/tutorial for this? Thanks Thanks, Prasi
Solr ranking query..
Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^60 + OR (title:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^20 + OR (title:+keyword+ AND domainRank:[10001 TO *] AND adultFlag:N)^2 + OR (fulltxt:+keyword+) ); In case we have multiple words in keywords - A B C D then for the documents that have all the words should rank highest (Group1), then 3 words(Group2), then 2 words(Group 3) etc AND - Within each group (Group1, 2, 3) I would want the ones with the lowest domain rank value to rank higher (but within the group) How can i do this in a single query? and please advice on the fastest way possible, (open to implementing fq other techniques to speed it up) Please advice. Document Structure in XML - doc str name=subDomainwww/str str name=domainncoah.com/str str name=path/links.html/str str name=urlFullhttp://www.ncoah.com/links.html/str str name=titleNorth Carolina Office of Administrative Hearings - Links/str arr name=text strNorth Carolina Office of Administrative Hearings - Links/str /arr str name=relatedLinks - a href=http://www.ncoah.com/links.html; title=HearingsHearings/a - a href=http://www.ncoah.com/links.html; title=RulesRules/a - a href=http://www.ncoah.com/links.html; title=Civil RightsCivil Rights/a - a href=http://www.ncoah.com/links.html; title=WelcomeWelcome/a - a href=http://www.ncoah.com/links.html; title=General InformationGeneral Information/a - a href=http://www.ncoah.com/links.html; title=Directions to OAHDirections to OAH/a - a href=http://www.ncoah.com/links.html; title=Establishment of OAHEstablishment of OAH/a - a href=http://www.ncoah.com/links.html; title=G.S. 150BG.S. 150B/a - a href=http://www.ncoah.com/links.html; title=FormsForms/a - a href=http://www.ncoah.com/links.html; title=LinksLinks/a - a href=http://www.nc.gov/; title=Visit the North Carolina State web portalVisit the North Carolina State web portal/a - a href=http://ncinfo.iog.unc.edu/library/counties.html; title=North Carolina CountiesNorth Carolina Counties/a - a href=http://ncinfo.iog.unc.edu/library/cities.html; title=North Carolina Cities TownsNorth Carolina Cities Towns/a - a href=http://www.nccourts.org/; title=Administrative Office of the CourtsAdministrative Office of the Courts/a - a href=http://www.ncleg.net/; title=North Carolina General AssemblyNorth Carolina General Assembly/a - a href=http://www.doa.state.nc.us/; title=Department of AdministrationDepartment of Administration/a - a href=http://www.ncagr.com/; title=Department of AgricultureDepartment of Agriculture/a - a href=http://www.nccommerce.com; title=Department of CommerceDepartment of Commerce/a - a href=http://www.doc.state.nc.us/; title=Department of CorrectionDepartment of Correction/a - a href=http://www.nccrimecontrol.org/; title=Department of Crime Control Public SafetyDepartment of Crime Control Public Safety/a - a href=http://www.ncdcr.gov/; title=Department of Cultural ResourcesDepartment of Cultural Resources/a - a href=http://www.ncdenr.gov/; title=Department of Environment and Natural ResourcesDepartment of Environment and Natural Resources/a - a href=http://www.dhhs.state.nc.us; title=Department of Health and Human ServicesDepartment of Health and Human Services/a - a href=http://www.ncdoi.com/; title=Department of InsuranceDepartment of Insurance/a - a href=http://www.ncdoj.com/; title=Department of JusticeDepartment of Justice/a - a href=http://www.juvjus.state.nc.us/; title=Department of Juvenile Justice and Delinquency PreventionDepartment of Juvenile Justice and Delinquency Prevention/a - a href=http://www.nclabor.com/; title=Department of LaborDepartment of Labor/a - a href=http://www.dpi.state.nc.us/; title=Department of Public InstructionDepartment of Public Instruction/a - a href=http://www.dor.state.nc.us/; title=Department of RevenueDepartment of Revenue/a - a href=http://www.treasurer.state.nc.us/; title=Department of State TreasurerDepartment of State Treasurer/a - a href=http://www.ncdot.org/; title=Department of TransportationDepartment of Transportation/a - a href=http://www.secstate.state.nc.us/; title=Department of the Secretary of StateDepartment of the Secretary of State/a - a href=http://www.osp.state.nc.us/; title=Office of State PersonnelOffice of State Personnel/a - a href=http://www.governor.state.nc.us/; title=Office of the GovernorOffice of the Governor/a - a href=http://www.ltgov.state.nc.us/; title=Office of the Lt. GovernorOffice of the Lt. Governor/a - a href=http://www.ncauditor.net/; title=Office of the State AuditorOffice of the State Auditor/a - a href=http://www.osc.nc.gov/; title=Office of the State ControllerOffice of the State Controller/a - a href=http://www.ncbar.org/; title=North Carolina Bar AssociationNorth Carolina Bar Association/a - a