Re: Nno servers hosting shard.
Hi, Any suggestion will be really helpful. Kindly provide your inputs. Thanks, Modassar On Thu, Apr 16, 2015 at 4:27 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I have a setup of 5 node SolrCloud (Lucene/Solr version 5.1.0) without replicas. When I am executing complex and large queries with wild-cards after some time I am getting following exceptions. The index size on each of the node is around 170GB and the memory is set to -Xms20g -Xmx24g on each node. Empty shard! org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:214) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:184) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) There is no OutofMemory or any other major lead for me to understand what had caused it. May be I am missing something. There are following other exceptions: SEVERE: null:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://server:8080/solr/collection at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:342) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) WARNING: listener throws error org.apache.solr.common.SolrException: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /configs/collection/params.json at org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:163) at org.apache.solr.core.SolrConfig.refreshRequestParams(SolrConfig.java:919) at org.apache.solr.core.SolrCore$11.run(SolrCore.java:2500) at org.apache.solr.cloud.ZkController$4.run(ZkController.java:2366) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /configs/collection/params.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:294) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:291) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:291) at org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:153) ... 3 more The Zookeeper session timeout is set to 3. In the log file I can see logs of the following pattern for all the queries I fired. INFO:
Re: SolrCloud Core Reload
Optimize will be distributed to all shards/replicas. I believe reload will only reload the specific core. For reloading the complete collection use the Collections API: https://cwiki.apache.org/confluence/display/solr/Collections+API On Thu, Apr 16, 2015 at 5:15 PM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi all, I have a solrcloud cluster with 3 server and there are many cores. Using the SolrCloud UI Admin Core, if I execute core optimize (or reload), all the core in the cluster will be optimized or reloaded? or only the selected core?. Best regards, Vincenzo
Re: 5.1 'unique' facet function / calcDistinct
II. Is there a way to use the stats.calcdistinct functionality and only return the countDistinct portion of the response and not the full list of distinct values -- as provided in the distinctValues portion of the response. In a field with high cardinality the response size becomes too large. I don't think this is currently supported. If there is no such option, could someone point me in the right direction for implementing a custom solution? The problem is how to calculate this in distributed requests. Even if the final response doesn't include the distinct values, the shard responses will probably have to. Look at StatsComponent.java and AbstractStatsValues in StatsValuesFactory.java Tomás Thank you for your time, Levan -- View this message in context: http://lucene.472066.n3.nabble.com/5-1-unique-facet-function-calcDistinct-tp4200110.html Sent from the Solr - User mailing list archive at Nabble.com.
Bad contentType for search handler :text/xml; charset=UTF-8
Hi, we have migrated Solr from 5.0 do 5.1 and we can't search now, we have a ERROR for SolrCore like in subject. I can't get any info through Google. Please, can someone help what is going on? Thanks, Pavel -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Core Reload
I don't think there is any Collection level support at this point in the Solr admin UI. Whatever you do via the UI would be core level, unless I'm forgetting something. On Thu, Apr 16, 2015 at 5:15 PM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi all, I have a solrcloud cluster with 3 server and there are many cores. Using the SolrCloud UI Admin Core, if I execute core optimize (or reload), all the core in the cluster will be optimized or reloaded? or only the selected core?. Best regards, Vincenzo -- Anshum Gupta
HttpSolrServer and CloudSolrServer
Hi All, Good Morning!! For SolrCloud deployment, for indexing data through SolrJ, which is the preferred / correct SolrServer class to use? HttpSolrServer of CloudSolrServer? In case both can be used, when to use which? Any help please. Thanks Regards Vijay -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS.
spellcheck enabled but not getting any suggestions.
Hi I have enabled spellcheck but not getting any suggestions withincorrectly spelled keywords. I added the spellcheck into the/select request handler. What steps did I miss out? spellcheck list in return result: lst name=spellcheck lst name=suggestions/ /lst solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str !-- Spell checking defaults -- str name=spellcheckon/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str /lst !-- append spellchecking to our list of components -- arr name=last-components strspellcheck/str /arr /requestHandler
Re: Merge indexes in MapReduce
Thank you for the reply. Out schema is: 1) Index real-time (on separate machine). 2) NRT index becomes large. 3) Copy NRT index on other machine. 3) Merge NRT-made indexes with large (all-the-time) index. 4) Remove NRT index (until now it was available for searching). At the end we have big, optimized index with data of all the time. And we'r ready to index more data and indexing will be fast. Excuse me, if I'm describing unclearly. About optimization - indexing with low merge-factor results in lot of segments, which results in slow search, so we have to make it. -- View this message in context: http://lucene.472066.n3.nabble.com/Merge-indexes-in-MapReduce-tp4200106p4200346.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Merge indexes in MapReduce
Hi Norgorn, I think there is no ready-made tool out of the box, but you have the spare parts in the MapreduceIndexerTool :-) With little effort you can decouple the index merging component from MRIndexerTool and use based on the needs. I did the same. On Fri, Apr 17, 2015 at 10:40 AM, Norgorn lsunnyd...@mail.ru wrote: Thank you for the reply. Out schema is: 1) Index real-time (on separate machine). 2) NRT index becomes large. 3) Copy NRT index on other machine. 3) Merge NRT-made indexes with large (all-the-time) index. 4) Remove NRT index (until now it was available for searching). At the end we have big, optimized index with data of all the time. And we'r ready to index more data and indexing will be fast. Excuse me, if I'm describing unclearly. About optimization - indexing with low merge-factor results in lot of segments, which results in slow search, so we have to make it. -- View this message in context: http://lucene.472066.n3.nabble.com/Merge-indexes-in-MapReduce-tp4200106p4200346.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Ariya *
Re: HttpSolrServer and CloudSolrServer
If you're using SolrCloud then you should use CloudSolrServer as it is able to abstract / hide the interaction with the cluster. HttpSolrServer communicates directly with a Solr instance. Best, Andrea On 04/17/2015 10:59 AM, Vijay Bhoomireddy wrote: Hi All, Good Morning!! For SolrCloud deployment, for indexing data through SolrJ, which is the preferred / correct SolrServer class to use? HttpSolrServer of CloudSolrServer? In case both can be used, when to use which? Any help please. Thanks Regards Vijay
search ignoring accents
Hello, What is the best way to search in a field ignoring accents? The field has the type: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType Ive tried adding the filter: filter class=solr.ASCIIFoldingFilterFactory/ but some strange results happened.. like: Search by Mourao and the results were: Mourão - OK Monteiro - NOTOK Morais - NOTOK Thanks in advanced, Pedro Figueiredo Senior Engineer mailto:pjlfigueir...@criticalsoftware.com pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 http://www.criticalsoftware.com/ www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA http://cmmiinstitute.com/ A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by http://www.cmu.edu/ CMU
solr 4.8.0 update synonyms in zookeeper splitted files
Hi All, I have solr synonyms stored in multiple files as defined in the schema: !ENTITY sinonimi_freeling sinonimi_freeling/sfaa,sinonimi_freeling/sfab,sinonimi_freeling/sfac,sinonimi_freeling/sfad,sinonimi_freeling/sfae,sinonimi_freeling/sfaf,sinonimi_freeling/sfag,sinonimi_freeling/sfah,sinonimi_freeling/sfai,sinonimi_freeling/sfaj,sinonimi_freeling/sfak so that I can specify synonym resource in this way: filter class=solr.SynonymFilterFactory synonyms=sinonimi_freeling; expand=false ignoreCase=true / I'm quite worried because I tried to update one synonym file adding at the end the new synonyms. SolrCloud didn't update its synonyms list. So I reloaded the core and then I started to have floating results querying solrcloud. I had to stop and restart all the tomcat instances to stop this strange behaviour. Is there a best practice to update synonyms when you are using SynonymFilterFactory? How can I updated the synonym resources, why cannnot I simply upload the new file into zookeeper? Best regards, Vincenzo
Re: SolrCloud 4.8.0 upgrade
Vincenzo D'Amore v.dam...@gmail.com wrote: I have a SolrCloud cluster with 3 server, I would like to use stats.facet, but this feature is available only if I upgrade to 4.10. May I simply redeploy new solr cloud version in tomcat or should reload all the documents? There are other drawbacks? Support for the Disk-format for DocValues was removed after 4.8, so you should check if you use that: DocValuesFormat=Disk for the field in the schema, if I remember correctly. - Toke Eskildsen
Re: spellcheck enabled but not getting any suggestions.
Shouldn't you specify a spellcheck.dictionary in your request handler? Best regards, Elisabeth 2015-04-17 11:24 GMT+02:00 Derek Poh d...@globalsources.com: Hi I have enabled spellcheck but not getting any suggestions withincorrectly spelled keywords. I added the spellcheck into the/select request handler. What steps did I miss out? spellcheck list in return result: lst name=spellcheck lst name=suggestions/ /lst solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str !-- Spell checking defaults -- str name=spellcheckon/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str /lst !-- append spellchecking to our list of components -- arr name=last-components strspellcheck/str /arr /requestHandler
Re: solr 4.8.0 update synonyms in zookeeper splitted files
On 4/17/2015 6:02 AM, Vincenzo D'Amore wrote: I have solr synonyms stored in multiple files as defined in the schema: !ENTITY sinonimi_freeling sinonimi_freeling/sfaa,sinonimi_freeling/sfab,sinonimi_freeling/sfac,sinonimi_freeling/sfad,sinonimi_freeling/sfae,sinonimi_freeling/sfaf,sinonimi_freeling/sfag,sinonimi_freeling/sfah,sinonimi_freeling/sfai,sinonimi_freeling/sfaj,sinonimi_freeling/sfak so that I can specify synonym resource in this way: filter class=solr.SynonymFilterFactory synonyms=sinonimi_freeling; expand=false ignoreCase=true / I'm quite worried because I tried to update one synonym file adding at the end the new synonyms. SolrCloud didn't update its synonyms list. So I reloaded the core and then I started to have floating results querying solrcloud. I had to stop and restart all the tomcat instances to stop this strange behaviour. Is there a best practice to update synonyms when you are using SynonymFilterFactory? How can I updated the synonym resources, why cannnot I simply upload the new file into zookeeper? I've not encountered the !ENTITY syntax or used more than one synonym file. I'll have to take your word for it that this works. When you update a config resource, you must reload or restart for it to take effect. If the resource is used in index analysis, you must reindex after reloading. Resources used in query analysis will take effect immediately. With SolrCloud, you should reload the entire collection (with the Collections API), not just a core (with the CoreAdmin API). I don't know what you mean by floating results above. Thanks, Shawn
Re: search ignoring accents
Hi Pedro, solr.ASCIIFoldingFilterFactory is one way to remove diacritics. Confusion comes from EdgeNGram, why do you need it? Ahmet On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, What is the best way to search in a field ignoring accents? The field has the type: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType I’ve tried adding the filter: filter class=solr.ASCIIFoldingFilterFactory/ but some strange results happened.. like: Search by “Mourao” and the results were: Mourão - OK Monteiro - NOTOK Morais - NOTOK Thanks in advanced, Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
RE: search ignoring accents
Hi Ahmet, Yes... the EdgeNGram is what produces those results... I need it to improve the search by name by the applications users. Thanks. Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 17 April 2015 12:01 To: solr-user@lucene.apache.org Subject: Re: search ignoring accents Hi Pedro, solr.ASCIIFoldingFilterFactory is one way to remove diacritics. Confusion comes from EdgeNGram, why do you need it? Ahmet On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, What is the best way to search in a field ignoring accents? The field has the type: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType I’ve tried adding the filter: filter class=solr.ASCIIFoldingFilterFactory/ but some strange results happened.. like: Search by “Mourao” and the results were: Mourão - OK Monteiro - NOTOK Morais - NOTOK Thanks in advanced, Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
Re: Solr 5.x deployment in production
On 4/16/2015 2:07 PM, Steven White wrote: In my case, I have to deploy Solr on Windows, AIX, and Linux (all server edition). We are a WebSphere shop, moving away from it means I have to deal with politics and culture. You *can* run Solr 5.0 (and 5.1) in another container, just like you could with all previous Solr versions. There are additional steps that have to be taken, such as correctly installing the logging jars and the logging config, but if you've used Solr 4.3 or later, you already know this: http://wiki.apache.org/solr/SolrLogging Eventually, hopefully before we reach the 6.0 release, that kind of deployment won't be possible, because Solr will be a true application (like Jetty itself), not a webapp contained in a .war file. It may take us quite a while to reach that point. If you are already using the scripts that come with Solr 5.x, you will have a seamless transition to the new implementation. The docs for 5.0 say that we aren't supporting deployment in a third-party servlet container, even though that still is possible. There are several reasons for this: * Eventually it won't be possible, because Solr's implementation will change. * We now have scripts that will start Solr in a consistent manner. ** This means that our instructions won't have to change for a new implementation. * There are a LOT of containers available. ** Each one requires different instructions. ** Are problems caused by the container, or Solr? We may not know. * Jetty is the only container that gets tested. ** Bugs with other containers have happened. ** User feedback is usually the only way such bugs can be found. Thanks, Shawn
Re: Range facets in sharded search
Thanks for the fast turnaround, you beat me to opening the Jira and fixed it too! Much appreciated. Thanks, Will From: Tomás Fernández Löbbe tomasflo...@gmail.com Sent: Thursday, April 16, 2015 10:26 PM To: solr-user@lucene.apache.org Subject: Re: Range facets in sharded search Should be fixed in 5.2. See https://issues.apache.org/jira/browse/SOLR-7412 On Thu, Apr 16, 2015 at 3:18 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: This looks like a bug. The logic to merge range facets from shards seems to only be merging counts, not the first level elements. Could you create a Jira? On Thu, Apr 16, 2015 at 2:38 PM, Will Miller wmil...@fbbrands.com wrote: I am seeing some some odd behavior with range facets across multiple shards. When querying each node directly with distrib=false the facet returned matches what is expected. When doing the same query against the collection and it spans the two shards, the facet after and between buckets are wrong. I can re-create a similar problem using the out of the box example scripts and data. I am running on Windows and tested both Solr 5.0.0 and 5.1.0. This is the steps to reproduce: c:\solr-5.1.0\solr -e cloud These are the selections I made: (specify 1-4 nodes) [2]: 2 Please enter the port for node1 [8983]: 8983 Please enter the port for node2 [7574]: 7574 Please provide a name for your new collection: [gettingstarted] gettingstarted How many shards would you like to split gettingstarted into? [2] 2 How many replicas per shard would you like to create? [2] 1 Please choose a configuration ... [data_driven_schema_configs] sample_techproducts_configs I then posted some of the sample XMLs: C:\solr-5.1.0\example\exampledocs java -Dc=gettingstarted -jar post.jar vidcard.xml, hd.xml, ipod_other.xml, ipod_video.xml, mem.xml, monitor.xml, monitor2.xml,mp500.xml, sd500.xml This first query is against node1 with distrib=false: http://localhost:8983/solr/gettingstarted/select/?q=*:*wt=jsonindent=truedistrib=falsefacet=truefacet.range=pricef.price.facet.range.start=0.00f.price.facet.range.end=100.00f.price.facet.range.gap=20f.price.facet.range.other=alldefType=edismaxq.op=AND There are 7 Results (results ommited). facet_ranges:{ price:{ counts:[ 0.0,1, 20.0,0, 40.0,0, 60.0,0, 80.0,1], gap:20.0, start:0.0, end:100.0, before:0, after:5, between:2}}, This second query is against node2 with distrib=false: http://localhost:7574/solr/gettingstarted/select/?q=*:*wt=jsonindent=truedistrib=falsefacet=truefacet.range=pricef.price.facet.range.start=0.00f.price.facet.range.end=100.00f.price.facet.range.gap=20f.price.facet.range.other=alldefType=edismaxq.op=AND 7 Results (one product does not have a price): facet_ranges:{ price:{ counts:[ 0.0,1, 20.0,0, 40.0,0, 60.0,1, 80.0,0], gap:20.0, start:0.0, end:100.0, before:0, after:4, between:2}}, Finally querying the entire collection: http://localhost:7574/solr/gettingstarted/select/?q=*:*wt=jsonindent=truefacet=truefacet.range=pricef.price.facet.range.start=0.00f.price.facet.range.end=100.00f.price.facet.range.gap=20f.price.facet.range.other=alldefType=edismaxq.op=AND 14 results (one without a price range): facet_ranges:{ price:{ counts:[ 0.0,2, 20.0,0, 40.0,0, 60.0,1, 80.0,1], gap:20.0, start:0.0, end:100.0, before:0, after:5, between:2}}, Notice that both the after and the between are wrong here. The actual buckets do correctly represent the right values but I would expect between to be 5 and after to be 13. There appears to be a recently fixed issue ( https://issues.apache.org/jira/browse/SOLR-6154) with range facet in distributed queries but it was related to buckets not always appearing with mincount=1 for the field. This looks like it is a different problem. Anyone have any suggestions or notice anythign wrong with my query parameters? I can open a Jira ticket but wanted to run it by the larger audience first to see if I am missing anything obvious. Thanks, Will
Re: 1:M connectivity
On 4/16/2015 2:27 PM, Oded Sofer wrote: The issue is the firewall setting needed for the cloud. We do not want to open all nodes to all others nodes. However, we found that add-index to a specific node tries to access all other nodes though we set it to index locally on that node only. That is basic SolrCloud operation. If the nodes cannot communicate with each other, they cannot keep replicas in sync, they will not know when another node goes down (required to keep the clusterstate current), multi-shard distributed search will not work, Solr cannot load balance queries across the cloud, collection creation will not work, and so on. There are probably several other fundamental SolrCloud operations that require inter-node communication. If you don't want the nodes to talk to each other, you probably need to stop using SolrCloud, plus give up distributed search and replication entirely. Thanks, Shawn
Re: Nno servers hosting shard.
Hi, sounds like you hit a Full GC. Check your GC.log. Ugo On 17 Apr 2015 08:24, Modassar Ather modather1...@gmail.com wrote: Hi, Any suggestion will be really helpful. Kindly provide your inputs. Thanks, Modassar On Thu, Apr 16, 2015 at 4:27 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I have a setup of 5 node SolrCloud (Lucene/Solr version 5.1.0) without replicas. When I am executing complex and large queries with wild-cards after some time I am getting following exceptions. The index size on each of the node is around 170GB and the memory is set to -Xms20g -Xmx24g on each node. Empty shard! org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:214) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:184) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) There is no OutofMemory or any other major lead for me to understand what had caused it. May be I am missing something. There are following other exceptions: SEVERE: null:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://server:8080/solr/collection at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:342) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) WARNING: listener throws error org.apache.solr.common.SolrException: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /configs/collection/params.json at org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:163) at org.apache.solr.core.SolrConfig.refreshRequestParams(SolrConfig.java:919) at org.apache.solr.core.SolrCore$11.run(SolrCore.java:2500) at org.apache.solr.cloud.ZkController$4.run(ZkController.java:2366) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /configs/collection/params.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:294) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:291) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61) at
Re: SolrCloud Core Reload
Hi, this morning I have optimised my SolrCloud cluster (3 instances). I have many collections, all are in shard and replica for each node. At the end of optimisation task (about 10 minutes) all cores are optimised on every node. How can be sure than also reload affects all the cores? On Fri, Apr 17, 2015 at 9:31 AM, Anshum Gupta ans...@anshumgupta.net wrote: I don't think there is any Collection level support at this point in the Solr admin UI. Whatever you do via the UI would be core level, unless I'm forgetting something. On Thu, Apr 16, 2015 at 5:15 PM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi all, I have a solrcloud cluster with 3 server and there are many cores. Using the SolrCloud UI Admin Core, if I execute core optimize (or reload), all the core in the cluster will be optimized or reloaded? or only the selected core?. Best regards, Vincenzo -- Anshum Gupta -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: SolrCloud Core Reload
On 4/17/2015 7:21 AM, Vincenzo D'Amore wrote: this morning I have optimised my SolrCloud cluster (3 instances). I have many collections, all are in shard and replica for each node. At the end of optimisation task (about 10 minutes) all cores are optimised on every node. How can be sure than also reload affects all the cores? The optimize command is sent at the core level, to a specific machine, but sets in motion an optimize of the entire collection, one core at a time. The optimize update command ignores distrib=false -- it always optimizes the entire collection. If you send a RELOAD action to a core in a collection, it will only affect that core. There is a separate RELOAD action on the Collections API which will reload every core in the collection on all servers. Perhaps we should change how optimize works, and provide an OPTIMIZE action on the Collections API, so it works much the same as RELOAD. I remember seeing an issue in Jira about adding distrib=false support to optimize, but now I can't find it. Changing optimize to work like RELOAD would fix that issue. Thanks, Shawn
RE: search ignoring accents
And for this example what filter should I use? Filter by edr should give the result Pedro The NGram create tokens starting at the beginning or the ending, and in the middle? Thanks! Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] Sent: 17 April 2015 12:22 To: solr-user@lucene.apache.org; 'Ahmet Arslan' Subject: RE: search ignoring accents Hi Ahmet, Yes... the EdgeNGram is what produces those results... I need it to improve the search by name by the applications users. Thanks. Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 17 April 2015 12:01 To: solr-user@lucene.apache.org Subject: Re: search ignoring accents Hi Pedro, solr.ASCIIFoldingFilterFactory is one way to remove diacritics. Confusion comes from EdgeNGram, why do you need it? Ahmet On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, What is the best way to search in a field ignoring accents? The field has the type: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType I’ve tried adding the filter: filter class=solr.ASCIIFoldingFilterFactory/ but some strange results happened.. like: Search by “Mourao” and the results were: Mourão - OK Monteiro - NOTOK Morais - NOTOK Thanks in advanced, Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
Re: facets on external field
Hi Jainam, One workaround is to use facet.query and frange query parser. facet.query={!frange l=50 u=100}field(price) Ahmet On Thursday, April 16, 2015 1:01 PM, jainam vora jainam.v...@gmail.com wrote: Hi, I am using external field for price field since it changes frequently. generate facets using external field? how? I understand that faceting requires indexing and external fields fields are not actually indexed. Is there any solution for this problem? -- Thanks Regards, Jainam Vora
Re: SolrCloud 4.8.0 upgrade
Solr/Lucene are supposed to _always_ read one major version back. Thus your 4.10 should be able to read indexes produced all the way back to (and including) 3.x. Sometimes experimental formats are excepted. In your case you should be fine since you're upgrading from 4.8.. As always, though, I'd recommend copying your indexes someplace just to be paranoid before upgrading. Best, Erick On Fri, Apr 17, 2015 at 10:28 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: Thanks for your answers, I looked at changes and we don't use DocValuesFormat. The question is, if I upgrade the SolrCloud version to 4.10, should I reload entirely all documents? Is there a binary compatibility between these two versions reading the solar home? On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson erickerick...@gmail.com wrote: Look at CHANGES.txt for both Lucene and Solr, there's always an upgrading section for each release. Best, Erick On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Vincenzo D'Amore v.dam...@gmail.com wrote: I have a SolrCloud cluster with 3 server, I would like to use stats.facet, but this feature is available only if I upgrade to 4.10. May I simply redeploy new solr cloud version in tomcat or should reload all the documents? There are other drawbacks? Support for the Disk-format for DocValues was removed after 4.8, so you should check if you use that: DocValuesFormat=Disk for the field in the schema, if I remember correctly. - Toke Eskildsen -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: JSON Facet Analytics API in Solr 5.1
I like the first way. It matches how elasticsearch does it http://www.elastic.co/guide/en/elasticsearch/reference/1.x/search-aggregations-bucket-range-aggregation.html Can we specify explicit ranges in Solr now like we can in elasticsearch? I do like how Solr's version of aggs can be much shorter though! elasticsearch : { aggs : { min_price : { min : { field : price } } } } solr : { facet : { min_price : min(price) } } Great work! On Fri, Apr 17, 2015 at 12:20 PM, Yonik Seeley ysee...@gmail.com wrote: Does anyone have any thoughts on the current general structure of JSON facets? The current general form of a facet command is: facet_name : { facet_type : facet_args } For example: top_authors : { terms : { field : author, limit : 5, }} One alternative I considered in the past is having the type in the args: top_authors : { type : terms, field : author, limit : 5 } It's a flatter structure... probably better in some ways, but worse in other ways. Thoughts / preferences? -Yonik On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote: Folks, there's a new JSON Facet API in the just released Solr 5.1 (actually, a new facet module under the covers too). It's marked as experimental so we have time to change the API based on your feedback. So let us know what you like, what you would change, what's missing, or any other ideas you may have! I've just started the documentation for the reference guide (on our confluence wiki), so for now the best doc is on my blog: http://yonik.com/json-facet-api/ http://yonik.com/solr-facet-functions/ http://yonik.com/solr-subfacets/ I'll also be hanging out more on the #solr-dev IRC channel on freenode if you want to hit me up there about any development ideas. -Yonik --Mike
Re: JSON Facet Analytics API in Solr 5.1
I prefer the second way. I find it more readable and shorter. Thanks for making Solr even better ;) From: Yonik Seeley ysee...@gmail.com Sent: Friday, April 17, 2015 12:20 PM To: solr-user@lucene.apache.org Subject: Re: JSON Facet Analytics API in Solr 5.1 Does anyone have any thoughts on the current general structure of JSON facets? The current general form of a facet command is: facet_name : { facet_type : facet_args } For example: top_authors : { terms : { field : author, limit : 5, }} One alternative I considered in the past is having the type in the args: top_authors : { type : terms, field : author, limit : 5 } It's a flatter structure... probably better in some ways, but worse in other ways. Thoughts / preferences? -Yonik On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote: Folks, there's a new JSON Facet API in the just released Solr 5.1 (actually, a new facet module under the covers too). It's marked as experimental so we have time to change the API based on your feedback. So let us know what you like, what you would change, what's missing, or any other ideas you may have! I've just started the documentation for the reference guide (on our confluence wiki), so for now the best doc is on my blog: http://yonik.com/json-facet-api/ http://yonik.com/solr-facet-functions/ http://yonik.com/solr-subfacets/ I'll also be hanging out more on the #solr-dev IRC channel on freenode if you want to hit me up there about any development ideas. -Yonik
Re: JSON Facet Analytics API in Solr 5.1
Personally I find the second form easier to read. The second level of nesting in the first example confuses me at first glance. I don't have a really strong preference here, but I vote for the second form. On Fri, Apr 17, 2015 at 9:20 AM, Yonik Seeley ysee...@gmail.com wrote: Does anyone have any thoughts on the current general structure of JSON facets? The current general form of a facet command is: facet_name : { facet_type : facet_args } For example: top_authors : { terms : { field : author, limit : 5, }} One alternative I considered in the past is having the type in the args: top_authors : { type : terms, field : author, limit : 5 } It's a flatter structure... probably better in some ways, but worse in other ways. Thoughts / preferences? -Yonik On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote: Folks, there's a new JSON Facet API in the just released Solr 5.1 (actually, a new facet module under the covers too). It's marked as experimental so we have time to change the API based on your feedback. So let us know what you like, what you would change, what's missing, or any other ideas you may have! I've just started the documentation for the reference guide (on our confluence wiki), so for now the best doc is on my blog: http://yonik.com/json-facet-api/ http://yonik.com/solr-facet-functions/ http://yonik.com/solr-subfacets/ I'll also be hanging out more on the #solr-dev IRC channel on freenode if you want to hit me up there about any development ideas. -Yonik
Re: Solr 5.0, defaultSearchField, defaultOperator ?
Hi, df and q.op are the ones you are looking for. You can define them in defaults section. Ahmet On Friday, April 17, 2015 9:18 PM, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old schema for solr 5.0. I have two questions: - how can I set the defaultSearchField ? I don't want to use in the query the df tag because I have a lot of modification to do for that on my web project. - how can I set the defaultOperator (and|or) ? It seems that these options are now deprecated in SOLR 5.0 schema. Thanks a lot for your comment, Regards, Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
RE: Spurious _version_ conflict?
Thanks for getting back. Something like that crossed my mind but I checked the values on the way into SolrJ SolrInputDocument match the values printed in the Admin Query interface and they both match the expected value in the error message exactly. Besides the difference is only in the last few bits ... Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 expected=1498643112821522400 actual=1498643112821522432 Note, all my _version_ values have zeroes in the last two digits. But, again, there is agreement between the Admin UI and every stage of my client (from query in my REST service, to REST client in browser, back to update in my REST service). -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, April 16, 2015 5:04 PM To: solr-user@lucene.apache.org Subject: Re: Spurious _version_ conflict? : I notice that the expected value in the error message matches both what : I pass in and the index contents. But the actual value in the error : message is different only in the last (low order) two digits. : Consistently. what does your client code look like? Are you sure you aren't being bit by a JSON parsing library that can't handle long values and winds up truncating them? https://issues.apache.org/jira/browse/SOLR-6364 -Hoss http://www.lucidworks.com/ * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
RE: Spurious _version_ conflict?
Here's another data point. To work around this issue, I am converting all non-null _version_ values to the constant 1 on the way into Solr. As a result, updates work fine. Immediately after the update+commit, a /select?q=*:* returns the _version_ value of 1498715798795976700 for id == '553d0f5d320c4321b13f4312ff907218'. Looking in solr.log, however, the LogUpdateProcessor displays the following: DEBUG - 2015-04-17 16:06:04.918; org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE FINISH {versions=truewt=javabinversion=2} INFO - 2015-04-17 16:06:04.918; org.apache.solr.update.LoggingInfoStream; [DW][commitScheduler-12-thread-1]: commitScheduler-12-thread-1 finishFullFlush success=true INFO - 2015-04-17 16:06:04.918; org.apache.solr.update.processor.LogUpdateProcessor; [bb] webapp=/solr path=/update params={versions=truewt=javabinversion=2} {add=[553d0f5d320c4321b13f4312ff907218 (1498715798795976704), ... ]} 0 15 Note: 1498715798795976700 is returned from the update to SolrJ with versions=true. I.e. the last two digits disagree, with the client showing only zeroes. So, yes, it appears some truncation is taking place. But it looks to be upstream from my client code (which is seeing the same thing as the Admin UI). I am running 4.10.3 on 64-bit Windows desktop. Java is jdk1.7.0_67, 64-bit. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Friday, April 17, 2015 11:37 AM To: solr-user@lucene.apache.org Subject: RE: Spurious _version_ conflict? Thanks for getting back. Something like that crossed my mind but I checked the values on the way into SolrJ SolrInputDocument match the values printed in the Admin Query interface and they both match the expected value in the error message exactly. Besides the difference is only in the last few bits ... Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 expected=1498643112821522400 actual=1498643112821522432 Note, all my _version_ values have zeroes in the last two digits. But, again, there is agreement between the Admin UI and every stage of my client (from query in my REST service, to REST client in browser, back to update in my REST service). -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, April 16, 2015 5:04 PM To: solr-user@lucene.apache.org Subject: Re: Spurious _version_ conflict? : I notice that the expected value in the error message matches both what : I pass in and the index contents. But the actual value in the error : message is different only in the last (low order) two digits. : Consistently. what does your client code look like? Are you sure you aren't being bit by a JSON parsing library that can't handle long values and winds up truncating them? https://issues.apache.org/jira/browse/SOLR-6364 -Hoss http://www.lucidworks.com/ * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF * * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
Re: search ignoring accents
Pedro: For your example, don't use EdgeNgrams, use just NGrams. That'll index tokens like (in the 2gram case) pe er dr ro and searching against edr would look for ed dr. which would match. However, this isn't in line with your first example where you got results you didn't expect. You'll have to be careful to search for these pairwise tokens as _phrases_ to prevent false matches. Best, Erick On Fri, Apr 17, 2015 at 4:50 AM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: And for this example what filter should I use? Filter by edr should give the result Pedro The NGram create tokens starting at the beginning or the ending, and in the middle? Thanks! Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] Sent: 17 April 2015 12:22 To: solr-user@lucene.apache.org; 'Ahmet Arslan' Subject: RE: search ignoring accents Hi Ahmet, Yes... the EdgeNGram is what produces those results... I need it to improve the search by name by the applications users. Thanks. Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 17 April 2015 12:01 To: solr-user@lucene.apache.org Subject: Re: search ignoring accents Hi Pedro, solr.ASCIIFoldingFilterFactory is one way to remove diacritics. Confusion comes from EdgeNGram, why do you need it? Ahmet On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, What is the best way to search in a field ignoring accents? The field has the type: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType I’ve tried adding the filter: filter class=solr.ASCIIFoldingFilterFactory/ but some strange results happened.. like: Search by “Mourao” and the results were: Mourão - OK Monteiro - NOTOK Morais - NOTOK Thanks in advanced, Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
Re: SolrCloud 4.8.0 upgrade
Thanks for your answers, I looked at changes and we don't use DocValuesFormat. The question is, if I upgrade the SolrCloud version to 4.10, should I reload entirely all documents? Is there a binary compatibility between these two versions reading the solar home? On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson erickerick...@gmail.com wrote: Look at CHANGES.txt for both Lucene and Solr, there's always an upgrading section for each release. Best, Erick On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Vincenzo D'Amore v.dam...@gmail.com wrote: I have a SolrCloud cluster with 3 server, I would like to use stats.facet, but this feature is available only if I upgrade to 4.10. May I simply redeploy new solr cloud version in tomcat or should reload all the documents? There are other drawbacks? Support for the Disk-format for DocValues was removed after 4.8, so you should check if you use that: DocValuesFormat=Disk for the field in the schema, if I remember correctly. - Toke Eskildsen -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Solr 5.0, defaultSearchField, defaultOperator ?
Dear Solr users, Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old schema for solr 5.0. I have two questions: - how can I set the defaultSearchField ? I don't want to use in the query the df tag because I have a lot of modification to do for that on my web project. - how can I set the defaultOperator (and|or) ? It seems that these options are now deprecated in SOLR 5.0 schema. Thanks a lot for your comment, Regards, Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
RE: Spurious _version_ conflict?
you still haven't provided any details on what your client code looks like -- ie: what code is talking to solr? what response format is it asking for? is it JSON? what is parsing that JSON? as for the admin UI: if you are looking at a JSON response in the Query screen of the Admin UI, then the Javascript engine of your webbrowser is being use to parse the JSON and prettty print it for you. what does the _version_ in the *RAW* response from your /get or /select request return when you use something like curl that does *NO* processing of the response data? : Date: Fri, 17 Apr 2015 15:37:21 + : From: Reitzel, Charles charles.reit...@tiaa-cref.org : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: RE: Spurious _version_ conflict? : : Thanks for getting back. Something like that crossed my mind but I checked the values on the way into SolrJ SolrInputDocument match the values printed in the Admin Query interface and they both match the expected value in the error message exactly. : : Besides the difference is only in the last few bits ... : : Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 expected=1498643112821522400 actual=1498643112821522432 : : Note, all my _version_ values have zeroes in the last two digits. But, again, there is agreement between the Admin UI and every stage of my client (from query in my REST service, to REST client in browser, back to update in my REST service). : : -Original Message- : From: Chris Hostetter [mailto:hossman_luc...@fucit.org] : Sent: Thursday, April 16, 2015 5:04 PM : To: solr-user@lucene.apache.org : Subject: Re: Spurious _version_ conflict? : : : : I notice that the expected value in the error message matches both what : : I pass in and the index contents. But the actual value in the error : : message is different only in the last (low order) two digits. : : Consistently. : : what does your client code look like? Are you sure you aren't being bit by a JSON parsing library that can't handle long values and winds up truncating them? : : https://issues.apache.org/jira/browse/SOLR-6364 : : : : -Hoss : http://www.lucidworks.com/ : : * : This e-mail may contain confidential or privileged information. : If you are not the intended recipient, please notify the sender immediately and then delete it. : : TIAA-CREF : * : : -Hoss http://www.lucidworks.com/
Re: Merge indexes in MapReduce
The core admin MERGEINDEXES will work for you I'm pretty sure. You copy the NRT index over to the all-the-time box. MERGEINDEXES just takes the path to the index you want to add to the existing core. Note the warnings in the reference guide about taking care that the indexes aren't changing and committing at the very end of the operation. I suspect this is one of the cases where optimizing is called for, I don't believe the MERGEINDEXES call triggers any kind of segment merging, and since your all-the-time index isn't getting incremental updates (I'm assuming), there's no event to trigger incremental merges. Best Erick On Fri, Apr 17, 2015 at 2:24 AM, ariya bala ariya...@gmail.com wrote: Hi Norgorn, I think there is no ready-made tool out of the box, but you have the spare parts in the MapreduceIndexerTool :-) With little effort you can decouple the index merging component from MRIndexerTool and use based on the needs. I did the same. On Fri, Apr 17, 2015 at 10:40 AM, Norgorn lsunnyd...@mail.ru wrote: Thank you for the reply. Out schema is: 1) Index real-time (on separate machine). 2) NRT index becomes large. 3) Copy NRT index on other machine. 3) Merge NRT-made indexes with large (all-the-time) index. 4) Remove NRT index (until now it was available for searching). At the end we have big, optimized index with data of all the time. And we'r ready to index more data and indexing will be fast. Excuse me, if I'm describing unclearly. About optimization - indexing with low merge-factor results in lot of segments, which results in slow search, so we have to make it. -- View this message in context: http://lucene.472066.n3.nabble.com/Merge-indexes-in-MapReduce-tp4200106p4200346.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Ariya *
Re: search ignoring accents
Hi Pedro, Requirement of Filter by edr should give the result Pedro can be done expanding terms at index time only. You can remove the ngram filter from query analyzer. But remember that ngram filter produces a lot of tokens. Try it on analysis page. Regarding starting at the beginning or the ending, there is an EdgeNGramTokenFilter where you can specify side, front or back. Ahmet On Friday, April 17, 2015 2:50 PM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: And for this example what filter should I use? Filter by edr should give the result Pedro The NGram create tokens starting at the beginning or the ending, and in the middle? Thanks! Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] Sent: 17 April 2015 12:22 To: solr-user@lucene.apache.org; 'Ahmet Arslan' Subject: RE: search ignoring accents Hi Ahmet, Yes... the EdgeNGram is what produces those results... I need it to improve the search by name by the applications users. Thanks. Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 17 April 2015 12:01 To: solr-user@lucene.apache.org Subject: Re: search ignoring accents Hi Pedro, solr.ASCIIFoldingFilterFactory is one way to remove diacritics. Confusion comes from EdgeNGram, why do you need it? Ahmet On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, What is the best way to search in a field ignoring accents? The field has the type: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType I’ve tried adding the filter: filter class=solr.ASCIIFoldingFilterFactory/ but some strange results happened.. like: Search by “Mourao” and the results were: Mourão - OK Monteiro - NOTOK Morais - NOTOK Thanks in advanced, Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU
Re: HttpSolrServer and CloudSolrServer
Additionally when indexing, CloudSolrServer collects up the documents for each shard and routes them to the leader for that shard, moving that processing away from whatever node you happen so contact using HttpSolrServer. Finally, HttpSolrServer is a single point of failure if the node you point to goes down, whereas CloudSolrServer will compensate if any node goes down. Best, Erick On Fri, Apr 17, 2015 at 2:39 AM, Andrea Gazzarini a.gazzar...@gmail.com wrote: If you're using SolrCloud then you should use CloudSolrServer as it is able to abstract / hide the interaction with the cluster. HttpSolrServer communicates directly with a Solr instance. Best, Andrea On 04/17/2015 10:59 AM, Vijay Bhoomireddy wrote: Hi All, Good Morning!! For SolrCloud deployment, for indexing data through SolrJ, which is the preferred / correct SolrServer class to use? HttpSolrServer of CloudSolrServer? In case both can be used, when to use which? Any help please. Thanks Regards Vijay
Re: Solr 5.x deployment in production
Thanks Shawn, this makes a lot of sense. With WAR going away and no mention of Solr deployment strategy (see: https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production) isn't good; there is a gab in Solr's release. It feels as if Solr 5.x was rushed out ignoring Windows Servers deployment. -- George On Fri, Apr 17, 2015 at 9:24 AM, Shawn Heisey apa...@elyograg.org wrote: On 4/16/2015 2:07 PM, Steven White wrote: In my case, I have to deploy Solr on Windows, AIX, and Linux (all server edition). We are a WebSphere shop, moving away from it means I have to deal with politics and culture. You *can* run Solr 5.0 (and 5.1) in another container, just like you could with all previous Solr versions. There are additional steps that have to be taken, such as correctly installing the logging jars and the logging config, but if you've used Solr 4.3 or later, you already know this: http://wiki.apache.org/solr/SolrLogging Eventually, hopefully before we reach the 6.0 release, that kind of deployment won't be possible, because Solr will be a true application (like Jetty itself), not a webapp contained in a .war file. It may take us quite a while to reach that point. If you are already using the scripts that come with Solr 5.x, you will have a seamless transition to the new implementation. The docs for 5.0 say that we aren't supporting deployment in a third-party servlet container, even though that still is possible. There are several reasons for this: * Eventually it won't be possible, because Solr's implementation will change. * We now have scripts that will start Solr in a consistent manner. ** This means that our instructions won't have to change for a new implementation. * There are a LOT of containers available. ** Each one requires different instructions. ** Are problems caused by the container, or Solr? We may not know. * Jetty is the only container that gets tested. ** Bugs with other containers have happened. ** User feedback is usually the only way such bugs can be found. Thanks, Shawn
Re: Bad contentType for search handler :text/xml; charset=UTF-8
Off the cuff, it sounds like you are making a POST request to the SearchHandler (ie: /search or /query) and the Content-TYpe you are sending is text/xml; charset=UTF-8 In the past SearchHandler might have ignored that Content-Type, but now that structured queries can be sent as POST data, it's trying to parse the POST body and it can't make sense of your XML data. As erick said: with out more details on what your client code looks like it's hard to give you additional advice -- the first big question you wnat to ask yourself though, is *why*, in SOlr 5.0, you were POSTing XML data to Solr -- what was the purpose of that POSTed XML data? : Date: Thu, 16 Apr 2015 22:57:30 -0700 (MST) : From: Pavel Hladik pavel.hla...@profimedia.cz : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Bad contentType for search handler :text/xml; charset=UTF-8 : : Hi, : : we have migrated Solr from 5.0 do 5.1 and we can't search now, we have a : ERROR for SolrCore like in subject. I can't get any info through Google. : : Please, can someone help what is going on? : : Thanks, : : Pavel : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss http://www.lucidworks.com/
Re: SolrCloud 4.8.0 upgrade
Look at CHANGES.txt for both Lucene and Solr, there's always an upgrading section for each release. Best, Erick On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Vincenzo D'Amore v.dam...@gmail.com wrote: I have a SolrCloud cluster with 3 server, I would like to use stats.facet, but this feature is available only if I upgrade to 4.10. May I simply redeploy new solr cloud version in tomcat or should reload all the documents? There are other drawbacks? Support for the Disk-format for DocValues was removed after 4.8, so you should check if you use that: DocValuesFormat=Disk for the field in the schema, if I remember correctly. - Toke Eskildsen
Solr Cloud reclaiming disk space from deleted documents
Hi All, Running into an issue and wanted to see if anyone had some suggestions. We are seeing this with both solr 4.6 and 4.10.3 code. We are running an extremely update heavy application, with millions of writes and deletes happening to our indexes constantly. An issue we are seeing is that solr cloud reclaiming the disk space that can be used for new inserts, by cleanup up deletes. We used to run optimize periodically with our old multicore set up, not sure if that works for solr cloud. Num Docs:28762340 Max Doc:48079586 Deleted Docs:19317246 Version 1429299216227 Gen 16525463 Size 109.92 GB In our solrconfig.xml we use the following configs. indexConfig !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength mergeFactor10/mergeFactor mergePolicy class=org.apache.lucene.index.TieredMergePolicy/ mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount3/int int name=maxMergeCount15/int /mergeScheduler ramBufferSizeMB64/ramBufferSizeMB /indexConfig Any suggestions on which which tunable to adjust, mergeFactor, mergeScheduler thread counts etc would be great. Thanks, Rishi.
Re: Differentiating user search term in Solr
: It looks to me that f with qq is doing phrase search, that's not what I : want. The data in the field title is Apache Solr Release Notes if you don't wnat phrase queries then you don't want pharse queries and that's fine -- but it wasn't clear from any of your original emails because you never provided (that i saw) any concrete examples of the types of queries you expected, the types of matches you wanted, and the types of matches you did *NOT* want. details matter https://wiki.apache.org/solr/UsingMailingLists Based on that one concrete example i've now seen of what you *do* want to match: it seems that maybe a general description of your objective is that each of the words in your user input should treated as a mandatory clause in a boolean query -- but the concept of a word is already something that violates your earlier statement about not wanting the query parser to treat any reserved characters as special -- in order to recognize that Apache, Solr and Notes should each be treated as independent mandatory clauses in a boolean query, then some query parser needs to recognize that *whitespace* is a syntactically significant character in your query string: it's what seperates the words in your input. the reason the field parser produces phrase queries in the example URLs you mentioned is because that parser doesn't have *ANY* special reserved characters -- not even whitespace. it passes the entire input string to the analyzer of the configured (f) field. if you are using TextField with a Tokenizer that means it gets split on whitespace, resulting in multiple *sequential* tokens, which will result in a phrase query (on the other hand, using something like StrField will cause the entire input string, spaces an all, to be serached as one single Term) : I looked over the links you provided and tried out the examples, in each : case if the user-typed-text contains any reserved characters, it will fail : with a syntax error (the exception is when I used f and qq but like I : said, that gave me 0 hit). As i said: Details matter. which examples did you try? what configs were you using? what data where you using? which version of solr are you using? what exactly was the syntax error? etc ? f and qq are not magic -- saying you used them just means you used *some* parser that supports an f param ... if you tried it with the term or field parser then i don't know why you would have gotten a SyntaxError, but based on your goal it sounds like those parsers aren't really useful to you. (see below) : If you can give me a concrete example, please do. My need is to pass to : Solr the text Apache: Solr Notes (without quotes) and get a hit as if I : passed Apache\: Solr Notes ? To re-iterate, saying you want the same bhavior as if you passed Apache\: Solr Notes is a vague statment -- as if you passed that string to *what* ? to the standard parser? to the dismax parser? using what request options? (q.op? qf? df?) ... query strings don't exist in a vacume. the details context matters. (I'm sorry if it feels like i keep hitting you over the head about this, i'm just trying to help you realize the breadth and scope of the variables involved in a question like the one you are asking, so you consider the full context and understand *how* to think about the problem you are trying to solve, and what questions to ask yourselve / this list) My *BEST* guess as to a parser that might help you is the simple parser... https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser ...by default it supports several syntactically significant operators (which can be escaped), but those can be disabled using the q.operators option. As the documentation notes Any errors in syntax are ignored and the query parser will interpret as best it can. This can mean, however, odd results in some cases. so a lot of experimentation with a large sample of expected good/bad queries is important to make sure you understand what types of query structures search results you'll get out of them A trivial example of using the simple parser, with the Solr 5.1 bin/solr -e techproducts example configs/data would be... http://localhost:8983/solr/techproducts/select?fl=id,namedebug=querydefType=simpleq.op=ANDq.operators=df=nameq=apple%20-ipod which matches the name Apple 60 GB iPod with Video Playback Black even though there is a - in front of ipod, because the q.operators= param tells the parser to ignore all of it's operators. (at which point the literal string -ipod is passed to the analyzer for the name field, and it's striped off by the tokenizer). On the other hand it does not match the name Belkin Mobile Power Cord for iPod w/ Dock because it doesn't contain apple. That was a trivial good example query -- it's important to remeber however that localparam parsing happens *before* the actual query parser is given the input string (it
Re: Java.net.socketexception: broken pipe Solr 4.10.2
I haven't had time to really take a look at this. But read a couple of articles regarding the hard commit and it actually makes sense. We were seeing tlogs in the multiple GBs during ingest. I will have some time in a couple of weeks to come back to testing indexing. Thanks for the help. Vy -- View this message in context: http://lucene.472066.n3.nabble.com/Java-net-socketexception-broken-pipe-Solr-4-10-2-tp4199484p4200498.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 5.1 'unique' facet function / calcDistinct
Perfect, thank you for the information -- will have a look through those classes. Thank you, Levan -- View this message in context: http://lucene.472066.n3.nabble.com/5-1-unique-facet-function-calcDistinct-tp4200110p4200535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.0, defaultSearchField, defaultOperator ?
: df and q.op are the ones you are looking for. : You can define them in defaults section. specifically... https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig : : Ahmet : : : : On Friday, April 17, 2015 9:18 PM, Bruno Mannina bmann...@free.fr wrote: : Dear Solr users, : : Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old : schema for solr 5.0. : : I have two questions: : - how can I set the defaultSearchField ? : I don't want to use in the query the df tag because I have a lot of : modification to do for that on my web project. : : - how can I set the defaultOperator (and|or) ? : : It seems that these options are now deprecated in SOLR 5.0 schema. : : Thanks a lot for your comment, : : Regards, : Bruno : : --- : Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. : http://www.avast.com : -Hoss http://www.lucidworks.com/
help with schema containing nested documents
Hi, I need some documentation/samples on how to create a SOLR schema with nested documents. I have been looking online but could not find anything. Thank you in advance, Nick Pandrea
RE: Enrich search results with external data
Hi Sujit, Many thanks for your blog post, responding to my question, and suggesting the alternative option ☺ I think I prefer your approach because we can supply our own Comparator. The reason is that we need to meet some strict requirements: we can only call the external system once to retrieve extra fields (price, inventory, etc.) for probably a subset of the search result. Therefore we need to be able to sort and facet on the list of items that some of them may not have external fields. I think using the Comparator would help with the sorting but let me know if you have different ideas. Do you have suggestion how we should deal with the facet requirement? I am thinking about adding another Facet Component that will be executed after the standard FacetComponent. Let me know if you think we should consider other options. Thanks, -Ha -Original Message- From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of Sujit Pal Sent: Saturday, April 11, 2015 10:23 AM To: solr-user@lucene.apache.org; Ahmet Arslan Subject: Re: Enrich search results with external data Hi Ha, I am the author of the blog post you mention. To your question, I don't know if the code will work without change (since the Lucene/Solr API has evolved so much over the last few years), but a more preferred way using Function Queries way may be found in slides for Timothy Potter's talk here: http://www.slideshare.net/thelabdude/boosting-documents-in-solr-lucene-revolution-2011 Here he speaks of external fields stored in a database and accessed using a custom component (rather than from a flat file as in ExternalFieldField), and using function queries to influence the ranking based on the external field. However, per this document on function queries, you can use the output of a function query to sort as well by passing the function to the sort parameter. https://wiki.apache.org/solr/FunctionQuery#Sort_By_Function Hope this helps, Sujit On Fri, Apr 10, 2015 at 10:38 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Who don't you include/add/index those additional fields, at least the one used in sorting? Also, you may find https://stanbol.apache.org/docs/trunk/components/enhancer/ relevant. Ahmet On Saturday, April 11, 2015 1:04 AM, ha.p...@arvatosystems.com ha.p...@arvatosystems.com wrote: This ticket seems to address the problem I have https://issues.apache.org/jira/browse/SOLR-1566 and as the result of that ticket, DocTransformer is added since Solr 4.0. I wrote a simple DocTransformer and found that the transformer is executed AFTER pagination. In our application, we need the external fields added before sorting/pagination. I've looked around for the option to change the execution order but haven't had any luck. Does anyone know the solution? The ticket also states it is not possible for components to add fields to outgoing documents which are not in the stored fields of the document. Does anyone know if this is still true? Thanks, -Ha -Original Message- From: Pham, Ha Sent: Thursday, April 09, 2015 11:41 PM To: solr-user@lucene.apache.org Subject: Enrich search results with external data Hi everyone, We have a requirement to append external data (e.g. price/inventory of product, retrieved from an ERP via web services) to query result and support sorting and pagination based on those external fields. For example if Solr returns 100 records and the page size user selects is 20, the sorting on the external fields is still on 100 records. This limits us from enriching search results outside of Solr. I guess this is a common problem so hopefully someone could share their experience. I am considering using a PostFilter and enrich documents in collect() method as below @Override public void collect(int docId) throws IOException { DoubleField price = new DoubleField (PRICE, 1.23, Field.Store.YES); Document currentDoc = context.reader().document(docId); currentDoc.add(price); } but the result documents don't have PRICE fields. Did I miss anything here? I also did some research and it seems the approach mentioned here http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-exte rnal.html is close to what we need to achieve but since the document is 4 years old, I don't know if there's a better approach for our problem (we are using solr 5.0)? Thanks, -Ha
RE: Spurious _version_ conflict?
Ah, starting to see the light ... thanks for your patience. First, this is a Java REST service using solrj. I am using default transport (wt=javabin, I think). But right-clicking the URL at the top of the Admin query page and selecting open in new tab displays the non-truncated _version_ values. Also, I am getting the non-truncated values from SolrJ QueryResponse. I think I short-circuited my diagnosis when I saw matching truncated values in the browser. So, my bad. To be safe, I will transport _version_ values as strings. Thanks for your help! -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Friday, April 17, 2015 12:50 PM To: solr-user@lucene.apache.org Subject: RE: Spurious _version_ conflict? you still haven't provided any details on what your client code looks like -- ie: what code is talking to solr? what response format is it asking for? is it JSON? what is parsing that JSON? as for the admin UI: if you are looking at a JSON response in the Query screen of the Admin UI, then the Javascript engine of your webbrowser is being use to parse the JSON and prettty print it for you. what does the _version_ in the *RAW* response from your /get or /select request return when you use something like curl that does *NO* processing of the response data? : Date: Fri, 17 Apr 2015 15:37:21 + : From: Reitzel, Charles charles.reit...@tiaa-cref.org : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: RE: Spurious _version_ conflict? : : Thanks for getting back. Something like that crossed my mind but I checked the values on the way into SolrJ SolrInputDocument match the values printed in the Admin Query interface and they both match the expected value in the error message exactly. : : Besides the difference is only in the last few bits ... : : Error executing update: version conflict for 553d0f5d320c4321b13f4312ff907218 expected=1498643112821522400 actual=1498643112821522432 : : Note, all my _version_ values have zeroes in the last two digits. But, again, there is agreement between the Admin UI and every stage of my client (from query in my REST service, to REST client in browser, back to update in my REST service). : : -Original Message- : From: Chris Hostetter [mailto:hossman_luc...@fucit.org] : Sent: Thursday, April 16, 2015 5:04 PM : To: solr-user@lucene.apache.org : Subject: Re: Spurious _version_ conflict? : : : : I notice that the expected value in the error message matches both what : : I pass in and the index contents. But the actual value in the error : : message is different only in the last (low order) two digits. : : Consistently. : : what does your client code look like? Are you sure you aren't being bit by a JSON parsing library that can't handle long values and winds up truncating them? : : https://issues.apache.org/jira/browse/SOLR-6364 : : : : -Hoss : http://www.lucidworks.com/ : : * : This e-mail may contain confidential or privileged information. : If you are not the intended recipient, please notify the sender immediately and then delete it. : : TIAA-CREF : * : : -Hoss http://www.lucidworks.com/ * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
Re: 5.1 'unique' facet function / calcDistinct
I've posted the issue here, please let me know if any additional information needs to be provided. https://issues.apache.org/jira/browse/SOLR-7417 Happy to provide the feedback, using the sub-facets has been a lot of fun, the nested facet query is especially useful. -- View this message in context: http://lucene.472066.n3.nabble.com/5-1-unique-facet-function-calcDistinct-tp4200110p4200534.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud reclaiming disk space from deleted documents
On 4/17/2015 2:15 PM, Rishi Easwaran wrote: Running into an issue and wanted to see if anyone had some suggestions. We are seeing this with both solr 4.6 and 4.10.3 code. We are running an extremely update heavy application, with millions of writes and deletes happening to our indexes constantly. An issue we are seeing is that solr cloud reclaiming the disk space that can be used for new inserts, by cleanup up deletes. We used to run optimize periodically with our old multicore set up, not sure if that works for solr cloud. Num Docs:28762340 Max Doc:48079586 Deleted Docs:19317246 Version 1429299216227 Gen 16525463 Size 109.92 GB In our solrconfig.xml we use the following configs. indexConfig !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength mergeFactor10/mergeFactor mergePolicy class=org.apache.lucene.index.TieredMergePolicy/ mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount3/int int name=maxMergeCount15/int /mergeScheduler ramBufferSizeMB64/ramBufferSizeMB /indexConfig This part of my response won't help the issue you wrote about, but it can affect performance, so I'm going to mention it. If your indexes are stored on regular spinning disks, reduce mergeScheduler/maxThreadCount to 1. If they are stored on SSD, then a value of 3 is OK. Spinning disks cannot do seeks (read/write head moves) fast enough to handle multiple merging threads properly. All the seek activity required will really slow down merging, which is a very bad thing when your indexing load is high. SSD disks do not have to seek, so multiple threads are OK there. An optimize is the only way to reclaim all of the disk space held by deleted documents. Over time, as segments are merged automatically, deleted doc space will be automatically recovered, but it won't be perfect, especially as segments are merged multiple times into very large segments. If you send an optimize command to a core/collection in SolrCloud, the entire collection will be optimized ... the cloud will do one shard replica (core) at a time until the entire collection has been optimized. There is no way (currently) to ask it to only optimize a single core, or to do multiple cores simultaneously, even if they are on different servers. Thanks, Shawn
Re: Solr Cloud reclaiming disk space from deleted documents
Thanks Shawn for the quick reply. Our indexes are running on SSD, so 3 should be ok. Any recommendation on bumping it up? I guess will have to run optimize for entire solr cloud and see if we can reclaim space. Thanks, Rishi. -Original Message- From: Shawn Heisey apa...@elyograg.org To: solr-user solr-user@lucene.apache.org Sent: Fri, Apr 17, 2015 6:22 pm Subject: Re: Solr Cloud reclaiming disk space from deleted documents On 4/17/2015 2:15 PM, Rishi Easwaran wrote: Running into an issue and wanted to see if anyone had some suggestions. We are seeing this with both solr 4.6 and 4.10.3 code. We are running an extremely update heavy application, with millions of writes and deletes happening to our indexes constantly. An issue we are seeing is that solr cloud reclaiming the disk space that can be used for new inserts, by cleanup up deletes. We used to run optimize periodically with our old multicore set up, not sure if that works for solr cloud. Num Docs:28762340 Max Doc:48079586 Deleted Docs:19317246 Version 1429299216227 Gen 16525463 Size 109.92 GB In our solrconfig.xml we use the following configs. indexConfig !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength mergeFactor10/mergeFactor mergePolicy class=org.apache.lucene.index.TieredMergePolicy/ mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount3/int int name=maxMergeCount15/int /mergeScheduler ramBufferSizeMB64/ramBufferSizeMB /indexConfig This part of my response won't help the issue you wrote about, but it can affect performance, so I'm going to mention it. If your indexes are stored on regular spinning disks, reduce mergeScheduler/maxThreadCount to 1. If they are stored on SSD, then a value of 3 is OK. Spinning disks cannot do seeks (read/write head moves) fast enough to handle multiple merging threads properly. All the seek activity required will really slow down merging, which is a very bad thing when your indexing load is high. SSD disks do not have to seek, so multiple threads are OK there. An optimize is the only way to reclaim all of the disk space held by deleted documents. Over time, as segments are merged automatically, deleted doc space will be automatically recovered, but it won't be perfect, especially as segments are merged multiple times into very large segments. If you send an optimize command to a core/collection in SolrCloud, the entire collection will be optimized ... the cloud will do one shard replica (core) at a time until the entire collection has been optimized. There is no way (currently) to ask it to only optimize a single core, or to do multiple cores simultaneously, even if they are on different servers. Thanks, Shawn
Re: JSON Facet Analytics API in Solr 5.1
Agreed, I also prefer the second way. I find it more readible, less verbose while communicating the same information, less confusing to mentally parse (is 'terms' the name of my facet, or the type of my facet?...), and less prone to syntactlcally valid, but logically invalid inputs. Let's break those topics down. *1) Less verbose while communicating the same information:* The flatter structure is particularly useful when you have nested facets to reduce unnecessary verbosity / extra levels. Let's contrast the two approaches with just 2 levels of subfacets: ** Current Format ** top_genres:{ terms:{ field: genre, limit: 5, facet:{ top_authors:{ terms:{ field: author, limit: 4, facet: { top_books:{ terms:{ field: title, limit: 5 } } } } } } } } ** Flat Format ** top_genres:{ type: terms, field: genre, limit: 5, facet:{ top_authors:{ type: terms field: author, limit: 4, facet: { top_books:{ type: terms field: title, limit: 5 } } } } } The flat format is clearly shorter and more succinct, while communicating the same information. What value do the extra levels add? *2) Less confusing to mentally parse* I also find the flatter structure less confusing, as I'm consistently having to take a mental pause with the current format to verify whether terms is the name of my facet or the type of my facet and have to count the curly braces to figure this out. Not that I would name my facets like this, but to give an extreme example of why that extra mental calculation is necessary due to the name of an attribute in the structure being able to represent both a facet name and facet type: terms: { terms: { field: genre, limit: 5, facet: { terms: { terms:{ field: author limit: 4 } } } } } In this example, the first terms is a facet name, the second terms is a facet type, the third is a facet name, etc. Even if you don't name your facets like this, it still requires parsing someone else's query mentally to ensure that's not what was done. 3) *Less prone to syntactically valid, but logically invalid inputs* Also, given this first format (where the type is indicated by one of several possible attributes: terms, range, etc.), what happens if I pass in multiple of the valid JSON attributes... the flatter structure prevents this from being possible (which is a good thing!): top_authors : { terms : { field : author, limit : 5 }, range : { field : price, start : 0, end : 100, gap : 20 } } I don't think the response format can currently handle this without adding in extra levels to make it look like the input side, so this is an exception case even thought it seems syntactically valid. So in conclusion, I'd give a strong vote to the flatter structure. Can someone enumerate the benefits of the current format over the flatter structure (I'm probably dense and just failing to see them currently)? Thanks, -Trey On Fri, Apr 17, 2015 at 2:28 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: I prefer the second way. I find it more readable and shorter. Thanks for making Solr even better ;) From: Yonik Seeley ysee...@gmail.com Sent: Friday, April 17, 2015 12:20 PM To: solr-user@lucene.apache.org Subject: Re: JSON Facet Analytics API in Solr 5.1 Does anyone have any thoughts on the current general structure of JSON facets? The current general form of a facet command is: facet_name : { facet_type : facet_args } For example: top_authors : { terms : { field : author, limit : 5, }} One alternative I considered in the past is having the type in the args: top_authors : { type : terms, field : author, limit : 5 } It's a flatter structure... probably better in some ways, but worse in other ways. Thoughts / preferences? -Yonik On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote: Folks, there's a new JSON Facet API in the just released Solr 5.1 (actually, a new facet module under the covers too). It's marked as experimental so we have time to change the API based on your feedback. So let us know what you like, what you would change, what's missing, or any other ideas you may have! I've just started the documentation for the reference guide (on our confluence wiki), so
Re: MoreLikeThis (mlt) in sharded SolrCloud
Ah, I meant SOLR-7418 https://issues.apache.org/jira/browse/SOLR-7418. On Fri, Apr 17, 2015 at 4:30 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Ere, Those seem like valid issues. I've created an issue : SOLR-7275 https://issues.apache.org/jira/browse/SOLR-7275 and will create more as I find more of those. I plan to get to them and fix over the weekend. On Wed, Apr 15, 2015 at 5:13 AM, Ere Maijala ere.maij...@helsinki.fi wrote: Hi, I'm trying to gather information on how mlt works or is supposed to work with SolrCloud and a sharded collection. I've read issues SOLR-6248, SOLR-5480 and SOLR-4414, and docs at https://wiki.apache.org/solr/MoreLikeThis, but I'm still struggling with multiple issues. I've been testing with Solr 5.1 and the Getting Started sample cloud. So, with a freshly extracted Solr, these are the steps I've done: bin/solr start -e cloud -noprompt bin/post -c gettingstarted docs/ bin/post -c gettingstarted example/exampledocs/books.json After this I've tried different variations of queries with limited success: http://localhost:8983/solr/gettingstarted/select?q={!mlt}non-existing causes java.lang.NullPointerException at org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:80) http://localhost:8983/solr/gettingstarted/select?q={!mlt}978-0641723445 causes java.lang.NullPointerException at org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:84) http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=title}978-0641723445 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=title%7D978-0641723445 causes java.lang.NullPointerException at org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759) http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=cat}978-0641723445 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=cat%7D978-0641723445 actually gives results http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=author,cat}978-0641723445 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=author,cat%7D978-0641723445 again causes Java.lang.NullPointerException at org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759) I guess the actual question is, how am I supposed to use the handler to replicate behavior of non-distributed mlt that was formerly used with qt=morelikethis and the following configuration in solrconfig.xml: requestHandler name=morelikethis class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.fltitle,title_short,callnumber-label,topic,language,author,publishDate/str str name=mlt.qf title^75 title_short^100 callnumber-label^400 topic^300 language^30 author^75 publishDate /str int name=mlt.mintf1/int int name=mlt.mindf1/int str name=mlt.boosttrue/str int name=mlt.count5/int int name=rows5/int /lst /requestHandler Real-life full schema and config can be found at https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf . --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland -- Anshum Gupta -- Anshum Gupta
Highlighting
Hello All, I am new to solr and trying to configure highlighting. If I look at the result in xml, or json format, I can see the highlighting part of the data and it looks good. However the velocity page does not show the highlighted words on my result page. Do I need to do something extra for the highlighting results to show up on the page that is generated by Velocity? Here is my hl setting in solrconfig.xml: str name=hlon/str str name=hl.flseriesTitle/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldseriesTitle/str Here is those fields in schema.xml: field name=seriesTitle type=text indexed=true stored=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType Thank you in advance. -- Misagh Karimi
Multilevel nested level support using Solr
Hi folks, In my DB, my records are nested in a folder base hierarchy: Root Level_1 record_1 record_2 Level_2 record_3 record_4 Level_3 record_5 Level_1 Level_2 Level_3 record_6 record_7 record_8 You got the idea. Is there anything in Solr that will let me preserve this structer and thus when I'm searching to tell it in which level to narrow down the search? I have four search levels needs: 1) Be able to search inside only level: Root.Level_1.Level_2.* (and everything under Level_2 from this path). 2) Be able to search inside a level regardless it's path: Level_2.* (no matter where Level_2 is, i want to search on all records under Level_2 and everything under it's path. 3) Same as #1 but limit the search to within that level (nothing below its level are searched). 4) Same as #3 but limit the search to within that level (nothing below its level are searched). I found this: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments but it looks like it supports one level only and requires the whole two levels be updated even if 1 of the doc in the nest is updated. Thanks Steve
Re: JSON Facet Analytics API in Solr 5.1
Does anyone have any thoughts on the current general structure of JSON facets? The current general form of a facet command is: facet_name : { facet_type : facet_args } For example: top_authors : { terms : { field : author, limit : 5, }} One alternative I considered in the past is having the type in the args: top_authors : { type : terms, field : author, limit : 5 } It's a flatter structure... probably better in some ways, but worse in other ways. Thoughts / preferences? -Yonik On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote: Folks, there's a new JSON Facet API in the just released Solr 5.1 (actually, a new facet module under the covers too). It's marked as experimental so we have time to change the API based on your feedback. So let us know what you like, what you would change, what's missing, or any other ideas you may have! I've just started the documentation for the reference guide (on our confluence wiki), so for now the best doc is on my blog: http://yonik.com/json-facet-api/ http://yonik.com/solr-facet-functions/ http://yonik.com/solr-subfacets/ I'll also be hanging out more on the #solr-dev IRC channel on freenode if you want to hit me up there about any development ideas. -Yonik
Re: Bad contentType for search handler :text/xml; charset=UTF-8
Not unless you provide a lot more details. Specifically, anything in your Solr logs that looks suspicious _and_ in your container logs (Tomcat? Jetty?). Plus the message you sent. Please review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Thu, Apr 16, 2015 at 10:57 PM, Pavel Hladik pavel.hla...@profimedia.cz wrote: Hi, we have migrated Solr from 5.0 do 5.1 and we can't search now, we have a ERROR for SolrCore like in subject. I can't get any info through Google. Please, can someone help what is going on? Thanks, Pavel -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 4.8.0 update synonyms in zookeeper splitted files
On 4/17/2015 7:45 PM, Vincenzo D'Amore wrote: Hi Shawn, thanks for your answer. I apologise for my english, for floating results I meant random results in queries. As far as I know, we should split the synonyms file because of zookeeper, there is a limit in the size of files (1MB). All my synonyms are about 10MB. That's a very large synonyms file. If your synonyms happen at index time, that might slow down indexing, and as I said before in my previous reply, a full reindex would be required after updating the synonyms. If your synonyms are at query time, a reindex wouldn't be required. Such a large synonym file at query time could add noticeable time to query parsing, because every term in the query would need to be checked against every synonym. Regarding the 1MB limit in zookeeper, you might find it more useful to increase the limit instead of trying to use multiple files. Adding -Djute.maxbuffer= to the java commandline on all Solr (Tomcat) instances and all Zookeeper instances will increase this limit. http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Experimental+Options%2FFeatures As a general rule, storing very large stuff in zookeeper is not recommended, but synonyms will only be read when a core first starts up or is reloaded, so I do not think it is a big problem in this case. I have tried again in dev environment these steps: 1. put into zookeeper an updated synonym file sinonimi_freeling/sfak (added just one new synonym ) 2. reload the core using Core Admin UI Then I started to receive random results executing a simple query like: http://src-dev-3:8080/solr/0bis/select/?q=smartphonefl=*rows=24 There are random numFound in result name=response numFound=641 start=0 maxScore=4.653946 and the order of documents vary. If numFound is changing when you run the same query multiple times, there is one of two things happening: 1) You have documents with the same uniqueKey value in more than one shard. This can happen if you are using implicit (manual) document routing for multiple shards. 2) Different replicas of your index have different settings (such as the synonyms), or different documents in the index.Different settings can happen if you update the config and then only reload/restart some of your cores. Different documents in different replicas is usually an indication of a bug, or something going very wrong, such as OutOfMemory errors. Thanks, Shawn
Solr Performance with Ram size variation
Hi, As per this article, the linux machine is preferred to have 1.5 times RAM with respect to index size. So, to verify this, I tried testing the solr performance in different volumes of RAM allocation keeping other configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same in both the cases. I am using solr 4.8.1 with tomcat server. https://wiki.apache.org/solr/SolrPerformanceProblems 1) Initially, the linux machine had 32 GB RAM, out of which I allocated 14GB to solr. export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log The average search time for 1000 queries 300ms. 2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to Solr. Now, on a strange note, the average search time for the same set of queries was 3000ms. Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But, still the search time was higher as compared to first case. What am I missing. Please suggest.
Re: Solr Performance with Ram size variation
Hi, This may be irrelevant but your machine configuration reminded me of some reading I had done some time back on memory vs ssd. Do a search on solr ssd and you should get some meaningful posts. Like this one https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ Regards Puneet On 18 Apr 2015 07:45, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi, As per this article, the linux machine is preferred to have 1.5 times RAM with respect to index size. So, to verify this, I tried testing the solr performance in different volumes of RAM allocation keeping other configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same in both the cases. I am using solr 4.8.1 with tomcat server. https://wiki.apache.org/solr/SolrPerformanceProblems 1) Initially, the linux machine had 32 GB RAM, out of which I allocated 14GB to solr. export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log The average search time for 1000 queries 300ms. 2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to Solr. Now, on a strange note, the average search time for the same set of queries was 3000ms. Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But, still the search time was higher as compared to first case. What am I missing. Please suggest.
Re: Solr Performance with Ram size variation
Hi, Because you went over 31-32 GB heap you lost the benefit of compressed pointers and even though you gave the JVM more memory the GC may have had to work harder. This is a relatively well educated guess, which you can confirm if you run tests and look at GC counts, times, JVM heap memory pool utilization, etc. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 17, 2015 at 10:14 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi, As per this article, the linux machine is preferred to have 1.5 times RAM with respect to index size. So, to verify this, I tried testing the solr performance in different volumes of RAM allocation keeping other configuration (i.e Solid State Drives, 8 core processor, 64-Bit) to be same in both the cases. I am using solr 4.8.1 with tomcat server. https://wiki.apache.org/solr/SolrPerformanceProblems 1) Initially, the linux machine had 32 GB RAM, out of which I allocated 14GB to solr. export CATALINA_OPTS=-Xms2048m -Xmx14336m -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:./logs/info_error/tomcat_gcdetails.log The average search time for 1000 queries 300ms. 2) After that, RAM was increased to 68 GB, out of which I allocated 40GB to Solr. Now, on a strange note, the average search time for the same set of queries was 3000ms. Now, after this, I reduced solr allocated RAM to 25GB on 68GB machine. But, still the search time was higher as compared to first case. What am I missing. Please suggest.
Re: Enrich search results with external data
Hi Ha, Yes, I think if you want to facet on the external field, the custom component seems to be the best option IMO. -sujit On Fri, Apr 17, 2015 at 3:02 PM, ha.p...@arvatosystems.com wrote: Hi Sujit, Many thanks for your blog post, responding to my question, and suggesting the alternative option ☺ I think I prefer your approach because we can supply our own Comparator. The reason is that we need to meet some strict requirements: we can only call the external system once to retrieve extra fields (price, inventory, etc.) for probably a subset of the search result. Therefore we need to be able to sort and facet on the list of items that some of them may not have external fields. I think using the Comparator would help with the sorting but let me know if you have different ideas. Do you have suggestion how we should deal with the facet requirement? I am thinking about adding another Facet Component that will be executed after the standard FacetComponent. Let me know if you think we should consider other options. Thanks, -Ha -Original Message- From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of Sujit Pal Sent: Saturday, April 11, 2015 10:23 AM To: solr-user@lucene.apache.org; Ahmet Arslan Subject: Re: Enrich search results with external data Hi Ha, I am the author of the blog post you mention. To your question, I don't know if the code will work without change (since the Lucene/Solr API has evolved so much over the last few years), but a more preferred way using Function Queries way may be found in slides for Timothy Potter's talk here: http://www.slideshare.net/thelabdude/boosting-documents-in-solr-lucene-revolution-2011 Here he speaks of external fields stored in a database and accessed using a custom component (rather than from a flat file as in ExternalFieldField), and using function queries to influence the ranking based on the external field. However, per this document on function queries, you can use the output of a function query to sort as well by passing the function to the sort parameter. https://wiki.apache.org/solr/FunctionQuery#Sort_By_Function Hope this helps, Sujit On Fri, Apr 10, 2015 at 10:38 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Who don't you include/add/index those additional fields, at least the one used in sorting? Also, you may find https://stanbol.apache.org/docs/trunk/components/enhancer/ relevant. Ahmet On Saturday, April 11, 2015 1:04 AM, ha.p...@arvatosystems.com ha.p...@arvatosystems.com wrote: This ticket seems to address the problem I have https://issues.apache.org/jira/browse/SOLR-1566 and as the result of that ticket, DocTransformer is added since Solr 4.0. I wrote a simple DocTransformer and found that the transformer is executed AFTER pagination. In our application, we need the external fields added before sorting/pagination. I've looked around for the option to change the execution order but haven't had any luck. Does anyone know the solution? The ticket also states it is not possible for components to add fields to outgoing documents which are not in the stored fields of the document. Does anyone know if this is still true? Thanks, -Ha -Original Message- From: Pham, Ha Sent: Thursday, April 09, 2015 11:41 PM To: solr-user@lucene.apache.org Subject: Enrich search results with external data Hi everyone, We have a requirement to append external data (e.g. price/inventory of product, retrieved from an ERP via web services) to query result and support sorting and pagination based on those external fields. For example if Solr returns 100 records and the page size user selects is 20, the sorting on the external fields is still on 100 records. This limits us from enriching search results outside of Solr. I guess this is a common problem so hopefully someone could share their experience. I am considering using a PostFilter and enrich documents in collect() method as below @Override public void collect(int docId) throws IOException { DoubleField price = new DoubleField (PRICE, 1.23, Field.Store.YES); Document currentDoc = context.reader().document(docId); currentDoc.add(price); } but the result documents don't have PRICE fields. Did I miss anything here? I also did some research and it seems the approach mentioned here http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-exte rnal.html is close to what we need to achieve but since the document is 4 years old, I don't know if there's a better approach for our problem (we are using solr 5.0)? Thanks, -Ha
Re: MoreLikeThis (mlt) in sharded SolrCloud
The other issue that would fix half of your problems is: https://issues.apache.org/jira/browse/SOLR-7143 On Fri, Apr 17, 2015 at 4:35 PM, Anshum Gupta ans...@anshumgupta.net wrote: Ah, I meant SOLR-7418 https://issues.apache.org/jira/browse/SOLR-7418. On Fri, Apr 17, 2015 at 4:30 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Ere, Those seem like valid issues. I've created an issue : SOLR-7275 https://issues.apache.org/jira/browse/SOLR-7275 and will create more as I find more of those. I plan to get to them and fix over the weekend. On Wed, Apr 15, 2015 at 5:13 AM, Ere Maijala ere.maij...@helsinki.fi wrote: Hi, I'm trying to gather information on how mlt works or is supposed to work with SolrCloud and a sharded collection. I've read issues SOLR-6248, SOLR-5480 and SOLR-4414, and docs at https://wiki.apache.org/solr/MoreLikeThis, but I'm still struggling with multiple issues. I've been testing with Solr 5.1 and the Getting Started sample cloud. So, with a freshly extracted Solr, these are the steps I've done: bin/solr start -e cloud -noprompt bin/post -c gettingstarted docs/ bin/post -c gettingstarted example/exampledocs/books.json After this I've tried different variations of queries with limited success: http://localhost:8983/solr/gettingstarted/select?q={!mlt}non-existing causes java.lang.NullPointerException at org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:80) http://localhost:8983/solr/gettingstarted/select?q={!mlt}978-0641723445 causes java.lang.NullPointerException at org.apache.solr.search.mlt.CloudMLTQParser.parse(CloudMLTQParser.java:84) http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=title}978-0641723445 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=title%7D978-0641723445 causes java.lang.NullPointerException at org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759) http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=cat}978-0641723445 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=cat%7D978-0641723445 actually gives results http://localhost:8983/solr/gettingstarted/select?q={!mlt%20qf=author,cat}978-0641723445 http://localhost:8983/solr/gettingstarted/select?q=%7B!mlt%20qf=author,cat%7D978-0641723445 again causes Java.lang.NullPointerException at org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:759) I guess the actual question is, how am I supposed to use the handler to replicate behavior of non-distributed mlt that was formerly used with qt=morelikethis and the following configuration in solrconfig.xml: requestHandler name=morelikethis class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.fltitle,title_short,callnumber-label,topic,language,author,publishDate/str str name=mlt.qf title^75 title_short^100 callnumber-label^400 topic^300 language^30 author^75 publishDate /str int name=mlt.mintf1/int int name=mlt.mindf1/int str name=mlt.boosttrue/str int name=mlt.count5/int int name=rows5/int /lst /requestHandler Real-life full schema and config can be found at https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf . --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland -- Anshum Gupta -- Anshum Gupta -- Anshum Gupta
Re: SolrCloud 4.8.0 upgrade
Great!! Thank you very much. On Fri, Apr 17, 2015 at 7:36 PM, Erick Erickson erickerick...@gmail.com wrote: Solr/Lucene are supposed to _always_ read one major version back. Thus your 4.10 should be able to read indexes produced all the way back to (and including) 3.x. Sometimes experimental formats are excepted. In your case you should be fine since you're upgrading from 4.8.. As always, though, I'd recommend copying your indexes someplace just to be paranoid before upgrading. Best, Erick On Fri, Apr 17, 2015 at 10:28 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: Thanks for your answers, I looked at changes and we don't use DocValuesFormat. The question is, if I upgrade the SolrCloud version to 4.10, should I reload entirely all documents? Is there a binary compatibility between these two versions reading the solar home? On Fri, Apr 17, 2015 at 7:04 PM, Erick Erickson erickerick...@gmail.com wrote: Look at CHANGES.txt for both Lucene and Solr, there's always an upgrading section for each release. Best, Erick On Fri, Apr 17, 2015 at 5:31 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Vincenzo D'Amore v.dam...@gmail.com wrote: I have a SolrCloud cluster with 3 server, I would like to use stats.facet, but this feature is available only if I upgrade to 4.10. May I simply redeploy new solr cloud version in tomcat or should reload all the documents? There are other drawbacks? Support for the Disk-format for DocValues was removed after 4.8, so you should check if you use that: DocValuesFormat=Disk for the field in the schema, if I remember correctly. - Toke Eskildsen -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251 -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
Re: solr 4.8.0 update synonyms in zookeeper splitted files
Hi Shawn, thanks for your answer. I apologise for my english, for floating results I meant random results in queries. As far as I know, we should split the synonyms file because of zookeeper, there is a limit in the size of files (1MB). All my synonyms are about 10MB. I have tried again in dev environment these steps: 1. put into zookeeper an updated synonym file sinonimi_freeling/sfak (added just one new synonym ) 2. reload the core using Core Admin UI Then I started to receive random results executing a simple query like: http://src-dev-3:8080/solr/0bis/select/?q=smartphonefl=*rows=24 There are random numFound in result name=response numFound=641 start=0 maxScore=4.653946 and the order of documents vary. So, now I'm pretty afraid to update such synonyms because I cannot stop and start all instances in production. I'll take a look at how reload the entire collection through the Collection API. thanks again for your suggestions. On Fri, Apr 17, 2015 at 3:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 4/17/2015 6:02 AM, Vincenzo D'Amore wrote: I have solr synonyms stored in multiple files as defined in the schema: !ENTITY sinonimi_freeling sinonimi_freeling/sfaa,sinonimi_freeling/sfab,sinonimi_freeling/sfac,sinonimi_freeling/sfad,sinonimi_freeling/sfae,sinonimi_freeling/sfaf,sinonimi_freeling/sfag,sinonimi_freeling/sfah,sinonimi_freeling/sfai,sinonimi_freeling/sfaj,sinonimi_freeling/sfak so that I can specify synonym resource in this way: filter class=solr.SynonymFilterFactory synonyms=sinonimi_freeling; expand=false ignoreCase=true / I'm quite worried because I tried to update one synonym file adding at the end the new synonyms. SolrCloud didn't update its synonyms list. So I reloaded the core and then I started to have floating results querying solrcloud. I had to stop and restart all the tomcat instances to stop this strange behaviour. Is there a best practice to update synonyms when you are using SynonymFilterFactory? How can I updated the synonym resources, why cannnot I simply upload the new file into zookeeper? I've not encountered the !ENTITY syntax or used more than one synonym file. I'll have to take your word for it that this works. When you update a config resource, you must reload or restart for it to take effect. If the resource is used in index analysis, you must reindex after reloading. Resources used in query analysis will take effect immediately. With SolrCloud, you should reload the entire collection (with the Collections API), not just a core (with the CoreAdmin API). I don't know what you mean by floating results above. Thanks, Shawn