Re: Local Solr and Webserver-Solr act differently (and treated like or)
Unfortunatly, I don't really know what stopwords are. I would like it to not ignore any words of my query. How/Where can I change this stopwords-behaviour? Am 16.10.2013 23:45, schrieb Jack Krupansky: So, the stopwords.txt file is different between the two systems - the first has stop words but the second does not. Did you expect stop words to be removed, or not? -- Jack Krupansky -Original Message- From: Stavros Delsiavas Sent: Wednesday, October 16, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Re: Local Solr and Webserver-Solr act differently (and treated like or) Okay I understand, here's the rawquerystring. It was at about line 3000: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:wild*/str str name=parsedquery_toString+title:wild*/str At this place the debug output DOES differ from the one on my local system. But I don't understand why... This is the local debug output: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:into +title:the +title:wild*/str str name=parsedquery_toString+title:into +title:the +title:wild*/str Why is that? Any ideas? Am 16.10.2013 21:03, schrieb Shawn Heisey: On 10/16/2013 4:46 AM, Stavros Delisavas wrote: My local solr gives me: http://pastebin.com/Q6d9dFmZ and my webserver this: http://pastebin.com/q87WEjVA I copied only the first few hundret lines (of more than 8000) because the webserver output was to big even for pastebin. On 16.10.2013 12:27, Erik Hatcher wrote: What does the debug output say from debugQuery=true say between the two? What's really needed here is the first part of the debug section, which has rawquerystring, querystring, parsedquery, and parsedquery_toString. The info from your local solr has this part, but what you pasted from the webserver one didn't include those parts, because it's further down than the first few hundred lines. Thanks, Shawn
Re: SolrCloud Performance Issue
Query result cache hit might be low due to using NOW in bf. NOW is always translated to current time and that of course changes from ms to ms... :) Primoz From: Shamik Bandopadhyay sham...@gmail.com To: solr-user@lucene.apache.org Date: 17.10.2013 00:14 Subject:SolrCloud Performance Issue Hi, I'm in the process of transitioning to SolrCloud from a conventional Master-Slave model. I'm using Solr 4.4 and has set-up 2 shards with 1 replica each. I've 3 zookeeper ensemble. All the nodes are running on AWS EC2 instances. Shards are on m1.xlarge and sharing a zookeeper instance (mounted on a separate volume). 6 gb memory is allocated to each solr instance. I've around 10 million documents in index. With the previous standalone model, the queries avg around 100 ms. The SolrCloud query response have been abysmal so far. The query response time is over 1000ms, reaching 2000ms often. I expected some surge due to additional servers, network latency, etc. but this difference is really baffling. The hardware is similar in both cases, except for the fact that couple of SolrCloud node is sharing zookeeper as well. m1x.large I/O is high, so shouldn't be a bottleneck as well. The other difference from old setup is that I'm using the new CloudSolrServer class which is having the 3 zookeeper reference for load balancing. But I don't think it has any major impact as the queries executed from Solr admin query panel confirms the slowness. Here are some of my configuration setup: autoCommit maxTime3/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit maxBooleanClauses1024/maxBooleanClauses filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=8192 autowarmCount=4096/ documentCache class=solr.LRUCache size=32768 initialSize=16384 autowarmCount=0/ fieldValueCache class=solr.FastLRUCache size=16384 autowarmCount=8192 showItems=4096 / enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize200/queryResultWindowSize queryResultMaxDocsCached400/queryResultMaxDocsCached listener event=newSearcher class=solr.QuerySenderListener arr name=queries lststr name=qline/str/lst lststr name=qxref/str/lst lststr name=qdraw/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=qline/str/lst lststr name=qdraw/str/lst lststr name=qline/strstr name=fqlanguage:english/str/lst lststr name=qline/strstr name=fqSource2:documentation/str/lst lststr name=qline/strstr name=fqSource2:CloudHelp/str/lst lststr name=qdraw/strstr name=fqlanguage:english/str/lst lststr name=qdraw/strstr name=fqSource2:documentation/str/lst lststr name=qdraw/strstr name=fqSource2:CloudHelp/str/lst /arr /listener maxWarmingSearchers2/maxWarmingSearchers The custom request handler : requestHandler name=/adskcloudhelp class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=wtvelocity/str str name=v.templatebrowse/str str name=v.contentTypetext/html;charset=UTF-8/str str name=v.layoutlayout/str str name=v.channelcloudhelp/str str name=defTypeedismax/str str name=q.alt*:*/str str name=rows15/str str name=flid,url,Description,Source2,text,filetype,title,LastUpdateDate,PublishDate,ViewCount,TotalMessageCount,Solution,LastPostAuthor,Author,Duration,AuthorUrl,ThumbnailUrl,TopicId,score/str str name=qftext^1.5 title^2 IndexTerm^.9 keywords^1.2 ADSKCommandSrch^2 ADSKContextId^1/str str name=bqSource2:CloudHelp^3 Source2:youtube^0.85/str str name=bfrecip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0/str str name=dftext/str str name=faceton/str str name=facet.mincount1/str str name=facet.limit100/str str name=facet.fieldlanguage/str str name=facet.fieldSource2/str str name=facet.fieldDocumentationBook/str str name=facet.fieldADSKProductDisplay/str str name=facet.fieldaudience/str str
Re: Local Solr and Webserver-Solr act differently (and treated like or)
Stopwords are small words such as and, the or is,that we might choose to exclude from our documents and queries because they are such common terms. Once you have stripped stop words from your above query, all that is left is the word wild, or so is being suggested. Somewhere in your config, close to solr config.xml, you will find a file called something like stopwords.txt. Compare these files between your two systems. Upayavira On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote: Unfortunatly, I don't really know what stopwords are. I would like it to not ignore any words of my query. How/Where can I change this stopwords-behaviour? Am 16.10.2013 23:45, schrieb Jack Krupansky: So, the stopwords.txt file is different between the two systems - the first has stop words but the second does not. Did you expect stop words to be removed, or not? -- Jack Krupansky -Original Message- From: Stavros Delsiavas Sent: Wednesday, October 16, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Re: Local Solr and Webserver-Solr act differently (and treated like or) Okay I understand, here's the rawquerystring. It was at about line 3000: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:wild*/str str name=parsedquery_toString+title:wild*/str At this place the debug output DOES differ from the one on my local system. But I don't understand why... This is the local debug output: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:into +title:the +title:wild*/str str name=parsedquery_toString+title:into +title:the +title:wild*/str Why is that? Any ideas? Am 16.10.2013 21:03, schrieb Shawn Heisey: On 10/16/2013 4:46 AM, Stavros Delisavas wrote: My local solr gives me: http://pastebin.com/Q6d9dFmZ and my webserver this: http://pastebin.com/q87WEjVA I copied only the first few hundret lines (of more than 8000) because the webserver output was to big even for pastebin. On 16.10.2013 12:27, Erik Hatcher wrote: What does the debug output say from debugQuery=true say between the two? What's really needed here is the first part of the debug section, which has rawquerystring, querystring, parsedquery, and parsedquery_toString. The info from your local solr has this part, but what you pasted from the webserver one didn't include those parts, because it's further down than the first few hundred lines. Thanks, Shawn
Re: Local Solr and Webserver-Solr act differently (and treated like or)
Thank you, I found the file with the stopwords and noticed that my local file is empty (comments only) and the one on my webserver has a big list of english stopwords. That seems to be the problem. I think in general it is a good idea to use stopwords for random searches, but it is not usefull in my special case. Is there a way to (de)activate stopwords query-wise? Like I would like to ignore stopwords when searching in titles but I would like to use stopwords when users do a fulltext-search on whole articles, etc. Thanks again, Stavros On 17.10.2013 09:13, Upayavira wrote: Stopwords are small words such as and, the or is,that we might choose to exclude from our documents and queries because they are such common terms. Once you have stripped stop words from your above query, all that is left is the word wild, or so is being suggested. Somewhere in your config, close to solr config.xml, you will find a file called something like stopwords.txt. Compare these files between your two systems. Upayavira On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote: Unfortunatly, I don't really know what stopwords are. I would like it to not ignore any words of my query. How/Where can I change this stopwords-behaviour? Am 16.10.2013 23:45, schrieb Jack Krupansky: So, the stopwords.txt file is different between the two systems - the first has stop words but the second does not. Did you expect stop words to be removed, or not? -- Jack Krupansky -Original Message- From: Stavros Delsiavas Sent: Wednesday, October 16, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Re: Local Solr and Webserver-Solr act differently (and treated like or) Okay I understand, here's the rawquerystring. It was at about line 3000: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:wild*/str str name=parsedquery_toString+title:wild*/str At this place the debug output DOES differ from the one on my local system. But I don't understand why... This is the local debug output: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:into +title:the +title:wild*/str str name=parsedquery_toString+title:into +title:the +title:wild*/str Why is that? Any ideas? Am 16.10.2013 21:03, schrieb Shawn Heisey: On 10/16/2013 4:46 AM, Stavros Delisavas wrote: My local solr gives me: http://pastebin.com/Q6d9dFmZ and my webserver this: http://pastebin.com/q87WEjVA I copied only the first few hundret lines (of more than 8000) because the webserver output was to big even for pastebin. On 16.10.2013 12:27, Erik Hatcher wrote: What does the debug output say from debugQuery=true say between the two? What's really needed here is the first part of the debug section, which has rawquerystring, querystring, parsedquery, and parsedquery_toString. The info from your local solr has this part, but what you pasted from the webserver one didn't include those parts, because it's further down than the first few hundred lines. Thanks, Shawn
Change config set for a collection
The question also asked some 10 months ago in http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for-a-collection-td4037456.html, and then the answer was negative, but here it goes again, maybe now it's different. Is it possible to change the config set of a collection using the Collection API to another one (stored in zookeeper)? If not, is it possible to do it using zkCli ? Also how can somebody check which config set a collection is using ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032.html Sent from the Solr - User mailing list archive at Nabble.com.
measure result set quality
Hi, Imagine the next situation. You have a corpus of documents and a list of queries extracted from production environment. The corpus haven't been manually annotated with relvant/non relevant tags for every query. Then you configure various solr instances changing the schema (adding synonyms, stopwords...). After indexing, you prepare and execute the test over different schema configurations. How do you compare the quality of your search result in order to decide which schema is better? Regards.
A few questions about solr and tika
Hello everyone! Please tell me how and where to set Tika options in Solr? Where is Tica conf? I'm want to know how I can eliminate not required to me response attribute(such as links or images)? Also I am interesting how i can get and index only metadata in several file formats?
Status of wiki documentation on grouping under distributed search
On the SolrCloud wiki page (https://wiki.apache.org/solr/SolrCloud), I found this statement: The Grouping feature only works if groups are in the same shard. You must use the custom sharding feature to use the Grouping feature. However, the Distributed Search page (https://wiki.apache.org/solr/DistributedSearch) implies that grouping largely works, and the actual grouping page (or rather Field Collapsing: http://wiki.apache.org/solr/FieldCollapsing) goes into much more detail, outlining limitations on specific features. Am I right in assuming that the statement on the SolrCloud page is out of date? I'm happy to replace it with some text that links to https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Lim itations if that makes more sense? Similarly, on the Distributed Search page, we find: Doesn't support MoreLikeThis -- (see https://issues.apache.org/jira/browse/SOLR-788) Looking at the issue, it seems this has been (largely?) resolved since Solr 4.1 and 5.0. Can I update the text to reflect that? Thanks for your time. Best wishes, Andy Jackson -- Dr Andrew N Jackson Web Archiving Technical Lead The British Library Tel: 01937 546602 Mobile: 07765 897948 Web: www.webarchive.org.uk http://www.webarchive.org.uk/ Twitter: @UKWebArchive
Re: Timeout Errors while using Collections API
On 16 October 2013 11:48, RadhaJayalakshmi rlakshminaraya...@inautix.co.inwrote: Hi, My setup is Zookeeper ensemble - running with 3 nodes Tomcats - 9 Tomcat instances are brought up, by registereing with zookeeper. Steps : 1) I uploaded the solr configuration like db_data_config, solrconfig, schema xmls into zookeeoper 2) Now, i am trying to create a collection with the collection API like below: http://miadevuser001.albridge.com:7021/solr/admin/collections?action=CREATEname=Schwab_InvACC_CollnumShards=1replicationFactor=2createNodeSet=localhost:7034_solr,localhost:7036_solrcollection.configName=InvestorAccountDomainConfig Now, when i execute this command, i am getting the following error: responselst name=responseHeaderint name=status500/intint name=QTime60015/int/lstlst name=errorstr name=msgcreatecollection the collection time out:60s/strstr name=traceorg.apache.solr.common.SolrException: createcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156) at org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) /strint name=code500/int/lst/response Now after i got this error, i am not able to do any operation on these instances with collection API. It is repeteadly giving the same timeout error.. This setup was working fine 5 mins back. suddenly it started throwing this exceptions. Any ideas please?? -- View this message in context: http://lucene.472066.n3.nabble.com/Timeout-Errors-while-using-Collections-API-tp4095852.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grzegorz Sobczyk
Re: Timeout Errors while using Collections API
Sorry for previous spam (something eat my message) I have the same problem but with reload action ENV: - 3x Solr 4.2.1 with 4 cores each - ZK Before error I have: - 14, 2013 5:25:36 AM CollectionsHandler handleReloadAction INFO: Reloading Collection : name=productsaction=RELOAD - hundreds of (with the same timestamp) 14, 2013 5:25:36 AM DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged - 13 times (from 2013 5:25:39 to 5:25:45): -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={action=STATUSwt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={action=STATUSwt=ruby} status=0 QTime=1 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [forum] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [knowledge] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [products] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [shops] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=1 - 14, 2013 5:26:21 AM SolrCore execute INFO: [products] webapp=/solr path=/select/ params={q=solrpingquery} hits=0 status=0 QTime=0 - 14, 2013 5:26:36 AM DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work/qnr-000806 state: SyncConnected type NodeDeleted - 14, 2013 5:26:36 AM SolrException log SEVERE: org.apache.solr.common.SolrException: reloadcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:162) at org.apache.solr.handler.admin.CollectionsHandler.handleReloadAction(CollectionsHandler.java:184) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:120) What are possilibities of such behaviour? When this error is thrown? Does anybody has the same issue? On 17 October 2013 13:08, Grzegorz Sobczyk gsobc...@gmail.com wrote: On 16 October 2013 11:48, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: Hi, My setup is Zookeeper ensemble - running with 3 nodes Tomcats - 9 Tomcat instances are brought up, by registereing with zookeeper. Steps : 1) I uploaded the solr configuration like db_data_config, solrconfig, schema xmls into zookeeoper 2) Now, i am trying to create a collection with the collection API like below: http://miadevuser001.albridge.com:7021/solr/admin/collections?action=CREATEname=Schwab_InvACC_CollnumShards=1replicationFactor=2createNodeSet=localhost:7034_solr,localhost:7036_solrcollection.configName=InvestorAccountDomainConfig Now, when i execute this command, i am getting the following error: responselst name=responseHeaderint name=status500/intint name=QTime60015/int/lstlst name=errorstr name=msgcreatecollection the collection time out:60s/strstr name=traceorg.apache.solr.common.SolrException: createcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156) at org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at
Re: SolrCloud on SSL
Tim, if a separate VLAN was an option, I wouldn't be trying to use SSL. -- Chris On Wed, Oct 16, 2013 at 7:27 PM, Tim Vaillancourt t...@elementspace.comwrote: Not important, but I'm also curious why you would want SSL on Solr (adds overhead, complexity, harder-to-troubleshoot, etc)? To avoid the overhead, could you put Solr on a separate VLAN (with ACLs to client servers)? Cheers, Tim On 12 October 2013 17:30, Shawn Heisey s...@elyograg.org wrote: On 10/11/2013 9:38 AM, Christopher Gross wrote: On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org wrote: On 10/11/2013 8:17 AM, Christopher Gross wrote: Is there a spot in a Solr configuration that I can set this up to use HTTPS? From what I can tell, not yet. https://issues.apache.org/jira/browse/SOLR-3854 https://issues.apache.org/jira/browse/SOLR-4407 https://issues.apache.org/jira/browse/SOLR-4470 Dang. Christopher, I was just looking through Solr source code for a completely different issue, and it seems that there *IS* a way to do this in your configuration. If you were to use https://hostname; or https://ipaddress; as the host parameter in your solr.xml file on each machine, it should do what you want. The parameter is described here, but not the behavior that I have discovered: http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params Boring details: In the org.apache.solr.cloud package, there is a ZkController class. The getHostAddress method is where I discovered that you can do this. If you could try this out and confirm that it works, I will get the wiki page updated and look into the Solr reference guide as well. Thanks, Shawn
RE: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID
Thanks Shalin. Regards, Bharat Akkinepalli -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Thursday, October 17, 2013 1:18 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID Thanks Bharat. This is a bug. I've opened LUCENE-5289. https://issues.apache.org/jira/browse/LUCENE-5289 On Wed, Oct 16, 2013 at 9:35 PM, Akkinepalli, Bharat (ELS-CON) b.akkinepa...@elsevier.com wrote: Hi Shalin, I am not sure why the log specifies No uncommitted changes appear. The data is available in Solr at the time I perform a delete. please find the below steps I have performed: Inserted a document in master (with id= change.me.1) issued a commit on master Triggered replication on slave Ensured that the document is replicated successfully. Issued a delete by ID. Issued a commit on master Replication did NOT happen. The logs are as follows: Master - http://pastebin.com/265CtCEp Slave - http://pastebin.com/Qx0xLwmK Regards, Bharat Akkinepalli. -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, October 16, 2013 11:28 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID The only delete I see in the master logs is: INFO - 2013-10-11 14:06:54.793; org.apache.solr.update.processor.LogUpdateProcessor; [annotation] webapp=/solr path=/update params={} {delete=[change.me(-1448623278425899008)]} 0 60 When you commit, we have the following: INFO - 2013-10-11 14:07:03.809; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDele tes=false,softCommit=false,prepareCommit=false} INFO - 2013-10-11 14:07:03.813; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. That suggests that the id you are trying to delete never existed in the first place and hence there was nothing to commit. Hence replication was not triggered. Am I missing something? On Wed, Oct 16, 2013 at 5:06 PM, Akkinepalli, Bharat (ELS-CON) b.akkinepa...@elsevier.com wrote: Hi Otis, Did you get a chance to look into the logs. Please let me know if you need more information. Thank you. Regards, Bharat Akkinepalli -Original Message- From: Akkinepalli, Bharat (ELS-CON) [mailto:b.akkinepa...@elsevier.com] Sent: Friday, October 11, 2013 2:16 PM To: solr-user@lucene.apache.org Subject: RE: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID Hi Otis, Thanks for the response. The log files can be found here. MasterLog : http://pastebin.com/DPLKMPcF Slave Log: http://pastebin.com/DX9sV6Jx One more point worth mentioning here is that when we issue the commit with expungeDeletes=true, then the delete by id replication is successful. i.e. http://localhost:8983/solr/annotation/update?commit=trueexpungeDele te s=true Regards, Bharat Akkinepalli -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Wednesday, October 09, 2013 6:35 PM To: solr-user@lucene.apache.org Subject: Re: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID Bharat, Can you look at the logs on the Master when you issue the delete and the subsequent commits and share that? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Oct 8, 2013 at 3:57 PM, Akkinepalli, Bharat (ELS-CON) b.akkinepa...@elsevier.com wrote: Hi, We have recently migrated from Solr 3.6 to Solr 4.4. We are using the Master/Slave configuration in Solr 4.4 (not Solr Cloud). We have noticed the following behavior/defect. Configuration: === 1. The Hard Commit and Soft Commit are disabled in the configuration (we control the commits from the application) 2. We have 1 Master and 2 Slaves configured and the pollInterval is configured to 10 Minutes. 3. The Master is configured to have the replicateAfter as commit startup Steps to reproduce the problem: == 1. Delete a document in Solr (using delete by id). URL - http://localhost:8983/solr/annotation/update with body as deleteid change.me/id/delete 2. Issue a commit in Master ( http://localhost:8983/solr/annotation/update?commit=true). 3. The replication of the DELETE WILL NOT happen. The master and slave has the same Index version. 4. If we try to issue another commit in Master, we see that it replicates
Solr errors
Hello everyone! Please tell my wy Solr freezes when I adding this file http://yadi.sk/d/dy-RtcHXB7KZU The response from the server does not come. curl http://localhost:8085/solr/myCollection/update/extract?literal.id=doc1literal.fileName=asuprefix=attr_commit=true; -F myfile=@/media/PENDRIVE/Out/www-http/159/8696_6_5_5535.mp3 Second question: When I adding this file http://yadi.sk/d/OpLW2JTTB7Ms4 Solr returns: wonder@wonder:~$ curl http://localhost:8085/solr/myCollection/update/extract?literal.id=doc1literal.fileName=asuprefix=attr_commit=true; -F myfile=@/media/PENDRIVE/Out/www-http/152/8696_6_5_5528.jpeg ?xml version=1.0 encoding=UTF-8? response lst name=errorstr name=msgjava.lang.NoClassDefFoundError: com/adobe/xmp/XMPException/strstr name=tracejava.lang.RuntimeException: java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1489) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:517) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:540) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1097) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:446) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1031) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:200) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:269) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException at com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112) at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71) at org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91) at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) ... 23 more Caused by: java.lang.ClassNotFoundException: com.adobe.xmp.XMPException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 37 more /strint name=code500/int/lst /response
ExtractRequestHandler, skipping errors
Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the ExtractRequestHandler to ignore errors like it seems to be possible with dataimporthandler and entity processors. I know that it is possible to configure the ExtractRequestHandler to ignore tika exception (We already do that) but the errors that now stops the mcfjobs are generated by solr itself. While it is interesting to have such option in solr, I plan to post to the manifoldcf mailing list, anyway, to know if it is possible to configure manifolcf to be less picky about solr errors. Regards, Roland.
Re: Solr errors
Does anybody know how index files in zip archives?
Re: A few questions about solr and tika
Thanks for answer. If I dont want to store and index any fields i do: field name=links type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=link type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=img type=string indexed=false stored=false multiValued=true/!--удаление лишних TIKA-- field name=iframe type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=area type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=map type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=pragma type=string indexed=false stored=false multiValued=true/!--удаление лишних TIKA-- field name=expires type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=keywords type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=stream_source_info type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- Other qestions is still open for me. 17.10.2013 14:26, primoz.sk...@policija.si пишет: Why don't you check these: - Content extraction with Apache Tika ( http://www.youtube.com/watch?v=ifgFjAeTOws) - ExtractingRequestHandler ( http://wiki.apache.org/solr/ExtractingRequestHandler) - Uploading Data with Solr Cell using Apache Tika ( https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika ) Primož From: wonder a-wonde...@rambler.ru To: solr-user@lucene.apache.org Date: 17.10.2013 12:23 Subject:A few questions about solr and tika Hello everyone! Please tell me how and where to set Tika options in Solr? Where is Tica conf? I'm want to know how I can eliminate not required to me response attribute(such as links or images)? Also I am interesting how i can get and index only metadata in several file formats?
Re: Regarding Solr Cloud issue...
Wow thanks for all that, i just upgraded, linked my plugins it seems fine so far, but i have run into another issue while adding a document to the solr cloud it says - org.apache.solr.common.SolrException: Unknown document router '{name=compositeId}' in the clusterstate.json i can see - shard5:{ range:4ccc-7fff, state:active, replicas:{core_node4:{ state:active, base_url:http://64.251.14.47:1984/solr;, core:web_shard5_replica1, node_name:64.251.14.47:1984_solr, leader:true, maxShardsPerNode:2, router:{name:compositeId}, replicationFactor:1}, I am using this to add - CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL); solrCoreCloud.setDefaultCollection(web); UpdateResponse up = solrCoreCloud.addBean(resultItem); UpdateResponse upr = solrCoreCloud.commit(); Please advice. On Wed, Oct 16, 2013 at 9:49 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 4:51 AM, Chris wrote: Also, is there any easy way upgrading to 4.5 without having to change most of my plugins configuration files? Upgrading is something that should be done carefully. If you can, it's always recommended that you try it out on dev hardware with your real index data beforehand, so you can deal with any problems that arise without causing problems for your production cluster. Upgrading SolrCloud is particularly tricky, because for a while you will be running different versions on different machines in your cluster. If you're using your own custom software to go with Solr, or you're using third-party plugins that aren't included in the Solr download, upgrading might take more effort than usual. Also, if you are doing anything in your config/schema that changes the format of the Lucene index, you may find that it can't be upgraded without completely rebuilding the index. Examples of this are changing the postings format or docValues format. This is a very nasty complication with SolrCloud, because those configurations affect the entire cluster. In that case, the whole index may need to be rebuilt without custom formats before upgrading is attempted. If you don't have any of the complications mentioned in the preceding paragraph, upgrading is usually a very simple process: *) Shut down Solr. *) Delete the extracted WAR file directory. *) Replace solr.war with the new war from dist/ in the download. **) Usually it must actually be named solr.war, which means renaming it. *) Delete and replace other jars copied from the download. *) Change luceneMatchVersion in all solrconfig.xml files. ** *) Start Solr back up. ** With SolrCloud, you can't actually change the luceneMatchVersion until all of your servers have been upgraded. A full reindex is strongly recommended. With SolrCloud, it normally needs to wait until all servers are upgraded. In situations where it won't work at all without a reindex, upgrading SolrCloud can be very challenging. It's strongly recommended that you look over CHANGES.txt and compare the new example config/schema with the example from the old version, to see if there are any changes that you might want to incorporate into your own config. As with luceneMatchVersion, if you're running SolrCloud, those changes might need to wait until you're fully upgraded. Side note: When upgrading to a new minor version, config changes aren't normally required. They will usually be required when upgrading major versions, such as 3.x to 4.x. If you *do* have custom plugins that aren't included in the Solr download, you may have to recompile them for the new version, or wait for the vendor to create a new version before you upgrade. This is only the tip of the iceberg, but a lot of the rest of it depends greatly on your configurations. Thanks, Shawn
Re: Local Solr and Webserver-Solr act differently (and treated like or)
The default Solr stopwords.txt file is empty, so SOMEBODY created that non-empty stop words file. The StopFilterFactory token filter in the field type analyzer controls stop word processing. You can remove that step entirely, or different field types can reference different stop word files, or some field type analyzers can use the stop filter and some would not have it. This does mean that you would have to use different field types for fields that want different stop word processing. -- Jack Krupansky -Original Message- From: Stavros Delisavas Sent: Thursday, October 17, 2013 3:27 AM To: solr-user@lucene.apache.org Subject: Re: Local Solr and Webserver-Solr act differently (and treated like or) Thank you, I found the file with the stopwords and noticed that my local file is empty (comments only) and the one on my webserver has a big list of english stopwords. That seems to be the problem. I think in general it is a good idea to use stopwords for random searches, but it is not usefull in my special case. Is there a way to (de)activate stopwords query-wise? Like I would like to ignore stopwords when searching in titles but I would like to use stopwords when users do a fulltext-search on whole articles, etc. Thanks again, Stavros On 17.10.2013 09:13, Upayavira wrote: Stopwords are small words such as and, the or is,that we might choose to exclude from our documents and queries because they are such common terms. Once you have stripped stop words from your above query, all that is left is the word wild, or so is being suggested. Somewhere in your config, close to solr config.xml, you will find a file called something like stopwords.txt. Compare these files between your two systems. Upayavira On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote: Unfortunatly, I don't really know what stopwords are. I would like it to not ignore any words of my query. How/Where can I change this stopwords-behaviour? Am 16.10.2013 23:45, schrieb Jack Krupansky: So, the stopwords.txt file is different between the two systems - the first has stop words but the second does not. Did you expect stop words to be removed, or not? -- Jack Krupansky -Original Message- From: Stavros Delsiavas Sent: Wednesday, October 16, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Re: Local Solr and Webserver-Solr act differently (and treated like or) Okay I understand, here's the rawquerystring. It was at about line 3000: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:wild*/str str name=parsedquery_toString+title:wild*/str At this place the debug output DOES differ from the one on my local system. But I don't understand why... This is the local debug output: lst name=debug str name=rawquerystringtitle:(into AND the AND wild*)/str str name=querystringtitle:(into AND the AND wild*)/str str name=parsedquery+title:into +title:the +title:wild*/str str name=parsedquery_toString+title:into +title:the +title:wild*/str Why is that? Any ideas? Am 16.10.2013 21:03, schrieb Shawn Heisey: On 10/16/2013 4:46 AM, Stavros Delisavas wrote: My local solr gives me: http://pastebin.com/Q6d9dFmZ and my webserver this: http://pastebin.com/q87WEjVA I copied only the first few hundret lines (of more than 8000) because the webserver output was to big even for pastebin. On 16.10.2013 12:27, Erik Hatcher wrote: What does the debug output say from debugQuery=true say between the two? What's really needed here is the first part of the debug section, which has rawquerystring, querystring, parsedquery, and parsedquery_toString. The info from your local solr has this part, but what you pasted from the webserver one didn't include those parts, because it's further down than the first few hundred lines. Thanks, Shawn
Re: Solr errors
Even if I don't test it myself, you can use Tika, it is able to extract document from zip archives and index them, but of course it depends of the file type in the archive. Regards, Roland. On Thu, Oct 17, 2013 at 2:36 PM, wonder a-wonde...@rambler.ru wrote: Does anybody know how index files in zip archives?
Re: Solr errors
Thanks for answer. Yes Tika extract, but not index content. Here is the solr response ... content: [ 9118_xmessengereu_v18ximpda.jar dimonvideo.ru.txt ], ... There are not any of this files in index. Any ideas? 17.10.2013 17:20, Roland Everaert ?: Even if I don't test it myself, you can use Tika, it is able to extract document from zip archives and index them, but of course it depends of the file type in the archive.
Re: Regarding Solr Cloud issue...
I am also trying with something like - java -Durl=http://domainname.com:1981/solr/web/update-Dtype=application/json -jar /solr4RA/example1/exampledocs/post.jar /root/Desktop/web/*.json but it is giving error - 19:06:22 ERROR SolrCore org.apache.solr.common.SolrException: Unknown command: subDomain [12] org.apache.solr.common.SolrException: Unknown command: subDomain [12] at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:152) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) On Thu, Oct 17, 2013 at 6:31 PM, Chris christu...@gmail.com wrote: Wow thanks for all that, i just upgraded, linked my plugins it seems fine so far, but i have run into another issue while adding a document to the solr cloud it says - org.apache.solr.common.SolrException: Unknown document router '{name=compositeId}' in the clusterstate.json i can see - shard5:{ range:4ccc-7fff, state:active, replicas:{core_node4:{ state:active, base_url:http://64.251.14.47:1984/solr;, core:web_shard5_replica1, node_name:64.251.14.47:1984_solr, leader:true, maxShardsPerNode:2, router:{name:compositeId}, replicationFactor:1}, I am using this to add - CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL); solrCoreCloud.setDefaultCollection(web); UpdateResponse up = solrCoreCloud.addBean(resultItem); UpdateResponse upr = solrCoreCloud.commit(); Please advice. On Wed, Oct 16, 2013 at 9:49 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 4:51 AM, Chris wrote: Also, is there any easy way upgrading to 4.5 without having to change most of my plugins configuration files? Upgrading is something that should be
Re: Solr errors
I have just find this JIRA report, which could explain your problem: https://issues.apache.org/jira/browse/SOLR-2416 Regards, Roland. On Thu, Oct 17, 2013 at 3:30 PM, wonder a-wonde...@rambler.ru wrote: Thanks for answer. Yes Tika extract, but not index content. Here is the solr response ... content: [ 9118_xmessengereu_v18ximpda.**jar dimonvideo.ru.txt ], ... There are not any of this files in index. Any ideas? 17.10.2013 17:20, Roland Everaert ?: Even if I don't test it myself, you can use Tika, it is able to extract document from zip archives and index them, but of course it depends of the file type in the archive.
Re: Timeout Errors while using Collections API
There was a reload bug in SolrCloud that was fixed in 4.4 - https://issues.apache.org/jira/browse/SOLR-4805 Mark On Oct 17, 2013, at 7:18 AM, Grzegorz Sobczyk gsobc...@gmail.com wrote: Sorry for previous spam (something eat my message) I have the same problem but with reload action ENV: - 3x Solr 4.2.1 with 4 cores each - ZK Before error I have: - 14, 2013 5:25:36 AM CollectionsHandler handleReloadAction INFO: Reloading Collection : name=productsaction=RELOAD - hundreds of (with the same timestamp) 14, 2013 5:25:36 AM DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged - 13 times (from 2013 5:25:39 to 5:25:45): -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={action=STATUSwt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={action=STATUSwt=ruby} status=0 QTime=1 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [forum] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [knowledge] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [products] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [shops] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=1 - 14, 2013 5:26:21 AM SolrCore execute INFO: [products] webapp=/solr path=/select/ params={q=solrpingquery} hits=0 status=0 QTime=0 - 14, 2013 5:26:36 AM DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work/qnr-000806 state: SyncConnected type NodeDeleted - 14, 2013 5:26:36 AM SolrException log SEVERE: org.apache.solr.common.SolrException: reloadcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:162) at org.apache.solr.handler.admin.CollectionsHandler.handleReloadAction(CollectionsHandler.java:184) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:120) What are possilibities of such behaviour? When this error is thrown? Does anybody has the same issue? On 17 October 2013 13:08, Grzegorz Sobczyk gsobc...@gmail.com wrote: On 16 October 2013 11:48, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: Hi, My setup is Zookeeper ensemble - running with 3 nodes Tomcats - 9 Tomcat instances are brought up, by registereing with zookeeper. Steps : 1) I uploaded the solr configuration like db_data_config, solrconfig, schema xmls into zookeeoper 2) Now, i am trying to create a collection with the collection API like below: http://miadevuser001.albridge.com:7021/solr/admin/collections?action=CREATEname=Schwab_InvACC_CollnumShards=1replicationFactor=2createNodeSet=localhost:7034_solr,localhost:7036_solrcollection.configName=InvestorAccountDomainConfig Now, when i execute this command, i am getting the following error: responselst name=responseHeaderint name=status500/intint name=QTime60015/int/lstlst name=errorstr name=msgcreatecollection the collection time out:60s/strstr name=traceorg.apache.solr.common.SolrException: createcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156) at org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at
Re: limiting deep pagination
Yes, right now this constraint could be implemented in either the web app or Solr. I see now that many of the QTimes on these queries are 10 ms (probably due to caching), so I'm a bit less concerned. On Wed, Oct 16, 2013 at 2:13 AM, Furkan KAMACI furkankam...@gmail.comwrote: I just wonder that: Don't you implement a custom API that interacts with Solr and limits such kinds of requestst? (I know that you are asking about how to do that in Solr but I handle such situations at my custom search APIs and want to learn what fellows do) 9 Ekim 2013 Çarşamba tarihinde Michael Sokolov msoko...@safaribooksonline.com adlı kullanıcı şöyle yazdı: On 10/8/13 6:51 PM, Peter Keegan wrote: Is there a way to configure Solr 'defaults/appends/invariants' such that the product of the 'start' and 'rows' parameters doesn't exceed a given value? This would be to prevent deep pagination. Or would this require a custom requestHandler? Peter Just wondering -- isn't it the sum that you should be concerned about rather than the product? Actually I think what we usually do is limit both independently, with slightly different concerns, since. eg start=1, rows=1000 causes memory problems if you have large fields in your results, where start=1000, rows=1 may not actually be a problem -Mike
Re: ExtractRequestHandler, skipping errors
Hi Roland, (13/10/17 20:44), Roland Everaert wrote: Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the ExtractRequestHandler to ignore errors like it seems to be possible with dataimporthandler and entity processors. I know that it is possible to configure the ExtractRequestHandler to ignore tika exception (We already do that) but the errors that now stops the mcfjobs are generated by solr itself. While it is interesting to have such option in solr, I plan to post to the manifoldcf mailing list, anyway, to know if it is possible to configure manifolcf to be less picky about solr errors. ignoreTikaException flag might help you? https://issues.apache.org/jira/browse/SOLR-2480 koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: SolrCloud Performance Issue
Thanks Primoz, I was suspecting that too. But then, its hard to imagine that query cache is only contributing to the big performance hit. The setting applies to the old configuration, and it works pretty well even with the query cache low hit rate. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096123.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change config set for a collection
On 10/17/2013 2:36 AM, michael.boom wrote: The question also asked some 10 months ago in http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for-a-collection-td4037456.html, and then the answer was negative, but here it goes again, maybe now it's different. Is it possible to change the config set of a collection using the Collection API to another one (stored in zookeeper)? If not, is it possible to do it using zkCli ? Also how can somebody check which config set a collection is using ? Thanks! The zkcli command linkconfig should take care of that. You'd need to reload the collection after making the change. If you're using a version prior to 4.4, reloading doesn't work, you need to restart Solr completely. You can see what config a collection is using with the Cloud-Tree section of the admin UI. Open /collections and click on the collection. At the bottom of the right-hand window, it has a small JSON string with configName in it. I don't know of a way to easily get this information from Solr with a program. If your program is Java, you could very likely grab the zookeeper object from CloudSolrServer and find it that way, but I have no idea how to write that code. Thanks, Shawn
Re: Timeout Errors while using Collections API
Thanks, I'll try upgade. On 17 October 2013 15:55, Mark Miller markrmil...@gmail.com wrote: There was a reload bug in SolrCloud that was fixed in 4.4 - https://issues.apache.org/jira/browse/SOLR-4805 Mark On Oct 17, 2013, at 7:18 AM, Grzegorz Sobczyk gsobc...@gmail.com wrote: Sorry for previous spam (something eat my message) I have the same problem but with reload action ENV: - 3x Solr 4.2.1 with 4 cores each - ZK Before error I have: - 14, 2013 5:25:36 AM CollectionsHandler handleReloadAction INFO: Reloading Collection : name=productsaction=RELOAD - hundreds of (with the same timestamp) 14, 2013 5:25:36 AM DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged - 13 times (from 2013 5:25:39 to 5:25:45): -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={action=STATUSwt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={action=STATUSwt=ruby} status=0 QTime=1 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [forum] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [knowledge] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [products] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=2 -- 14, 2013 5:25:39 AM SolrCore execute INFO: [shops] webapp=/solr path=/admin/mbeans params={stats=truewt=ruby} status=0 QTime=1 - 14, 2013 5:26:21 AM SolrCore execute INFO: [products] webapp=/solr path=/select/ params={q=solrpingquery} hits=0 status=0 QTime=0 - 14, 2013 5:26:36 AM DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work/qnr-000806 state: SyncConnected type NodeDeleted - 14, 2013 5:26:36 AM SolrException log SEVERE: org.apache.solr.common.SolrException: reloadcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:162) at org.apache.solr.handler.admin.CollectionsHandler.handleReloadAction(CollectionsHandler.java:184) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:120) What are possilibities of such behaviour? When this error is thrown? Does anybody has the same issue? On 17 October 2013 13:08, Grzegorz Sobczyk gsobc...@gmail.com wrote: On 16 October 2013 11:48, RadhaJayalakshmi rlakshminaraya...@inautix.co.in wrote: Hi, My setup is Zookeeper ensemble - running with 3 nodes Tomcats - 9 Tomcat instances are brought up, by registereing with zookeeper. Steps : 1) I uploaded the solr configuration like db_data_config, solrconfig, schema xmls into zookeeoper 2) Now, i am trying to create a collection with the collection API like below: http://miadevuser001.albridge.com:7021/solr/admin/collections?action=CREATEname=Schwab_InvACC_CollnumShards=1replicationFactor=2createNodeSet=localhost:7034_solr,localhost:7036_solrcollection.configName=InvestorAccountDomainConfig Now, when i execute this command, i am getting the following error: responselst name=responseHeaderint name=status500/intint name=QTime60015/int/lstlst name=errorstr name=msgcreatecollection the collection time out:60s/strstr name=traceorg.apache.solr.common.SolrException: createcollection the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156) at org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at
Re: Change config set for a collection
Thank you, Shawn! linkconfig - that's exactly what i was looking for! -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096134.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Change config set for a collection
But if you're working with multiple configs in zookeeper, be aware that 4.5 currently has an issue creating multiple collections in a cloud that has multiple configs. It's targeted to be fixed whenever 4.5.1 comes out. https://issues.apache.org/jira/i#browse/SOLR-5306 -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, October 17, 2013 10:24 AM To: solr-user@lucene.apache.org Subject: Re: Change config set for a collection On 10/17/2013 2:36 AM, michael.boom wrote: The question also asked some 10 months ago in http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for -a-collection-td4037456.html, and then the answer was negative, but here it goes again, maybe now it's different. Is it possible to change the config set of a collection using the Collection API to another one (stored in zookeeper)? If not, is it possible to do it using zkCli ? Also how can somebody check which config set a collection is using ? Thanks! The zkcli command linkconfig should take care of that. You'd need to reload the collection after making the change. If you're using a version prior to 4.4, reloading doesn't work, you need to restart Solr completely. You can see what config a collection is using with the Cloud-Tree section of the admin UI. Open /collections and click on the collection. At the bottom of the right-hand window, it has a small JSON string with configName in it. I don't know of a way to easily get this information from Solr with a program. If your program is Java, you could very likely grab the zookeeper object from CloudSolrServer and find it that way, but I have no idea how to write that code. Thanks, Shawn
Chegg is looking for a search engineer
I work at Chegg.com and I really like it, but we have more search work than I can do by myself, so we are hiring a senior software engineer for search. Most of our search services are on Solr. http://www.chegg.com/jobs/listings/?jvi=oAQGXfwN,Job If you'd like to know a lot more about Chegg's business, you can read the S1 that we filed recently in preparation for an IPO. wunder -- Walter Underwood wun...@wunderwood.org Search Guy Chegg.com
RE: Change config set for a collection
Thanks Garth! Yes, indeed, I know that issue. I had set up my SolrCloud using 4.5.0 and then encountered this problem, so I rolled back to 4.4.0 - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096136.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Switching indexes
OK, super confused now. http://index1:8080/solr/admin/cores?action=CREATEname=test2collection=test2numshards=1replicationFactor=3 Nets me this: response lst name=responseHeader int name=status400/int int name=QTime15007/int /lst lst name=error str name=msgError CREATEing SolrCore 'test2': Could not find configName for collection test2 found:[xxx, xxx, , x, xx]/str int name=code400/int /lst /response For that node (test2), in my solr data directory, I have a folder with the conf files and an existing data dir (copied the index from another location). Right now it seems like the only way that I can add in a collection is to load the configs into zookeeper, stop tomcat, add it to the solr.xml file, and restart tomcat. Is there a primer that I'm missing for how to do this? Thanks. -- Chris On Wed, Oct 16, 2013 at 2:59 PM, Christopher Gross cogr...@gmail.comwrote: Thanks Shawn, the explanations help bring me forward to the SolrCloud mentality. So it sounds like going forward that I should have a more complicated name (ex: coll1-20131015) aliased to coll1, to make it easier to switch in the future. Now, if I already have an index (copied from one location to another), it sounds like I should just remove my existing (bad/old data) coll1, create the replicated one (calling it coll1-date), then alias coll1 to that one. This type of information would have been awesome to know before I got started, but I can make do with what I've got going now. Thanks again! -- Chris On Wed, Oct 16, 2013 at 2:40 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 11:51 AM, Christopher Gross wrote: Ok, so I think I was confusing the terminology (still in a 3.X mindset I guess.) From the Cloud-Tree, I do see that I have collections for what I was calling core1, core2, etc. So, to redo the above, Servers: index1, index2, index3 Collections: (on each) coll1, coll2 Collection (core?) on index1: coll1new Each Collection has 1 shard (too small to make sharding worthwhile). So should I run something like this: http://index1:8080/solr/admin/collections?action=CREATEALIASname=coll1collections=col11new Or will I need coll1new to be on each of the index1, index2 and index3 instances of Solr? I don't think you can create an alias if a collection already exists with that name - so having a collection named core1 means you wouldn't want an alias named core1. I could be wrong, but just to keep things clean, I wouldn't recommend it, even if it's possible. That CREATEALIAS command will only work if coll1new shows up in /collections and shows green on the cloud graph. If it does, and you're using an alias name that doesn't already exist as a collection, then you're good. Whether coll1new is living on one server, two servers, or all three servers doesn't matter for CREATEALIAS, or for most other collection-related topics. Any query or update can be sent to any server in the cloud and it will be routed to the correct place according to the clusterstate. Where things live and how many replicas there are *does* matter for a discussion about redundancy. Generally speaking, you're going to want your shards to have at least two replicas, so that if a Solr instance goes down, or is taken down for maintenance, your cloud remains fully operational. In your situation, you probably want three replicas - so each collection lives on all three servers. So my general advice: Decide what name you want your application to use, make sure none of your existing collections are using that name, and set up an alias with that name pointing to whichever collection is current. Then change your application configurations or code to point at the alias instead of directly at the collection. When you want to do your reindex, first create a new collection using the collections API. Index to that new collection. When it's ready to go, use CREATEALIAS to update the alias, and your application will start using the new index. Thanks, Shawn
Re: Switching indexes
Also, when I make an alias: http://index1:8080/solr/admin/collections?action=CREATEALIASname=test1-aliascollections=test1 I get a pretty useless response: responselst name=responseHeaderint name=status0/intint name=QTime0/int/lst/response So I'm not sure if it is made. I tried going to: http://index1:8080/solr/test1-alias/select?q=*:* but that didn't work. How do I use an alias when it gets made? -- Chris On Thu, Oct 17, 2013 at 2:51 PM, Christopher Gross cogr...@gmail.comwrote: OK, super confused now. http://index1:8080/solr/admin/cores?action=CREATEname=test2collection=test2numshards=1replicationFactor=3 Nets me this: response lst name=responseHeader int name=status400/int int name=QTime15007/int /lst lst name=error str name=msgError CREATEing SolrCore 'test2': Could not find configName for collection test2 found:[xxx, xxx, , x, xx]/str int name=code400/int /lst /response For that node (test2), in my solr data directory, I have a folder with the conf files and an existing data dir (copied the index from another location). Right now it seems like the only way that I can add in a collection is to load the configs into zookeeper, stop tomcat, add it to the solr.xml file, and restart tomcat. Is there a primer that I'm missing for how to do this? Thanks. -- Chris On Wed, Oct 16, 2013 at 2:59 PM, Christopher Gross cogr...@gmail.comwrote: Thanks Shawn, the explanations help bring me forward to the SolrCloud mentality. So it sounds like going forward that I should have a more complicated name (ex: coll1-20131015) aliased to coll1, to make it easier to switch in the future. Now, if I already have an index (copied from one location to another), it sounds like I should just remove my existing (bad/old data) coll1, create the replicated one (calling it coll1-date), then alias coll1 to that one. This type of information would have been awesome to know before I got started, but I can make do with what I've got going now. Thanks again! -- Chris On Wed, Oct 16, 2013 at 2:40 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 11:51 AM, Christopher Gross wrote: Ok, so I think I was confusing the terminology (still in a 3.X mindset I guess.) From the Cloud-Tree, I do see that I have collections for what I was calling core1, core2, etc. So, to redo the above, Servers: index1, index2, index3 Collections: (on each) coll1, coll2 Collection (core?) on index1: coll1new Each Collection has 1 shard (too small to make sharding worthwhile). So should I run something like this: http://index1:8080/solr/admin/collections?action=CREATEALIASname=coll1collections=col11new Or will I need coll1new to be on each of the index1, index2 and index3 instances of Solr? I don't think you can create an alias if a collection already exists with that name - so having a collection named core1 means you wouldn't want an alias named core1. I could be wrong, but just to keep things clean, I wouldn't recommend it, even if it's possible. That CREATEALIAS command will only work if coll1new shows up in /collections and shows green on the cloud graph. If it does, and you're using an alias name that doesn't already exist as a collection, then you're good. Whether coll1new is living on one server, two servers, or all three servers doesn't matter for CREATEALIAS, or for most other collection-related topics. Any query or update can be sent to any server in the cloud and it will be routed to the correct place according to the clusterstate. Where things live and how many replicas there are *does* matter for a discussion about redundancy. Generally speaking, you're going to want your shards to have at least two replicas, so that if a Solr instance goes down, or is taken down for maintenance, your cloud remains fully operational. In your situation, you probably want three replicas - so each collection lives on all three servers. So my general advice: Decide what name you want your application to use, make sure none of your existing collections are using that name, and set up an alias with that name pointing to whichever collection is current. Then change your application configurations or code to point at the alias instead of directly at the collection. When you want to do your reindex, first create a new collection using the collections API. Index to that new collection. When it's ready to go, use CREATEALIAS to update the alias, and your application will start using the new index. Thanks, Shawn
RE: Switching indexes
Go to the admin screen for Cloud/Tree, and then click the node for aliases.json. To the lower right, you should see something like: {collection:{AdWorksQuery:AdWorks}} Or access the Zookeeper instance, and do a 'get /aliases.json'. -Original Message- From: Christopher Gross [mailto:cogr...@gmail.com] Sent: Thursday, October 17, 2013 2:40 PM To: solr-user Subject: Re: Switching indexes Also, when I make an alias: http://index1:8080/solr/admin/collections?action=CREATEALIASname=test1-aliascollections=test1 I get a pretty useless response: responselst name=responseHeaderint name=status0/intint name=QTime0/int/lst/response So I'm not sure if it is made. I tried going to: http://index1:8080/solr/test1-alias/select?q=*:* but that didn't work. How do I use an alias when it gets made? -- Chris On Thu, Oct 17, 2013 at 2:51 PM, Christopher Gross cogr...@gmail.comwrote: OK, super confused now. http://index1:8080/solr/admin/cores?action=CREATEname=test2collectio n=test2numshards=1replicationFactor=3 Nets me this: response lst name=responseHeader int name=status400/int int name=QTime15007/int /lst lst name=error str name=msgError CREATEing SolrCore 'test2': Could not find configName for collection test2 found:[xxx, xxx, , x, xx]/str int name=code400/int /lst /response For that node (test2), in my solr data directory, I have a folder with the conf files and an existing data dir (copied the index from another location). Right now it seems like the only way that I can add in a collection is to load the configs into zookeeper, stop tomcat, add it to the solr.xml file, and restart tomcat. Is there a primer that I'm missing for how to do this? Thanks. -- Chris On Wed, Oct 16, 2013 at 2:59 PM, Christopher Gross cogr...@gmail.comwrote: Thanks Shawn, the explanations help bring me forward to the SolrCloud mentality. So it sounds like going forward that I should have a more complicated name (ex: coll1-20131015) aliased to coll1, to make it easier to switch in the future. Now, if I already have an index (copied from one location to another), it sounds like I should just remove my existing (bad/old data) coll1, create the replicated one (calling it coll1-date), then alias coll1 to that one. This type of information would have been awesome to know before I got started, but I can make do with what I've got going now. Thanks again! -- Chris On Wed, Oct 16, 2013 at 2:40 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 11:51 AM, Christopher Gross wrote: Ok, so I think I was confusing the terminology (still in a 3.X mindset I guess.) From the Cloud-Tree, I do see that I have collections for what I was calling core1, core2, etc. So, to redo the above, Servers: index1, index2, index3 Collections: (on each) coll1, coll2 Collection (core?) on index1: coll1new Each Collection has 1 shard (too small to make sharding worthwhile). So should I run something like this: http://index1:8080/solr/admin/collections?action=CREATEALIASname=co ll1collections=col11new Or will I need coll1new to be on each of the index1, index2 and index3 instances of Solr? I don't think you can create an alias if a collection already exists with that name - so having a collection named core1 means you wouldn't want an alias named core1. I could be wrong, but just to keep things clean, I wouldn't recommend it, even if it's possible. That CREATEALIAS command will only work if coll1new shows up in /collections and shows green on the cloud graph. If it does, and you're using an alias name that doesn't already exist as a collection, then you're good. Whether coll1new is living on one server, two servers, or all three servers doesn't matter for CREATEALIAS, or for most other collection-related topics. Any query or update can be sent to any server in the cloud and it will be routed to the correct place according to the clusterstate. Where things live and how many replicas there are *does* matter for a discussion about redundancy. Generally speaking, you're going to want your shards to have at least two replicas, so that if a Solr instance goes down, or is taken down for maintenance, your cloud remains fully operational. In your situation, you probably want three replicas - so each collection lives on all three servers. So my general advice: Decide what name you want your application to use, make sure none of your existing collections are using that name, and set up an alias with that name pointing to whichever collection is current. Then change your application configurations or code to point at the alias instead of directly at the collection. When you want to do your reindex, first create a new collection using the collections API. Index to that new collection. When it's ready to go, use CREATEALIAS to update the alias, and your application will start
Re: Switching indexes
I can't find it in the Admin-Cloud-Tree part of the UI. Trying to get the file: [zk: localhost:2181(CONNECTED) 0] get /aliases.json Node does not exist: /aliases.json So it didn't stick -- I'm guessing. I don't see an error message regarding the alias in my logs either. Anywhere else I should look? -- Chris On Thu, Oct 17, 2013 at 3:50 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: Go to the admin screen for Cloud/Tree, and then click the node for aliases.json. To the lower right, you should see something like: {collection:{AdWorksQuery:AdWorks}} Or access the Zookeeper instance, and do a 'get /aliases.json'. -Original Message- From: Christopher Gross [mailto:cogr...@gmail.com] Sent: Thursday, October 17, 2013 2:40 PM To: solr-user Subject: Re: Switching indexes Also, when I make an alias: http://index1:8080/solr/admin/collections?action=CREATEALIASname=test1-aliascollections=test1 I get a pretty useless response: responselst name=responseHeaderint name=status0/intint name=QTime0/int/lst/response So I'm not sure if it is made. I tried going to: http://index1:8080/solr/test1-alias/select?q=*:* but that didn't work. How do I use an alias when it gets made? -- Chris On Thu, Oct 17, 2013 at 2:51 PM, Christopher Gross cogr...@gmail.com wrote: OK, super confused now. http://index1:8080/solr/admin/cores?action=CREATEname=test2collectio n=test2numshards=1replicationFactor=3 Nets me this: response lst name=responseHeader int name=status400/int int name=QTime15007/int /lst lst name=error str name=msgError CREATEing SolrCore 'test2': Could not find configName for collection test2 found:[xxx, xxx, , x, xx]/str int name=code400/int /lst /response For that node (test2), in my solr data directory, I have a folder with the conf files and an existing data dir (copied the index from another location). Right now it seems like the only way that I can add in a collection is to load the configs into zookeeper, stop tomcat, add it to the solr.xml file, and restart tomcat. Is there a primer that I'm missing for how to do this? Thanks. -- Chris On Wed, Oct 16, 2013 at 2:59 PM, Christopher Gross cogr...@gmail.com wrote: Thanks Shawn, the explanations help bring me forward to the SolrCloud mentality. So it sounds like going forward that I should have a more complicated name (ex: coll1-20131015) aliased to coll1, to make it easier to switch in the future. Now, if I already have an index (copied from one location to another), it sounds like I should just remove my existing (bad/old data) coll1, create the replicated one (calling it coll1-date), then alias coll1 to that one. This type of information would have been awesome to know before I got started, but I can make do with what I've got going now. Thanks again! -- Chris On Wed, Oct 16, 2013 at 2:40 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 11:51 AM, Christopher Gross wrote: Ok, so I think I was confusing the terminology (still in a 3.X mindset I guess.) From the Cloud-Tree, I do see that I have collections for what I was calling core1, core2, etc. So, to redo the above, Servers: index1, index2, index3 Collections: (on each) coll1, coll2 Collection (core?) on index1: coll1new Each Collection has 1 shard (too small to make sharding worthwhile). So should I run something like this: http://index1:8080/solr/admin/collections?action=CREATEALIASname=co ll1collections=col11new Or will I need coll1new to be on each of the index1, index2 and index3 instances of Solr? I don't think you can create an alias if a collection already exists with that name - so having a collection named core1 means you wouldn't want an alias named core1. I could be wrong, but just to keep things clean, I wouldn't recommend it, even if it's possible. That CREATEALIAS command will only work if coll1new shows up in /collections and shows green on the cloud graph. If it does, and you're using an alias name that doesn't already exist as a collection, then you're good. Whether coll1new is living on one server, two servers, or all three servers doesn't matter for CREATEALIAS, or for most other collection-related topics. Any query or update can be sent to any server in the cloud and it will be routed to the correct place according to the clusterstate. Where things live and how many replicas there are *does* matter for a discussion about redundancy. Generally speaking, you're going to want your shards to have at least two replicas, so that if a Solr instance goes down, or is taken down for maintenance, your cloud remains fully operational. In your situation, you probably want three replicas - so each collection lives on all three servers. So my general advice: Decide what name you want your
Check if dynamic columns exists and query else ignore
I trying to do this: if (US_offers_i exists): fq=US_offers_i:[1 TO *] else: fq=offers_count:[1 TO *] Where: US_offers_i is a dynamic field containing an int offers_count is a status field containing an int. I have tried this so far but it doesn't work: http://solr_server/solr/col1/select? q=iphone+5s fq=if(exist(US_offers_i),US_offers_i:[1 TO *], offers_count:[1 TO *]) Also, there is a heavy performance penalty for this condition? I am planning to use this for all my queries. -- Thanks, -Utkarsh
Re: Switching indexes
load the configs into zookeeper, Yes. stop tomcat, add it to the solr.xml file, and restart tomcat. To your CREATE URL, add the parameter collection.configName=blah http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Oct 17, 2013 at 2:51 PM, Christopher Gross cogr...@gmail.comwrote: OK, super confused now. http://index1:8080/solr/admin/cores?action=CREATEname=test2collection=test2numshards=1replicationFactor=3 Nets me this: response lst name=responseHeader int name=status400/int int name=QTime15007/int /lst lst name=error str name=msgError CREATEing SolrCore 'test2': Could not find configName for collection test2 found:[xxx, xxx, , x, xx]/str int name=code400/int /lst /response For that node (test2), in my solr data directory, I have a folder with the conf files and an existing data dir (copied the index from another location). Right now it seems like the only way that I can add in a collection is to load the configs into zookeeper, stop tomcat, add it to the solr.xml file, and restart tomcat. Is there a primer that I'm missing for how to do this? Thanks. -- Chris On Wed, Oct 16, 2013 at 2:59 PM, Christopher Gross cogr...@gmail.com wrote: Thanks Shawn, the explanations help bring me forward to the SolrCloud mentality. So it sounds like going forward that I should have a more complicated name (ex: coll1-20131015) aliased to coll1, to make it easier to switch in the future. Now, if I already have an index (copied from one location to another), it sounds like I should just remove my existing (bad/old data) coll1, create the replicated one (calling it coll1-date), then alias coll1 to that one. This type of information would have been awesome to know before I got started, but I can make do with what I've got going now. Thanks again! -- Chris On Wed, Oct 16, 2013 at 2:40 PM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2013 11:51 AM, Christopher Gross wrote: Ok, so I think I was confusing the terminology (still in a 3.X mindset I guess.) From the Cloud-Tree, I do see that I have collections for what I was calling core1, core2, etc. So, to redo the above, Servers: index1, index2, index3 Collections: (on each) coll1, coll2 Collection (core?) on index1: coll1new Each Collection has 1 shard (too small to make sharding worthwhile). So should I run something like this: http://index1:8080/solr/admin/collections?action=CREATEALIASname=coll1collections=col11new Or will I need coll1new to be on each of the index1, index2 and index3 instances of Solr? I don't think you can create an alias if a collection already exists with that name - so having a collection named core1 means you wouldn't want an alias named core1. I could be wrong, but just to keep things clean, I wouldn't recommend it, even if it's possible. That CREATEALIAS command will only work if coll1new shows up in /collections and shows green on the cloud graph. If it does, and you're using an alias name that doesn't already exist as a collection, then you're good. Whether coll1new is living on one server, two servers, or all three servers doesn't matter for CREATEALIAS, or for most other collection-related topics. Any query or update can be sent to any server in the cloud and it will be routed to the correct place according to the clusterstate. Where things live and how many replicas there are *does* matter for a discussion about redundancy. Generally speaking, you're going to want your shards to have at least two replicas, so that if a Solr instance goes down, or is taken down for maintenance, your cloud remains fully operational. In your situation, you probably want three replicas - so each collection lives on all three servers. So my general advice: Decide what name you want your application to use, make sure none of your existing collections are using that name, and set up an alias with that name pointing to whichever collection is current. Then change your application configurations or code to point at the alias instead of directly at the collection. When you want to do your reindex, first create a new collection using the collections API. Index to that new collection. When it's ready to go, use CREATEALIAS to update the alias, and your application will start using the new index. Thanks, Shawn
Re: Skipping caches on a /select
Thanks Yonik, Does cache=false apply to all caches? The docs make it sound like it is for filterCache only, but I could be misunderstanding. When I force a commit and perform a /select a query many times with cache=false, I notice my query gets cached still, my guess is in the queryResultCache. At first the query takes 500ms+, then all subsequent requests take 0-1ms. I'll confirm this queryResultCache assumption today. Cheers, Tim On 16/10/13 06:33 PM, Yonik Seeley wrote: On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com wrote: I am debugging some /select queries on my Solr tier and would like to see if there is a way to tell Solr to skip the caches on a given /select query if it happens to ALREADY be in the cache. Live queries are being inserted and read from the caches, but I want my debug queries to bypass the cache entirely. I do know about the cache=false param (that causes the results of a select to not be INSERTED in to the cache), but what I am looking for instead is a way to tell Solr to not read the cache at all, even if there actually is a cached result for my query. Yeah, cache=false for q or fq should already not use the cache at all (read or write). -Yonik
Re: Skipping caches on a /select
There isn't a global cache=false... it's a local param that can be applied to any fq or q parameter independently. -Yonik On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt t...@elementspace.com wrote: Thanks Yonik, Does cache=false apply to all caches? The docs make it sound like it is for filterCache only, but I could be misunderstanding. When I force a commit and perform a /select a query many times with cache=false, I notice my query gets cached still, my guess is in the queryResultCache. At first the query takes 500ms+, then all subsequent requests take 0-1ms. I'll confirm this queryResultCache assumption today. Cheers, Tim On 16/10/13 06:33 PM, Yonik Seeley wrote: On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com wrote: I am debugging some /select queries on my Solr tier and would like to see if there is a way to tell Solr to skip the caches on a given /select query if it happens to ALREADY be in the cache. Live queries are being inserted and read from the caches, but I want my debug queries to bypass the cache entirely. I do know about the cache=false param (that causes the results of a select to not be INSERTED in to the cache), but what I am looking for instead is a way to tell Solr to not read the cache at all, even if there actually is a cached result for my query. Yeah, cache=false for q or fq should already not use the cache at all (read or write). -Yonik
Re: Skipping caches on a /select
: Does cache=false apply to all caches? The docs make it sound like it is for : filterCache only, but I could be misunderstanding. it's per *query* -- not per cache, or per request... /select?q={!cache=true}foofq={!cache=false}barfq={!cache=true}baz ...should cause 1 lookup/insert in the filterCache (baz) and 1 lookup/insert into the queryResultCache (for the main query with it's associated filters pagination) -Hoss
Re: Skipping caches on a /select
Awesome, this make a lot of sense now. Thanks a lot guys. Currently the only mention of this setting in the docs is under filterQuery on the "SolrCaching" page as: " Solr3.4 Adding the localParam flag of {!cache=false} to a query will prevent the filterCache from being consulted for that query. " I will update the docs sometime soon to reflect that this can apply to any query (q or fq). Cheers, Tim On 17/10/13 01:44 PM, Chris Hostetter wrote: : Does "cache=false" apply to all caches? The docs make it sound like it is for : filterCache only, but I could be misunderstanding. it's per *query* -- not per cache, or per request... /select?q={!cache=true}foofq={!cache=false}barfq={!cache=true}baz ...should cause 1 lookup/insert in the filterCache (baz) and 1 lookup/insert into the queryResultCache (for the main query with it's associated filters pagination) -Hoss
solrconfig.xml carrot2 params
Would someone help me out with the syntax for setting Tokenizer.documentFields in the ClusteringComponent engine definition in solrconfig.xml? Carrot2 is expecting a Collection of Strings. There's no schema definition for this XML file and a big TODO on the Wiki wrt init params. Every permutation I have tried results in an error stating: Cannot set java.until.Collection field ... to java.lang.String. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Re: Skipping caches on a /select
But global on a qt would be awesome !!! Bill Bell Sent from mobile On Oct 17, 2013, at 2:43 PM, Yonik Seeley ysee...@gmail.com wrote: There isn't a global cache=false... it's a local param that can be applied to any fq or q parameter independently. -Yonik On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt t...@elementspace.com wrote: Thanks Yonik, Does cache=false apply to all caches? The docs make it sound like it is for filterCache only, but I could be misunderstanding. When I force a commit and perform a /select a query many times with cache=false, I notice my query gets cached still, my guess is in the queryResultCache. At first the query takes 500ms+, then all subsequent requests take 0-1ms. I'll confirm this queryResultCache assumption today. Cheers, Tim On 16/10/13 06:33 PM, Yonik Seeley wrote: On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com wrote: I am debugging some /select queries on my Solr tier and would like to see if there is a way to tell Solr to skip the caches on a given /select query if it happens to ALREADY be in the cache. Live queries are being inserted and read from the caches, but I want my debug queries to bypass the cache entirely. I do know about the cache=false param (that causes the results of a select to not be INSERTED in to the cache), but what I am looking for instead is a way to tell Solr to not read the cache at all, even if there actually is a cached result for my query. Yeah, cache=false for q or fq should already not use the cache at all (read or write). -Yonik
Re: Switching indexes
On 10/17/2013 12:51 PM, Christopher Gross wrote: OK, super confused now. http://index1:8080/solr/admin/cores?action=CREATEname=test2collection=test2numshards=1replicationFactor=3 Nets me this: response lst name=responseHeader int name=status400/int int name=QTime15007/int /lst lst name=error str name=msgError CREATEing SolrCore 'test2': Could not find configName for collection test2 found:[xxx, xxx, , x, xx]/str int name=code400/int /lst /response For that node (test2), in my solr data directory, I have a folder with the conf files and an existing data dir (copied the index from another location). Right now it seems like the only way that I can add in a collection is to load the configs into zookeeper, stop tomcat, add it to the solr.xml file, and restart tomcat. The config does need to be loaded into zookeeper. That's how SolrCloud works. Because you have existing collections, you're going to have at least one config set already uploaded, you may be able to use that directly. You don't need to stop anything, though. Michael Della Bitta's response indicates the part you're missing on your create URL - the collection.configName parameter. The basic way to get things done with collections is this: 1) Upload one or more named config sets to zookeeper. This can be done with zkcli and its upconfig command, or with the bootstrap startup options that are intended to be used once. 2) Create the collection, referencing the proper collection.configName. You can have many collections that all share one config name. You can also change which config an existing collection uses with the zkcli linkconfig command, followed by a collection reload. If you upload a new configuration with an existing name, a collection reload (or Solr restart) is required to use the new config. For uploading configs, I find zkcli to be a lot cleaner than the bootstrap options - it doesn't require stopping Solr or giving it different startup options. Actually, it doesn't even require Solr to be started - it talks only to zookeeper, and we strongly recommend standalone zookeeper, not the zk server that can be run embedded in Solr. Thanks, Shawn
Re: SolrDocumentList - bitwise operation
Hi, Regrets, I was confused with bit-set. I l have Shawn's suggested approach in system. I want to try with other ways and test performance. How can I use join? I have 2 different solr indexes. localhost:8080/solr_1/select?q=content:testfl=id,name,type localhost:8081/solr_1_1/select?q=text:testfl=id After getting results - Join by id How do I do this? please suggest me with other ways to do this. current method is taking lot of time. Thanks Michael. On Tue, Oct 15, 2013 at 11:41 PM, Erick Erickson erickerick...@gmail.comwrote: Why do you think a bitset would help? Bitsets have a bit set on for every document that matches based on the _internal_ Lucene document ID, it has nothing to do with the uniqueKey you have defined. Nor does it have anything to do with the foreign key relationship. So either I don't understand the problem at all or pursuing bitsets is a red herring. You might be substantially faster by sorting the results and then doing a skip-list sort of thing. FWIW, Erick On Mon, Oct 14, 2013 at 1:47 PM, Michael Tyler michaeltyler1...@gmail.comwrote: Hi Shawn, This is time consuming operation. I already have this in my application . I was pondering whether I can get bit set from both the solr indexes , bitset.and then retrieve only those matched? I don't know how do I retrieve bitset. - wanted to try this and test the performance. Regards Michael On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey s...@elyograg.org wrote: On 10/13/2013 8:34 AM, Michael Tyler wrote: Hello, I have 2 different solr indexes returning 2 different sets of SolrDocumentList. Doc Id is the foreign key relation. After obtaining them, I want to perform AND operation between them and then return results to user. Can you tell me how do I get this? I am using solr 4.3 SolrDocumentList results1 = responseA.getResults(); SolrDocumentList results2 = responseB.getResults(); results1 : d1, d2, d3 results2 : d1,d2, d4 The SolrDocumentList class extends ArrayListSolrDocument, which means that it inherits all ArrayList functionality. Unfortunately, there's no built-in way of eliminating duplicates with a java List. It's very easy to combine the two results into another object, but that object will contain both of the d1 and both of the d2 SolrDocument objects. The following code is a reasonably fast way to handle this. It assumes that results1 is the list that should win when there are duplicates, so it gets added first. It assumes that the uniqueKey field is named id and that it contains a String value. If these are incorrect assumptions, you can adjust the code accordingly. SolrDocumentList results1 = responseA.getResults(); SolrDocumentList results2 = responseB.getResults(); ListSolrDocumentList tmpList = new ArrayListSolrDocumentList(); tmpList.add(results1); tmpList.add(results2); SetString tmpSet = new HashSetString(); SolrDocumentList newList = new SolrDocumentList(); for (SolrDocumentList l : tmpList) { for (SolrDocument d : l) { String id = (String) d.get(id); if (tmpSet.contains(id)) { continue; } tmpSet.add(id); newList.add(d); } } Thanks, Shawn
Re: Different document types in different collections OR same collection without sharing fields?
Hi, Logically maintaining will be easy, as both collections are in different folders. Next, even thought making separate fields in one collection, at search time if field list is not mentioned then results will be combination of both domains. If this is mandatorily taking care at search/query level that should be fine. Else in case of 2 collections, the search word can queried at specified collection level easily with or without field list. regards, Shrikanth On Wed, Oct 16, 2013 at 4:32 PM, user 01 user...@gmail.com wrote: @Shrikanth: how do you manage multiple redundant configurations(isn' it?) ? I thought indexes would be separate when fields aren't shared. I don't need to import any data/ or re-indexing, if those are the only benefits for separate collections. I just index when a request comes/ new item is added to DB. On Wed, Oct 16, 2013 at 4:12 PM, shrikanth k jconsult.s...@gmail.com wrote: Hi, Please refer below link for clarification on fields having null value. http://stackoverflow.com/questions/7332122/solr-what-are-the-default-values-for-fields-which-does-not-have-a-default-value logically it is better to have different collections for different domain data. Having 2 collections will improve the overall performances. Currently am holding 2 collections for different domain data. It eases importing data and re-indexing. regards, Shrikanth On Wed, Oct 16, 2013 at 3:48 PM, user 01 user...@gmail.com wrote: Can some expert users please leave a comment on this ? On Sun, Oct 6, 2013 at 2:54 AM, user 01 user...@gmail.com wrote: Using a single node Solr instance, I need to search for, lets say, electronics items grocery items. But I never want to search both of them together. When I search for electrnoics I don't expect a grocery item ever vice versa. Should I be defining both these document types within a single schema.xml or should I use different collection for each of these two(maintaining separate schema.xml solrconfig.xml for each of two) ? I believe that if I add both to a single collection, without sharing fields among these two document types, I should be equally good as separating them in two collection(in terms of performance all), as their indexes/filter caches would be totally independent of each other when they don't share fields? Also posted at SO: http://stackoverflow.com/q/19202882/530153 -- --
Re: SolrCloud Performance Issue
I tried commenting out NOW in bq, but didn't make any difference in the performance. I do see minor entry in the queryfiltercache rate which is a meager 0.02. I'm really struggling to figure out the bottleneck, any known pain points I should be checking ? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096277.html Sent from the Solr - User mailing list archive at Nabble.com.