Re: Solr CPU Usage
Yes we do complex query with a lot of clauses and facets and data is growing up bigger every day, i agree with you it might not on the hardware issue maybe i need to tune up solr/OS/jetty system configration to optimize solr process. Thank you so much for help. Best regards, Hendra -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155528.html Sent from the Solr - User mailing list archive at Nabble.com.
Null pointer on multi-core search
Hi, I'm using solr 4.8.1and with following scenario I got a null pointer exception: 1. I'm trying to search over multi-cores and group search. 2. SearchHandler is called and when executing for(SearchComponent c : components) { c.finishStage(rb); } 3. With QueryComponent.finishStage, this code is called: if (rb.grouping()) { groupedFinishStage(rb); } else { regularFinishStage(rb); } 4. And then - private void groupedFinishStage(final ResponseBuilder rb) { // To have same response as non-distributed request. GroupingSpecification groupSpec = rb.getGroupingSpec(); if (rb.mergedTopGroups.isEmpty()) { for (String field : groupSpec.getFields()) { rb.mergedTopGroups.put(field, new TopGroups(null, null, 0, 0, new GroupDocs[]{}, Float.NaN)); } rb.resultIds = new HashMap(); } 5. As you can see the marked line is initializing resultsIds. 6. And then when get to HighlightComponent.finishStage in and trying to execute: ShardDoc sdoc = rb.resultIds.get(id); int idx = sdoc.positionInResponse; arr[idx] = new NamedList.NamedListEntry(id, hl.getVal(i)); 7. resultsIds is empty and then sdoc.positionInResponse get a null pointer exception. Hope that my description is clear. Thanks, Shay.
RE: Solr CPU Usage
Do you index and search from this box ? How many documents do you have ? From: Shawn Heisey [s...@elyograg.org] Sent: Thursday, August 28, 2014 7:48 AM To: solr-user@lucene.apache.org Subject: Re: Solr CPU Usage On 8/27/2014 8:42 PM, hendra_budiawan wrote: Yes i'm just worried about load average reported by OS, because last week suddenly server can't accessed so we have to hard reboot. I'm still investigating what is the problem, because this server is dedicated to solr only, we suspect the problem came from the solr process but i'm still looking another possibility what makes this problem arises. Can you give me suggestion what supposed i need to check further? What kind of query volume is your Solr server supporting? Are you doing complex queries with a lot of clauses, facets, or something else that's CPU intensive? Is your update volume high? The numbers that you've shown, assuming that the htop info is accurate and you really do have 16 or 32 CPU cores, do not look like any major problem. Solr is working hard, but there's a lot more CPU capacity left. The top output shows that iowait percentage is not a problem, so it's not stuck in disk I/O. Memory usage indicates that OS disk caching is working well. It looks like you were running jetty, but that the jetty might not be the one included in the Solr example. If it's not the one included in the example, then its configuration is not well-tuned for Solr. If you have a high request volume, you may need to increase the maxThreads parameter in the jetty config. The only possible thing that I can think of which might cause a complete inability to access the server via ssh or other means is that you are hitting the open file limit in the operating system. Most linux distros use /etc/security/limits.conf to configure the open file limit for each user. Thanks, Shawn This email and its contents are subject to an email legal notice that can be viewed at http://www.naspers.com/disclaimer.php Should you be unable to access the link provided, please email us for a copy at c...@optinet.net Hierdie e-pos en sy inhoud is onderhewig aan 'n regskennisgewing oor elektroniese pos wat gelees kan word by http://www.naspers.com/afrikaans/voorbehoud.php 'n Afskrif kan aangevra word by c...@optinet.net
RE: Solr CPU Usage
HI Jacques, Yes we index and search from this box, we have 6 core with almost 4000K document each core getting and bigger each day. Regards, Hendra Budiawan -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p412.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr CPU Usage
HI Hendra That doesn't seem overly huge... I agree with the other person saying, from the top/htop graph it doesnt look too bad. I will maybe try to split the searching/indexing as well try to schedule the delta index for the cores at different times maybe PS.We had a nice little bump in efficiency by going with Tomcat 7 and java8. Jacques From: hendra_budiawan [hendra.budiawan...@gmail.com] Sent: Thursday, August 28, 2014 9:46 AM To: solr-user@lucene.apache.org Subject: RE: Solr CPU Usage HI Jacques, Yes we index and search from this box, we have 6 core with almost 4000K document each core getting and bigger each day. Regards, Hendra Budiawan -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p412.html Sent from the Solr - User mailing list archive at Nabble.com. This email and its contents are subject to an email legal notice that can be viewed at http://www.naspers.com/disclaimer.php Should you be unable to access the link provided, please email us for a copy at c...@optinet.net Hierdie e-pos en sy inhoud is onderhewig aan 'n regskennisgewing oor elektroniese pos wat gelees kan word by http://www.naspers.com/afrikaans/voorbehoud.php 'n Afskrif kan aangevra word by c...@optinet.net
RE: Solr CPU Usage
Hi Jacques, I will try your advice to schedule index with different times also will try to start research with tomcat7 and java8. Thank you so much, Hendra -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155562.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing documents with ContentStreamUpdateRequest (SolrJ) asynchronously
I am using SolrJ API 4.8 to index rich documents to solr. But i want to index these documents asynchronously. The function that I made send documents synchronously but i don't know how to change it to make it asynchronously. Any idea? Function: public Boolean indexDocument(HttpSolrServer server, String PathFile, InputReader external) { ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); try { up.addFile(new File(PathFile), text); } catch (IOException e) { Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); return false; } up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); try { server.request(up); } catch (SolrServerException e) { Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); return false; } catch (IOException e) { Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); return false; } return true; } Solr server: version 4.8.
redo log for solr
Hello solr users! We have a case when any actions a user did to the solr shard should be recorded for a possible later replay. This way we are looking at per user replay feature such that if the user did something wrong accidentally or because of a system level bug, we could restore a previous state. Two actions are available: 1. INSERT new solr document 2. DELETE existing solr document If user wants to perform an update on the existing document, we first delete it and insert a new one with modified fields. Are there any existing components / solutions in the Solr universe that could help implement this? Dmitry -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Solr issue
Hi, Version - 4.8.1 While executing this solr query (from solr web UI): http://localhost:8983/solr/Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2F0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2FGlobal_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=truehttp://localhost:8983/solr/cpm_Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_Global_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=true We got NullPointerException: java.lang.NullPointerException at org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:189) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:330) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Seems like integration of Grouping + shards + highlighting cause this NullPointerException. Anyone familiar with this issue? Thanks, Shay.
Re: Query regarding URL Analysers
Gentle Reminder On 21 August 2014 18:05, Sathyam sathyam.dorasw...@gmail.com wrote: Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as well as each individual entity as tokens. For example: *Given a URL : * http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz The tokens that I need are : *Hierarchical subsets of the URL* 1 http:// 2 http://www.google.com/ 3 http://www.google.com/abcd/ 4 http://www.google.com/abcd/efgh/ 5 http://www.google.com/abcd/efgh/ijkl/ 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php *Individual elements in the path to the resource* 7 abcd 8 efgh 9 ijkl 10 mnop.php *Query Terms* 11 a=10 12 b=20 13 c=30 *Fragment* 14 xyz This comes to a total of 14 tokens for the given URL. Basically a URL analyzer that creates tokens based on the categories mentioned in bold. Also a separate token for port(if mentioned). I would like to know how this can be achieved by using a single analyzer that uses a combination of the tokenizers and filters provided by solr. Also curious to know why there is a restriction of only *one *tokenizer to be used in an analyzer. Looking forward to a response from your side telling the best possible way to achieve the closest to what I need. Thanks. -- Sathyam Doraswamy -- Sathyam Doraswamy
Re: Help with StopFilterFactory
Hello, Any thoughts on this? Should I open a jira ticket? Or how can we engage at least one of Solr devs to this issue? Best, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4155582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query regarding URL Analysers
Sorry for the delay... take a look at the URL Classify update processor, which parses a URL and distributes the components to various fields: http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessorFactory.html http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html The official doc is... pitiful, but I have doc and examples in my e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky -Original Message- From: Sathyam Sent: Thursday, August 28, 2014 6:21 AM To: solr-user@lucene.apache.org Subject: Re: Query regarding URL Analysers Gentle Reminder On 21 August 2014 18:05, Sathyam sathyam.dorasw...@gmail.com wrote: Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as well as each individual entity as tokens. For example: *Given a URL : * http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz The tokens that I need are : *Hierarchical subsets of the URL* 1 http:// 2 http://www.google.com/ 3 http://www.google.com/abcd/ 4 http://www.google.com/abcd/efgh/ 5 http://www.google.com/abcd/efgh/ijkl/ 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php *Individual elements in the path to the resource* 7 abcd 8 efgh 9 ijkl 10 mnop.php *Query Terms* 11 a=10 12 b=20 13 c=30 *Fragment* 14 xyz This comes to a total of 14 tokens for the given URL. Basically a URL analyzer that creates tokens based on the categories mentioned in bold. Also a separate token for port(if mentioned). I would like to know how this can be achieved by using a single analyzer that uses a combination of the tokenizers and filters provided by solr. Also curious to know why there is a restriction of only *one *tokenizer to be used in an analyzer. Looking forward to a response from your side telling the best possible way to achieve the closest to what I need. Thanks. -- Sathyam Doraswamy -- Sathyam Doraswamy
Re: redo log for solr
On 8/28/2014 3:10 AM, Dmitry Kan wrote: We have a case when any actions a user did to the solr shard should be recorded for a possible later replay. This way we are looking at per user replay feature such that if the user did something wrong accidentally or because of a system level bug, we could restore a previous state. Two actions are available: 1. INSERT new solr document 2. DELETE existing solr document If user wants to perform an update on the existing document, we first delete it and insert a new one with modified fields. Are there any existing components / solutions in the Solr universe that could help implement this? I'm wondering what functionality you need beyond what Solr already provides ... because it sounds like Solr already does a lot of what you are implementing. Solr already includes a transaction log that records all changes to the index. Each individual log is closed when you do a hard commit. Enough transaction logs are kept so that Solr can replay at least the last 100 transactions. The entire transaction log is replayed when Solr is restarted or a core is reloaded. What you describe where you delete an existing document before inserting a new one ... Solr already has that functionality built in, using the uniqueKey. That capability is further extended by the Atomic Update functionality. You're not new around here, so I don't think I'm telling you anything you don't already know ... which may mean that I'm missing something. :) Thanks, Shawn
Re: redo log for solr
It may mean that I wasn't clear enough :) The idea is to build a paper trail system (without negative connotation!). Such that for instance if user deleted some data _by mistake_ and we have hard-committed to solr (upon which the tlog has been truncated), we paper trail'ed the document before the delete for providing the restore functionality. So if tlog is meant to make soft commits durable, this feature will serve more like undo functionality and persist the _history_ of modifications. I'm currently investigating what you suggested over IRC -- the UpdateProcessor. Looks like a way to go. Thanks, Dmitry On Thu, Aug 28, 2014 at 4:16 PM, Shawn Heisey s...@elyograg.org wrote: On 8/28/2014 3:10 AM, Dmitry Kan wrote: We have a case when any actions a user did to the solr shard should be recorded for a possible later replay. This way we are looking at per user replay feature such that if the user did something wrong accidentally or because of a system level bug, we could restore a previous state. Two actions are available: 1. INSERT new solr document 2. DELETE existing solr document If user wants to perform an update on the existing document, we first delete it and insert a new one with modified fields. Are there any existing components / solutions in the Solr universe that could help implement this? I'm wondering what functionality you need beyond what Solr already provides ... because it sounds like Solr already does a lot of what you are implementing. Solr already includes a transaction log that records all changes to the index. Each individual log is closed when you do a hard commit. Enough transaction logs are kept so that Solr can replay at least the last 100 transactions. The entire transaction log is replayed when Solr is restarted or a core is reloaded. What you describe where you delete an existing document before inserting a new one ... Solr already has that functionality built in, using the uniqueKey. That capability is further extended by the Atomic Update functionality. You're not new around here, so I don't think I'm telling you anything you don't already know ... which may mean that I'm missing something. :) Thanks, Shawn -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Using a RequestHandler to expand query parameter
I would like to send only one query to my custom request handler and have the request handler expand that query into a more complicated query. Example: */myHandler?q=kids+books* ... would turn into a more complicated EDismax query of: *kids books kids books* Is this achievable via a Request Handler definition in solrconfig.xml? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with SOLR Collection creation
Hello, We have deployed a solr.war file to a weblogic server. The web.xml has been modified to have the path to the SOLR home as follows: env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry The deployment of the Solr comes up fine. In the D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the conf directory with the required config files are present (solrconfig.xml, schema.xml, etc). But when I try to add the collection to SOLR through the admin console, I get the following error. Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RegulatoryReview': Unable to create core: RegulatoryReview Caused by: class org.apache.solr.search.LRUCache org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RR': Unable to create core: RRCaused by: class org.apache.solr.search.LRUCache at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:546) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:733) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:57) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3730) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3696) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2273) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2179) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1490) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256) at weblogic.work.ExecuteThread.run(ExecuteThread.java:221) Caused by: org.apache.solr.common.SolrException: Unable to create core: RR at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:509) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:732) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56) ... 9 more Caused by: org.apache.solr.common.SolrException: Could not load config file D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:597) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:509) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:733) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:57) ... 9 more Caused by: java.lang.ClassCastException: class org.apache.solr.search.LRUCache at java.lang.Class.asSubclass(Class.java:3027) at
incomplete proximity boost for fielded searches
Consider query: http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan Corporate Income Tax)debugQuery=truepf=titleps=255defType=edismax The intention is to perform a search in field title and to apply a proximity boost within a window of 255 words. If I look at the debug information, I see: str name=parsedquery BoostedQuery(boost(+((title:michigan title:corporate title:income title:tax)~4) (title:corporate income tax~255)~1.0)) /str Note that the first search term (michigan) is missing in the proximity boost clause. I can't believe this is intended behavior. Why is edismax splitting (title:Michigan) and (Corporate Income Tax) while determining what to use for proximity boost? Thanks, Tom
Issue with multivalued fields in UIMA
Hi all, I am trying to integrate Dictionary Annotator with Solr to find genotypes in a multivalued field. It seems that it only works on the first row of multivalued fields. I tried using SentenceAnnotation as well and the same problem occurs. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-multivalued-fields-in-UIMA-tp4155609.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with SOLR Collection creation
On 8/28/2014 8:28 AM, Kaushik wrote: Hello, We have deployed a solr.war file to a weblogic server. The web.xml has been modified to have the path to the SOLR home as follows: env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry The deployment of the Solr comes up fine. In the D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the conf directory with the required config files are present (solrconfig.xml, schema.xml, etc). But when I try to add the collection to SOLR through the admin console, I get the following error. Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RegulatoryReview': Unable to create core: RegulatoryReview Caused by: class org.apache.solr.search.LRUCache It would seem there's a problem with the cache config in your solrconfig.xml, or that there's some kind of problem with the Solr jars contained within the war. No testing is done with weblogic, so it's always possible it's a class conflict with weblogic itself, but I would bet on a config problem first. The issue I believe is that it is trying to find D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf directory in which it should be finding it. What am I doing wrong? This is SOLR-5814, a bug in the log messages, not the program logic. I thought it had been fixed by 4.8, but the issue is still unresolved. https://issues.apache.org/jira/browse/SOLR-5814 Thanks, Shawn
RE: solr query gives different numFound upon refreshing
Hi Shawn, Thanks for your reply. We did some tests enabling shards.info=true and confirmed that there is not duplicate copy of our index. We have one replica but many times we see three versions on Admin GUI/Overview tab. All three has different versions and gen. Is that a problem? Master (Searching) Master (Replicable) Slave (Searching) We constantly see max searcher open exception. The warmup time is 1.5 minutes but the difference between openedAt date and registeredAt date is at times more than 4-5 minutes. Is the true searcher time the difference between two dates and not the warmupTime? openedAt: 2014-08-28T16:17:24.829Z registeredAt: 2014-08-28T16:21:02.278Z warmupTime: 65727 Thanks for all help. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, August 27, 2014 2:37 PM To: solr-user@lucene.apache.org Subject: Re: solr query gives different numFound upon refreshing On 8/27/2014 10:44 AM, Bryan Bende wrote: Theoretically this shouldn't happen, but is it possible that the two replicas for a given shard are not fully in sync? Say shard1 replica1 is missing a document that is in shard1 replica2... if you run a query that would hit on that document and run it a bunch of times, sometimes replica 1 will handle the request and sometimes replica 2 will handle it, and it would change your number of results if one of them is missing a document. You could write a program that compares each replica's documents by querying them with distrib=false. If there was a replica out of sync, I would think it would detect that on a restart when comparing itself against the leader for that shard, but I'm not sure. A replica out of sync is a possibility, but the most common reason for a changing numFound is because the overall distributed index has more than one document with the same uniqueKey value -- different versions of the same document in more than one shard. SolrCloud tries really hard to never end up with replicas out of sync, but either due to highly unusual circumstances or bugs, it could still happen. Thanks, Shawn
FileListEntityProcessor still ignores onError-Attribute!? (SOLR-2897?)
Hello, it looks like I ran into an old problem: I configured an entity for data import with FileListEntityProcessor in data-config.xml. If the baseDir-Attribut points to a non-existing directory, the whole import process gets aborted no matter which value I provide in the onErrror-Attribute. I did some search today. Several threads did include the same scenario but I could not find a solution. It looks like the issue is already ackknowledged: SOLR-2897. But I'm not sure, since it's an almost 3 years old ticket. Any suggestions? Btw.: What is the intended behaviour/difference for the onError-options skip and continue ? H.
Re: Problem with SOLR Collection creation
The issue I was facing was that there were additonal librarires on the classpath that were conflicting and not required. Removed those and the problem dissapeared. Thank you, Kaushik On Thu, Aug 28, 2014 at 11:50 AM, Shawn Heisey s...@elyograg.org wrote: On 8/28/2014 8:28 AM, Kaushik wrote: Hello, We have deployed a solr.war file to a weblogic server. The web.xml has been modified to have the path to the SOLR home as follows: env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry The deployment of the Solr comes up fine. In the D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the conf directory with the required config files are present (solrconfig.xml, schema.xml, etc). But when I try to add the collection to SOLR through the admin console, I get the following error. Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RegulatoryReview': Unable to create core: RegulatoryReview Caused by: class org.apache.solr.search.LRUCache It would seem there's a problem with the cache config in your solrconfig.xml, or that there's some kind of problem with the Solr jars contained within the war. No testing is done with weblogic, so it's always possible it's a class conflict with weblogic itself, but I would bet on a config problem first. The issue I believe is that it is trying to find D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf directory in which it should be finding it. What am I doing wrong? This is SOLR-5814, a bug in the log messages, not the program logic. I thought it had been fixed by 4.8, but the issue is still unresolved. https://issues.apache.org/jira/browse/SOLR-5814 Thanks, Shawn
How to accomadate huge data
Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we are seeing a lot of disk and network IO resulting in huge latencies and instability(one of the server used to shutdown and stay in recovery mode when restarted). Our admin added swap space and that seemed to have mitigated the issue. But what is the usual practice in such scenario? Index size eventually outgrows RAM and is pushed on to disk. Is it advisable to shard(solr forum says no)? Or is there a different mechanism? System config: We have 3 node cluster with RAID1 SSD. Two nodes are running solr and the other is to maintain Quorum. -E
Re: How to accomadate huge data
On 8/28/2014 11:57 AM, Ethan wrote: Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we are seeing a lot of disk and network IO resulting in huge latencies and instability(one of the server used to shutdown and stay in recovery mode when restarted). Our admin added swap space and that seemed to have mitigated the issue. Adding swap space doesn't seem like it would actually fix anything. If the system is actively swapping, performance will be terrible. Assuming your heap size and query volume are not enormous, 96GB of RAM for an index size of 110GB seems like it would actually be pretty good. Remember that you have to subtract all heap requirements (java and otherwise) from the total RAM in order to determine how much RAM is left for caching the index. The ideal setup has enough extra RAM (beyond what's required for the software itself) to cache the entire index, but that ideal is usually not required. In most cases, getting between half and two thirds of the index into RAM is enough. One thing to note: If you don't have the entire index fitting into RAM, the server will probably not be able to handle an extreme query volume. But what is the usual practice in such scenario? Index size eventually outgrows RAM and is pushed on to disk. Is it advisable to shard(solr forum says no)? Or is there a different mechanism? System config: We have 3 node cluster with RAID1 SSD. Two nodes are running solr and the other is to maintain Quorum. Whether or not to shard depends on several factors, not the least of which is whether or not the features that you are using will work on a distributed index. My index is slightly larger than yours, and it's sharded. I don't run SolrCloud, the sharding is completely manual. Thanks, Shawn
re: How to accomadate huge data
Look into SolrCloud. From: Ethan eh198...@gmail.com Sent: Thursday, August 28, 2014 1:59 PM To: solr-user solr-user@lucene.apache.org Subject: How to accomadate huge data Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we are seeing a lot of disk and network IO resulting in huge latencies and instability(one of the server used to shutdown and stay in recovery mode when restarted). Our admin added swap space and that seemed to have mitigated the issue. But what is the usual practice in such scenario? Index size eventually outgrows RAM and is pushed on to disk. Is it advisable to shard(solr forum says no)? Or is there a different mechanism? System config: We have 3 node cluster with RAID1 SSD. Two nodes are running solr and the other is to maintain Quorum. -E
RE: How to accomadate huge data
kokatnur.vi...@gmail.com [kokatnur.vi...@gmail.com] On Behalf Of Ethan [eh198...@gmail.com] wrote: Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we are seeing a lot of disk and network IO resulting in huge latencies and instability(one of the server used to shutdown and stay in recovery mode when restarted). Our admin added swap space and that seemed to have mitigated the issue. Something is off here. I can understand disk IO going up when the index size increases, but why would it cause more network IO? Are you using networked storage or performing aggressive synchronization? Can you describe how hard you are hitting your indexes, both for updates and queries? What is huge latencies? Have you tried profiling the running Solrs to see if the heap size is large enough? We have 3 node cluster with RAID1 SSD. Two nodes are running solr and the other is to maintain Quorum. Is that on the same physical hardware or on three separate ones? - Toke Eskildsen
Re: How to accomadate huge data
On Thu, Aug 28, 2014 at 11:12 AM, Shawn Heisey s...@elyograg.org wrote: On 8/28/2014 11:57 AM, Ethan wrote: Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we are seeing a lot of disk and network IO resulting in huge latencies and instability(one of the server used to shutdown and stay in recovery mode when restarted). Our admin added swap space and that seemed to have mitigated the issue. Adding swap space doesn't seem like it would actually fix anything. If the system is actively swapping, performance will be terrible. Assuming your heap size and query volume are not enormous, 96GB of RAM for an index size of 110GB seems like it would actually be pretty good. *E Before adding swap space nodes used to shutdown due to OOM or crash after 2-5 minutes of uptime. By bumping swap space the server came up cleanly. ** We have 7GB of heap. I'll need to ask admin more questions to know how it was solved.* Remember that you have to subtract all heap requirements (java and otherwise) from the total RAM in order to determine how much RAM is left for caching the index. The ideal setup has enough extra RAM (beyond what's required for the software itself) to cache the entire index, but that ideal is usually not required. In most cases, getting between half and two thirds of the index into RAM is enough. One thing to note: If you don't have the entire index fitting into RAM, the server will probably not be able to handle an extreme query volume. *E Our query volume is low right now, about 30 **TPS for /select. But** /update is 80 and /get around 100 TPS. In our SolrCloud setup we don't have a separate replication node that handles select traffic. The server currently has 12-40ms TP99 as we don't have any facets or complex queries. * But what is the usual practice in such scenario? Index size eventually outgrows RAM and is pushed on to disk. Is it advisable to shard(solr forum says no)? Or is there a different mechanism? System config: We have 3 node cluster with RAID1 SSD. Two nodes are running solr and the other is to maintain Quorum. Whether or not to shard depends on several factors, not the least of which is whether or not the features that you are using will work on a distributed index. My index is slightly larger than yours, and it's sharded. I don't run SolrCloud, the sharding is completely manual. *E Interesting. Whats your select and update TPS/TP99? We index around 6-8Gb data every month. I think we will need more than one server to handle our index in the long run without degrading performance.* Thanks, Shawn
RE: How to accomadate huge data
kokatnur.vi...@gmail.com [kokatnur.vi...@gmail.com] On Behalf Of Ethan [eh198...@gmail.com] wrote: Before adding swap space nodes used to shutdown due to OOM or crash after 2-5 minutes of uptime. By bumping swap space the server came up cleanly. ** We have 7GB of heap. I'll need to ask admin more questions to know how it was solved.* Yes, please. What you are describing is not solved by adding swap, unless the system has very little free RAM. - Toke Eskildsen
Re: Solr CPU Usage
: Yes i'm just worried about load average reported by OS, because last week : suddenly server can't accessed so we have to hard reboot. I'm still : investigating what is the problem, because this server is dedicated to solr ok - so here is the key bit. basically, nothing else you've mentioend in this thread indicates any sort of problem -- your load (now, the one you've observed) is fine. the questionsi what happened last week? do you have any metrics/monitoring information from the server when you actually had a problem? do you have any logs (from Solr, or from jetty, or from the OS, or any OS/hardware monitoring tools from last week when the problem happened? define server can't accessed? do you mean solr wasn't responding to queries, or do you mean i couldn't evevne ping the machine, let alone ssh to it? ... because there is a big difference. if you can ssh to a machine, but solr is not responding, then generating thread dumps would help see what solr is doing. -Hoss http://www.lucidworks.com/
Re: Solr issue
Hi Shay, I'm not quite sure about this. But, I think it is get fixed with this. https://issues.apache.org/jira/browse/SOLR-6223 https://issues.apache.org/jira/browse/SOLR-4186 https://issues.apache.org/jira/browse/SOLR-4049 Could you try 4.10 from a svn branch and see if your problem is fixed? Thanks, Patanachai On 08/28/2014 03:23 AM, Shay Sofer wrote: Hi, Version - 4.8.1 While executing this solr query (from solr web UI): http://localhost:8983/solr/Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2F0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2FGlobal_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=truehttp://localhost:8983/solr/cpm_Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_Global_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=true We got NullPointerException: java.lang.NullPointerException at org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:189) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:330) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Seems like integration of Grouping + shards + highlighting cause this NullPointerException. Anyone familiar with this issue? Thanks, Shay. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
CopyField Wildcard Exception possible?
I have hundreds of fields of the form in my schema.xml: field name=F10434 type=string indexed=true stored=true multiValued=true/ field name=B20215 type=string indexed=true stored=true multiValued=true/ . I also have a field 'text' that is set as the Default Search Field field name=text type=text indexed=true stored=false multiValued=true/ I populate this 'text' field using CopyField as: copyField source=* dest=text/ This '*' worked so far. However, I now want to exclude some of the fields from this i.e. I would like 'text' to contain everything (hundreds of fields) except a few. Is there any way to do this? One of the ways would be to specify the '*' explicitly e.g. copyField source=F10434 dest=text/ copyField source=B20215 dest=text/ and in this list I would exclude the ones I do not want. Is there an alternative to this? (I would like an alternative because putting these copyFields would be long and too difficult. Thank you O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/CopyField-Wildcard-Exception-possible-tp4155686.html Sent from the Solr - User mailing list archive at Nabble.com.
Two (or more) uniqueKey fields?
An odd requirement has come my way. One of our indexes has uniqueness on two different fields, but because Solr only allows one uniqueKey field, we cannot have automatic document replacement on both of the fields. This means that the indexing code must handle it, which (for reasons I don't fully understand) currently results in some *massive* delete requests being sent frequently -- one such request was over 130KB in size. They look like field:(x || y || z) -- but with a LOT of different values. How much pain would it take to implement multiple uniqueKeys? I have not searched Jira for an existing issue. Thanks, Shawn
Re: Two (or more) uniqueKey fields?
Can't you do a composite unique key? Combine them during indexing in URP stage. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Thu, Aug 28, 2014 at 4:41 PM, Shawn Heisey s...@elyograg.org wrote: An odd requirement has come my way. One of our indexes has uniqueness on two different fields, but because Solr only allows one uniqueKey field, we cannot have automatic document replacement on both of the fields. This means that the indexing code must handle it, which (for reasons I don't fully understand) currently results in some *massive* delete requests being sent frequently -- one such request was over 130KB in size. They look like field:(x || y || z) -- but with a LOT of different values. How much pain would it take to implement multiple uniqueKeys? I have not searched Jira for an existing issue. Thanks, Shawn
Re: Two (or more) uniqueKey fields?
On 8/28/2014 2:46 PM, Alexandre Rafalovitch wrote: Can't you do a composite unique key? Combine them during indexing in URP stage. That's an interesting idea. If they aren't *independently* unique (which would make it impossible to treat them as a single unit together), that might work. Thanks for the idea! I'll chase it down on this end. Shawn
Re: Two (or more) uniqueKey fields?
: That's an interesting idea. If they aren't *independently* unique : (which would make it impossible to treat them as a single unit : together), that might work. Thanks for the idea! I'll chase it down on if they are independently unique, check out the SignatureUpdateProcessorFactory, but be aware of SOLR-3473 https://cwiki.apache.org/confluence/display/solr/De-Duplication : -Hoss http://www.lucidworks.com/
Using Update Handler to Combine Data from 2 Cores
Hi, Say I have an index of Product Types and a different index of Products that belong to one of the types in the other index. Users will do their searches for attributes of types and products combined so the two distinct, but related indices must be combined into a single, flattened index so that the searches and relevancy ranking can be done appropriately. Let's call this 3rd index type+product index. I've been asked by a customer to implement a custom update processor chain for the 3rd index that will get as input two values that define a relationship between a product and its corresponding type. In other words, the documents posted to the type+product index would simply be a value that corresponds with the uniqueId of a product type doc and another value that represents the uniqueId of the specific product of that type. An update processor would then read all fields stored in the product type index and append them to the document, then another update processor would take the other key and read the stored fields in the products index to also append them to the doc that will then be ready to be indexed into the 3rd core for merged content. I explained to the customer already that this would be custom development, for which we would need to extend various classes and implement ourselves the desired logic (not modifying anything in trunk, preferably). Has anyone implemented something similar? Is there anything that would prevent this from being possible in Solr? Here is an example scenario to illustrate what I've been asked to implement. Product Types: * T1 car T2 truck T3 motorcycle Products: ** 1 white $14500 2 red $ 5600 3 white $ 3300 4 blue $ 88000 Possible searches: * white car red motorcycle white truck Notice that with the two independent data sets above it is not possible to implement this solution. Therefore the idea to create a 3rd index (core) which will take the relationships: typeId = T1, prodId = 1 typeId = T3, prodId = 2 typeId = T3, prodId = 3 typeId = T2, prodId = 4 To generate through a custom update processing chain an index consisting of: Type+Product T1+1 car white $14500 T3+2 motorcycle red $ 5600 T3+3 motorcycle white $ 3300 T2+4 truckblue $ 88000 Thanks, Carlos
Re: CopyField Wildcard Exception possible?
We would enjoy this feature as well, if you'd like to create a JIRA ticket. On Thu, Aug 28, 2014 at 4:21 PM, O. Olson olson_...@yahoo.it wrote: I have hundreds of fields of the form in my schema.xml: field name=F10434 type=string indexed=true stored=true multiValued=true/ field name=B20215 type=string indexed=true stored=true multiValued=true/ . I also have a field 'text' that is set as the Default Search Field field name=text type=text indexed=true stored=false multiValued=true/ I populate this 'text' field using CopyField as: copyField source=* dest=text/ This '*' worked so far. However, I now want to exclude some of the fields from this i.e. I would like 'text' to contain everything (hundreds of fields) except a few. Is there any way to do this? One of the ways would be to specify the '*' explicitly e.g. copyField source=F10434 dest=text/ copyField source=B20215 dest=text/ and in this list I would exclude the ones I do not want. Is there an alternative to this? (I would like an alternative because putting these copyFields would be long and too difficult. Thank you O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/CopyField-Wildcard-Exception-possible-tp4155686.html Sent from the Solr - User mailing list archive at Nabble.com. -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength.*-Philippians 4:12-13*
After zk restart SOLR can't update its clusterstate.json
Hi, just after we finished to restart our zk cluster SOLR started to fail with tons of zk events. We shut down all the nodes and restarted them one by one but looks like the clusterstate.json does not get updated properly. Example: core_node11 { state:active, base_url:http://10.140.4.161:9765 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c , core:sku_shard1_replica11, SOLR on the above node is actually down :/ and correctly does not appear in the live_nodes. any clue ? Ugo
Re: After zk restart SOLR can't update its clusterstate.json
Just adding some info: whan I do: curl -v 'http://10.140.3.25:9765/zookeeper?wt=json' it takes ages to come back and on the Admin UI I can't see the Cloud Graph. Ugo On Fri, Aug 29, 2014 at 12:52 AM, Ugo Matrangolo ugo.matrang...@gmail.com wrote: Hi, just after we finished to restart our zk cluster SOLR started to fail with tons of zk events. We shut down all the nodes and restarted them one by one but looks like the clusterstate.json does not get updated properly. Example: core_node11 { state:active, base_url:http://10.140.4.161:9765 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c , core:sku_shard1_replica11, SOLR on the above node is actually down :/ and correctly does not appear in the live_nodes. any clue ? Ugo On Fri, Aug 29, 2014 at 12:52 AM, Ugo Matrangolo ugo.matrang...@gmail.com wrote: Hi, just after we finished to restart our zk cluster SOLR started to fail with tons of zk events. We shut down all the nodes and restarted them one by one but looks like the clusterstate.json does not get updated properly. Example: core_node11 { state:active, base_url:http://10.140.4.161:9765 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c , core:sku_shard1_replica11, SOLR on the above node is actually down :/ and correctly does not appear in the live_nodes. any clue ? Ugo
Re: Solr CPU Usage
I think that is configs not tuned well. Can use jmx to monitor what is doing? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155747.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After zk restart SOLR can't update its clusterstate.json
On 8/28/2014 5:52 PM, Ugo Matrangolo wrote: just after we finished to restart our zk cluster SOLR started to fail with tons of zk events. We shut down all the nodes and restarted them one by one but looks like the clusterstate.json does not get updated properly. On IRC, you mentioned you were on 4.7.2. I wonder if maybe the overseer queue is not being processed? Can you look in that section of zookeeper? The big overseer queue bug (SOLR-5811) was fixed in 4.7.1, but I know there was at least one more bug fixed in 4.8 or later. Thanks, Shawn
Re: Solr CPU Usage
Here is a quick way you can identify which thread is taking up all your CPU. 1) Look at top (or htop) sorted by CPU Usage and with threads toggled on - hit capital 'H' 2) Get the native process ids of the threads taking up a lot of CPU 3) Convert that number to hex using a converter: http://www.mathsisfun.com/binary-decimal-hexadecimal-converter.html 4) Use the hex number to identify the problematic threads on the thread dump via the nid= value. So for example: nid=0x549 would equate to the native thread id of 1353 on top. Take a thread dump and identify any problematic threads so you can see the stack trace. However, Chris has pointed out that there is as of yet no evidence your outage is related to CPU overload. Greg On Thu, Aug 28, 2014 at 6:45 PM, rulinma ruli...@gmail.com wrote: I think that is configs not tuned well. Can use jmx to monitor what is doing? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155747.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using wild characters in query doesn't work with my configuraiton
OK, Do not, repeat NOT use different tokenizers at index and query time unless you are _very_ sure that you know exactly what the consequences are. Take a look at the admin/analyzer page for the field in question and put your values in. You'll see that what's in your index is very different than what's being looked for at query time. Nothing is worth trying until you straighten this out. The other great resource is adding debug=query to your URL and examining the parsed query. Best, Erick On Wed, Aug 27, 2014 at 12:08 PM, Romain Pigeyre rpige...@gmail.com wrote: Hi, I have a little mistake using Solr : I can query this : lastName:HK+IE The result contains the next record : { customerId: 0003500226598, countryLibelle: HONG KONG, firstName1: lC /o, countryCode: HK, address1: 1F0/, address2: 11-35, storeId: 100, lastName1: HK IE, city: HONG KONG, _version_: 1477612965227135000 } NB : lastName contains the lastName1 field. When I'm adding * on the same query : lastName:*HK*+*IE*, there is no result. I hoped that the * character replace 0 to n character. Here is my configuration : field name=lastName type=text_general indexed=true stored=false multiValued=true/ copyField source=lastName1 dest=lastName/ copyField source=lastName2 dest=lastName/ fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I'm using a WhitespaceTokenizerFactory at indexing time in order to keep specials characters : /?... After this configuration, I restarted Solr and re-indexed data. Is Somebody have any idea to resolve this issue? Thanks a lot -- *-Romain PIGEYRE*
Re: incomplete proximity boost for fielded searches
feels like a JIRA to me. This _does_ seem weird. if I omit the field qualification, i.e. my query is: q=Michigan http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan Corporate Income TaxdebugQuery=truepf=titleps=255defType=edismax it works fine. I can get the results I think you expect by omitting the field qualifier and defining my default search field as: q=Michigan http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan Corporate Income TaxdebugQuery=truepf=titleps=255defType=edismaxdf=title But the fact that you get the results feels like a bug. Or at least something that I don't understand. Feels like a bug to me, do others agree? Can you raise a JIRA? on this? Best, Erick On Thu, Aug 28, 2014 at 7:41 AM, Burgmans, Tom tom.burgm...@wolterskluwer.com wrote: Consider query: http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan Corporate Income Tax)debugQuery=truepf=titleps=255defType=edismax The intention is to perform a search in field title and to apply a proximity boost within a window of 255 words. If I look at the debug information, I see: str name=parsedquery BoostedQuery(boost(+((title:michigan title:corporate title:income title:tax)~4) (title:corporate income tax~255)~1.0)) /str Note that the first search term (michigan) is missing in the proximity boost clause. I can't believe this is intended behavior. Why is edismax splitting (title:Michigan) and (Corporate Income Tax) while determining what to use for proximity boost? Thanks, Tom
Re: solr query gives different numFound upon refreshing
First, I want to be sure you're not mixing old-style replication and SolrCloud. Your use of Master/Slave causes this question. Second, your maxWarmingSearchers error indicates that your commit interval is too short relative to your autowarm times. Try lengthening your autocommit settings (probably soft commit) until you no longer see that error message and see if the problem goes away. If it doesn't, let us know. Best, Erick On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital shital.jo...@gs.com wrote: Hi Shawn, Thanks for your reply. We did some tests enabling shards.info=true and confirmed that there is not duplicate copy of our index. We have one replica but many times we see three versions on Admin GUI/Overview tab. All three has different versions and gen. Is that a problem? Master (Searching) Master (Replicable) Slave (Searching) We constantly see max searcher open exception. The warmup time is 1.5 minutes but the difference between openedAt date and registeredAt date is at times more than 4-5 minutes. Is the true searcher time the difference between two dates and not the warmupTime? openedAt: 2014-08-28T16:17:24.829Z registeredAt: 2014-08-28T16:21:02.278Z warmupTime: 65727 Thanks for all help. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, August 27, 2014 2:37 PM To: solr-user@lucene.apache.org Subject: Re: solr query gives different numFound upon refreshing On 8/27/2014 10:44 AM, Bryan Bende wrote: Theoretically this shouldn't happen, but is it possible that the two replicas for a given shard are not fully in sync? Say shard1 replica1 is missing a document that is in shard1 replica2... if you run a query that would hit on that document and run it a bunch of times, sometimes replica 1 will handle the request and sometimes replica 2 will handle it, and it would change your number of results if one of them is missing a document. You could write a program that compares each replica's documents by querying them with distrib=false. If there was a replica out of sync, I would think it would detect that on a restart when comparing itself against the leader for that shard, but I'm not sure. A replica out of sync is a possibility, but the most common reason for a changing numFound is because the overall distributed index has more than one document with the same uniqueKey value -- different versions of the same document in more than one shard. SolrCloud tries really hard to never end up with replicas out of sync, but either due to highly unusual circumstances or bugs, it could still happen. Thanks, Shawn
Re: Problem with SOLR Collection creation
Ahhh, thanks for bringing closure to this! Whew! Erick On Thu, Aug 28, 2014 at 10:47 AM, Kaushik kaushika...@gmail.com wrote: The issue I was facing was that there were additonal librarires on the classpath that were conflicting and not required. Removed those and the problem dissapeared. Thank you, Kaushik On Thu, Aug 28, 2014 at 11:50 AM, Shawn Heisey s...@elyograg.org wrote: On 8/28/2014 8:28 AM, Kaushik wrote: Hello, We have deployed a solr.war file to a weblogic server. The web.xml has been modified to have the path to the SOLR home as follows: env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry The deployment of the Solr comes up fine. In the D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the conf directory with the required config files are present (solrconfig.xml, schema.xml, etc). But when I try to add the collection to SOLR through the admin console, I get the following error. Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RegulatoryReview': Unable to create core: RegulatoryReview Caused by: class org.apache.solr.search.LRUCache It would seem there's a problem with the cache config in your solrconfig.xml, or that there's some kind of problem with the Solr jars contained within the war. No testing is done with weblogic, so it's always possible it's a class conflict with weblogic itself, but I would bet on a config problem first. The issue I believe is that it is trying to find D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf directory in which it should be finding it. What am I doing wrong? This is SOLR-5814, a bug in the log messages, not the program logic. I thought it had been fixed by 4.8, but the issue is still unresolved. https://issues.apache.org/jira/browse/SOLR-5814 Thanks, Shawn