Re: SolrJ dependencies
Done see: https://issues.apache.org/jira/browse/SOLR-3541 On 12-6-2012 18:39, Sami Siren wrote: On Tue, Jun 12, 2012 at 4:22 PM, Thijs vonk.th...@gmail.com wrote: Hi I just checked out and build solrlucene from branches/lucene_4x I wanted to upgrade my custom client to this new version (using solrj). So I copied lucene/solr/dist/apache-solr-solrj-4.0-SNAPSHOT.jar lucene/solr/dist/apache-solr-core-4.0-SNAPSHOT.jar to my project and I updated the other libs from the libs in /solr/dist/solrj-lib However, when I wanted to run my client I got exceptions indicating that I was missing the HTTPClient jars. (httpclient, htpcore,httpmime) Shouldn't those go into lucene/solr/dist/solrj-lib as wel? Yes they should. Do I need to create a ticket for this? Please do so. -- Sami Siren
Re: Solr PHP highload search
How much memory are you giving the JVM? Have you put a performance monitor on the running process to see what resources have been exhausted (i.e. are you I/O bound? CPU bound?) Best Erick On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov bocharov.alexa...@gmail.com wrote: Hi, all. I need advice for configuring Solr search to use at highload production. I've wrote user's search engine (PHP class), that uses over 70 parameters for searching users. User's database is over 30 millions records. Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes. Previous search engine can handle 700,000 queries per day for searching users - it is ~8 queries/sec (4 mysql servers with manual sharding via Gearman) Example of queries are: [responseHeader] = SolrObject Object ( [status] = 0 [QTime] = 517 [params] = SolrObject Object ( [bq] = Array ( [0] = bool_field1:1^30 [1] = str_field1:str_value1^15 [2] = tint_field1:tint_field1^5 [3] = bool_field2:1^6 [4] = date_field1:[NOW-14DAYS TO NOW]^20 [5] = date_field2:[NOW-14DAYS TO NOW]^5 ) [indent] = on [start] = 0 [q.alt] = *:* [wt] = xml [fq] = Array ( [0] = tint_field2:[tint_value2 TO tint_value22] [1] = str_field1:str_value1 [2] = str_field2:str_value2 [3] = tint_field3:(tint_value3 OR tint_value32 OR tint_value33 OR tint_value34 OR tint_value5) [4] = tint_field4:tint_value4 [5] = -bool_field1:[* TO *] ) [version] = 2.2 [defType] = dismax [rows] = 10 ) ) I test my PHP search API and found that concurrent random queries, for example 10 queries at one time increases QTime from avg 500 ms to 3000 ms at 2 nodes. 1. How can I tweak my queries or parameters or Solr's config to decrease QTime? 2. What if I put my index data to emulated RAM directory, can it increase greatly performance? 3. Sorting by boost queries has a great influence on QTime, how can I optimize boost queries? 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes per machine, will it increase performance? 5. What is multi-core query, how can I configure it, and will it increase performance? Thank you!
Re: Solr PHP highload search
Thank you for help :) I'm giving 2048M the JVM for each node. CPU load is jumping 70-90%. Memory usage is increasing to max during testing (probably cache is filling). I/O I didn't monitor. I'd like to see answers on my other questions. 2012/6/13 Erick Erickson erickerick...@gmail.com How much memory are you giving the JVM? Have you put a performance monitor on the running process to see what resources have been exhausted (i.e. are you I/O bound? CPU bound?) Best Erick On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov bocharov.alexa...@gmail.com wrote: Hi, all. I need advice for configuring Solr search to use at highload production. I've wrote user's search engine (PHP class), that uses over 70 parameters for searching users. User's database is over 30 millions records. Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes. Previous search engine can handle 700,000 queries per day for searching users - it is ~8 queries/sec (4 mysql servers with manual sharding via Gearman) Example of queries are: [responseHeader] = SolrObject Object ( [status] = 0 [QTime] = 517 [params] = SolrObject Object ( [bq] = Array ( [0] = bool_field1:1^30 [1] = str_field1:str_value1^15 [2] = tint_field1:tint_field1^5 [3] = bool_field2:1^6 [4] = date_field1:[NOW-14DAYS TO NOW]^20 [5] = date_field2:[NOW-14DAYS TO NOW]^5 ) [indent] = on [start] = 0 [q.alt] = *:* [wt] = xml [fq] = Array ( [0] = tint_field2:[tint_value2 TO tint_value22] [1] = str_field1:str_value1 [2] = str_field2:str_value2 [3] = tint_field3:(tint_value3 OR tint_value32 OR tint_value33 OR tint_value34 OR tint_value5) [4] = tint_field4:tint_value4 [5] = -bool_field1:[* TO *] ) [version] = 2.2 [defType] = dismax [rows] = 10 ) ) I test my PHP search API and found that concurrent random queries, for example 10 queries at one time increases QTime from avg 500 ms to 3000 ms at 2 nodes. 1. How can I tweak my queries or parameters or Solr's config to decrease QTime? 2. What if I put my index data to emulated RAM directory, can it increase greatly performance? 3. Sorting by boost queries has a great influence on QTime, how can I optimize boost queries? 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes per machine, will it increase performance? 5. What is multi-core query, how can I configure it, and will it increase performance? Thank you!
Re: Exception when optimizing index
On Thu, Jun 7, 2012 at 5:50 AM, Rok Rejc rokrej...@gmail.com wrote: - java.runtime.nameOpenJDK Runtime Environment - java.runtime.version1.6.0_22-b22 ... As far as I see from the JIRA issue I have the patch attached (as mentioned I have a trunk version from May 12). Any ideas? its not guaranteed that the patch will workaround all hotspot bugs related to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921 Since you can reproduce, is it possible for you to re-test the scenario with a newer JVM (e.g. 1.7.0_04) just to rule that out? -- lucidimagination.com
Re: Solr PHP highload search
Consider just looking at it with jconsole (should be in your Java release) to get a sense of the memory usage/collection. How much physical memory do you have overall? Because this is not what I'd expect. Your CPU load is actually reasonably high, so it doesn't look like you're swapping. By and large, trying to use RAMDirectories isn't a good solution, between the OS and Solr, they read the necessary parts of your index into memory and use that. Best Erick On Wed, Jun 13, 2012 at 7:13 AM, Alexandr Bocharov bocharov.alexa...@gmail.com wrote: Thank you for help :) I'm giving 2048M the JVM for each node. CPU load is jumping 70-90%. Memory usage is increasing to max during testing (probably cache is filling). I/O I didn't monitor. I'd like to see answers on my other questions. 2012/6/13 Erick Erickson erickerick...@gmail.com How much memory are you giving the JVM? Have you put a performance monitor on the running process to see what resources have been exhausted (i.e. are you I/O bound? CPU bound?) Best Erick On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov bocharov.alexa...@gmail.com wrote: Hi, all. I need advice for configuring Solr search to use at highload production. I've wrote user's search engine (PHP class), that uses over 70 parameters for searching users. User's database is over 30 millions records. Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes. Previous search engine can handle 700,000 queries per day for searching users - it is ~8 queries/sec (4 mysql servers with manual sharding via Gearman) Example of queries are: [responseHeader] = SolrObject Object ( [status] = 0 [QTime] = 517 [params] = SolrObject Object ( [bq] = Array ( [0] = bool_field1:1^30 [1] = str_field1:str_value1^15 [2] = tint_field1:tint_field1^5 [3] = bool_field2:1^6 [4] = date_field1:[NOW-14DAYS TO NOW]^20 [5] = date_field2:[NOW-14DAYS TO NOW]^5 ) [indent] = on [start] = 0 [q.alt] = *:* [wt] = xml [fq] = Array ( [0] = tint_field2:[tint_value2 TO tint_value22] [1] = str_field1:str_value1 [2] = str_field2:str_value2 [3] = tint_field3:(tint_value3 OR tint_value32 OR tint_value33 OR tint_value34 OR tint_value5) [4] = tint_field4:tint_value4 [5] = -bool_field1:[* TO *] ) [version] = 2.2 [defType] = dismax [rows] = 10 ) ) I test my PHP search API and found that concurrent random queries, for example 10 queries at one time increases QTime from avg 500 ms to 3000 ms at 2 nodes. 1. How can I tweak my queries or parameters or Solr's config to decrease QTime? 2. What if I put my index data to emulated RAM directory, can it increase greatly performance? 3. Sorting by boost queries has a great influence on QTime, how can I optimize boost queries? 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes per machine, will it increase performance? 5. What is multi-core query, how can I configure it, and will it increase performance? Thank you!
Re: Sharding in SolrCloud
Mark Miller markrmil...@gmail.com schrieb am 12.06.2012 19:19:01: On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote: Hello, we tested SolrCloud in a setup with one collection, two shards and one replica per shard and it works quite fine with some example data. Now, we plan to set up our own collection and determine in how many shards we should devide it. We can estimate quite exactly the size of the collection, but we don't know, what the best approach for sharding is, even if we know the size and the amount of queries and updates. Is there any documentation or a kind of design guidelines for sharding a collection in SolrCloud? Thanks regards, Norman Lenzner It's hard to tell - I think you want to start with an idea of how many docs you can fit on a single node. This can vary wildly depending on many factors. Generally you have to do some testing with your particular config and data. You can search the mailing lists and perhaps dig up a little info, but there is really no replacement for running some tests with real data. Then you have to plan in your growth rate - resharding is naturally a relatively expensive operation. Once you have an idea of how many docs per machine you think seems comfortable, figure out how machines you need given your estimated doc growth rate and perhaps some padding. You might not get it right, but if you expect the possibility of a lot of growth, erring on the more shards side is obviously better. - Mark Miller lucidimagination.com Hello and thanks for your reply, We will run some tests to determine the size of our collection, but I think, there won't be the need of a second shard at all. The problem is not the size or the growth of the docs, but there will be a quite high update frequency. So, if we have many bulk updates, is it reasonable to distribute the update load on multiple shards? Thanks regards, Norman Lenzner
Re: Different sort for each facet
Hmm, it seems that if I leave off the initial facet.sort=index then it will sort each by index by default, and I can use the f.people.facet.sort=count as expected. I thought I tried that yesterday, but I suppose it slipped my mind in my sleep-deprived state. Thanks Jack! -- Chris On Tue, Jun 12, 2012 at 10:58 PM, Jack Krupansky j...@basetechnology.com wrote: f.people.facet.sort=count should work. Make sure you don't have a conflicting setting for that same field and attribute. Does the people facet sort by count correctly with f.sort=index? What are the attributes and field type for the people field? -- Jack Krupansky -Original Message- From: Christopher Gross Sent: Tuesday, June 12, 2012 11:05 AM To: solr-user Subject: Different sort for each facet In Solr 3.4, is there a way I can sort two facets differently in the same query? If I have: http://mysolrsrvr/solr/select?q=*:*facet=truefacet.field=peoplefacet.field=category is there a way that I can sort people by the count and category by the name all in one query? Or do I need to do that in separate queries? I tried using f.people.facet.sort=count while also having facet.sort=index but both came back in alphabetical order. Doing more queries is OK, I'm just trying to avoid having to do too many. -- Chris
LockObtainFailedException after trying to create cores on second SolrCloud instance
Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)* * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)* * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) * * at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) * * at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) * * at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) * * at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) * * at org.eclipse.jetty.server.Server.handle(Server.java:351)* * at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) * * at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) * * at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) * * at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) * * at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)* * at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)* * at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) * * at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) * * at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) * * at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) * * at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) * * at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) * * at org.eclipse.jetty.server.Server.handle(Server.java:351)* * at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) * * at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) * * at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) * * at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) * * at
Re: Different sort for each facet
I'm glad that you have something working, but you shouldn't have to remove that facet.sort=index. I tried the following and it works with the Solr 3.6 example after I indexed with exampledocs/books.json: http://localhost:8983/solr/select/?q=*:*facet=truefacet.field=namefacet.field=genre_sfacet.sort=indexf.name.facet.sort=count I see the name field sorted by count and the genre_s field sorted by lexical order (note: IT comes before fantasy because upper case comes before lower case - it would be nice to have a case-neutral sort.) Could you try it, just to see if maybe we are not communicating about what exactly is not working for you? What release of Solr are you using? I am not aware of any fixes/changes that would make this behave differently as of 3.6. BTW, the default sort is index IFF facet.limit = 0. The default for facet.limit is 100, so sort should default to count. I presume you have facet.limit set to -1 or 0. You might also check to see what facet parameters might be set in your request handler as opposed to on the actual query request. -- Jack Krupansky -Original Message- From: Christopher Gross Sent: Wednesday, June 13, 2012 9:19 AM To: solr-user@lucene.apache.org Subject: Re: Different sort for each facet Hmm, it seems that if I leave off the initial facet.sort=index then it will sort each by index by default, and I can use the f.people.facet.sort=count as expected. I thought I tried that yesterday, but I suppose it slipped my mind in my sleep-deprived state. Thanks Jack! -- Chris On Tue, Jun 12, 2012 at 10:58 PM, Jack Krupansky j...@basetechnology.com wrote: f.people.facet.sort=count should work. Make sure you don't have a conflicting setting for that same field and attribute. Does the people facet sort by count correctly with f.sort=index? What are the attributes and field type for the people field? -- Jack Krupansky -Original Message- From: Christopher Gross Sent: Tuesday, June 12, 2012 11:05 AM To: solr-user Subject: Different sort for each facet In Solr 3.4, is there a way I can sort two facets differently in the same query? If I have: http://mysolrsrvr/solr/select?q=*:*facet=truefacet.field=peoplefacet.field=category is there a way that I can sort people by the count and category by the name all in one query? Or do I need to do that in separate queries? I tried using f.people.facet.sort=count while also having facet.sort=index but both came back in alphabetical order. Doing more queries is OK, I'm just trying to avoid having to do too many. -- Chris
Re: [DIH] Multiple repeat XPath stmts
TNX. A lifesaver... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Multiple-repeat-XPath-stmts-tp499770p3989439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting maximum / minimum field value - slow query
What is more, I tried to get the maximum value using stats query This time the response time was about 30 seconds and server ate 1.5 Gb of memory when calculating the response. But there were no statistics in response: response lst name=responseHeader int name=status0/int int name=QTime27578/int lst name=params str name=q*.*/str str name=statstrue/str str name=stats.fieldId/str str name=rows0/str /lst /lst result name=response numFound=0 start=0/ lst name=stats lst name=stats_fields null name=Id/ /lst /lst /response What's wrong here? -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467p3989468.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
Thats an interesting data dir location: NativeFSLock@/home/myuser/ data/index/write.lock Where are the other data dirs located? Are you sharing one drive or something? It looks like something already has a writer lock - are you sure another solr instance is not running somehow? On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) * * at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) * * at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) * * at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) * * at
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
What command are you using to create the cores? I had this sort of problem, and it was because I'd accidentally created two cores with the same instanceDir within the same SOLR process. Make sure you don't have that kind of collision. The easiest way is to specify an explicit instanceDir and dataDir. Best, Casey Callendrello On 6/13/12 7:28 AM, Daniel Brügge wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)* * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)* * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) * * at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) * * at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) * * at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) * * at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) * * at org.eclipse.jetty.server.Server.handle(Server.java:351)* * at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) * * at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) * * at
Re: Getting maximum / minimum field value - slow query
Try the query without the sort to get the number of rows, then do a second query using a start equal to the number of rows. That should get you the last row/document. -- Jack Krupansky -Original Message- From: rafal.gwizd...@gmail.com Sent: Wednesday, June 13, 2012 3:07 PM To: solr-user@lucene.apache.org Subject: Getting maximum / minimum field value - slow query Hi, I have an index with about 9 millions of documents. Every document has an integer 'Id' field (it's not the SOLR document identifier) and I want to get the maximum value of that field. Therefore I'm doing a search with the following parameters query=*.*, sort=Id desc, rows=1 response lst name=responseHeader int name=status0/int int name=QTime2672/int lst name=params str name=q*:*/str str name=rows1/str str name=sortId desc/str /lst /lst result name=response numFound=8747779 start=0 doc str name=UidCRQIncident#45165891/str /doc /result /response The problem is that it takes quite a long time to get the response (2-10 seconds). Why is it so slow - isn't it a simple index lookup? Best regards RG -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr1.4 and threads ....
We've got a tokenizer which is quite explicitly coded on the assumption that it will only be called from one thread at a time. After all, what would it mean for two threads to make interleaved calls to the hasNext() function()? Yet, a customer of ours with a gigantic instance of Solr 1.4 reports incidents in which we throw an exception that indicates (we think), that two different threads made interleaved calls. Does this suggest anything to anyone? Other than that we've misanalyzed the logic in the tokenizer and there's a way to make it burp on one thread?
Re: Sharding in SolrCloud
Hmmm, are you sure SolrCloud fits your needs? You say that you think everything will fit on one shard and are worried about bulk updates. In that case I should think regular Solr master/slave (rather than cloud) might be a better fit. Using Cloud and all that goes with it for a single shard is certainly possible, but I question whether it's your best option here Of course if NRT is a requirement, then SolrCloud is a much better option With typical master/slave setups, since your bulk updates are happening on a separate machine, having multiple slaves that query at a given interval seems like it would work, but you'd have to be able to stand, say, 5-10 minute latency... Best Erick On Wed, Jun 13, 2012 at 7:47 AM, lenz...@gfi.ihk.de wrote: Mark Miller markrmil...@gmail.com schrieb am 12.06.2012 19:19:01: On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote: Hello, we tested SolrCloud in a setup with one collection, two shards and one replica per shard and it works quite fine with some example data. Now, we plan to set up our own collection and determine in how many shards we should devide it. We can estimate quite exactly the size of the collection, but we don't know, what the best approach for sharding is, even if we know the size and the amount of queries and updates. Is there any documentation or a kind of design guidelines for sharding a collection in SolrCloud? Thanks regards, Norman Lenzner It's hard to tell - I think you want to start with an idea of how many docs you can fit on a single node. This can vary wildly depending on many factors. Generally you have to do some testing with your particular config and data. You can search the mailing lists and perhaps dig up a little info, but there is really no replacement for running some tests with real data. Then you have to plan in your growth rate - resharding is naturally a relatively expensive operation. Once you have an idea of how many docs per machine you think seems comfortable, figure out how machines you need given your estimated doc growth rate and perhaps some padding. You might not get it right, but if you expect the possibility of a lot of growth, erring on the more shards side is obviously better. - Mark Miller lucidimagination.com Hello and thanks for your reply, We will run some tests to determine the size of our collection, but I think, there won't be the need of a second shard at all. The problem is not the size or the growth of the docs, but there will be a quite high update frequency. So, if we have many bulk updates, is it reasonable to distribute the update load on multiple shards? Thanks regards, Norman Lenzner
Re: FilterCache - maximum size of document set
Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize is also not good solution because during day cache filters are very very helpful. You could be interested in type of filters that I use. These are range filters (I tried standard range filters and frange) - eg. price:[* TO 1]. Some fq with price can return few thousands of results (eg. price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of documents. I'd also like to avoid solution which will introduce strict ranges that user can choose. Have you any suggestions what can I do? Is there any way to limit for example maximum size of docSet which is cached in FilterCache? -- Pawel
Re: Solr1.4 and threads ....
On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies bimargul...@gmail.com wrote: Does this suggest anything to anyone? Other than that we've misanalyzed the logic in the tokenizer and there's a way to make it burp on one thread? it might suggest the different tokenstream instances refer to some shared object that is not thread safe: we had bugs like this before (e.g. sharing a JDK collator is ok, but ICU ones are not thread-safe, so you must clone them). Because of this we beefed up our base analysis class (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java) to find thread safety bugs like this. I recommend just grabbing the test-framework.jar (we release it as an artifact), extend that class and write a test like: public void testRandomStrings() throws Exception { checkRandomData(random, analyzer, 10); } (or use the one in the branch, its even been improved since 3.6) -- lucidimagination.com
Re: Getting maximum / minimum field value - slow query
A large start value is probably worse performing than the sort (see SOLR-1726). Once the sort field is cached, it'll be quick from then on. Put in a warming query in solrconfig for new and/or firstSearcher that does this sort and the cache will be built in advance of queries at least. Erik On Jun 13, 2012, at 16:09 , Jack Krupansky wrote: Try the query without the sort to get the number of rows, then do a second query using a start equal to the number of rows. That should get you the last row/document. -- Jack Krupansky -Original Message- From: rafal.gwizd...@gmail.com Sent: Wednesday, June 13, 2012 3:07 PM To: solr-user@lucene.apache.org Subject: Getting maximum / minimum field value - slow query Hi, I have an index with about 9 millions of documents. Every document has an integer 'Id' field (it's not the SOLR document identifier) and I want to get the maximum value of that field. Therefore I'm doing a search with the following parameters query=*.*, sort=Id desc, rows=1 response lst name=responseHeader int name=status0/int int name=QTime2672/int lst name=params str name=q*:*/str str name=rows1/str str name=sortId desc/str /lst /lst result name=response numFound=8747779 start=0 doc str name=UidCRQIncident#45165891/str /doc /result /response The problem is that it takes quite a long time to get the response (2-10 seconds). Why is it so slow - isn't it a simple index lookup? Best regards RG -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FilterCache - maximum size of document set
Thanks for your response Yes, maybe you are right. I thought that filters can be larger than 3M. All kinds of filters uses BitSet? Moreover maxSize of filterCache is set to 16000 in my case. There are evictions during day traffic but not during night traffic. Version of Solr which I use is 3.5 I haven't used Memory Anayzer yet. Could you write more details about it? -- Regards, Pawel On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize is also not good solution because during day cache filters are very very helpful. You could be interested in type of filters that I use. These are range filters (I tried standard range filters and frange) - eg. price:[* TO 1]. Some fq with price can return few thousands of results (eg. price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of documents. I'd also like to avoid solution which will introduce strict ranges that user can choose. Have you any suggestions what can I do? Is there any way to limit for example maximum size of docSet which is cached in FilterCache? -- Pawel
Regarding number of documents
Hi, I have a data config file that contains the data import query. If I just run the import query against MySQL, I get a certain number of results. I assume that if I run the full-import, I should get the same number of documents added to the index, but I see that it's not the case and the number of documents added to the index are less than what I see from the MySQL query result. Can any one tell me if my assumption is correct and why the number of documents would be off? Thanks, Swetha
Re: Regarding number of documents
Note: I don't see any errors in the logs when I run the index. On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote: Hi, I have a data config file that contains the data import query. If I just run the import query against MySQL, I get a certain number of results. I assume that if I run the full-import, I should get the same number of documents added to the index, but I see that it's not the case and the number of documents added to the index are less than what I see from the MySQL query result. Can any one tell me if my assumption is correct and why the number of documents would be off? Thanks, Swetha
Re: Regarding number of documents
Could it be that you are getting records that are not unique. If so then SOLR would just overwrite the non unique documents. Thanks Afroz On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote: Note: I don't see any errors in the logs when I run the index. On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote: Hi, I have a data config file that contains the data import query. If I just run the import query against MySQL, I get a certain number of results. I assume that if I run the full-import, I should get the same number of documents added to the index, but I see that it's not the case and the number of documents added to the index are less than what I see from the MySQL query result. Can any one tell me if my assumption is correct and why the number of documents would be off? Thanks, Swetha
Re: Regarding number of documents
That makes sense. But I added a new entry that showed up in the MySQL results and not in the Solr search results. The count of documents also did not increase after the addition. How can a new entry show up in MySQL results and not as a new document? On Wed, Jun 13, 2012 at 6:26 PM, Afroz Ahmad ahmad@gmail.com wrote: Could it be that you are getting records that are not unique. If so then SOLR would just overwrite the non unique documents. Thanks Afroz On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote: Note: I don't see any errors in the logs when I run the index. On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote: Hi, I have a data config file that contains the data import query. If I just run the import query against MySQL, I get a certain number of results. I assume that if I run the full-import, I should get the same number of documents added to the index, but I see that it's not the case and the number of documents added to the index are less than what I see from the MySQL query result. Can any one tell me if my assumption is correct and why the number of documents would be off? Thanks, Swetha
Re: Regarding number of documents
Check the ID for that latest record and try to query it in Solr. One way you can get multiple records in an RDBMS query is via join. In that case, each of the records could have the same value in the column(s) that you are using for your unique key field in Solr. -- Jack Krupansky -Original Message- From: Swetha Shenoy Sent: Wednesday, June 13, 2012 7:21 PM To: solr-user@lucene.apache.org Subject: Re: Regarding number of documents That makes sense. But I added a new entry that showed up in the MySQL results and not in the Solr search results. The count of documents also did not increase after the addition. How can a new entry show up in MySQL results and not as a new document? On Wed, Jun 13, 2012 at 6:26 PM, Afroz Ahmad ahmad@gmail.com wrote: Could it be that you are getting records that are not unique. If so then SOLR would just overwrite the non unique documents. Thanks Afroz On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote: Note: I don't see any errors in the logs when I run the index. On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote: Hi, I have a data config file that contains the data import query. If I just run the import query against MySQL, I get a certain number of results. I assume that if I run the full-import, I should get the same number of documents added to the index, but I see that it's not the case and the number of documents added to the index are less than what I see from the MySQL query result. Can any one tell me if my assumption is correct and why the number of documents would be off? Thanks, Swetha
Re: Regarding number of documents
On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote: That makes sense. But I added a new entry that showed up in the MySQL results and not in the Solr search results. The count of documents also did not increase after the addition. How can a new entry show up in MySQL results and not as a new document? Sorry, but this is not very clear: Are you running a full-import, or a delta-import after adding the new entry in mysql? By any chance, does the new entry have an ID that already exists in the Solr index? What is the number of records that DIH reports after an import is completed? Regards, Gora
Re: Unexpected DIH behavior for onError attribute
On 13 June 2012 10:45, Pranav Prakash pra...@gmail.com wrote: My DIH Config file goes as follows. We have two db hosts, one of which contains blocks of content and the other contain transcripts of those content blocks. The makeDynamicTranscript function is used to create row names like transcript_en, transcript_es and so on, which are dynamic fields in Solr with appropriate tokenizers. [...] This looks fine. Have you looked in the Solr logs for more information? Is it possible that the error is causing some connection issue? What is the error exactly, and is it happening on the SELECT in the inner entity, or on the outer one? Regards, Gora