Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
Will check later to use different data dirs for the core on each instance. But because each Solr sits in it's own openvz instance (virtual server respectively) they should be totally separated. At least from my point of understanding virtualization. Will check and get back here... Thanks. On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote: Thats an interesting data dir location: NativeFSLock@/home/myuser/ data/index/write.lock Where are the other data dirs located? Are you sharing one drive or something? It looks like something already has a writer lock - are you sure another solr instance is not running somehow? On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)* * at
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
OK, I think I have found it. I provided when starting the 4 solr instances via start.jar always the data directory property via *-Dsolr.data.dir=/home/myuser/data * After removing this it worked fine. What is weird is, that all 4 instances are totally separated, so that instance-2 should never conflict with instance-1. they could also be on totally different physical servers. Thanks. Daniel On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote: Thats an interesting data dir location: NativeFSLock@/home/myuser/ data/index/write.lock Where are the other data dirs located? Are you sharing one drive or something? It looks like something already has a writer lock - are you sure another solr instance is not running somehow? On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge daniel.brue...@googlemail.com wrote: BTW: i am running the solr instances using -Xms512M -Xmx1024M so not so little memory. Daniel On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, am struggling around with creating multiple collections on a 4 instances SolrCloud setup: I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each and on one is also a standalone Zookeeper running. Loading the Solr configuration into ZK works fine. Then I startup the 4 instances and everything is also running smoothly. After that I am adding one core with the name e.g. '123'. This core is correctly visible on the instance I have used for creating it. it maps like '123' shard1 - virtual-instance-1 After that I am creating a core with the same name '123' on the second instance and it creates it, but an exception is thrown after some while and the cluster state of the newly created core goes to 'recovering' *123:{shard1:{ virtual-instance-1:8983_solr_123:{ shard:shard1, roles:null, leader:true, state:active, core:123, collection:123, node_name:virtual-instance-1:8983_solr, base_url:http://virtual-instance-1:8983/solr}, **virtual-instance-2**:8983_solr_123:{* *shard:shard1, roles:null, state:recovering, core:123, collection:123, node_name:virtual-instance-2:8983_solr, base_url:http://virtual-instance-2:8983/solr}}},* The exception throws is on the first virtual instance: *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log* *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock* * at org.apache.lucene.store.Lock.obtain(Lock.java:84)* * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)* * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)* * at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112) * * at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52) * * at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364) * * at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) * * at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) * * at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) * * at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) * * at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) * * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) * * at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) * * at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) * * at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) * * at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) * * at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) * * at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) * * at
Starts with Query
I want to find documents whose title is starting with digit, what will be solr query for this. I have tried many queries but could not able to configure proper query for this. Note : title is a field in my index. -- View this message in context: http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH idle in transaction forever
Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
On 6/14/2012 2:05 AM, Daniel Brügge wrote: Will check later to use different data dirs for the core on each instance. But because each Solr sits in it's own openvz instance (virtual server respectively) they should be totally separated. At least from my point of understanding virtualization. Depending on how your VMs are configured, their filesystems could be mapped to the same place of the host's filesystem. What you describe sounds like this is the case.
Re: Starts with Query
I want to find documents whose title is starting with digit, what will be solr query for this. I have tried many queries but could not able to configure proper query for this. Note : title is a field in my index. Something like this? q=title:(1* 2* 3* 4* ... 9*)q.op=OR
Re: Starts with Query
Are you trying to query for any numeric term at the start of a title or a specific numeric term at the start of a title? Unless you are using a query parser that supports Lucene's SpanFirstQuery or SpanPositionRangeQuery, you have two choices: 1. Explicitly (or implicitly via a custom update processor) add a marker term so you can match beginning of title if you are looking for a specific numeric term, such as markertext 123. 2. Add a second title field that is a string field type, say title_s, with a copyField from title to title_s, and then do a regex query to check for a digit at the beginning of the string form of the title, or use a trailing wildcard if you know the exact leading numeric value, such as 123 *. -- Jack Krupansky -Original Message- From: nutchsolruser Sent: Thursday, June 14, 2012 8:42 AM To: solr-user@lucene.apache.org Subject: Starts with Query I want to find documents whose title is starting with digit, what will be solr query for this. I have tried many queries but could not able to configure proper query for this. Note : title is a field in my index. -- View this message in context: http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH idle in transaction forever
Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
Re: DIH idle in transaction forever
Actually, the readOnly=true makes things worse. What it does (among other things) is: c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED); which leads to: Caused by: org.postgresql.util.PSQLException: Cannot change transaction isolation level in the middle of a transaction. because the connection is idle in transaction. I found this issue: https://issues.apache.org/jira/browse/SOLR-2045 Patching DIH with the code they suggest seems to work. mvg, Jasper On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote: Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
Aha, OK. That was new to me. Will check this. Thanks. On Thu, Jun 14, 2012 at 3:52 PM, Yury Kats yuryk...@yahoo.com wrote: On 6/14/2012 2:05 AM, Daniel Brügge wrote: Will check later to use different data dirs for the core on each instance. But because each Solr sits in it's own openvz instance (virtual server respectively) they should be totally separated. At least from my point of understanding virtualization. Depending on how your VMs are configured, their filesystems could be mapped to the same place of the host's filesystem. What you describe sounds like this is the case.
Re: Regarding number of documents
I am running a full-import. DIH reported that 1125 documents were added after indexing. This number did not change even after I added the new entries. How do I check the ID for an entry and query it against Solr? On Wed, Jun 13, 2012 at 10:33 PM, Gora Mohanty g...@mimirtech.com wrote: On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote: That makes sense. But I added a new entry that showed up in the MySQL results and not in the Solr search results. The count of documents also did not increase after the addition. How can a new entry show up in MySQL results and not as a new document? Sorry, but this is not very clear: Are you running a full-import, or a delta-import after adding the new entry in mysql? By any chance, does the new entry have an ID that already exists in the Solr index? What is the number of records that DIH reports after an import is completed? Regards, Gora
phrase query and string/keyword tokenizer
I have documents that are word definitions (basically an online dictionary) that can have alternate titles. For example the document entitled Read-only memory might have an alternate title of ROM. In search results, I want to boost documents with an alternate title that is a case-insensitive exact match for the query text -- e.g. rom should work as well. I'm running solr 3.6 and using edismax. I've gone through a few iterations of this. What I have working best so far is a multi-valued text field for the alternate titles with a big boost: fieldType name=lowerCaseSort class=solr.TextField sortMissingLast=true omitNorms=true analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType field name=bestMatchTitle type=lowerCaseSort indexed=true stored=false multiValued=true/ This produces great results with single-word searches like the ROM example above. It runs into problems with a multi-word alternate title like Blue Tooth. I have read some of the prior discussions about this, regarding how the query is parsed based on spaces before it gets to the keyword tokenizer for the field type. The question I have is about phrase queries in this case. My request handler has: str name=qfbestMatchTitle^20 title^5 summary^3 metaDescription^1.5 body^1 author^0.5/str str name=pfbestMatchTitle^20 title^5 summary^3 metaDescription^1.5 body^1 author^0.5/str When I run a query, I get this: +((DisjunctionMaxQuery((metaDescription:blue^1.5 | summary:blue^3.0 | author:blue^0.5 | body:blue | title:blue^5.0 | bestMatchTitle:blue^20.0)~0.01) DisjunctionMaxQuery((metaDescription:tooth^1.5 | summary:tooth^3.0 | author:tooth^0.5 | body:tooth | title:tooth^5.0 | bestMatchTitle:tooth^20.0)~0.01))~2) DisjunctionMaxQuery((metaDescription:blue tooth~100^1.5 | summary:blue tooth~100^3.0 | body:blue tooth~100 | title:blue tooth~100^5.0)~0.01) It looks like the phrase isn't being matched against my bestMatchTitle field. It also isn't matched against author, which is type string. So do phrases only get matched against certain field types? When I put the quotes in the query text: /select/?qt=best-matchq=blue+toothdebugQuery=on It builds the query I was hoping to get: +DisjunctionMaxQuery((metaDescription:blue tooth^1.5 | summary:blue tooth^3.0 | author:blue tooth^0.5 | body:blue tooth | title:blue tooth^5.0 | bestMatchTitle:blue tooth^20.0)~0.01) But I still need the query on the individual tokens, otherwise it eliminates results that may be good hits. So far, any way I have tried to combine the two queries either opens up matching a ton of documents that shouldn't really match (e.g. total found goes from 24 to 4800+ documents) or doesn't match the one I want, giving poor results. Does anyone have suggestions for how I can convince the phrase query to match against my bestMatchTitle field, or change the query text I'm passing in to combine these two queries and get the boost I want? Or is there another approach altogether that I'm missing? Thanks for any help with this. -Cat Bieber
Re: FilterCache - maximum size of document set
Hmmm, your maxSize is pretty high, it may just be that you've set this much higher than is wise. The maxSize setting governs the number of entries. I'd start with a much lower number here, and monitor the solr/admin page for both hit ratio and evictions. Well, and size too. 16,000 entries puts a ceiling of, what, 48G on it? Ouch! It sounds like what's happening here is you're just accumulating more and more fqs over the course of the evening and blowing memory. Not all FQs will be that big, there's some heuristics in there to just store the document numbers for sparse filters, maxDocs/8 is pretty much the upper bound though. Evictions are not necessarily a bad thing, the hit-ratio is important here. And if you're using a bare NOW in your filter queries, you're probably never re-using them anyway, see: http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/ I really question whether this limit is reasonable, but you know your situation best. Best Erick On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog pawelro...@gmail.com wrote: Thanks for your response Yes, maybe you are right. I thought that filters can be larger than 3M. All kinds of filters uses BitSet? Moreover maxSize of filterCache is set to 16000 in my case. There are evictions during day traffic but not during night traffic. Version of Solr which I use is 3.5 I haven't used Memory Anayzer yet. Could you write more details about it? -- Regards, Pawel On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize is also not good solution because during day cache filters are very very helpful. You could be interested in type of filters that I use. These are range filters (I tried standard range filters and frange) - eg. price:[* TO 1]. Some fq with price can return few thousands of results (eg. price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of documents. I'd also like to avoid solution which will introduce strict ranges that user can choose. Have you any suggestions what can I do? Is there any way to limit for example maximum size of docSet which is cached in FilterCache? -- Pawel
Re: Regarding number of documents
Here's a quick thing to check. Delete your index and do a fresh import. Then go to the admin/statistics. Check the numDocs and maxDocs entries. If they're different, it means that some of your documents have been deleted. Deleted you say? What's that about? Well, if more than one record has the same uniqueKey (see schema.xml), then the first doc is overwritten by the second. But this is really a delete of the old doc followed by an add. NOTE: This won't show any difference if you optimize, so don't optimize for this test. The fact that this isn't changing even after you add new entries probably means you're indexing documents with the same uniqueKey. Hope this helps Erick On Thu, Jun 14, 2012 at 12:03 PM, Swetha Shenoy sshe...@gmail.com wrote: I am running a full-import. DIH reported that 1125 documents were added after indexing. This number did not change even after I added the new entries. How do I check the ID for an entry and query it against Solr? On Wed, Jun 13, 2012 at 10:33 PM, Gora Mohanty g...@mimirtech.com wrote: On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote: That makes sense. But I added a new entry that showed up in the MySQL results and not in the Solr search results. The count of documents also did not increase after the addition. How can a new entry show up in MySQL results and not as a new document? Sorry, but this is not very clear: Are you running a full-import, or a delta-import after adding the new entry in mysql? By any chance, does the new entry have an ID that already exists in the Solr index? What is the number of records that DIH reports after an import is completed? Regards, Gora
Re: solrj library requirements: slf4j-jdk14-1.5.5.jar
What is the version of solrj you are trying to get working? If you download version 3.6 of solr there's a directory dist/solrj-lib in the binary release artifact that includes the required dependencies. I would start with those. -- Sami Siren On Wed, Jun 6, 2012 at 5:34 PM, Welty, Richard rwe...@ltionline.com wrote: the section of the solrj wiki page on setting up the class path calls for slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory. i don't see this jar or any like it with a different version anywhere in either the 3.5.0 or 3.6.0 distributions. is it really needed or is this just slightly outdated documentation? the top of the page (which references solr 1.4) suggests this is true, and i see other docs on the web suggesting this is the case, but the first result that pops out of google for solrj is the apparently outdated wiki page, so i imagine others will encounter the same issue. the other, more recent pages are not without issue as well, for example this page: http://lucidworks.lucidimagination.com/display/solr/Using+SolrJ references apache-solr-common which i'm not finding either. thanks, richard
Re: defaultSearchField not working after upgrade to solr3.6
: Correct. In 3.6 it is simply ignored. In 4.x it currently does work. That's not true. the example cofigs in Solr 3.6 no longer mention defaultSearchField, but Solr 3.6 will still respect a defaultSearchField/ declaration if it exists in your schema.xml -- I just verified this by running Solr 3.6 using hte Solr 3.5 example configs. The only change SOLR-2274 made to the *CODE* in 3.6 was to improve the wording in the logs/error messages to better distiguish when it was refering to the df param vs the defaultSearchField/ Rohit: if you are running 3.6 wit ha schema.xml that contains a defaultSearchField and you are seeing a failure related to not fiding hte default field, please post your schema.xml and the stack trace of the error. -Hoss
Re: defaultSearchField and param df are messed up in 3.6.x
: So if defaultSearchField has been removed (deprecated) from schema.xml then why : are the still calls to org.apache.solr.schema.IndexSchema.getDefaultSearchFieldName()? Because even though the syntax is deprecated/discouraged in schema.xml, we don't want things to break for existing users who have it in their schema.xml -- hence the method is still called. If you upgrade from a previous version, your old configs should still work -- if you start from scratch with the Solr 3.6 example, then you should follow the lead of hte Solr 3.6 example and specify df/qf as appropriate for your usecase. There are certainly improvements that can be made in how the chain of defaults works (hence SOLR-3534) but I don't see any way that this change broke anything for existing users -- if you can provide an example of a query + configs that worked in Solr 3.5 but don't work in Solr 3.6 then please, please, please file a bug with that information so we can understand what happened. -Hoss
How to boost a field with another field's value?
I have 2 fields in my schema - e.g. long field field1 and long field field 2. I'd like my boost query to be such that field1 is boosted by the value of field 2 for each document. What should the query time boost for this look like? I was able to do this using Index time boosting with the DataImportHandler, but couldn't figure out how to do this using query time boosting. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-boost-a-field-with-another-field-s-value-tp3989706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to boost a field with another field's value?
See Function Query: http://wiki.apache.org/solr/FunctionQuery If you are using the dismax or edismax query parser you can use the bf request parameter. e.g., q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 -- Jack Krupansky -Original Message- From: smita Sent: Thursday, June 14, 2012 4:40 PM To: solr-user@lucene.apache.org Subject: How to boost a field with another field's value? I have 2 fields in my schema - e.g. long field field1 and long field field 2. I'd like my boost query to be such that field1 is boosted by the value of field 2 for each document. What should the query time boost for this look like? I was able to do this using Index time boosting with the DataImportHandler, but couldn't figure out how to do this using query time boosting. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-boost-a-field-with-another-field-s-value-tp3989706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: defaultSearchField not working after upgrade to solr3.6
Hmmm... how could I have gotten so confused?!?! Actually, I recognized my mistake yesterday (after reading the code some more for David's Jira) but hadn't gotten around to correcting myself. In any case, the original problematic scenario may have been simply copying 3.5 request handler/params to the 3.6 example solrconfig but not realizing that the deprecated defaultSearchField element needed to be uncommented in the 3.6 schema. A second scenario was using in fact setting defaultSearchField but it wasn't working - because the request handler for 3.6 had set df to text and the code won't check the defaultSearchField if df is set. -- Jack Krupansky -Original Message- From: Chris Hostetter Sent: Thursday, June 14, 2012 4:05 PM To: solr-user@lucene.apache.org Subject: Re: defaultSearchField not working after upgrade to solr3.6 : Correct. In 3.6 it is simply ignored. In 4.x it currently does work. That's not true. the example cofigs in Solr 3.6 no longer mention defaultSearchField, but Solr 3.6 will still respect a defaultSearchField/ declaration if it exists in your schema.xml -- I just verified this by running Solr 3.6 using hte Solr 3.5 example configs. The only change SOLR-2274 made to the *CODE* in 3.6 was to improve the wording in the logs/error messages to better distiguish when it was refering to the df param vs the defaultSearchField/ Rohit: if you are running 3.6 wit ha schema.xml that contains a defaultSearchField and you are seeing a failure related to not fiding hte default field, please post your schema.xml and the stack trace of the error. -Hoss
Re: Regarding number of documents
Thanks all, for your inputs. We found what the problem was, the reason certain entries were missing from the index and not from the MySQL search results was that we had some customized transformers in the data config, that skipped the entries when a particular field was missing. On Thu, Jun 14, 2012 at 1:28 PM, Erick Erickson erickerick...@gmail.comwrote: Here's a quick thing to check. Delete your index and do a fresh import. Then go to the admin/statistics. Check the numDocs and maxDocs entries. If they're different, it means that some of your documents have been deleted. Deleted you say? What's that about? Well, if more than one record has the same uniqueKey (see schema.xml), then the first doc is overwritten by the second. But this is really a delete of the old doc followed by an add. NOTE: This won't show any difference if you optimize, so don't optimize for this test. The fact that this isn't changing even after you add new entries probably means you're indexing documents with the same uniqueKey. Hope this helps Erick On Thu, Jun 14, 2012 at 12:03 PM, Swetha Shenoy sshe...@gmail.com wrote: I am running a full-import. DIH reported that 1125 documents were added after indexing. This number did not change even after I added the new entries. How do I check the ID for an entry and query it against Solr? On Wed, Jun 13, 2012 at 10:33 PM, Gora Mohanty g...@mimirtech.com wrote: On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote: That makes sense. But I added a new entry that showed up in the MySQL results and not in the Solr search results. The count of documents also did not increase after the addition. How can a new entry show up in MySQL results and not as a new document? Sorry, but this is not very clear: Are you running a full-import, or a delta-import after adding the new entry in mysql? By any chance, does the new entry have an ID that already exists in the Solr index? What is the number of records that DIH reports after an import is completed? Regards, Gora
Re: PageRanking with DIH
: I have computed pagerank offline for document set dump. I ideally : want to use pagerank and solr relevency score together in formula to : sort search solr result. I have already looked at : http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents : and found that indextimeboost is useful. I want to know how can I use : indextimeboost ? i would strongly suggest thta instead of using index time boost you use a boost function on a numeric field (the very next section of that SolrRelevancyFAQ) I've updated the page to try and make this alternative method more obvious, and mentioned the use of ExternalFileField (for the case where you want to be able to update these rankings w/o reindexing) http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29 -Hoss
Re: FilterCache - maximum size of document set
It can be true that filters cache max size is set to high value. That is also true that. We looked at evictions and hit rate earlier. Maybe you are right that evictions are not always unwanted. Some time ago we made tests. There are not so high difference in hit rate when filters maxSize is set to 4000 (hit rate about 85%) and 16000 (hitrate about 91%). I think that also using LFU cache can be helpful but it makes me to migrate to 3.6. Do you think it is reasonable to use slave on version 3.6 and master on 3.5? Once again, Thanks for your help -- Pawel On Thu, Jun 14, 2012 at 7:22 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, your maxSize is pretty high, it may just be that you've set this much higher than is wise. The maxSize setting governs the number of entries. I'd start with a much lower number here, and monitor the solr/admin page for both hit ratio and evictions. Well, and size too. 16,000 entries puts a ceiling of, what, 48G on it? Ouch! It sounds like what's happening here is you're just accumulating more and more fqs over the course of the evening and blowing memory. Not all FQs will be that big, there's some heuristics in there to just store the document numbers for sparse filters, maxDocs/8 is pretty much the upper bound though. Evictions are not necessarily a bad thing, the hit-ratio is important here. And if you're using a bare NOW in your filter queries, you're probably never re-using them anyway, see: http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/ I really question whether this limit is reasonable, but you know your situation best. Best Erick On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog pawelro...@gmail.com wrote: Thanks for your response Yes, maybe you are right. I thought that filters can be larger than 3M. All kinds of filters uses BitSet? Moreover maxSize of filterCache is set to 16000 in my case. There are evictions during day traffic but not during night traffic. Version of Solr which I use is 3.5 I haven't used Memory Anayzer yet. Could you write more details about it? -- Regards, Pawel On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I think you may be looking at the wrong thing here. Generally, a filterCache entry will be maxDocs/8 (plus some overhead), so in your case they really shouldn't be all that large, on the order of 3M/filter. That shouldn't vary based on the number of docs that match the fq, it's just a bitset. To see if that makes any sense, take a look at the admin page and the number of evictions in your filterCache. If that is 0, you're probably using all the memory you're going to in the filterCache during the day.. But you haven't indicated what version of Solr you're using, I'm going from a relatively recent 3x knowledge-base. Have you put a memory analyzer against your Solr instance to see where the memory is being used? Best Erick On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote: Hi, I have solr index with about 25M documents. I optimized FilterCache size to reach the best performance (considering traffic characteristic that my Solr handles). I see that the only way to limit size of a Filter Cace is to set number of document sets that Solr can cache. There is no way to set memory limit (eg. 2GB, 4GB or something like that). When I process a standard trafiic (during day) everything is fine. But when Solr handle night traffic (and the charateristic of requests change) some problems appear. There is JVM out of memory error. I know what is the reason. Some filters on some fields are quite poor filters. They returns 15M of documents or even more. You could say 'Just put that into q'. I tried to put that filters into Query part but then, the statistics of request processing time (during day) become much worse. Reduction of Filter Cache maxSize is also not good solution because during day cache filters are very very helpful. You could be interested in type of filters that I use. These are range filters (I tried standard range filters and frange) - eg. price:[* TO 1]. Some fq with price can return few thousands of results (eg. price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of documents. I'd also like to avoid solution which will introduce strict ranges that user can choose. Have you any suggestions what can I do? Is there any way to limit for example maximum size of docSet which is cached in FilterCache? -- Pawel