Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Daniel Brügge
Will check later to use different data dirs for the core on
each instance.
But because each Solr sits in it's own openvz instance (virtual
server respectively) they should be totally separated. At least
from my point of understanding virtualization.

Will check and get back here...

Thanks.

On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Thats an interesting data dir location: NativeFSLock@/home/myuser/
 data/index/write.lock

 Where are the other data dirs located? Are you sharing one drive or
 something? It looks like something already has a writer lock - are you sure
 another solr instance is not running somehow?

 On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge 
 daniel.brue...@googlemail.com wrote:

  BTW: i am running the solr instances using -Xms512M -Xmx1024M
 
  so not so little memory.
 
  Daniel
 
  On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
  daniel.brue...@googlemail.com wrote:
 
   Hi,
  
   am struggling around with creating multiple collections on a 4
 instances
   SolrCloud
   setup:
  
   I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
   each and
   on one is also a standalone Zookeeper running.
  
   Loading the Solr configuration into ZK works fine.
  
   Then I startup the 4 instances and everything is also running smoothly.
  
   After that I am adding one core with the name e.g. '123'.
  
   This core is correctly visible on the instance I have used for creating
   it.
  
   it maps like
  
   '123'  shard1 - virtual-instance-1
  
  
   After that I am creating a core with the same name '123' on the second
   instance and it
   creates it, but an exception is thrown after some while and the cluster
   state of
   the newly created core goes to 'recovering'
  
  
 *123:{shard1:{
 virtual-instance-1:8983_solr_123:{
   shard:shard1,
   roles:null,
   leader:true,
   state:active,
   core:123,
   collection:123,
   node_name:virtual-instance-1:8983_solr,
   base_url:http://virtual-instance-1:8983/solr},
 **virtual-instance-2**:8983_solr_123:{*
   *shard:shard1,
   roles:null,
   state:recovering,
   core:123,
   collection:123,
   node_name:virtual-instance-2:8983_solr,
   base_url:http://virtual-instance-2:8983/solr}}},*
  
  
   The exception throws is on the first virtual instance:
  
   *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
   *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
   obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
   * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
   * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
   * at
   org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
   *
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
   *
   * at
  
 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
   *
   * at
  
 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   *
   * at
  
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   *
   * at
  
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
   *
   * at
  
 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
   *
   * at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   *
   * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   *
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   *
   * at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   *
   * at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
   * at
  
 
 

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Daniel Brügge
OK, I think I have found it. I provided when starting the 4 solr instances
via start.jar always the data directory property via

*-Dsolr.data.dir=/home/myuser/data
*
After removing this it worked fine. What is weird is, that all 4 instances
are totally separated, so that instance-2 should never conflict with
instance-1. they could also be on totally different physical servers.

Thanks. Daniel

On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Thats an interesting data dir location: NativeFSLock@/home/myuser/
 data/index/write.lock

 Where are the other data dirs located? Are you sharing one drive or
 something? It looks like something already has a writer lock - are you sure
 another solr instance is not running somehow?

 On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge 
 daniel.brue...@googlemail.com wrote:

  BTW: i am running the solr instances using -Xms512M -Xmx1024M
 
  so not so little memory.
 
  Daniel
 
  On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
  daniel.brue...@googlemail.com wrote:
 
   Hi,
  
   am struggling around with creating multiple collections on a 4
 instances
   SolrCloud
   setup:
  
   I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
   each and
   on one is also a standalone Zookeeper running.
  
   Loading the Solr configuration into ZK works fine.
  
   Then I startup the 4 instances and everything is also running smoothly.
  
   After that I am adding one core with the name e.g. '123'.
  
   This core is correctly visible on the instance I have used for creating
   it.
  
   it maps like
  
   '123'  shard1 - virtual-instance-1
  
  
   After that I am creating a core with the same name '123' on the second
   instance and it
   creates it, but an exception is thrown after some while and the cluster
   state of
   the newly created core goes to 'recovering'
  
  
 *123:{shard1:{
 virtual-instance-1:8983_solr_123:{
   shard:shard1,
   roles:null,
   leader:true,
   state:active,
   core:123,
   collection:123,
   node_name:virtual-instance-1:8983_solr,
   base_url:http://virtual-instance-1:8983/solr},
 **virtual-instance-2**:8983_solr_123:{*
   *shard:shard1,
   roles:null,
   state:recovering,
   core:123,
   collection:123,
   node_name:virtual-instance-2:8983_solr,
   base_url:http://virtual-instance-2:8983/solr}}},*
  
  
   The exception throws is on the first virtual instance:
  
   *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
   *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
   obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
   * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
   * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
   * at
   org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
   *
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
   *
   * at
  
 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
   *
   * at
  
 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   *
   * at
  
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   *
   * at
  
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
   *
   * at
  
 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
   *
   * at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   *
   * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   *
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   *
   * at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   *
   * at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   *
   * at
 

Starts with Query

2012-06-14 Thread nutchsolruser
I want to find documents whose title is starting with digit, what will be
solr query for this. I have tried many queries but could not able to
configure proper query for this.
Note : title is a field in my index.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH idle in transaction forever

2012-06-14 Thread Jasper Floor
Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
dataSource name=df-stream-store-ds
jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
autoCommit=false batchSize=1 /

We have a datasource defined in the jndi:
no-tx-datasource
jndi-nameext_solr_datafeeds_dba/jndi-name
security-domainext_solr_datafeeds_dba_realm/security-domain

connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
min-pool-size0/min-pool-size
max-pool-size5/max-pool-size

transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
driver-classorg.postgresql.Driver/driver-class
blocking-timeout-millis3/blocking-timeout-millis
idle-timeout-minutes5/idle-timeout-minutes
new-connection-sqlSELECT 1/new-connection-sql
check-valid-connection-sqlSELECT 
1/check-valid-connection-sql
/no-tx-datasource


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper


Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Yury Kats
On 6/14/2012 2:05 AM, Daniel Brügge wrote:
 Will check later to use different data dirs for the core on
 each instance.
 But because each Solr sits in it's own openvz instance (virtual
 server respectively) they should be totally separated. At least
 from my point of understanding virtualization.

Depending on how your VMs are configured, their filesystems could
be mapped to the same place of the host's filesystem. What you describe
sounds like this is the case.


Re: Starts with Query

2012-06-14 Thread Ahmet Arslan
 I want to find documents whose title
 is starting with digit, what will be
 solr query for this. I have tried many queries but could not
 able to
 configure proper query for this.
 Note : title is a field in my index.

Something like this?  q=title:(1* 2* 3* 4* ... 9*)q.op=OR 


Re: Starts with Query

2012-06-14 Thread Jack Krupansky
Are you trying to query for any numeric term at the start of a title or a 
specific numeric term at the start of a title?


Unless you are using a query parser that supports Lucene's SpanFirstQuery or 
SpanPositionRangeQuery, you have two choices:


1. Explicitly (or implicitly via a custom update processor) add a marker 
term so you can match beginning of title if you are looking for a specific 
numeric term, such as markertext 123.


2. Add a second title field that is a string field type, say title_s, with 
a copyField from title to title_s, and then do a regex query to check for a 
digit at the beginning of the string form of the title, or use a trailing 
wildcard if you know the exact leading numeric value, such as 123 *.


-- Jack Krupansky

-Original Message- 
From: nutchsolruser

Sent: Thursday, June 14, 2012 8:42 AM
To: solr-user@lucene.apache.org
Subject: Starts with Query

I want to find documents whose title is starting with digit, what will be
solr query for this. I have tried many queries but could not able to
configure proper query for this.
Note : title is a field in my index.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: DIH idle in transaction forever

2012-06-14 Thread Dyer, James
Try readOnly=true in the dataSource configuration.  This causes several 
defaults to get set in the JDBC connection, and often will solve problems like 
this. (see 
http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
optimal.  This might be better than 1.

There is also an issue in that it doesn't explicitly close the resultset but 
relies on closing the connection to implicily close the child objects.  I know 
when I tried using DIH with Derby a while back this had at the least caused 
some log warnings, and it wouldn't work at all without readOnly=false.  Not 
sure abour PostgreSql.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jasper Floor [mailto:jasper.fl...@m4n.nl] 
Sent: Thursday, June 14, 2012 8:21 AM
To: solr-user@lucene.apache.org
Subject: DIH idle in transaction forever

Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
dataSource name=df-stream-store-ds
jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
autoCommit=false batchSize=1 /

We have a datasource defined in the jndi:
no-tx-datasource
jndi-nameext_solr_datafeeds_dba/jndi-name
security-domainext_solr_datafeeds_dba_realm/security-domain

connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
min-pool-size0/min-pool-size
max-pool-size5/max-pool-size

transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
driver-classorg.postgresql.Driver/driver-class
blocking-timeout-millis3/blocking-timeout-millis
idle-timeout-minutes5/idle-timeout-minutes
new-connection-sqlSELECT 1/new-connection-sql
check-valid-connection-sqlSELECT 
1/check-valid-connection-sql
/no-tx-datasource


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper


Re: DIH idle in transaction forever

2012-06-14 Thread Jasper Floor
Actually, the readOnly=true makes things worse.
What it does (among other things) is:
c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

which leads to:
Caused by: org.postgresql.util.PSQLException: Cannot change
transaction isolation level in the middle of a transaction.

because the connection is idle in transaction.

I found this issue:
https://issues.apache.org/jira/browse/SOLR-2045

Patching DIH with the code they suggest seems to work.

mvg,
Jasper

On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote:
 Try readOnly=true in the dataSource configuration.  This causes several 
 defaults to get set in the JDBC connection, and often will solve problems 
 like this. (see 
 http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
 Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
 optimal.  This might be better than 1.

 There is also an issue in that it doesn't explicitly close the resultset but 
 relies on closing the connection to implicily close the child objects.  I 
 know when I tried using DIH with Derby a while back this had at the least 
 caused some log warnings, and it wouldn't work at all without readOnly=false. 
  Not sure abour PostgreSql.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
 Sent: Thursday, June 14, 2012 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: DIH idle in transaction forever

 Hi all,

 It seems that DIH always holds two connections open to the database.
 One of them is almost always 'idle in transaction'. It may sometimes
 seem to do a little work but then it goes idle again.


 datasource definition:
        dataSource name=df-stream-store-ds
 jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
 autoCommit=false batchSize=1 /

 We have a datasource defined in the jndi:
        no-tx-datasource
                jndi-nameext_solr_datafeeds_dba/jndi-name
                security-domainext_solr_datafeeds_dba_realm/security-domain
                
 connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
                min-pool-size0/min-pool-size
                max-pool-size5/max-pool-size
                
 transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
                driver-classorg.postgresql.Driver/driver-class
                blocking-timeout-millis3/blocking-timeout-millis
                idle-timeout-minutes5/idle-timeout-minutes
                new-connection-sqlSELECT 1/new-connection-sql
                check-valid-connection-sqlSELECT 
 1/check-valid-connection-sql
        /no-tx-datasource


 If we set autocommit to true then we get an OOM on indexing so that is
 not an option.

 Does anyone have any idea why this happens? I would guess that DIH
 doesn't close the connection, but reading the code I can't be sure of
 this. The ResultSet object should close itself once it reaches the
 end.

 mvg,
 JAsper


Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Daniel Brügge
Aha, OK. That was new to me. Will check this. Thanks.

On Thu, Jun 14, 2012 at 3:52 PM, Yury Kats yuryk...@yahoo.com wrote:

 On 6/14/2012 2:05 AM, Daniel Brügge wrote:
  Will check later to use different data dirs for the core on
  each instance.
  But because each Solr sits in it's own openvz instance (virtual
  server respectively) they should be totally separated. At least
  from my point of understanding virtualization.

 Depending on how your VMs are configured, their filesystems could
 be mapped to the same place of the host's filesystem. What you describe
 sounds like this is the case.



Re: Regarding number of documents

2012-06-14 Thread Swetha Shenoy
I am running a full-import. DIH reported that 1125 documents were added
after indexing. This number did not change even after I added the new
entries.

How do I check the ID for an entry and query it against Solr?

On Wed, Jun 13, 2012 at 10:33 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote:
  That makes sense. But I added a new entry that showed up in the MySQL
  results and not in the Solr search results. The count of documents also
 did
  not increase after the addition. How can a new entry show up in MySQL
  results and not as a new document?

 Sorry, but this is not very clear: Are you running a
 full-import, or a delta-import after adding the new
 entry in mysql? By any chance, does the new entry
 have an ID that already exists in the Solr index?

 What is the number of records that DIH reports
 after an import is completed?

 Regards,
 Gora



phrase query and string/keyword tokenizer

2012-06-14 Thread Cat Bieber
I have documents that are word definitions (basically an online 
dictionary) that can have alternate titles. For example the document 
entitled Read-only memory might have an alternate title of ROM. In 
search results, I want to boost documents with an alternate title that 
is a case-insensitive exact match for the query text -- e.g. rom 
should work as well.


I'm running solr 3.6 and using edismax.

I've gone through a few iterations of this. What I have working best so 
far is a multi-valued text field for the alternate titles with a big boost:


fieldType name=lowerCaseSort class=solr.TextField 
sortMissingLast=true omitNorms=true

analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
/analyzer
/fieldType

field name=bestMatchTitle type=lowerCaseSort indexed=true 
stored=false multiValued=true/


This produces great results with single-word searches like the ROM 
example above. It runs into problems with a multi-word alternate title 
like Blue Tooth. I have read some of the prior discussions about this, 
regarding how the query is parsed based on spaces before it gets to the 
keyword tokenizer for the field type.


The question I have is about phrase queries in this case. My request 
handler has:


str name=qfbestMatchTitle^20 title^5 summary^3 metaDescription^1.5 
body^1 author^0.5/str
str name=pfbestMatchTitle^20 title^5 summary^3 metaDescription^1.5 
body^1 author^0.5/str


When I run a query, I get this:

+((DisjunctionMaxQuery((metaDescription:blue^1.5 | summary:blue^3.0 | 
author:blue^0.5 | body:blue | title:blue^5.0 | 
bestMatchTitle:blue^20.0)~0.01) 
DisjunctionMaxQuery((metaDescription:tooth^1.5 | summary:tooth^3.0 | 
author:tooth^0.5 | body:tooth | title:tooth^5.0 | 
bestMatchTitle:tooth^20.0)~0.01))~2) 
DisjunctionMaxQuery((metaDescription:blue tooth~100^1.5 | 
summary:blue tooth~100^3.0 | body:blue tooth~100 | title:blue 
tooth~100^5.0)~0.01)


It looks like the phrase isn't being matched against my bestMatchTitle 
field. It also isn't matched against author, which is type string. So do 
phrases only get matched against certain field types?


When I put the quotes in the query text:

/select/?qt=best-matchq=blue+toothdebugQuery=on

It builds the query I was hoping to get:

+DisjunctionMaxQuery((metaDescription:blue tooth^1.5 | summary:blue 
tooth^3.0 | author:blue tooth^0.5 | body:blue tooth | title:blue 
tooth^5.0 | bestMatchTitle:blue tooth^20.0)~0.01)


But I still need the query on the individual tokens, otherwise it 
eliminates results that may be good hits. So far, any way I have tried 
to combine the two queries either opens up matching a ton of documents 
that shouldn't really match (e.g. total found goes from 24 to 4800+ 
documents) or doesn't match the one I want, giving poor results.


Does anyone have suggestions for how I can convince the phrase query to 
match against my bestMatchTitle field, or change the query text I'm 
passing in to combine these two queries and get the boost I want? Or is 
there another approach altogether that I'm missing?


Thanks for any help with this.
-Cat Bieber



Re: FilterCache - maximum size of document set

2012-06-14 Thread Erick Erickson
Hmmm, your maxSize is pretty high, it may just be that you've set this
much higher
than is wise. The maxSize setting governs the number of entries. I'd start with
a much lower number here, and monitor the solr/admin page for both
hit ratio and evictions. Well, and size too. 16,000 entries puts a
ceiling of, what,
48G on it? Ouch! It sounds like what's happening here is you're just
accumulating
more and more fqs over the course of the evening and blowing memory.

Not all FQs will be that big, there's some heuristics in there to just store the
document numbers for sparse filters, maxDocs/8 is pretty much the upper
bound though.

Evictions are not necessarily a bad thing, the hit-ratio is important here. And
if you're using a bare NOW in your filter queries, you're probably never
re-using them anyway, see:
http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/

I really question whether this limit is reasonable, but you know your
situation best.

Best
Erick

On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog pawelro...@gmail.com wrote:
 Thanks for your response
 Yes, maybe you are right. I thought that filters can be larger than 3M. All
 kinds of filters uses BitSet?
 Moreover maxSize of filterCache is set to 16000 in my case. There are
 evictions during day traffic
 but not during night traffic.

 Version of Solr which I use is 3.5

 I haven't used Memory Anayzer yet. Could you write more details about it?

 --
 Regards,
 Pawel

 On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Hmmm, I think you may be looking at the wrong thing here. Generally, a
 filterCache
 entry will be maxDocs/8 (plus some overhead), so in your case they really
 shouldn't be all that large, on the order of 3M/filter. That shouldn't
 vary based
 on the number of docs that match the fq, it's just a bitset. To see if
 that makes any
 sense, take a look at the admin page and the number of evictions in
 your filterCache. If
 that is  0, you're probably using all the memory you're going to in
 the filterCache during
 the day..

 But you haven't indicated what version of Solr you're using, I'm going
 from a
 relatively recent 3x knowledge-base.

 Have you put a memory analyzer against your Solr instance to see where
 the memory
 is being used?

 Best
 Erick

 On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote:
  Hi,
  I have solr index with about 25M documents. I optimized FilterCache size
 to
  reach the best performance (considering traffic characteristic that my
 Solr
  handles). I see that the only way to limit size of a Filter Cace is to
 set
  number of document sets that Solr can cache. There is no way to set
 memory
  limit (eg. 2GB, 4GB or something like that). When I process a standard
  trafiic (during day) everything is fine. But when Solr handle night
 traffic
  (and the charateristic of requests change) some problems appear. There is
  JVM out of memory error. I know what is the reason. Some filters on some
  fields are quite poor filters. They returns 15M of documents or even
 more.
  You could say 'Just put that into q'. I tried to put that filters into
  Query part but then, the statistics of request processing time (during
  day) become much worse. Reduction of Filter Cache maxSize is also not
 good
  solution because during day cache filters are very very helpful.
  You could be interested in type of filters that I use. These are range
  filters (I tried standard range filters and frange) - eg. price:[* TO
  1]. Some fq with price can return few thousands of results (eg.
  price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions
 of
  documents. I'd also like to avoid solution which will introduce strict
  ranges that user can choose.
  Have you any suggestions what can I do? Is there any way to limit for
  example maximum size of docSet which is cached in FilterCache?
 
  --
  Pawel



Re: Regarding number of documents

2012-06-14 Thread Erick Erickson
Here's a quick thing to check. Delete your index and do a fresh import. Then
go to the admin/statistics. Check the numDocs and maxDocs entries. If
they're different, it means that some of your documents have been deleted.

Deleted you say? What's that about? Well, if more than one record has the
same uniqueKey (see schema.xml), then the first doc is overwritten by the
second. But this is really a delete of the old doc followed by an add.

NOTE: This won't show any difference if you optimize, so don't optimize for this
test.

The fact that this isn't changing even after you add new entries probably means
you're indexing documents with the same uniqueKey.

Hope this helps
Erick

On Thu, Jun 14, 2012 at 12:03 PM, Swetha Shenoy sshe...@gmail.com wrote:
 I am running a full-import. DIH reported that 1125 documents were added
 after indexing. This number did not change even after I added the new
 entries.

 How do I check the ID for an entry and query it against Solr?

 On Wed, Jun 13, 2012 at 10:33 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote:
  That makes sense. But I added a new entry that showed up in the MySQL
  results and not in the Solr search results. The count of documents also
 did
  not increase after the addition. How can a new entry show up in MySQL
  results and not as a new document?

 Sorry, but this is not very clear: Are you running a
 full-import, or a delta-import after adding the new
 entry in mysql? By any chance, does the new entry
 have an ID that already exists in the Solr index?

 What is the number of records that DIH reports
 after an import is completed?

 Regards,
 Gora



Re: solrj library requirements: slf4j-jdk14-1.5.5.jar

2012-06-14 Thread Sami Siren
What is the version of solrj you are trying to get working?

If you download version 3.6 of solr there's a directory dist/solrj-lib
in the binary release artifact that includes the required
dependencies. I would start with those.

--
 Sami Siren

On Wed, Jun 6, 2012 at 5:34 PM, Welty, Richard rwe...@ltionline.com wrote:
 the section of the solrj wiki page on setting up the class path calls for
 slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory.

 i don't see this jar or any like it with a different version anywhere
 in either the 3.5.0 or 3.6.0 distributions.

 is it really needed or is this just slightly outdated documentation? the top 
 of the page (which references solr 1.4) suggests this is true, and i see 
 other docs on the web suggesting this is the case, but the first result that 
 pops out of google for solrj is the apparently outdated wiki page, so i 
 imagine others will encounter the same issue.

 the other, more recent pages are not without issue as well, for example this 
 page:

 http://lucidworks.lucidimagination.com/display/solr/Using+SolrJ

 references apache-solr-common which i'm not finding either.

 thanks,
   richard


Re: defaultSearchField not working after upgrade to solr3.6

2012-06-14 Thread Chris Hostetter

: Correct. In 3.6 it is simply ignored. In 4.x it currently does work.

That's not true.

the example cofigs in Solr 3.6 no longer mention defaultSearchField, 
but Solr 3.6 will still respect a defaultSearchField/ declaration if 
it exists in your schema.xml -- I just verified this by running Solr 3.6 
using hte Solr 3.5 example configs.

The only change SOLR-2274 made to the *CODE* in 3.6 was to improve the 
wording in the logs/error messages to better distiguish when it was 
refering to the df param vs the defaultSearchField/

Rohit: if you are running 3.6 wit ha schema.xml that contains a 
defaultSearchField and you are seeing a failure related to not fiding 
hte default field, please post your schema.xml and the stack trace of the 
error.


-Hoss


Re: defaultSearchField and param df are messed up in 3.6.x

2012-06-14 Thread Chris Hostetter

: So if defaultSearchField has been removed (deprecated) from schema.xml then 
why
: are the still calls to 
org.apache.solr.schema.IndexSchema.getDefaultSearchFieldName()?

Because even though the syntax is deprecated/discouraged in schema.xml, we 
don't want things to break for existing users who have it in their 
schema.xml -- hence the method is still called.

If you upgrade from a previous version, your old configs should still work 
-- if you start from scratch with the Solr 3.6 example, then you should 
follow the lead of hte Solr 3.6 example and specify df/qf as appropriate 
for your usecase.  There are certainly improvements that can be made in 
how the chain of defaults works (hence SOLR-3534) but I don't see any way 
that this change broke anything for existing users -- if you can provide 
an example of a query + configs that worked in Solr 3.5 but don't work in 
Solr 3.6 then please, please, please file a bug with that information so 
we can understand what happened.


-Hoss


How to boost a field with another field's value?

2012-06-14 Thread smita
I have 2 fields in my schema - e.g. 

long field field1 and long field field 2.
I'd like my boost query to be such that field1 is boosted by the value of
field 2 for each document.
What should the query time boost for this look like? I was able to do this
using Index time boosting with the DataImportHandler, but couldn't figure
out how to do this using query time boosting.

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-a-field-with-another-field-s-value-tp3989706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to boost a field with another field's value?

2012-06-14 Thread Jack Krupansky

See Function Query:
http://wiki.apache.org/solr/FunctionQuery

If you are using the dismax or edismax query parser you can use the bf 
request parameter.


e.g.,

   q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3

-- Jack Krupansky

-Original Message- 
From: smita

Sent: Thursday, June 14, 2012 4:40 PM
To: solr-user@lucene.apache.org
Subject: How to boost a field with another field's value?

I have 2 fields in my schema - e.g.

long field field1 and long field field 2.
I'd like my boost query to be such that field1 is boosted by the value of
field 2 for each document.
What should the query time boost for this look like? I was able to do this
using Index time boosting with the DataImportHandler, but couldn't figure
out how to do this using query time boosting.

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-a-field-with-another-field-s-value-tp3989706.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: defaultSearchField not working after upgrade to solr3.6

2012-06-14 Thread Jack Krupansky
Hmmm... how could I have gotten so confused?!?! Actually, I recognized my 
mistake yesterday (after reading the code some more for David's Jira) but 
hadn't gotten around to correcting myself.


In any case, the original problematic scenario may have been simply copying 
3.5 request handler/params to the 3.6 example solrconfig but not realizing 
that the deprecated defaultSearchField element needed to be uncommented in 
the 3.6 schema.


A second scenario was using in fact setting defaultSearchField but it wasn't 
working - because the request handler for 3.6 had set df to text and the 
code won't check the defaultSearchField if df is set.


-- Jack Krupansky

-Original Message- 
From: Chris Hostetter

Sent: Thursday, June 14, 2012 4:05 PM
To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6


: Correct. In 3.6 it is simply ignored. In 4.x it currently does work.

That's not true.

the example cofigs in Solr 3.6 no longer mention defaultSearchField,
but Solr 3.6 will still respect a defaultSearchField/ declaration if
it exists in your schema.xml -- I just verified this by running Solr 3.6
using hte Solr 3.5 example configs.

The only change SOLR-2274 made to the *CODE* in 3.6 was to improve the
wording in the logs/error messages to better distiguish when it was
refering to the df param vs the defaultSearchField/

Rohit: if you are running 3.6 wit ha schema.xml that contains a
defaultSearchField and you are seeing a failure related to not fiding
hte default field, please post your schema.xml and the stack trace of the
error.


-Hoss 



Re: Regarding number of documents

2012-06-14 Thread Swetha Shenoy
Thanks all, for your inputs.

We found what the problem was, the reason certain entries were missing from
the index and not from the MySQL search results was that we had some
customized transformers in the data config, that skipped the entries when a
particular field was missing.

On Thu, Jun 14, 2012 at 1:28 PM, Erick Erickson erickerick...@gmail.comwrote:

 Here's a quick thing to check. Delete your index and do a fresh import.
 Then
 go to the admin/statistics. Check the numDocs and maxDocs entries. If
 they're different, it means that some of your documents have been deleted.

 Deleted you say? What's that about? Well, if more than one record has the
 same uniqueKey (see schema.xml), then the first doc is overwritten by the
 second. But this is really a delete of the old doc followed by an add.

 NOTE: This won't show any difference if you optimize, so don't optimize
 for this
 test.

 The fact that this isn't changing even after you add new entries probably
 means
 you're indexing documents with the same uniqueKey.

 Hope this helps
 Erick

 On Thu, Jun 14, 2012 at 12:03 PM, Swetha Shenoy sshe...@gmail.com wrote:
  I am running a full-import. DIH reported that 1125 documents were added
  after indexing. This number did not change even after I added the new
  entries.
 
  How do I check the ID for an entry and query it against Solr?
 
  On Wed, Jun 13, 2012 at 10:33 PM, Gora Mohanty g...@mimirtech.com
 wrote:
 
  On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote:
   That makes sense. But I added a new entry that showed up in the MySQL
   results and not in the Solr search results. The count of documents
 also
  did
   not increase after the addition. How can a new entry show up in MySQL
   results and not as a new document?
 
  Sorry, but this is not very clear: Are you running a
  full-import, or a delta-import after adding the new
  entry in mysql? By any chance, does the new entry
  have an ID that already exists in the Solr index?
 
  What is the number of records that DIH reports
  after an import is completed?
 
  Regards,
  Gora
 



Re: PageRanking with DIH

2012-06-14 Thread Chris Hostetter

: I have computed pagerank offline for document set dump.  I ideally
: want to use pagerank and solr relevency score together in formula to
: sort search solr result.  I have already looked at
: 
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
: and found that indextimeboost is useful. I want to know how can I use
: indextimeboost ?

i would strongly suggest thta instead of using index time boost you use 
a boost function on a numeric field (the very next section of that 
SolrRelevancyFAQ)

I've updated the page to try and make this alternative method more 
obvious, and mentioned the use of ExternalFileField (for the case where 
you want to be able to update these rankings w/o reindexing)

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29

-Hoss


Re: FilterCache - maximum size of document set

2012-06-14 Thread Pawel Rog
It can be true that filters cache max size is set to high value. That is
also true that.
We looked at evictions and hit rate earlier. Maybe you are right that
evictions are
not always unwanted. Some time ago we made tests. There are not so high
difference in hit rate when filters maxSize is set to 4000 (hit rate about
85%) and
16000 (hitrate about 91%). I think that also using LFU cache can be helpful
but
it makes me to migrate to 3.6. Do you think it is reasonable to use slave on
version 3.6 and master on 3.5?

Once again, Thanks for your help

--
Pawel

On Thu, Jun 14, 2012 at 7:22 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, your maxSize is pretty high, it may just be that you've set this
 much higher
 than is wise. The maxSize setting governs the number of entries. I'd start
 with
 a much lower number here, and monitor the solr/admin page for both
 hit ratio and evictions. Well, and size too. 16,000 entries puts a
 ceiling of, what,
 48G on it? Ouch! It sounds like what's happening here is you're just
 accumulating
 more and more fqs over the course of the evening and blowing memory.

 Not all FQs will be that big, there's some heuristics in there to just
 store the
 document numbers for sparse filters, maxDocs/8 is pretty much the upper
 bound though.

 Evictions are not necessarily a bad thing, the hit-ratio is important
 here. And
 if you're using a bare NOW in your filter queries, you're probably never
 re-using them anyway, see:

 http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/

 I really question whether this limit is reasonable, but you know your
 situation best.

 Best
 Erick

 On Wed, Jun 13, 2012 at 5:40 PM, Pawel Rog pawelro...@gmail.com wrote:
  Thanks for your response
  Yes, maybe you are right. I thought that filters can be larger than 3M.
 All
  kinds of filters uses BitSet?
  Moreover maxSize of filterCache is set to 16000 in my case. There are
  evictions during day traffic
  but not during night traffic.
 
  Version of Solr which I use is 3.5
 
  I haven't used Memory Anayzer yet. Could you write more details about it?
 
  --
  Regards,
  Pawel
 
  On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  Hmmm, I think you may be looking at the wrong thing here. Generally, a
  filterCache
  entry will be maxDocs/8 (plus some overhead), so in your case they
 really
  shouldn't be all that large, on the order of 3M/filter. That shouldn't
  vary based
  on the number of docs that match the fq, it's just a bitset. To see if
  that makes any
  sense, take a look at the admin page and the number of evictions in
  your filterCache. If
  that is  0, you're probably using all the memory you're going to in
  the filterCache during
  the day..
 
  But you haven't indicated what version of Solr you're using, I'm going
  from a
  relatively recent 3x knowledge-base.
 
  Have you put a memory analyzer against your Solr instance to see where
  the memory
  is being used?
 
  Best
  Erick
 
  On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote:
   Hi,
   I have solr index with about 25M documents. I optimized FilterCache
 size
  to
   reach the best performance (considering traffic characteristic that my
  Solr
   handles). I see that the only way to limit size of a Filter Cace is to
  set
   number of document sets that Solr can cache. There is no way to set
  memory
   limit (eg. 2GB, 4GB or something like that). When I process a standard
   trafiic (during day) everything is fine. But when Solr handle night
  traffic
   (and the charateristic of requests change) some problems appear.
 There is
   JVM out of memory error. I know what is the reason. Some filters on
 some
   fields are quite poor filters. They returns 15M of documents or even
  more.
   You could say 'Just put that into q'. I tried to put that filters into
   Query part but then, the statistics of request processing time
 (during
   day) become much worse. Reduction of Filter Cache maxSize is also not
  good
   solution because during day cache filters are very very helpful.
   You could be interested in type of filters that I use. These are range
   filters (I tried standard range filters and frange) - eg. price:[* TO
   1]. Some fq with price can return few thousands of results (eg.
   price:[40 TO 50]), but some (eg. price:[* TO 1]) can return
 milions
  of
   documents. I'd also like to avoid solution which will introduce strict
   ranges that user can choose.
   Have you any suggestions what can I do? Is there any way to limit for
   example maximum size of docSet which is cached in FilterCache?
  
   --
   Pawel