Re: SolrJ dependencies

2012-06-13 Thread Thijs

Done see:
https://issues.apache.org/jira/browse/SOLR-3541

On 12-6-2012 18:39, Sami Siren wrote:

On Tue, Jun 12, 2012 at 4:22 PM, Thijs vonk.th...@gmail.com wrote:

Hi
I just checked out and build solrlucene from branches/lucene_4x

I wanted to upgrade my custom client to this new version (using solrj).
So I copied lucene/solr/dist/apache-solr-solrj-4.0-SNAPSHOT.jar 
  lucene/solr/dist/apache-solr-core-4.0-SNAPSHOT.jar to my project and I
updated the other libs from the libs in /solr/dist/solrj-lib

However, when I wanted to run my client I got exceptions indicating that I
was missing the HTTPClient jars. (httpclient, htpcore,httpmime)
Shouldn't those go into lucene/solr/dist/solrj-lib as wel?

Yes they should.


Do I need to create a ticket for this?

Please do so.

--
  Sami Siren





Re: Solr PHP highload search

2012-06-13 Thread Erick Erickson
How much memory are you giving the JVM? Have you put a performance
monitor on the running process to see what resources have been
exhausted (i.e. are you I/O bound? CPU bound?)

Best
Erick

On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
bocharov.alexa...@gmail.com wrote:
 Hi, all.

 I need advice for configuring Solr search to use at highload production.

 I've wrote user's search engine (PHP class), that uses over 70 parameters
 for searching users.
 User's database is over 30 millions records.
 Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
 Previous search engine can handle 700,000 queries per day for searching
 users - it is ~8 queries/sec (4 mysql servers with manual sharding via
 Gearman)

 Example of queries are:

 [responseHeader] = SolrObject Object
        (
            [status] = 0
            [QTime] = 517
            [params] = SolrObject Object
                (
                    [bq] = Array
                        (
                            [0] = bool_field1:1^30
                            [1] = str_field1:str_value1^15
                            [2] = tint_field1:tint_field1^5
                            [3] = bool_field2:1^6
                            [4] = date_field1:[NOW-14DAYS TO NOW]^20
                            [5] = date_field2:[NOW-14DAYS TO NOW]^5
                        )

                    [indent] = on
                    [start] = 0
                    [q.alt] = *:*
                    [wt] = xml
                    [fq] = Array
                        (
                            [0] = tint_field2:[tint_value2 TO tint_value22]
                            [1] = str_field1:str_value1
                            [2] = str_field2:str_value2
                            [3] = tint_field3:(tint_value3 OR tint_value32
 OR tint_value33 OR tint_value34 OR tint_value5)
                            [4] = tint_field4:tint_value4
                            [5] = -bool_field1:[* TO *]
                        )

                    [version] = 2.2
                    [defType] = dismax
                    [rows] = 10
                )

        )


 I test my PHP search API and found that concurrent random queries, for
 example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
 at 2 nodes.

 1. How can I tweak my queries or parameters or Solr's config to decrease
 QTime?
 2. What if I put my index data to emulated RAM directory, can it increase
 greatly performance?
 3. Sorting by boost queries has a great influence on QTime, how can I
 optimize boost queries?
 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes
 per machine, will it increase performance?
 5. What is multi-core query, how can I configure it, and will it increase
 performance?

 Thank you!


Re: Solr PHP highload search

2012-06-13 Thread Alexandr Bocharov
Thank you for help :)

I'm giving 2048M the JVM for each node.
CPU load is jumping 70-90%.
Memory usage is increasing to max during testing (probably cache is
filling).
I/O I didn't monitor.

I'd like to see answers on my other questions.

2012/6/13 Erick Erickson erickerick...@gmail.com

 How much memory are you giving the JVM? Have you put a performance
 monitor on the running process to see what resources have been
 exhausted (i.e. are you I/O bound? CPU bound?)

 Best
 Erick

 On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
 bocharov.alexa...@gmail.com wrote:
  Hi, all.
 
  I need advice for configuring Solr search to use at highload production.
 
  I've wrote user's search engine (PHP class), that uses over 70 parameters
  for searching users.
  User's database is over 30 millions records.
  Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
  Previous search engine can handle 700,000 queries per day for searching
  users - it is ~8 queries/sec (4 mysql servers with manual sharding via
  Gearman)
 
  Example of queries are:
 
  [responseHeader] = SolrObject Object
 (
 [status] = 0
 [QTime] = 517
 [params] = SolrObject Object
 (
 [bq] = Array
 (
 [0] = bool_field1:1^30
 [1] = str_field1:str_value1^15
 [2] = tint_field1:tint_field1^5
 [3] = bool_field2:1^6
 [4] = date_field1:[NOW-14DAYS TO NOW]^20
 [5] = date_field2:[NOW-14DAYS TO NOW]^5
 )
 
 [indent] = on
 [start] = 0
 [q.alt] = *:*
 [wt] = xml
 [fq] = Array
 (
 [0] = tint_field2:[tint_value2 TO
 tint_value22]
 [1] = str_field1:str_value1
 [2] = str_field2:str_value2
 [3] = tint_field3:(tint_value3 OR
 tint_value32
  OR tint_value33 OR tint_value34 OR tint_value5)
 [4] = tint_field4:tint_value4
 [5] = -bool_field1:[* TO *]
 )
 
 [version] = 2.2
 [defType] = dismax
 [rows] = 10
 )
 
 )
 
 
  I test my PHP search API and found that concurrent random queries, for
  example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
  at 2 nodes.
 
  1. How can I tweak my queries or parameters or Solr's config to decrease
  QTime?
  2. What if I put my index data to emulated RAM directory, can it increase
  greatly performance?
  3. Sorting by boost queries has a great influence on QTime, how can I
  optimize boost queries?
  4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3
 nodes
  per machine, will it increase performance?
  5. What is multi-core query, how can I configure it, and will it
 increase
  performance?
 
  Thank you!



Re: Exception when optimizing index

2012-06-13 Thread Robert Muir
On Thu, Jun 7, 2012 at 5:50 AM, Rok Rejc rokrej...@gmail.com wrote:
   - java.runtime.nameOpenJDK Runtime Environment
   - java.runtime.version1.6.0_22-b22
...

 As far as I see from the JIRA issue I have the patch attached (as mentioned
 I have a trunk version from May 12). Any ideas?


its not guaranteed that the patch will workaround all hotspot bugs
related to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921

Since you can reproduce, is it possible for you to re-test the
scenario with a newer JVM (e.g. 1.7.0_04) just to rule that out?

-- 
lucidimagination.com


Re: Solr PHP highload search

2012-06-13 Thread Erick Erickson
Consider just looking at it with jconsole (should be in your Java release) to
get a sense of the memory usage/collection. How much physical memory
do you have overall?

Because this is not what  I'd expect. Your CPU load is actually reasonably high,
so it doesn't look like you're swapping.

By and large, trying to use RAMDirectories isn't a good solution, between the OS
and Solr, they read the necessary parts of your index into memory and use that.

Best
Erick

On Wed, Jun 13, 2012 at 7:13 AM, Alexandr Bocharov
bocharov.alexa...@gmail.com wrote:
 Thank you for help :)

 I'm giving 2048M the JVM for each node.
 CPU load is jumping 70-90%.
 Memory usage is increasing to max during testing (probably cache is
 filling).
 I/O I didn't monitor.

 I'd like to see answers on my other questions.

 2012/6/13 Erick Erickson erickerick...@gmail.com

 How much memory are you giving the JVM? Have you put a performance
 monitor on the running process to see what resources have been
 exhausted (i.e. are you I/O bound? CPU bound?)

 Best
 Erick

 On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
 bocharov.alexa...@gmail.com wrote:
  Hi, all.
 
  I need advice for configuring Solr search to use at highload production.
 
  I've wrote user's search engine (PHP class), that uses over 70 parameters
  for searching users.
  User's database is over 30 millions records.
  Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
  Previous search engine can handle 700,000 queries per day for searching
  users - it is ~8 queries/sec (4 mysql servers with manual sharding via
  Gearman)
 
  Example of queries are:
 
  [responseHeader] = SolrObject Object
         (
             [status] = 0
             [QTime] = 517
             [params] = SolrObject Object
                 (
                     [bq] = Array
                         (
                             [0] = bool_field1:1^30
                             [1] = str_field1:str_value1^15
                             [2] = tint_field1:tint_field1^5
                             [3] = bool_field2:1^6
                             [4] = date_field1:[NOW-14DAYS TO NOW]^20
                             [5] = date_field2:[NOW-14DAYS TO NOW]^5
                         )
 
                     [indent] = on
                     [start] = 0
                     [q.alt] = *:*
                     [wt] = xml
                     [fq] = Array
                         (
                             [0] = tint_field2:[tint_value2 TO
 tint_value22]
                             [1] = str_field1:str_value1
                             [2] = str_field2:str_value2
                             [3] = tint_field3:(tint_value3 OR
 tint_value32
  OR tint_value33 OR tint_value34 OR tint_value5)
                             [4] = tint_field4:tint_value4
                             [5] = -bool_field1:[* TO *]
                         )
 
                     [version] = 2.2
                     [defType] = dismax
                     [rows] = 10
                 )
 
         )
 
 
  I test my PHP search API and found that concurrent random queries, for
  example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
  at 2 nodes.
 
  1. How can I tweak my queries or parameters or Solr's config to decrease
  QTime?
  2. What if I put my index data to emulated RAM directory, can it increase
  greatly performance?
  3. Sorting by boost queries has a great influence on QTime, how can I
  optimize boost queries?
  4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3
 nodes
  per machine, will it increase performance?
  5. What is multi-core query, how can I configure it, and will it
 increase
  performance?
 
  Thank you!



Re: Sharding in SolrCloud

2012-06-13 Thread Lenzner
Mark Miller markrmil...@gmail.com schrieb am 12.06.2012 19:19:01:
 
 
 On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote:
 
  Hello,
  
  we tested SolrCloud in a setup with one collection, two shards and one 

  replica per shard and it works quite fine with some example data. 
  Now, we plan to set up our own collection and determine in how many 
shards 
  we should devide it. 
  We can estimate quite exactly the size of the collection, but we don't 

  know, what the best approach for sharding is, 
  even if we know the size and the amount of queries and updates.
  Is there any documentation or a kind of design guidelines for sharding 
a 
  collection in SolrCloud?
  
  
  Thanks  regards,
  Norman Lenzner
 
 
 It's hard to tell - I think you want to start with an idea of how 
 many docs you can fit on a single node. This can vary wildly 
 depending on many factors. Generally you have to do some testing 
 with your particular config and data. You can search the mailing 
 lists and perhaps dig up a little info, but there is really no 
 replacement for running some tests with real data.
 
 Then you have to plan in your growth rate - resharding is naturally 
 a relatively expensive operation. Once you have an idea of how many 
 docs per machine you think seems comfortable, figure out how 
 machines you need given your estimated doc growth rate and perhaps 
 some padding. You might not get it right, but if you expect the 
 possibility of a lot of growth, erring on the more shards side is 
 obviously better.
 
 - Mark Miller
 lucidimagination.com
 

Hello and thanks for your reply,

We will run some tests to determine the size of our collection, but I 
think, there
won't be the need of a second shard at all. The problem is not the size or 
the growth of
the docs, but there will be a quite high update frequency. So, if we have 
many bulk updates, is
it reasonable to distribute the update load on multiple shards?

Thanks  regards,
Norman Lenzner

Re: Different sort for each facet

2012-06-13 Thread Christopher Gross
Hmm, it seems that if I leave off the initial facet.sort=index then
it will sort each by index by default, and I can use the
f.people.facet.sort=count as expected.

I thought I tried that yesterday, but I suppose it slipped my mind in
my sleep-deprived state.

Thanks Jack!

-- Chris


On Tue, Jun 12, 2012 at 10:58 PM, Jack Krupansky
j...@basetechnology.com wrote:
 f.people.facet.sort=count should work.

 Make sure you don't have a conflicting setting for that same field and
 attribute.

 Does the people facet sort by count correctly with f.sort=index?

 What are the attributes and field type for the people field?

 -- Jack Krupansky

 -Original Message- From: Christopher Gross
 Sent: Tuesday, June 12, 2012 11:05 AM
 To: solr-user
 Subject: Different sort for each facet


 In Solr 3.4, is there a way I can sort two facets differently in the same
 query?

 If I have:

 http://mysolrsrvr/solr/select?q=*:*facet=truefacet.field=peoplefacet.field=category

 is there a way that I can sort people by the count and category by the
 name all in one query?  Or do I need to do that in separate queries?
 I tried using f.people.facet.sort=count while also having
 facet.sort=index but both came back in alphabetical order.

 Doing more queries is OK, I'm just trying to avoid having to do too many.

 -- Chris


LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge
Hi,

am struggling around with creating multiple collections on a 4 instances
SolrCloud
setup:

I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each
and
on one is also a standalone Zookeeper running.

Loading the Solr configuration into ZK works fine.

Then I startup the 4 instances and everything is also running smoothly.

After that I am adding one core with the name e.g. '123'.

This core is correctly visible on the instance I have used for creating it.

it maps like

'123'  shard1 - virtual-instance-1


After that I am creating a core with the same name '123' on the second
instance and it
creates it, but an exception is thrown after some while and the cluster
state of
the newly created core goes to 'recovering'


  *123:{shard1:{
  virtual-instance-1:8983_solr_123:{
shard:shard1,
roles:null,
leader:true,
state:active,
core:123,
collection:123,
node_name:virtual-instance-1:8983_solr,
base_url:http://virtual-instance-1:8983/solr},
  **virtual-instance-2**:8983_solr_123:{*
*shard:shard1,
roles:null,
state:recovering,
core:123,
collection:123,
node_name:virtual-instance-2:8983_solr,
base_url:http://virtual-instance-2:8983/solr}}},*


The exception throws is on the first virtual instance:

*Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
*SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
* at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
* at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
* at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
* at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
*
* at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
*
* at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
*
* at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
*
* at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
*
* at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
*
* at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
*
* at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
*
* at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
*
* at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
*
* at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
* at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
*
* at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
*
* at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)*
* at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
*
* at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)*
* at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
*
* at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
*
* at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
* at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
*
* at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
*
* at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
*
* at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
*
* at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
*
* at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
*
* at org.eclipse.jetty.server.Server.handle(Server.java:351)*
* at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
*
* at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
*
* at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
*
* at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
*
* at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)*
* at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)*
* at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
*
* at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge
BTW: i am running the solr instances using -Xms512M -Xmx1024M

so not so little memory.

Daniel

On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Hi,

 am struggling around with creating multiple collections on a 4 instances
 SolrCloud
 setup:

 I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
 each and
 on one is also a standalone Zookeeper running.

 Loading the Solr configuration into ZK works fine.

 Then I startup the 4 instances and everything is also running smoothly.

 After that I am adding one core with the name e.g. '123'.

 This core is correctly visible on the instance I have used for creating
 it.

 it maps like

 '123'  shard1 - virtual-instance-1


 After that I am creating a core with the same name '123' on the second
 instance and it
 creates it, but an exception is thrown after some while and the cluster
 state of
 the newly created core goes to 'recovering'


   *123:{shard1:{
   virtual-instance-1:8983_solr_123:{
 shard:shard1,
 roles:null,
 leader:true,
 state:active,
 core:123,
 collection:123,
 node_name:virtual-instance-1:8983_solr,
 base_url:http://virtual-instance-1:8983/solr},
   **virtual-instance-2**:8983_solr_123:{*
 *shard:shard1,
 roles:null,
 state:recovering,
 core:123,
 collection:123,
 node_name:virtual-instance-2:8983_solr,
 base_url:http://virtual-instance-2:8983/solr}}},*


 The exception throws is on the first virtual instance:

 *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
 *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
 * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
 * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
 * at
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
 * at
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
 *
 * at
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
 *
 * at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
 *
 * at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
 *
 * at
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 *
 * at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
 *
 * at
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
 *
 * at
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 *
 * at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 *
 * at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 *
 * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
 * at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 *
 * at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 *
 * at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 *
 * at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 *
 * at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
 * at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 *
 * at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
 *
 * at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
 *
 * at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
 *
 * at org.eclipse.jetty.server.Server.handle(Server.java:351)*
 * at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
 *
 * at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
 *
 * at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
 *
 * at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
 *
 * at 

Re: Different sort for each facet

2012-06-13 Thread Jack Krupansky
I'm glad that you have something working, but you shouldn't have to remove 
that facet.sort=index.


I tried the following and it works with the Solr 3.6 example after I indexed 
with exampledocs/books.json:


http://localhost:8983/solr/select/?q=*:*facet=truefacet.field=namefacet.field=genre_sfacet.sort=indexf.name.facet.sort=count

I see the name field sorted by count and the genre_s field sorted by lexical 
order (note: IT comes before fantasy because upper case comes before 
lower case - it would be nice to have a case-neutral sort.)


Could you try it, just to see if maybe we are not communicating about what 
exactly is not working for you?


What release of Solr are you using? I am not aware of any fixes/changes that 
would make this behave differently as of 3.6.


BTW, the default sort is index IFF facet.limit = 0. The default for 
facet.limit is 100, so sort should default to count. I presume you have 
facet.limit set to -1 or 0.


You might also check to see what facet parameters might be set in your 
request handler as opposed to on the actual query request.


-- Jack Krupansky

-Original Message- 
From: Christopher Gross

Sent: Wednesday, June 13, 2012 9:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Different sort for each facet

Hmm, it seems that if I leave off the initial facet.sort=index then
it will sort each by index by default, and I can use the
f.people.facet.sort=count as expected.

I thought I tried that yesterday, but I suppose it slipped my mind in
my sleep-deprived state.

Thanks Jack!

-- Chris


On Tue, Jun 12, 2012 at 10:58 PM, Jack Krupansky
j...@basetechnology.com wrote:

f.people.facet.sort=count should work.

Make sure you don't have a conflicting setting for that same field and
attribute.

Does the people facet sort by count correctly with f.sort=index?

What are the attributes and field type for the people field?

-- Jack Krupansky

-Original Message- From: Christopher Gross
Sent: Tuesday, June 12, 2012 11:05 AM
To: solr-user
Subject: Different sort for each facet


In Solr 3.4, is there a way I can sort two facets differently in the same
query?

If I have:

http://mysolrsrvr/solr/select?q=*:*facet=truefacet.field=peoplefacet.field=category

is there a way that I can sort people by the count and category by the
name all in one query?  Or do I need to do that in separate queries?
I tried using f.people.facet.sort=count while also having
facet.sort=index but both came back in alphabetical order.

Doing more queries is OK, I'm just trying to avoid having to do too many.

-- Chris 




Re: [DIH] Multiple repeat XPath stmts

2012-06-13 Thread alesp
TNX. A lifesaver...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Multiple-repeat-XPath-stmts-tp499770p3989439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting maximum / minimum field value - slow query

2012-06-13 Thread rafal.gwizd...@gmail.com
What is more, I tried to get the maximum value using stats query
This time the response time was about 30 seconds and server ate 1.5 Gb of
memory when calculating the response. But there were no statistics in
response:

response
lst name=responseHeader
int name=status0/int
int name=QTime27578/int
lst name=params
str name=q*.*/str
str name=statstrue/str
str name=stats.fieldId/str
str name=rows0/str
/lst
/lst
result name=response numFound=0 start=0/
lst name=stats
lst name=stats_fields
null name=Id/
/lst
/lst
/response

What's wrong here?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467p3989468.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Mark Miller
Thats an interesting data dir location: NativeFSLock@/home/myuser/
data/index/write.lock

Where are the other data dirs located? Are you sharing one drive or
something? It looks like something already has a writer lock - are you sure
another solr instance is not running somehow?

On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 BTW: i am running the solr instances using -Xms512M -Xmx1024M

 so not so little memory.

 Daniel

 On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
 daniel.brue...@googlemail.com wrote:

  Hi,
 
  am struggling around with creating multiple collections on a 4 instances
  SolrCloud
  setup:
 
  I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
  each and
  on one is also a standalone Zookeeper running.
 
  Loading the Solr configuration into ZK works fine.
 
  Then I startup the 4 instances and everything is also running smoothly.
 
  After that I am adding one core with the name e.g. '123'.
 
  This core is correctly visible on the instance I have used for creating
  it.
 
  it maps like
 
  '123'  shard1 - virtual-instance-1
 
 
  After that I am creating a core with the same name '123' on the second
  instance and it
  creates it, but an exception is thrown after some while and the cluster
  state of
  the newly created core goes to 'recovering'
 
 
*123:{shard1:{
virtual-instance-1:8983_solr_123:{
  shard:shard1,
  roles:null,
  leader:true,
  state:active,
  core:123,
  collection:123,
  node_name:virtual-instance-1:8983_solr,
  base_url:http://virtual-instance-1:8983/solr},
**virtual-instance-2**:8983_solr_123:{*
  *shard:shard1,
  roles:null,
  state:recovering,
  core:123,
  collection:123,
  node_name:virtual-instance-2:8983_solr,
  base_url:http://virtual-instance-2:8983/solr}}},*
 
 
  The exception throws is on the first virtual instance:
 
  *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
  *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
  obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
  * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
  * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
  * at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
  * at
 
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
  *
  * at
 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
  *
  * at
 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
  *
  * at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
  *
  * at
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
  *
  * at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
  *
  * at
 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
  *
  * at
 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
  *
  * at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
  *
  * at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  *
  * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
  * at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
  *
  * at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
  *
  * at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
  *
  * at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
  *
  * at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
  *
  * at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
  *
  * at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
  *
  * at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
  *
  * at
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
  * at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
  *
  * at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
  *
  * at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
  *
  * at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
  *
  * at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
  *
  * at
 
 

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Casey Callendrello
What command are you using to create the cores?

I had this sort of problem, and it was because I'd accidentally created
two cores with the same instanceDir within the same SOLR process. Make
sure you don't have that kind of collision. The easiest way is to
specify an explicit instanceDir and dataDir.

Best,
Casey Callendrello


On 6/13/12 7:28 AM, Daniel Brügge wrote:
 Hi,

 am struggling around with creating multiple collections on a 4 instances
 SolrCloud
 setup:

 I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each
 and
 on one is also a standalone Zookeeper running.

 Loading the Solr configuration into ZK works fine.

 Then I startup the 4 instances and everything is also running smoothly.

 After that I am adding one core with the name e.g. '123'.

 This core is correctly visible on the instance I have used for creating it.

 it maps like

 '123'  shard1 - virtual-instance-1


 After that I am creating a core with the same name '123' on the second
 instance and it
 creates it, but an exception is thrown after some while and the cluster
 state of
 the newly created core goes to 'recovering'


   *123:{shard1:{
   virtual-instance-1:8983_solr_123:{
 shard:shard1,
 roles:null,
 leader:true,
 state:active,
 core:123,
 collection:123,
 node_name:virtual-instance-1:8983_solr,
 base_url:http://virtual-instance-1:8983/solr},
   **virtual-instance-2**:8983_solr_123:{*
 *shard:shard1,
 roles:null,
 state:recovering,
 core:123,
 collection:123,
 node_name:virtual-instance-2:8983_solr,
 base_url:http://virtual-instance-2:8983/solr}}},*


 The exception throws is on the first virtual instance:

 *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
 *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
 * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
 * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
 * at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
 * at
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
 *
 * at
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
 *
 * at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
 *
 * at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
 *
 * at
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 *
 * at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
 *
 * at
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
 *
 * at
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 *
 * at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 *
 * at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 *
 * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
 * at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 *
 * at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)*
 * at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 *
 * at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)*
 * at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
 * at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 *
 * at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
 *
 * at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
 *
 * at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
 *
 * at org.eclipse.jetty.server.Server.handle(Server.java:351)*
 * at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
 *
 * at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
 *
 * at
 

Re: Getting maximum / minimum field value - slow query

2012-06-13 Thread Jack Krupansky
Try the query without the sort to get the number of rows, then do a second 
query using a start equal to the number of rows. That should get you the 
last row/document.


-- Jack Krupansky

-Original Message- 
From: rafal.gwizd...@gmail.com

Sent: Wednesday, June 13, 2012 3:07 PM
To: solr-user@lucene.apache.org
Subject: Getting maximum / minimum field value - slow query

Hi, I have an index with about 9 millions of documents. Every document has 
an

integer 'Id' field (it's not the SOLR document identifier) and I want to get
the maximum value of that field.
Therefore I'm doing a search with the following parameters
query=*.*, sort=Id desc, rows=1

response
lst name=responseHeader
int name=status0/int
int name=QTime2672/int
lst name=params
str name=q*:*/str
str name=rows1/str
str name=sortId desc/str
/lst
/lst
result name=response numFound=8747779 start=0
doc
str name=UidCRQIncident#45165891/str
/doc
/result
/response

The problem is that it takes quite a long time to get the response (2-10
seconds). Why is it so slow - isn't it a simple index lookup?

Best regards
RG

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr1.4 and threads ....

2012-06-13 Thread Benson Margulies
We've got a tokenizer which is quite explicitly coded on the
assumption that it will only be called from one thread at a time.
After all, what would it mean for two threads to make interleaved
calls to the hasNext() function()?

Yet, a customer of ours with a gigantic instance of Solr 1.4 reports
incidents in which we throw an exception that indicates (we think),
that two different threads made interleaved calls.

Does this suggest anything to anyone? Other than that we've
misanalyzed the logic in the tokenizer and there's a way to make it
burp on one thread?


Re: Sharding in SolrCloud

2012-06-13 Thread Erick Erickson
Hmmm, are you sure SolrCloud fits your needs? You say that you think
everything will fit on one shard and are worried about bulk updates. In
that case I should think regular Solr master/slave (rather than cloud)
might be a better fit. Using Cloud and all that goes with it for a single shard
is certainly possible, but I question whether it's your best option here

Of course if NRT is a requirement, then SolrCloud is a much better option

With typical master/slave setups, since your bulk updates are happening on
a separate machine, having multiple slaves that query at a given interval
seems like it would work, but you'd have to be able to stand, say, 5-10 minute
latency...

Best
Erick

On Wed, Jun 13, 2012 at 7:47 AM,  lenz...@gfi.ihk.de wrote:
 Mark Miller markrmil...@gmail.com schrieb am 12.06.2012 19:19:01:


 On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote:

  Hello,
 
  we tested SolrCloud in a setup with one collection, two shards and one

  replica per shard and it works quite fine with some example data.
  Now, we plan to set up our own collection and determine in how many
 shards
  we should devide it.
  We can estimate quite exactly the size of the collection, but we don't

  know, what the best approach for sharding is,
  even if we know the size and the amount of queries and updates.
  Is there any documentation or a kind of design guidelines for sharding
 a
  collection in SolrCloud?
 
 
  Thanks  regards,
  Norman Lenzner


 It's hard to tell - I think you want to start with an idea of how
 many docs you can fit on a single node. This can vary wildly
 depending on many factors. Generally you have to do some testing
 with your particular config and data. You can search the mailing
 lists and perhaps dig up a little info, but there is really no
 replacement for running some tests with real data.

 Then you have to plan in your growth rate - resharding is naturally
 a relatively expensive operation. Once you have an idea of how many
 docs per machine you think seems comfortable, figure out how
 machines you need given your estimated doc growth rate and perhaps
 some padding. You might not get it right, but if you expect the
 possibility of a lot of growth, erring on the more shards side is
 obviously better.

 - Mark Miller
 lucidimagination.com


 Hello and thanks for your reply,

 We will run some tests to determine the size of our collection, but I
 think, there
 won't be the need of a second shard at all. The problem is not the size or
 the growth of
 the docs, but there will be a quite high update frequency. So, if we have
 many bulk updates, is
 it reasonable to distribute the update load on multiple shards?

 Thanks  regards,
 Norman Lenzner


Re: FilterCache - maximum size of document set

2012-06-13 Thread Erick Erickson
Hmmm, I think you may be looking at the wrong thing here. Generally, a
filterCache
entry will be maxDocs/8 (plus some overhead), so in your case they really
shouldn't be all that large, on the order of 3M/filter. That shouldn't
vary based
on the number of docs that match the fq, it's just a bitset. To see if
that makes any
sense, take a look at the admin page and the number of evictions in
your filterCache. If
that is  0, you're probably using all the memory you're going to in
the filterCache during
the day..

But you haven't indicated what version of Solr you're using, I'm going from a
relatively recent 3x knowledge-base.

Have you put a memory analyzer against your Solr instance to see where
the memory
is being used?

Best
Erick

On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote:
 Hi,
 I have solr index with about 25M documents. I optimized FilterCache size to
 reach the best performance (considering traffic characteristic that my Solr
 handles). I see that the only way to limit size of a Filter Cace is to set
 number of document sets that Solr can cache. There is no way to set memory
 limit (eg. 2GB, 4GB or something like that). When I process a standard
 trafiic (during day) everything is fine. But when Solr handle night traffic
 (and the charateristic of requests change) some problems appear. There is
 JVM out of memory error. I know what is the reason. Some filters on some
 fields are quite poor filters. They returns 15M of documents or even more.
 You could say 'Just put that into q'. I tried to put that filters into
 Query part but then, the statistics of request processing time (during
 day) become much worse. Reduction of Filter Cache maxSize is also not good
 solution because during day cache filters are very very helpful.
 You could be interested in type of filters that I use. These are range
 filters (I tried standard range filters and frange) - eg. price:[* TO
 1]. Some fq with price can return few thousands of results (eg.
 price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of
 documents. I'd also like to avoid solution which will introduce strict
 ranges that user can choose.
 Have you any suggestions what can I do? Is there any way to limit for
 example maximum size of docSet which is cached in FilterCache?

 --
 Pawel


Re: Solr1.4 and threads ....

2012-06-13 Thread Robert Muir
On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies bimargul...@gmail.com wrote:

 Does this suggest anything to anyone? Other than that we've
 misanalyzed the logic in the tokenizer and there's a way to make it
 burp on one thread?

it might suggest the different tokenstream instances refer to some
shared object that is not thread safe: we had bugs like this before
(e.g. sharing a JDK collator is ok, but ICU ones are not thread-safe,
so you must clone them).

Because of this we beefed up our base analysis class
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java)
to find thread safety bugs like this.

I recommend just grabbing the test-framework.jar (we release it as an
artifact), extend that class and write a test like:
  public void testRandomStrings() throws Exception {
checkRandomData(random, analyzer, 10);
  }

(or use the one in the branch, its even been improved since 3.6)

-- 
lucidimagination.com


Re: Getting maximum / minimum field value - slow query

2012-06-13 Thread Erik Hatcher
A large start value is probably worse performing than the sort (see SOLR-1726). 
 Once the sort field is cached, it'll be quick from then on.  Put in a warming 
query in solrconfig for new and/or firstSearcher that does this sort and the 
cache will be built in advance of queries at least.

Erik

On Jun 13, 2012, at 16:09 , Jack Krupansky wrote:

 Try the query without the sort to get the number of rows, then do a second 
 query using a start equal to the number of rows. That should get you the 
 last row/document.
 
 -- Jack Krupansky
 
 -Original Message- From: rafal.gwizd...@gmail.com
 Sent: Wednesday, June 13, 2012 3:07 PM
 To: solr-user@lucene.apache.org
 Subject: Getting maximum / minimum field value - slow query
 
 Hi, I have an index with about 9 millions of documents. Every document has an
 integer 'Id' field (it's not the SOLR document identifier) and I want to get
 the maximum value of that field.
 Therefore I'm doing a search with the following parameters
 query=*.*, sort=Id desc, rows=1
 
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime2672/int
 lst name=params
 str name=q*:*/str
 str name=rows1/str
 str name=sortId desc/str
 /lst
 /lst
 result name=response numFound=8747779 start=0
 doc
 str name=UidCRQIncident#45165891/str
 /doc
 /result
 /response
 
 The problem is that it takes quite a long time to get the response (2-10
 seconds). Why is it so slow - isn't it a simple index lookup?
 
 Best regards
 RG
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467.html
 Sent from the Solr - User mailing list archive at Nabble.com. 



Re: FilterCache - maximum size of document set

2012-06-13 Thread Pawel Rog
Thanks for your response
Yes, maybe you are right. I thought that filters can be larger than 3M. All
kinds of filters uses BitSet?
Moreover maxSize of filterCache is set to 16000 in my case. There are
evictions during day traffic
but not during night traffic.

Version of Solr which I use is 3.5

I haven't used Memory Anayzer yet. Could you write more details about it?

--
Regards,
Pawel

On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I think you may be looking at the wrong thing here. Generally, a
 filterCache
 entry will be maxDocs/8 (plus some overhead), so in your case they really
 shouldn't be all that large, on the order of 3M/filter. That shouldn't
 vary based
 on the number of docs that match the fq, it's just a bitset. To see if
 that makes any
 sense, take a look at the admin page and the number of evictions in
 your filterCache. If
 that is  0, you're probably using all the memory you're going to in
 the filterCache during
 the day..

 But you haven't indicated what version of Solr you're using, I'm going
 from a
 relatively recent 3x knowledge-base.

 Have you put a memory analyzer against your Solr instance to see where
 the memory
 is being used?

 Best
 Erick

 On Wed, Jun 13, 2012 at 1:05 PM, Pawel pawelmis...@gmail.com wrote:
  Hi,
  I have solr index with about 25M documents. I optimized FilterCache size
 to
  reach the best performance (considering traffic characteristic that my
 Solr
  handles). I see that the only way to limit size of a Filter Cace is to
 set
  number of document sets that Solr can cache. There is no way to set
 memory
  limit (eg. 2GB, 4GB or something like that). When I process a standard
  trafiic (during day) everything is fine. But when Solr handle night
 traffic
  (and the charateristic of requests change) some problems appear. There is
  JVM out of memory error. I know what is the reason. Some filters on some
  fields are quite poor filters. They returns 15M of documents or even
 more.
  You could say 'Just put that into q'. I tried to put that filters into
  Query part but then, the statistics of request processing time (during
  day) become much worse. Reduction of Filter Cache maxSize is also not
 good
  solution because during day cache filters are very very helpful.
  You could be interested in type of filters that I use. These are range
  filters (I tried standard range filters and frange) - eg. price:[* TO
  1]. Some fq with price can return few thousands of results (eg.
  price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions
 of
  documents. I'd also like to avoid solution which will introduce strict
  ranges that user can choose.
  Have you any suggestions what can I do? Is there any way to limit for
  example maximum size of docSet which is cached in FilterCache?
 
  --
  Pawel



Regarding number of documents

2012-06-13 Thread Swetha Shenoy
Hi,

I have a data config file that contains the data import query. If I just
run the import query against MySQL, I get a certain number of results. I
assume that if I run the full-import, I should get the same number of
documents added to the index, but I see that it's not the case and the
number of documents added to the index are less than what I see from the
MySQL query result. Can any one tell me if my assumption is correct and why
the number of documents would be off?

Thanks,
Swetha


Re: Regarding number of documents

2012-06-13 Thread Swetha Shenoy
Note: I don't see any errors in the logs when I run the index.

On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote:

 Hi,

 I have a data config file that contains the data import query. If I just
 run the import query against MySQL, I get a certain number of results. I
 assume that if I run the full-import, I should get the same number of
 documents added to the index, but I see that it's not the case and the
 number of documents added to the index are less than what I see from the
 MySQL query result. Can any one tell me if my assumption is correct and why
 the number of documents would be off?

 Thanks,
 Swetha



Re: Regarding number of documents

2012-06-13 Thread Afroz Ahmad
Could it be that you are getting records that are not unique. If so then
SOLR would just overwrite the non unique documents.

Thanks
Afroz

On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote:

 Note: I don't see any errors in the logs when I run the index.

 On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote:

  Hi,
 
  I have a data config file that contains the data import query. If I just
  run the import query against MySQL, I get a certain number of results. I
  assume that if I run the full-import, I should get the same number of
  documents added to the index, but I see that it's not the case and the
  number of documents added to the index are less than what I see from the
  MySQL query result. Can any one tell me if my assumption is correct and
 why
  the number of documents would be off?
 
  Thanks,
  Swetha
 



Re: Regarding number of documents

2012-06-13 Thread Swetha Shenoy
That makes sense. But I added a new entry that showed up in the MySQL
results and not in the Solr search results. The count of documents also did
not increase after the addition. How can a new entry show up in MySQL
results and not as a new document?

On Wed, Jun 13, 2012 at 6:26 PM, Afroz Ahmad ahmad@gmail.com wrote:

 Could it be that you are getting records that are not unique. If so then
 SOLR would just overwrite the non unique documents.

 Thanks
 Afroz

 On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote:

  Note: I don't see any errors in the logs when I run the index.
 
  On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com
 wrote:
 
   Hi,
  
   I have a data config file that contains the data import query. If I
 just
   run the import query against MySQL, I get a certain number of results.
 I
   assume that if I run the full-import, I should get the same number of
   documents added to the index, but I see that it's not the case and the
   number of documents added to the index are less than what I see from
 the
   MySQL query result. Can any one tell me if my assumption is correct and
  why
   the number of documents would be off?
  
   Thanks,
   Swetha
  
 



Re: Regarding number of documents

2012-06-13 Thread Jack Krupansky

Check the ID for that latest record and try to query it in Solr.

One way you can get multiple records in an RDBMS query is via join. In that 
case, each of the records could have the same value in the column(s) that 
you are using for your unique key field in Solr.


-- Jack Krupansky

-Original Message- 
From: Swetha Shenoy

Sent: Wednesday, June 13, 2012 7:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Regarding number of documents

That makes sense. But I added a new entry that showed up in the MySQL
results and not in the Solr search results. The count of documents also did
not increase after the addition. How can a new entry show up in MySQL
results and not as a new document?

On Wed, Jun 13, 2012 at 6:26 PM, Afroz Ahmad ahmad@gmail.com wrote:


Could it be that you are getting records that are not unique. If so then
SOLR would just overwrite the non unique documents.

Thanks
Afroz

On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote:

 Note: I don't see any errors in the logs when I run the index.

 On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com
wrote:

  Hi,
 
  I have a data config file that contains the data import query. If I
just
  run the import query against MySQL, I get a certain number of results.
I
  assume that if I run the full-import, I should get the same number of
  documents added to the index, but I see that it's not the case and the
  number of documents added to the index are less than what I see from
the
  MySQL query result. Can any one tell me if my assumption is correct 
  and

 why
  the number of documents would be off?
 
  Thanks,
  Swetha
 






Re: Regarding number of documents

2012-06-13 Thread Gora Mohanty
On 14 June 2012 04:51, Swetha Shenoy sshe...@gmail.com wrote:
 That makes sense. But I added a new entry that showed up in the MySQL
 results and not in the Solr search results. The count of documents also did
 not increase after the addition. How can a new entry show up in MySQL
 results and not as a new document?

Sorry, but this is not very clear: Are you running a
full-import, or a delta-import after adding the new
entry in mysql? By any chance, does the new entry
have an ID that already exists in the Solr index?

What is the number of records that DIH reports
after an import is completed?

Regards,
Gora


Re: Unexpected DIH behavior for onError attribute

2012-06-13 Thread Gora Mohanty
On 13 June 2012 10:45, Pranav Prakash pra...@gmail.com wrote:
 My DIH Config file goes as follows. We have two db hosts, one of which
 contains blocks of content and the other contain transcripts of those
 content blocks. The makeDynamicTranscript function is used to create row
 names like transcript_en, transcript_es and so on, which are dynamic fields
 in Solr with appropriate tokenizers.
[...]

This looks fine. Have you looked in the Solr logs
for more information? Is it possible that the error
is causing some connection issue? What is the
error exactly, and is it happening on the SELECT
in the inner entity, or on the outer one?

Regards,
Gora