Distributed search results in SocketException: Connection reset

2013-06-30 Thread Shahar Davidson
Hi all,

We're getting the below exception sporadically when using distributed search. 
(using Solr 4.2.1)
Note that 'core_3' is one of the cores mentioned in the 'shards' parameter.

Any ideas anyone?

Thanks,

Shahar.


Jun 03, 2013 5:27:38 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: http://127.0.0.1:8210/solr/core_3
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at 
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
occured when talking to server at: http://127.0.0.1:8210/solr/core_3
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
... 1 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:149)
at 

RE: Distributed search results in SocketException: Connection reset

2013-06-30 Thread Shahar Davidson
Thanks Lance.

If that is the case, are there any timeout mechanisms defined by Solr other 
than Jetty timeout definitions?

Thanks,

Shahar.

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Monday, July 01, 2013 4:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Distributed search results in SocketException: Connection reset

This usually means the end server timed out.

On 06/30/2013 06:31 AM, Shahar Davidson wrote:
 Hi all,

 We're getting the below exception sporadically when using distributed 
 search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned 
 in the 'shards' parameter.

 Any ideas anyone?

 Thanks,

 Shahar.


 Jun 03, 2013 5:27:38 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: 
 org.apache.solr.client.solrj.SolrServerException: IOException occured when 
 talking to server at: http://127.0.0.1:8210/solr/core_3
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
  at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
  at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
  at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
  at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
  at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
  at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
  at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:365)
  at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
  at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
  at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
  at 
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
  at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
  at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Unknown Source) Caused by: 
 org.apache.solr.client.solrj.SolrServerException: IOException occured when 
 talking to server at: http://127.0.0.1:8210/solr/core_3
  at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
  at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
  at 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
  at 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown 
 Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at 
 java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown 
 Source)
  at java.util.concurrent.FutureTask.run(Unknown

RE: Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

2013-04-28 Thread Shahar Davidson
Hi Steve,

Your help is much appreciated.

Turns out that all the problems that I had were proxy related. I had to 
explicitly provide the proxy configuration (host/port) to Ant. (though I 
already have been using ivy-2.3.0,  IVY-1194 was a good tip!)

That solved everything.

Thanks again,

Shahar.

-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Thursday, April 25, 2013 4:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

Hi Shahar,

I suspect you may have an older version of Ivy installed - the errors you're 
seeing look like IVY-1194 https://issues.apache.org/jira/browse/IVY-1194, 
which was fixed in Ivy 2.2.0.  Lucene/Solr uses Ivy 2.3.0.  Take a look in 
C:\Users\account\.ant\lib\ and remove older versions of ivy-*.jar, then run 
'ant ivy-bootstrap' from the Solr source code to download ivy-2.3.0.jar to 
C:\Users\account\.ant\lib\.

Just now on a Windows 7 box, I downloaded solr-4.2.1-src.tgz from one of the 
Apache mirrors, unpacked it, deleted my C:\Users\account\.ivy2\ directory (so 
that ivy would re-download everything), and ran 'ant idea' from a cmd window.  
BUILD SUCCESSFUL.

Steve

On Apr 25, 2013, at 6:07 AM, Shahar Davidson shah...@checkpoint.com wrote:

 Hi all,
 
 I'm trying to run 'ant idea' on 4.2.* and I'm getting invalid sha1 
 error messages. (see below)
 
 I'll appreciate any help,
 
 Shahar
 ===
 .
 .
 .
 resolve
 ivy:retrieve
 
 :: problems summary ::
  WARNINGS
   problem while downloading module descriptor: 
 http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
 sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (72ms)
   problem while downloading module descriptor: 
 http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom:
  invalid sha1: expected=!-- 
 computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
   problem while downloading module descriptor: 
 http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
 sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
   problem while downloading module descriptor: 
 http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.
 pom: invalid sha1: expected=!-- 
 computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms)
 
   module not found: org.apache.ant#ant;1.8.2 .
 .
 .
    public: tried
 http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
    sonatype-releases: tried
 
 http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
    maven.restlet.org: tried
 http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
    working-chinese-mirror: tried
 
 http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
   problem while downloading module descriptor: 
 http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
 expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (74ms)
   problem while downloading module descriptor: 
 http://oss.sonatype.org/content/repositories/releases/junit/junit/4.10/junit-4.10.pom:
  invalid sha1: expected=!-- 
 computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)
   problem while downloading module descriptor: 
 http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
 expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms)
   problem while downloading module descriptor: 
 http://mirror.netcologne.de/maven2/junit/junit/4.10/junit-4.10.pom: 
 invalid sha1: expected=!-- 
 computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)
 
   module not found: junit#junit;4.10
 .
 .
 .
 .
::
   ::  UNRESOLVED DEPENDENCIES ::
   ::
   :: org.apache.ant#ant;1.8.2: not found
   :: junit#junit;4.10: not found
   :: com.carrotsearch.randomizedtesting#junit4-ant;2.0.8: not 
 found
   :: 
 com.carrotsearch.randomizedtesting#randomizedtesting-runner;2.0.8: not found
   ::
 
 :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 D:\apache_solr_4.2.1\lucene\common-build.xml:348: impossible to resolve 
 dependencies:
   resolve failed - see output for details


Email secured by Check Point


Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

2013-04-25 Thread Shahar Davidson
Hi all,

I'm trying to run 'ant idea' on 4.2.* and I'm getting invalid sha1 error 
messages. (see below)

I'll appreciate any help,

Shahar
===
.
.
.
resolve
ivy:retrieve

:: problems summary ::
 WARNINGS
problem while downloading module descriptor: 
http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (72ms)
problem while downloading module descriptor: 
http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom:
 invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
(53ms)
problem while downloading module descriptor: 
http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: 
expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
problem while downloading module descriptor: 
http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: 
invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
(58ms)

module not found: org.apache.ant#ant;1.8.2
.
.
.
 public: tried
  http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
 sonatype-releases: tried
  
http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
 maven.restlet.org: tried
  http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
 working-chinese-mirror: tried
  
http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom
problem while downloading module descriptor: 
http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (74ms)
problem while downloading module descriptor: 
http://oss.sonatype.org/content/repositories/releases/junit/junit/4.10/junit-4.10.pom:
 invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 
(60ms)
problem while downloading module descriptor: 
http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms)
problem while downloading module descriptor: 
http://mirror.netcologne.de/maven2/junit/junit/4.10/junit-4.10.pom: invalid 
sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)

module not found: junit#junit;4.10
.
.
.
.
 ::
::  UNRESOLVED DEPENDENCIES ::
::
:: org.apache.ant#ant;1.8.2: not found
:: junit#junit;4.10: not found
:: com.carrotsearch.randomizedtesting#junit4-ant;2.0.8: not 
found
:: 
com.carrotsearch.randomizedtesting#randomizedtesting-runner;2.0.8: not found
::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
D:\apache_solr_4.2.1\lucene\common-build.xml:348: impossible to resolve 
dependencies:
resolve failed - see output for details


Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Shahar Davidson
Hi,

I'm trying to build Solr 4.2.x with Maven and I'm getting the following error 
in solr-core:

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1.341s
[INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
[INFO] Final Memory: 12M/174M
[INFO] 
[ERROR] Failed to execute goal on project solr-core: Could not resolve 
dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed 
to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT 
(compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), 
commons-codec:commons-codec:jar:1.7 (compile), commons-cli:commons-cli:jar:1.2 
(compile), commons-fileupload:commons-fileupload:jar:1.2.1 (compile), 
org.restlet.jee:org.restlet:jar:2.1.1 (compile), 
org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), 
org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 
(compile), commons-io:commons-io:jar:2.1 (compile), 
commons-lang:commons-lang:jar:2.6 (compile), com.google.guava:guava:jar:13.0.1 
(compile), org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), 
org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), 
org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), 
org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), 
javax.servlet:servlet-api:jar:2.4 (provided), 
org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), 
org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), 
org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed 
to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could 
not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to 
maven-restlet (http://maven.restlet.org): Not authorized, 
ReasonPhrase:Unauthorized. - [Help 1]


Has anyone encountered this issue?

Thanks,

Shahar.


RE: CoreAdmin STATUS performance

2013-01-14 Thread Shahar Davidson
Hi Stefan,

I have opened issue SOLR-4302 and attached the suggested patch.

Regards,

Shahar.

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Sunday, January 13, 2013 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

Shahar


would you mind, if i ask you to open an jira-issue for that? attaching your 
changes as typical patch?
perhaps we could use that for the UI, in those cases where we don't need to 
full set of information ..

Stefan 


On Sunday, January 13, 2013 at 12:28 PM, Shahar Davidson wrote:

 Shawn, Per and anyone else who has participated in this thread - thank you!
 
 I have finally resorted to apply a minor patch the Solr code. 
 I have noticed that most of the time of the STATUS request is spent when 
 collecting Index related info (such as segmentCount, sizeInBytes, numDocs, 
 etc.).
 In the STATUS request I added support for a new parameter which, if present, 
 will skip collection of the Index info (hence will only return general static 
 info, among it the core name) - this will, in fact, cut down the request time 
 by an order of two magnitudes!
 In my case, it decreased the request time from around 800ms to around 1ms-4ms.
 
 Regards,
 
 Shahar.
 
 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Thursday, January 10, 2013 5:14 PM
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Subject: Re: CoreAdmin STATUS performance
 
 On 1/10/2013 2:09 AM, Shahar Davidson wrote:
  As for your first question, the core info needs to be gathered upon every 
  search request because cores are created dynamically.
  When a user initiates a search request, the system must be aware of 
  all available cores in order to execute distributed search on _all_ 
  relevant cores. (the user must get reliable and most up to date data) The 
  reason that 800ms seems a lot to me is because the overall execution time 
  takes about 2500ms and a large part of it is due to the STATUS request.
  
  The minimal interval concept is a good idea and indeed we've considered 
  it, yet it poses a slight problem when building a RT system which needs to 
  return to most up to date data.
  I am just trying to understand if there's some other way to hasten 
  the STATUS reply (for example, by asking the STATUS request to 
  return just certain core attributes, such as name, instead of 
  collecting
  everything)
 
 
 
 Are there a *huge* number of SolrJ clients in the wild, or is it something 
 like a server farm where you are in control of everything? If it's the 
 latter, what I think I would do is have an asynchronous thread that 
 periodically (every few seconds) updates the client's view of what cores 
 exist. When a query is made, it will use that information, speeding up your 
 queries by 800 milliseconds and ensuring that new cores will not have long 
 delays before they become searchable. If you have a huge number of clients in 
 the wild, it would still be possible, but ensuring that those clients get 
 updated might be hard.
 
 If you also delete cores as well as add them, that complicates things. 
 You'd have to have the clients be smart enough to exclude the last core on 
 the list (by whatever sorting mechanism you require), and you'd have to wait 
 long enough (30 seconds, maybe?) before *actually* deleting the last core to 
 be sure that no clients are accessing it.
 
 Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 
 4.0. SolrCloud manages your cores for you automatically. 
 You'd probably be using a slightly customized SolrCloud, including the custom 
 hashing capability added by SOLR-2592. I don't know what other customizations 
 you might need.
 
 Thanks,
 Shawn
 
 
 Email secured by Check Point



Email secured by Check Point


RE: CoreAdmin STATUS performance

2013-01-13 Thread Shahar Davidson
Thanks for sharing this info, Per - this info may prove to be valuable for me 
in the future.

Shahar.

-Original Message-
From: Per Steffensen [mailto:st...@designware.dk] 
Sent: Thursday, January 10, 2013 6:10 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

The collections are created dynamically. Not on update though. We use one 
collection per month and we have a timer-job running (every hour or so), which 
checks if all collections that need to exist actually does exist - if not it 
creates the collection(s). The rule is that the collection for next month has 
to exist as soon as we enter current month, so the first time the timer job 
runs e.g. 1. July it will create the August-collection. We never get data with 
timestamp in the future. 
Therefore if the timer-job just gets to run once within every month we will 
always have needed collections ready.

We create collections using the new Collection API in Solr. Be used to manage 
creation of every single Shard/Replica/Core of the collections during the Core 
Admin API in Solr, but since an Collection API was introduced we decided that 
we better use that. In 4.0 it did not have the features we needed, which 
triggered SOLR-4114, SOLR-4120 and
SOLR-4140 which will be available in 4.1. With those features we are now using 
the Collection API.

BTW, our timer-job also handles deletion of old collections. In our system 
you can configure how many historic month-collection you will keep before it is 
ok to delete them. Lets say that this is configured to 3, as soon at it becomes 
1. July the timer-job will delete the March-collection (the historic 
collections to keep will just have become April-, May- and June-collections). 
This way we will always have a least
3 months of historic data, and last in a month close to 4 months of history. It 
does not matter that we have a little to much history, when we just do not go 
below the lower limit on lenght of historic data. We also use the new 
Collection API for deletion.

Regards, Per Steffensen

On 1/10/13 3:04 PM, Shahar Davidson wrote:
 Hi Per,

 Thanks for your reply!

 That's a very interesting approach.

 In your system, how are the collections created? In other words, are the 
 collections created dynamically upon an update (for example, per new day)?
 If they are created dynamically, who handles their creation (client/server)  
 and how is it done?

 I'd love to hear more about it!

 Appreciate your help,

 Shahar.

 -Original Message-
 From: Per Steffensen [mailto:st...@designware.dk]
 Sent: Thursday, January 10, 2013 1:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: CoreAdmin STATUS performance

 On 1/10/13 10:09 AM, Shahar Davidson wrote:
 search request, the system must be aware of all available cores in 
 order to execute distributed search on_all_  relevant cores
 For this purpose I would definitely recommend that you go SolrCloud.

 Further more we do something ekstra:
 We have several collections each containing data from a specific 
 period in time - timestamp of ingoing data decides which collection it 
 is indexed into. One important search-criteria for our clients are 
 search on timestamp-interval. Therefore most searches can be 
 restricted to only consider a subset of all our collections. Instead 
 of having the logic calculating the subset of collections to search 
 (given the timestamp
 search-interval) in clients, we just let clients do dumb searches by giving 
 the timestamp-interval. The subset of collections to search are calculated on 
 server-side from the timestamp-interval in the search-query. We handle this 
 in a Solr SearchComponent which we place early in the chain of 
 SearchComponents. Maybe you can get some inspiration by this approach, if it 
 is also relevant for you.

 Regards, Per Steffensen

 Email secured by Check Point



Email secured by Check Point


RE: CoreAdmin STATUS performance

2013-01-13 Thread Shahar Davidson
Shawn, Per and anyone else who has participated in this thread - thank you!

I have finally resorted to apply a minor patch the Solr code. 
I have noticed that most of the time of the STATUS request is spent when 
collecting Index related info (such as segmentCount, sizeInBytes, numDocs, 
etc.).
In the STATUS request I added support for a new parameter which, if present, 
will skip collection of the Index info (hence will only return general static 
info, among it the core name) - this will, in fact, cut down the request time 
by an order of two magnitudes!
In my case, it decreased the request time from around 800ms to around 1ms-4ms.

Regards,

Shahar.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Thursday, January 10, 2013 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

On 1/10/2013 2:09 AM, Shahar Davidson wrote:
 As for your first question, the core info needs to be gathered upon every 
 search request because cores are created dynamically.
 When a user initiates a search request, the system must be aware of 
 all available cores in order to execute distributed search on _all_ relevant 
 cores. (the user must get reliable and most up to date data) The reason that 
 800ms seems a lot to me is because the overall execution time takes about 
 2500ms and a large part of it is due to the STATUS request.

 The minimal interval concept is a good idea and indeed we've considered it, 
 yet it poses a slight problem when building a RT system which needs to return 
 to most up to date data.
 I am just trying to understand if there's some other way to hasten the 
 STATUS reply (for example, by asking the STATUS request to return just 
 certain core attributes, such as name, instead of collecting 
 everything)

Are there a *huge* number of SolrJ clients in the wild, or is it something like 
a server farm where you are in control of everything?  If it's the latter, what 
I think I would do is have an asynchronous thread that periodically (every few 
seconds) updates the client's view of what cores exist.  When a query is made, 
it will use that information, speeding up your queries by 800 milliseconds and 
ensuring that new cores will not have long delays before they become 
searchable.  If you have a huge number of clients in the wild, it would still 
be possible, but ensuring that those clients get updated might be hard.

If you also delete cores as well as add them, that complicates things.  
You'd have to have the clients be smart enough to exclude the last core on the 
list (by whatever sorting mechanism you require), and you'd have to wait long 
enough (30 seconds, maybe?) before *actually* deleting the last core to be sure 
that no clients are accessing it.

Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 
4.0.  SolrCloud manages your cores for you automatically.  
You'd probably be using a slightly customized SolrCloud, including the custom 
hashing capability added by SOLR-2592.  I don't know what other customizations 
you might need.

Thanks,
Shawn


Email secured by Check Point


RE: CoreAdmin STATUS performance

2013-01-10 Thread Shahar Davidson
Thanks Per.

I'm currently not using SolrCloud but that's a good tip to keep in mind.

Thanks,

Shahar.

-Original Message-
From: Per Steffensen [mailto:st...@designware.dk] 
Sent: Thursday, January 10, 2013 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

If you are using ZK-coordinating Solr (SolrCloud - you need 4.0+) you can 
maintain a in-memory always-up-to-date data-structure containing the 
information - ClusterState. You can get it through CloudSolrServer og 
ZkStateReader that you connect to ZK once and it will automatically update the 
in-memory ClusterState with changes.

Regards, Per Steffensen

On 1/9/13 4:38 PM, Shahar Davidson wrote:
 Hi All,

 I have a client app that uses SolrJ and which requires to collect the names 
 (and just the names) of all loaded cores.
 I have about 380 Solr Cores on a single Solr server (net indices size is 
 about 220GB).

 Running the STATUS action takes about 800ms - that seems a bit too long, 
 given my requirements.

 So here are my questions:
 1) Is there any way to get _only_ the core Name of all cores?
 2) Why does the STATUS request take such a long time and is there a way to 
 improve its performance?

 Thanks,

 Shahar.



Email secured by Check Point


RE: CoreAdmin STATUS performance

2013-01-10 Thread Shahar Davidson
Hi Per,

Thanks for your reply!

That's a very interesting approach.

In your system, how are the collections created? In other words, are the 
collections created dynamically upon an update (for example, per new day)?
If they are created dynamically, who handles their creation (client/server)  
and how is it done?

I'd love to hear more about it!

Appreciate your help,

Shahar.

-Original Message-
From: Per Steffensen [mailto:st...@designware.dk]
Sent: Thursday, January 10, 2013 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

On 1/10/13 10:09 AM, Shahar Davidson wrote:
 search request, the system must be aware of all available cores in 
 order to execute distributed search on_all_  relevant cores
For this purpose I would definitely recommend that you go SolrCloud.

Further more we do something ekstra:
We have several collections each containing data from a specific period in time 
- timestamp of ingoing data decides which collection it is indexed into. One 
important search-criteria for our clients are search on timestamp-interval. 
Therefore most searches can be restricted to only consider a subset of all our 
collections. Instead of having the logic calculating the subset of collections 
to search (given the timestamp
search-interval) in clients, we just let clients do dumb searches by giving 
the timestamp-interval. The subset of collections to search are calculated on 
server-side from the timestamp-interval in the search-query. We handle this in 
a Solr SearchComponent which we place early in the chain of SearchComponents. 
Maybe you can get some inspiration by this approach, if it is also relevant for 
you.

Regards, Per Steffensen

Email secured by Check Point


CoreAdmin STATUS performance

2013-01-09 Thread Shahar Davidson
Hi All,

I have a client app that uses SolrJ and which requires to collect the names 
(and just the names) of all loaded cores.
I have about 380 Solr Cores on a single Solr server (net indices size is about 
220GB).

Running the STATUS action takes about 800ms - that seems a bit too long, given 
my requirements.

So here are my questions:
1) Is there any way to get _only_ the core Name of all cores?
2) Why does the STATUS request take such a long time and is there a way to 
improve its performance?

Thanks,

Shahar.


RE: Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-26 Thread Shahar Davidson
Thanks for the prompt reply Mark.

Just to give you some background, I'm simulating a multi-shard environment by 
running more than 200 Solr Cores on a single machine (machine does not seem to 
be stressed) and I'm running a distributed facet.
The Solr server is running trunk 1404975 with SOLR-2894 patch applied over it 
(the one from Nov. 12th).
While I'm running the distributed request, other clients are sending various 
search requests to the Solr server.
This issue is randomly happening and does not reproduce constantly.
As I wrote earlier, I applied the Debugging.patch from SOLR-3258 to see the 
actual response and noticed that an actual XML reply was received and the XML 
itself was corrupt (as if a chunk of text was taken out right from the middle 
of it).

Since this reproduces randomly, the only thing that comes to mind is some kind 
of concurrency related problem.

Any help would be greatly appreciated,

Shahar.

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, December 26, 2012 4:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Invalid version (expected 2, but 60) or the data in not in 
'javabin'

The problem is not necessary xml - it seems to be anything that is not valid 
javabin - I've just most often seen it with 404s that return an html error.

I'm not sure if there is a jira issue or not, but this type of thing should be 
failing in a more user friendly way.

As to why your response is corrupt, I have no guesses.

This is easily repeatable? It's happening every time, or randomly?

- Mark

On Dec 25, 2012, at 4:23 AM, Shahar Davidson shah...@checkpoint.com wrote:

 Thanks Otis.
 
 I went through every piece of info that I could lay may hands on.
 Most of them are about incompatible SolrJ versions (that's not my case) and 
 there was one message from Mark Miller that Solr may respond with an XML  
 instead of javabin in case there was some kind of http error being returned 
 (that's not my case either).
 
 I'm using distributed search.
 I added some debug output to print out the response once the Invalid 
 version exception is caught (in JavaBinCode.unmarshal() ).
 What I saw is that the response actually contains the facet response in XML 
 format, yet I also noticed that the response is corrupt (i.e. as if a chunk 
 of text has been taken out of the middle of the reply - some kind of overrun 
 perhaps?).
 
 Any help would be appreciated.
 
 Thanks,
 
 Shahar.
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Friday, December 21, 2012 6:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Invalid version (expected 2, but 60) or the data in not in 
 'javabin'
 
 Hi,
 
 Have a look at http://search-lucene.com/?q=invalid+version+javabin
 
 Otis
 --
 Solr Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html
 
 
 
 
 On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson 
 shah...@checkpoint.comwrote:
 
 Hi,
 
 I'm encountering this error randomly when running a distributed facet.
 (i.e. I'm sending the exact same request, yet this does not reproduce
 consistently)
 I have about  180 shards that are being queried.
 It seems that when Solr distributes the request to the shards one , 
 or perhaps more, shards return an  XML reply instead of  Javabin.
 
 I added some debug output to JavaBinCode.unmarshal  (as done in the 
 debugging.patch of SOLR-3258) to check whether the XML reply holds an 
 error or not, and I noticed that the XML actually holds the response 
 from one of the shards.
 
 I'm using the patch provided in SOLR-2894 on top of trunk 1404975.
 
 Has anyone encountered such an issue? Any ideas?
 
 Thanks,
 
 Shahar.
 
 
 
 Email secured by Check Point


Email secured by Check Point


RE: Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-25 Thread Shahar Davidson
Thanks Otis.

I went through every piece of info that I could lay may hands on.
Most of them are about incompatible SolrJ versions (that's not my case) and 
there was one message from Mark Miller that Solr may respond with an XML  
instead of javabin in case there was some kind of http error being returned 
(that's not my case either).

I'm using distributed search.
I added some debug output to print out the response once the Invalid version 
exception is caught (in JavaBinCode.unmarshal() ).
What I saw is that the response actually contains the facet response in XML 
format, yet I also noticed that the response is corrupt (i.e. as if a chunk of 
text has been taken out of the middle of the reply - some kind of overrun 
perhaps?).

Any help would be appreciated.

Thanks,

Shahar.


-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, December 21, 2012 6:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Invalid version (expected 2, but 60) or the data in not in 
'javabin'

Hi,

Have a look at http://search-lucene.com/?q=invalid+version+javabin

Otis
--
Solr Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson shah...@checkpoint.comwrote:

 Hi,

 I'm encountering this error randomly when running a distributed facet.
  (i.e. I'm sending the exact same request, yet this does not reproduce
 consistently)
 I have about  180 shards that are being queried.
 It seems that when Solr distributes the request to the shards one , or 
 perhaps more, shards return an  XML reply instead of  Javabin.

 I added some debug output to JavaBinCode.unmarshal  (as done in the 
 debugging.patch of SOLR-3258) to check whether the XML reply holds an 
 error or not, and I noticed that the XML actually holds the response 
 from one of the shards.

 I'm using the patch provided in SOLR-2894 on top of trunk 1404975.

 Has anyone encountered such an issue? Any ideas?

 Thanks,

 Shahar.



Email secured by Check Point


Invalid version (expected 2, but 60) or the data in not in 'javabin'

2012-12-19 Thread Shahar Davidson
Hi,

I'm encountering this error randomly when running a distributed facet.  (i.e. 
I'm sending the exact same request, yet this does not reproduce consistently)
I have about  180 shards that are being queried.
It seems that when Solr distributes the request to the shards one , or perhaps 
more, shards return an  XML reply instead of  Javabin.

I added some debug output to JavaBinCode.unmarshal  (as done in the 
debugging.patch of SOLR-3258) to check whether the XML reply holds an error or 
not, and I noticed that the XML actually holds the response from one of the 
shards.

I'm using the patch provided in SOLR-2894 on top of trunk 1404975.

Has anyone encountered such an issue? Any ideas?

Thanks,

Shahar.


Partitioning data to Cores using a Solr plugin

2012-10-23 Thread Shahar Davidson
Hi all,

I would like to partition my data (by date for example) into Solr Cores by 
implement some sort of *pluggable component* for Solr.
In other words, I want Solr to handle distribution to partitions (rather than 
implementing an external solr proxy for sending requests to the right Solr 
Core).

Initially I thought that DistributedUpdateProcessor may help here but, as I 
understood, it is not intended for partitioning into Cores but rather into 
shard across several machines. In addition, one cannot control the logic by 
which distribution is done.
I thought about implementing an UpdateRequestProcessor that forwards document 
updates to the right Cores (DistributedUpdateProcessor), yet I want to check 
with you (all Solr users out there) if this can be avoided by doing it 
differently.

In other words, is there any other way of implementing a pluggable component 
for Solr that can forward/route updates (using predefined logic) to Cores?
Is there, for instance, a way to catch an update request before it enters the 
update-request processor-chain?

Thanks,

Shahar.