Re: Distributing Collections across Shards

2016-03-29 Thread Erick Erickson
Absolutely. You haven't said which version of Solr you're using,
but there are several possibilities:
1> create the collection with replicationFactor=1, then use the
ADDREPLICA command to specify exactly what node the  replicas
for each shard are created on with the 'node' parameter.
2> For recent versions of Solr, you can create a collection with _no_
replicas and then ADDREPLICA as you choose.

Best,
Erick

On Tue, Mar 29, 2016 at 5:10 AM, Salman Ansari  wrote:
> Hi,
>
> I believe the default behavior of creating collections distributed across
> shards through the following command
>
> http://
> [solrlocation]:8983/solr/admin/collections?action=CREATE=[collection_name]=2=2=2=[configuration_name]
>
> is that Solr will create the collection as follows
>
> *shard1: *leader in server1 and replica in server2
> *shard2:* leader in server2 and replica in server1
>
> However, I have seen cases when running the above command that it creates
> both the leader and replica on the same server.
>
> Wondering if there is a way to control this behavior (I mean control where
> the leader and the replica of each shard will reside)?
>
> Regards,
> Salman


Re: High Cpu sys usage

2016-03-29 Thread Erick Erickson
Do not, repeat NOT try to "cure" the "Overlapping onDeckSearchers"
by bumping this limit! What that means is that your commits
(either hard commit with openSearcher=true or softCommit) are
happening far too frequently and your Solr instance is trying to do
all sorts of work that is immediately thrown away and chewing up
lots of CPU. Perhaps this will help:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

I'd guess that you're

> commiting every second, or perhaps your indexing client is committing
after each add. If the latter, do not do this and rely on the
autocommit settings
and if the formaer make those intervals as long as you can stand.

> you may have your autowarm counts in your solrconfig.xml file set at
very high numbers (let's see the filterCache settings, the queryResultCache
settings etc.).

I'd _strongly_ recommend that you put the on deck searchers back to
2 and figure out why you have so many overlapping searchers.

Best,
Erick

On Tue, Mar 29, 2016 at 8:57 PM, YouPeng Yang  wrote:
> Hi Toke
>   The number of collection is just 10.One of collection has 43 shards,each
> shard has two replicas.We continue  importing data from oracle all the time
> while our systems provide searching service.
>There are "Overlapping onDeckSearchers"  in my solr.logs. What is the
> meaning about the "Overlapping onDeckSearchers" ,We set the the <
> maxWarmingSearchers>20 and true useColdSearcher>.Is it right ?
>
>
>
> Best Regard.
>
>
> 2016-03-29 22:31 GMT+08:00 Toke Eskildsen :
>
>> On Tue, 2016-03-29 at 20:12 +0800, YouPeng Yang wrote:
>> >   Our system still goes down as times going.We found lots of threads are
>> > WAITING.Here is the threaddump that I copy from the web page.And 4
>> pictures
>> > for it.
>> >   Is there any relationship with my problem?
>>
>> That is a lot of commitScheduler-threads. Do you have hundreds of
>> collections in your cloud?
>>
>>
>> Try grepping for "Overlapping onDeckSearchers" in your solr.logs to see
>> if you got caught in a downwards spiral of concurrent commits.
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>


Re: Solr not working on new environment

2016-03-29 Thread Erick Erickson
Good to meet you!

It looks like you've tried to start Solr a time or two. When you start
up the "cloud" example
it creates
/opt/solr-5.5.0/example/cloud
and puts your SolrCloud stuff under there. It also automatically puts
your configuration
sets up on Zookeeper. When I get this kind of thing, I usually

> stop Zookeeper (if running externally)

> rm -rf /opt/solr-5.5.0/example/cloud

> delete all the Zookeeper data. It may take a bit of poking to find out where
the Zookeeper data is. It's usually in /tmp/zookeeper if you're running ZK
standalone, or in a subdirectory in Solr if you're using embedded ZK.
NOTE: if you're running standalone zookeeper, you should _definitely_
change the data dir because it may disappear from /tmp/zookeeper One
of Zookeeper's little quirks

> try it all over again.

Here's the problem. The examples (-e cloud) tries to do a bunch of stuff for
you to get the installation up and running without having to wend your way
through all of the indiviual commands. Sometimes getting partway through
leaves you in an ambiguous state. Or at least a state you don't quite know
what all the moving parts are.

Here's the steps you need to follow if you're doing them yourself rather than
relying on the canned example
1> start Zookeeper externally. For experimentation, a single ZK is quite
sufficient, I don't bother with 3 ZK instances and a quorum unless I'm
in a production situation.
2> start solr with the bin/solr script, use the -c and -z options. At
this point,
you have a functioning Solr, but no collections. You should be
able to see the solr admin UI at http://node:8982/solr at this point.
3> use the bin/solr zk -upconfig command to put a configset in ZK
4> use the Collections API to create and maintain collections.

And one more note. When you use the '-e cloud' option, you'll see
messages go by about starting nodes with a command like:

bin/solr start -c -z localhost:2181 -p 8981 -s example/cloud/node1/solr
bin/solr start -c -z localhost:2181 -p 8982 -s example/cloud/node2/solr

Once the structure is created, then you just use these commands to
restart an existing set of Solr instances on your machine.

Remember I said that the canned examples create ...example/cloud?
What the canned examples are doing is creating solr instances that are
independent, but on the same machine in order to get people started. When
you specify the '-e cloud' option, those directories are created or, as you've
seen, messages are printed that essentially indicate you're running the
one-time example... more than once.

HTH,
Erick


On Tue, Mar 29, 2016 at 8:06 AM, Jarus Bosman  wrote:
> Hi,
>
> Introductions first (as I was taught): My name is Jarus Bosman, I am a
> software developer from South Africa, doing development in Java, PHP and
> Delphi. I have been programming for 19 years and find out more every day
> that I don't actually know anything about programming ;).
>
> My problem:
>
> We recently moved our environment to a new server. I've installed 5.5.0 on
> the new environment. When I want to start the server, I get the following:
>
> *Welcome to the SolrCloud example!*
>
> *Starting up 2 Solr nodes for your example SolrCloud cluster.*
>
> *Solr home directory /opt/solr-5.5.0/example/cloud/node1/solr already
> exists.*
> */opt/solr-5.5.0/example/cloud/node2 already exists.*
>
> *Starting up Solr on port 8983 using command:*
> */opt/solr-5.5.0/bin/solr start -cloud -p 8983 -s
> "/opt/solr-5.5.0/example/cloud/node1/solr"*
>
> *Waiting up to 30 seconds to see Solr running on port 8983 [/]  Still not
> seeing Solr listening on 8983 after 30 seconds!*
> *INFO  - 2016-03-29 14:22:14.356; [   ] org.eclipse.jetty.util.log.Log;
> Logging initialized @463ms*
> *INFO  - 2016-03-29 14:22:14.717; [   ] org.eclipse.jetty.server.Server;
> jetty-9.2.13.v20150730*
> *WARN  - 2016-03-29 14:22:14.752; [   ]
> org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog*
> *INFO  - 2016-03-29 14:22:14.757; [   ]
> org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
> [file:/opt/solr-5.5.0/server/contexts/] at interval 0*
> *INFO  - 2016-03-29 14:22:15.768; [   ]
> org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
> /solr, did not find org.apache.jasper.servlet.JspServlet*
> *WARN  - 2016-03-29 14:22:15.790; [   ]
> org.eclipse.jetty.security.ConstraintSecurityHandler;
> ServletContext@o.e.j.w.WebAppContext@7a583307{/solr,file:/opt/solr-5.5.0/server/solr-webapp/webapp/,STARTING}{/opt/solr-5.5.0/server/solr-webapp/webapp}
> has uncovered http methods for path: /*
> *INFO  - 2016-03-29 14:22:15.809; [   ]
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init():
> WebAppClassLoader=1287618844@4cbf811c*
> *INFO  - 2016-03-29 14:22:15.848; [   ]
> org.apache.solr.core.SolrResourceLoader; JNDI not configured for solr
> (NoInitialContextEx)*
> *INFO  - 2016-03-29 14:22:15.849; [   ]
> org.apache.solr.core.SolrResourceLoader; using system 

Re: High Cpu sys usage

2016-03-29 Thread YouPeng Yang
Hi Toke
  The number of collection is just 10.One of collection has 43 shards,each
shard has two replicas.We continue  importing data from oracle all the time
while our systems provide searching service.
   There are "Overlapping onDeckSearchers"  in my solr.logs. What is the
meaning about the "Overlapping onDeckSearchers" ,We set the the <
maxWarmingSearchers>20 and true.Is it right ?



Best Regard.


2016-03-29 22:31 GMT+08:00 Toke Eskildsen :

> On Tue, 2016-03-29 at 20:12 +0800, YouPeng Yang wrote:
> >   Our system still goes down as times going.We found lots of threads are
> > WAITING.Here is the threaddump that I copy from the web page.And 4
> pictures
> > for it.
> >   Is there any relationship with my problem?
>
> That is a lot of commitScheduler-threads. Do you have hundreds of
> collections in your cloud?
>
>
> Try grepping for "Overlapping onDeckSearchers" in your solr.logs to see
> if you got caught in a downwards spiral of concurrent commits.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: Solr not working on new environment

2016-03-29 Thread Shyam R
Hi Jarus,

Have you tried stopping the solr process and restarting the cluster again?

Thanks
Shyam

On Tue, Mar 29, 2016 at 8:36 PM, Jarus Bosman  wrote:

> Hi,
>
> Introductions first (as I was taught): My name is Jarus Bosman, I am a
> software developer from South Africa, doing development in Java, PHP and
> Delphi. I have been programming for 19 years and find out more every day
> that I don't actually know anything about programming ;).
>
> My problem:
>
> We recently moved our environment to a new server. I've installed 5.5.0 on
> the new environment. When I want to start the server, I get the following:
>
> *Welcome to the SolrCloud example!*
>
> *Starting up 2 Solr nodes for your example SolrCloud cluster.*
>
> *Solr home directory /opt/solr-5.5.0/example/cloud/node1/solr already
> exists.*
> */opt/solr-5.5.0/example/cloud/node2 already exists.*
>
> *Starting up Solr on port 8983 using command:*
> */opt/solr-5.5.0/bin/solr start -cloud -p 8983 -s
> "/opt/solr-5.5.0/example/cloud/node1/solr"*
>
> *Waiting up to 30 seconds to see Solr running on port 8983 [/]  Still not
> seeing Solr listening on 8983 after 30 seconds!*
> *INFO  - 2016-03-29 14:22:14.356; [   ] org.eclipse.jetty.util.log.Log;
> Logging initialized @463ms*
> *INFO  - 2016-03-29 14:22:14.717; [   ] org.eclipse.jetty.server.Server;
> jetty-9.2.13.v20150730*
> *WARN  - 2016-03-29 14:22:14.752; [   ]
> org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog*
> *INFO  - 2016-03-29 14:22:14.757; [   ]
> org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
> [file:/opt/solr-5.5.0/server/contexts/] at interval 0*
> *INFO  - 2016-03-29 14:22:15.768; [   ]
> org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
> /solr, did not find org.apache.jasper.servlet.JspServlet*
> *WARN  - 2016-03-29 14:22:15.790; [   ]
> org.eclipse.jetty.security.ConstraintSecurityHandler;
> ServletContext@o.e.j.w.WebAppContext
> @7a583307{/solr,file:/opt/solr-5.5.0/server/solr-webapp/webapp/,STARTING}{/opt/solr-5.5.0/server/solr-webapp/webapp}
> has uncovered http methods for path: /*
> *INFO  - 2016-03-29 14:22:15.809; [   ]
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init():
> WebAppClassLoader=1287618844@4cbf811c*
> *INFO  - 2016-03-29 14:22:15.848; [   ]
> org.apache.solr.core.SolrResourceLoader; JNDI not configured for solr
> (NoInitialContextEx)*
> *INFO  - 2016-03-29 14:22:15.849; [   ]
> org.apache.solr.core.SolrResourceLoader; using system property
> solr.solr.home: /opt/solr-5.5.0/example/cloud/node1/solr*
> *INFO  - 2016-03-29 14:22:15.850; [   ]
> org.apache.solr.core.SolrResourceLoader; new SolrResourceLoader for
> directory: '/opt/solr-5.5.0/example/cloud/node1/solr'*
> *INFO  - 2016-03-29 14:22:15.851; [   ]
> org.apache.solr.core.SolrResourceLoader; JNDI not configured for solr
> (NoInitialContextEx)*
> *INFO  - 2016-03-29 14:22:15.852; [   ]
> org.apache.solr.core.SolrResourceLoader; using system property
> solr.solr.home: /opt/solr-5.5.0/example/cloud/node1/solr*
> *INFO  - 2016-03-29 14:22:15.880; [   ] org.apache.solr.core.SolrXmlConfig;
> Loading container configuration from
> /opt/solr-5.5.0/example/cloud/node1/solr/solr.xml*
> *INFO  - 2016-03-29 14:22:16.051; [   ]
> org.apache.solr.core.CorePropertiesLocator; Config-defined core root
> directory: /opt/solr-5.5.0/example/cloud/node1/solr*
> *INFO  - 2016-03-29 14:22:16.104; [   ] org.apache.solr.core.CoreContainer;
> New CoreContainer 1211012646*
> *INFO  - 2016-03-29 14:22:16.104; [   ] org.apache.solr.core.CoreContainer;
> Loading cores into CoreContainer
> [instanceDir=/opt/solr-5.5.0/example/cloud/node1/solr]*
> *WARN  - 2016-03-29 14:22:16.109; [   ] org.apache.solr.core.CoreContainer;
> Couldn't add files from /opt/solr-5.5.0/example/cloud/node1/solr/lib to
> classpath: /opt/solr-5.5.0/example/cloud/node1/solr/lib*
> *INFO  - 2016-03-29 14:22:16.133; [   ]
> org.apache.solr.handler.component.HttpShardHandlerFactory; created with
> socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost :
> 20,maxConnections : 1,corePoolSize : 0,maximumPoolSize :
> 2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy :
> false,useRetries : false,*
> *INFO  - 2016-03-29 14:22:16.584; [   ]
> org.apache.solr.update.UpdateShardHandler; Creating UpdateShardHandler HTTP
> client with params: socketTimeout=60=6=true*
> *INFO  - 2016-03-29 14:22:16.590; [   ] org.apache.solr.logging.LogWatcher;
> SLF4J impl is org.slf4j.impl.Log4jLoggerFactory*
> *INFO  - 2016-03-29 14:22:16.592; [   ] org.apache.solr.logging.LogWatcher;
> Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]*
> *INFO  - 2016-03-29 14:22:16.603; [   ]
> org.apache.solr.cloud.SolrZkServerProps; Reading configuration from:
> /opt/solr-5.5.0/example/cloud/node1/solr/zoo.cfg*
> *INFO  - 2016-03-29 14:22:16.605; [   ] org.apache.solr.cloud.SolrZkServer;
> STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983*

Re: Deleted documents and expungeDeletes

2016-03-29 Thread Erick Erickson
bq: where I see that the number of deleted documents just
keeps on growing and growing, but they never seem to be deleted

This shouldn't be happening.  The default TieredMergePolicy weights
segments to be merged (which happens automatically) heavily as per
the percentage of deleted docs. Here's a great visualization:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

It may be that when you say "growing and growing", that the number of
deleted docs hasn't reached the threshold where they get merged away.

Please specify "growing and growing", Until it gets to 15% or more of the total
then I'd start to worry. And then only if it kept growing after that.

To your questions:
1> This is automatic. It'll "just happen", but you will probably always carry
some deleted docs around in your index.

2> You always need at least as much free space as your index occupies on disk.
In the worst case of normal merging, _all_ the segments will be merged
and they're
copied first. Once that's successful, then the original is deleted.

3> Not really. Normally there should be no need.

4> True, but usually the effect is so minuscule that nobody notices.
People spend
endless time obsessing about this and unless and until you can show that your
_users_ notice, I'd ignore it.

Best,
Erick

On Tue, Mar 29, 2016 at 8:16 AM, Jostein Elvaker Haande
 wrote:
> Hello everyone,
>
> I apologise beforehand if this is a question that has been visited
> numerous times on this list, but after hours spent on Google and
> talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a
> loss about SOLR and deleted documents.
>
> I have quite a few indexes in both production and development
> environments, where I see that the number of deleted documents just
> keeps on growing and growing, but they never seem to be deleted. From
> my understanding, this can be controller in the merge policy set for
> the current core, but I've not been able to find any specifics on the
> topic.
>
> The general consensus on most search hits I've found is to perform an
> optimize of the core, however this is both an expensive operation,
> both in terms of CPU cycles as well as disk I/O, and also requires you
> to have anywhere from 2 times to 3 times the size of the index
> available on disk to be guaranteed to complete fully. Given these
> criteria, it's often not something that is a viable option in certain
> environments, both to it being a resource hog and often that you just
> don't have the needed available disk space to perform the optimize.
>
> After having spoken with a couple of people on IRC (thanks tokee and
> elyograg), I was made aware of an optional parameter for 
> called 'expungeDeletes' that can explicitly make sure that deleted
> documents are deleted from the index, i.e:
>
> curl http://localhost:8983/solr/coreName/update -H "Content-Type:
> text/xml" --data-binary ''
>
> Now my questions are as follows:
>
> 1) How can I make sure that this is dealt with in my merge policy, if
> at all possible?
> 2) I've tried to find some disk space guidelines for 'expungeDeletes',
> however I've not been able to find any. What are the general
> guidelines here? Does it require as much space as an optimize, or is
> it less "aggressive" compared to an optimize?
> 3) Is 'expungeDeletes' the recommended method to make sure your
> deleted documents are actually removed from the index, or should you
> deal with this in your merge policy?
> 4) I have also heard from talks on #SOLR that deleted documents has an
> impact on the relevancy of performed searches. Is this correct, or
> just misinformation?
>
> If you require any additional information, like snippets from my
> configuration (solrconfig.xml), I'm more than happy to provide this.
>
> Again, if this is an issue that's being revisited for the Nth time, I
> apologize, I'm just trying to get my head around this with my somewhat
> limited SOLR knowledge.
>
> --
> Yours sincerely Jostein Elvaker Haande
> "A free society is a society where it is safe to be unpopular"
> - Adlai Stevenson
>
> http://tolecnal.net -- tolecnal at tolecnal dot net


Solr response error 403 when I try to index medium.com articles

2016-03-29 Thread Jeferson dos Anjos
I'm trying to index some pages of the medium. But I get error 403. I
believe it is because the medium does not accept the user-agent solr. Has
anyone ever experienced this? You know how to change?

I appreciate any help


500
94



Server returned HTTP response code: 403 for URL:
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1


java.io.IOException: Server returned HTTP response code: 403 for URL:
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown
Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source) at java.lang.reflect.Constructor.newInstance(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method) at
sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
Source) at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
Source) at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source) Caused by:
java.io.IOException: Server returned HTTP response code: 403 for URL:
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(Unknown
Source) at java.net.URLConnection.getContentType(Unknown Source) at
sun.net.www.protocol.https.HttpsURLConnectionImpl.getContentType(Unknown
Source) at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:84)
... 33 more

500




-- 
Jeferson M. dos Anjos
CEO do Packdocs
ps.: Mantenha seus arquivos vivos com o Packdocs (www.packdocs.com)


Re: Solr response error 403 when I try to index medium.com articles

2016-03-29 Thread Jack Krupansky
Medium switches from http to https, so you would need the logic for dealing
with https security handshakes.

-- Jack Krupansky

On Tue, Mar 29, 2016 at 7:54 PM, Jeferson dos Anjos <
jefersonan...@packdocs.com> wrote:

> I'm trying to index some pages of the medium. But I get error 403. I
> believe it is because the medium does not accept the user-agent solr. Has
> anyone ever experienced this? You know how to change?
>
> I appreciate any help
>
> 
> 500
> 94
> 
> 
> 
> Server returned HTTP response code: 403 for URL:
>
> https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
> 
> 
> java.io.IOException: Server returned HTTP response code: 403 for URL:
>
> https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
> at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown
> Source) at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source) at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
> at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method) at
> sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
> Source) at
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
> Source) at
> org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368) at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Unknown Source) Caused by:
> java.io.IOException: Server returned HTTP response code: 403 for URL:
>
> https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
> Source) at
> 

Solr response error 403 when I try to index medium.com articles

2016-03-29 Thread Jeferson dos Anjos
I'm trying to index some pages of the medium. But I get error 403. I
believe it is because the medium does not accept the user-agent solr. Has
anyone ever experienced this? You know how to change?

I appreciate any help


500
94



Server returned HTTP response code: 403 for URL:
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1


java.io.IOException: Server returned HTTP response code: 403 for URL:
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown
Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source) at java.lang.reflect.Constructor.newInstance(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method) at
sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
Source) at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
Source) at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source) Caused by:
java.io.IOException: Server returned HTTP response code: 403 for URL:
https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
Source) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(Unknown
Source) at java.net.URLConnection.getContentType(Unknown Source) at
sun.net.www.protocol.https.HttpsURLConnectionImpl.getContentType(Unknown
Source) at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:84)
... 33 more

500




Jeferson M. dos Anjos
CEO do Packdocs
ps.: Mantenha seus arquivos vivos com o Packdocs (www.packdocs.com)


Re: Setting up a two nodes Solr Cloud 5.4.1 environment

2016-03-29 Thread Shawn Heisey
I thought I had sent this reply over the weekend.  I had it all ready to
go, but it's still here waiting in my Drafts folder, so I'll send it now.

On 3/25/2016 11:05 AM, Victor D'agostino wrote:
> I am trying to set up a Solr Cloud environment of two Solr 5.4.1 nodes
> but the data are always indexed on the first node although the unique
> id is a GUID.
>
> It looks like I can't add an additional node. Could you tell me where
> i'm wrong ?
>
> I try to set up a collection named "db" with two shards on each node.
> Without replica. The config is named "copiemail3".



> On node n°2
> I start Solr and create the two shards with the cores API (collections
> API won't work because i use compositeId routing mode) :
>  wget --no-proxy
> "http://$HOSTNAME:8983/solr/admin/cores?action=CREATE=schema.xml=shard3=db_shard3_replica1=false=db_shard3_replica1=solrconfig.xml=db=data;
>  wget --no-proxy
> "http://$HOSTNAME:8983/solr/admin/cores?action=CREATE=schema.xml=shard4=db_shard4_replica1=false=db_shard4_replica1=solrconfig.xml=db=data;
> Like node 1 i activate the ping and restart Solr.

This is why it's a VERY bad idea to use CoreAdmin in cloud mode unless
you understand *EXACTLY* what you are doing and how SolrCloud functions
internally.  There's no polite way to tell you that you don't have this
expert-level understanding.

The CoreAdmin calls that you executed have added two new shards to your
collection.  This might be what you intended, but as you have
discovered, the true effects are not what you *wanted*.

Your interaction with SolrCloud collections should always be through the
Collections API.  Any other method may not work as expected.

When you first create your compositeId-routed collection, you need to
tell Solr exactly what you want (number of shards, number of replicas). 
If you had used replicationFactor=2, then your second node would have
had replicas of both shards from the beginning.  You can add replicas
later with the ADDREPLICA action on the Collections API.

The implicit router means 100% manual routing, and you probably do NOT
want that.  A collection using implicit routing is one that lets you add
shards with no problems.  This is because indexing to such a collection
requires that you choose which shard will receive every indexing request
-- nothing will be automatically routed.

If you want Solr to automatically handle shard routing (compositeId) you
can't just add shards to your collection and expect them to be used. 
This is why the collections API refuses to add shards when you're using
compositeId.

The shard routing and the number of total shards for compositeId is
established when the collection is created, and can only be changed by
splitting shards (a Collections API action) or manually changing the
hash ranges in the clusterstate in zookeeper.  Manual clusterstate
editing is only recommended as a *last* resort for fixing a completely
broken collection.  In normal situations even *experts* shouldn't edit
the clusterstate.  It's extremely easy to break SolrCloud with these edits.

Thanks,
Shawn



Re: Load Resource from within Solr Plugin

2016-03-29 Thread Ahmet Arslan


Hi Max,

Why not implement org.apache.lucene.analysis.util.ResourceLoaderAware?
Existing implementation all load/read text files.

Ahmet

On Wednesday, March 30, 2016 12:14 AM, Max Bridgewater 
 wrote:



HI,

I am facing the exact issue described here:
http://stackoverflow.com/questions/25623797/solr-plugin-classloader.

Basically I'm writing a solr plugin by extending SearchComponent class. My
new class is part of a.jar archive. Also my class depends on a jar b.jar. I
placed both jars in my own folder and declared in it solrconfig.xml with:



I also declared my new component in solrconfig.xml. The component is
invoked correctly up to a point where a class ClassFromB from b.jar
attempts to load a classpath resource personal-words.txt from classpath.

The piece of code in class ClassFromB looks like this:

Thread.currentThread().getContextClassLoader().getResources("personal-words.txt")


Unfortunately, this returns an empty list. Any recommendation?


Thanks,

Max.


Re[5]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

2016-03-29 Thread Alisa Z .
 Alright, based on  https://issues.apache.org/jira/browse/SOLR-5743 I can 
assume that limit and mincount for the BlockJoin part stay an open issue for 
some time ...  
Therefore, the answer is no as of Solr 5.5.0. 

Thanks to Mikhail Khludnev for working on the subject. 

>Вторник, 29 марта 2016, 14:38 -04:00 от Alisa Z. :
>
>Mikhail, 
>
>I totally see the point: the corresponding wiki page (  
>https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ) does not 
>mention it and says it's an experimental feature. 
>
>Is it correct that no additional options ( limit, mincount, etc.) can  be set 
>anyhow?  
>
>Or more specifically, is there any work-around to control the output of the 
>query at hand (maybe anything beyond faceting options): 
>
>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords=text_t=0={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california=json=true
>> >>
>> >>RETURNS:
>> >>
>> >>{
>> >> "responseHeader":{
>> >> "status":0,
>> >> "QTime":1},
>> >> "response":{"numFound":19,"start":0,"docs":[]
>> >> },
>> >> "facet_counts":[
>> >> "facet_fields",[
>> >> "text_t",[
>> >> "128x",1,
>> >> "18xx",1,
>> >> "1x",1,
>> >> "2",2,
>> >> "30",1,
>> >> "60",1,
>> >> "78xx",1,
>> >> "82xx",1,
>> >> "ab",2,
>> >> "access",5,
>> >> "account",1,
>> >> "accounts",1,
>> >>...
>> >>"california",13,
>> >>...
>> >>"enron",9,
>> >>...
>> >>]]]}
>> >>  
>
>
>>Вторник, 29 марта 2016, 13:40 -04:00 от Mikhail Khludnev < 
>>mkhlud...@griddynamics.com >:
>>
>>Alisa,
>>
>>There is no such thing as child.facet.limit, etc
>>
>>On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z. <  prol...@mail.ru > wrote:
>>
>>>  So the first issue eventually solved by adding facet: {top_terms_by_doc:
>>> "unique(_root_)"} AND sorting the outer facet buckets by this faceting:
>>>
>>> curl http://localhost:8985/solr/enron_path_w_ts/query -d
>>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
>>> json.facet={
>>>   filter_by_child_type :{
>>> type:query,
>>> q:"type_s:doc.enriched.text.keywords",
>>> domain: { blockChildren : "type_s:doc" },
>>> facet:{
>>>   top_keywords_text : {
>>> type: terms,
>>> field: text_t,
>>> limit: 10,
>>> sort: "top_terms_by_doc desc",
>>>  facet: {
>>>top_terms_by_doc: "unique(_root_)"
>>>  }
>>>   }
>>> }
>>>   }
>>> }'
>>>
>>>
>>> The  BlockJoin Faceting  part is still open:  I've tried all conventional
>>> faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit
>>> ... nothing worked :(
>>>
>>>
>>> >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. <  prol...@mail.ru >:
>>> >
>>> >Ok, so for the 1st question, I think I'm getting closer:  adding  facet:
>>> {top_terms_by_doc: "unique(_root_)"}  as indicated in
>>>  http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns
>>> correct counts. However, sorting is done by the upper faceting not by the
>>> unique(_root_):
>>> >
>>> >
>>> >curl  http://localhost:8985/solr/my_collection /query -d
>>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
>>> >json.facet={
>>> >  filter_by_child_type :{
>>> >type:query,
>>> >q:"type_s:doc.enriched.text.keywords",
>>> >domain: { blockChildren : "type_s:doc" },
>>> >facet:{
>>> >  top_keywords_text : {
>>> >type: terms,
>>> >field: text_t,
>>> >limit: 10,
>>> >facet: {
>>> >   top_terms_by_doc: "unique(_root_)"
>>> > }
>>> >  }
>>> >}
>>> >  }
>>> >}'
>>> >
>>> >RETURNS
>>> >
>>> >{
>>> >  "responseHeader":{
>>> >"status":0,
>>> >"QTime":25,
>>> >"params":{
>>> >  "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
>>> +Subject_t:california",
>>> >  "json.facet":"{\n  filter_by_child_type :{\ntype:query,\n
>>> q:\"type_s:doc.enriched.text.keywords\",\ndomain: { blockChildren :
>>> \"type_s:doc\" },\nfacet:{\n  top_keywords_text : {\ntype:
>>> terms,\nfield: text_t,\nlimit: 10,\nfacet:
>>> {\n   top_terms_by_doc: \"unique(_root_)\"\n }\n
>>> }\n}\n  }\n}",
>>> >  "rows":"0"}},
>>> >  "response":{"numFound":19,"start":0,"docs":[]
>>> >  },
>>> >  "facets":{
>>> >"count":19,
>>> >"filter_by_child_type":{
>>> >  "count":686,
>>> >  "top_keywords_text":{
>>> >"buckets":[{
>>> >"val":"enron",
>>> >"count":57,
>>> >"top_terms_by_doc":9},
>>> >  {
>>> >"val":"california",
>>> >"count":22,
>>> >"top_terms_by_doc":13},
>>> >  {
>>> >"val":"power",
>>> >"count":21,
>>> >"top_terms_by_doc":7},
>>> >  {
>>> >"val":"rate",
>>> >"count":15,
>>> >"top_terms_by_doc":5},
>>> >  {
>>> >"val":"plan",
>>> >

Load Resource from within Solr Plugin

2016-03-29 Thread Max Bridgewater
HI,

I am facing the exact issue described here:
http://stackoverflow.com/questions/25623797/solr-plugin-classloader.

Basically I'm writing a solr plugin by extending SearchComponent class. My
new class is part of a.jar archive. Also my class depends on a jar b.jar. I
placed both jars in my own folder and declared in it solrconfig.xml with:



I also declared my new component in solrconfig.xml. The component is
invoked correctly up to a point where a class ClassFromB from b.jar
attempts to load a classpath resource personal-words.txt from classpath.

The piece of code in class ClassFromB looks like this:

Thread.currentThread().getContextClassLoader().getResources("personal-words.txt")


Unfortunately, this returns an empty list. Any recommendation?


Thanks,

Max.


Re: Setting up a two nodes Solr Cloud 5.4.1 environment

2016-03-29 Thread Shawn Heisey
On 3/29/2016 1:58 AM, Victor D'agostino wrote:
> Thanks for your help, here is what I've done.
>
> 1. I deleted zookeepers and Solr installations.
> 2. I setup zookeepers on my two servers.
> 3. I successfully setup Solr Cloud node 1 with the same API call (1
> collection named db and two cores) :
>  wget --no-proxy
> "http://$HOSTNAME:8983/solr/admin/collections?numShards=2=copiemail3=compositeId=2=mail_id=db=1=CREATE;

FYI: You can't build a redundant SolrCloud with only two physical servers.

Solr itself only needs two servers for redundancy, but Zookeeper, which
is essential for SolrCloud, needs three.  This is documented
explicitly.  See the "Note" box here:

http://zookeeper.apache.org/doc/r3.4.8/zookeeperStarted.html#sc_RunningReplicatedZooKeeper

Thanks,
Shawn



Re[4]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

2016-03-29 Thread Alisa Z .
 Mikhail, 

I totally see the point: the corresponding wiki page ( 
https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ) does not 
mention it and says it's an experimental feature. 

Is it correct that no additional options ( limit, mincount, etc.) can  be set 
anyhow?  

Or more specifically, is there any work-around to control the output of the 
query at hand (maybe anything beyond faceting options): 

/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords=text_t=0={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california=json=true
> >>
> >>RETURNS:
> >>
> >>{
> >> "responseHeader":{
> >> "status":0,
> >> "QTime":1},
> >> "response":{"numFound":19,"start":0,"docs":[]
> >> },
> >> "facet_counts":[
> >> "facet_fields",[
> >> "text_t",[
> >> "128x",1,
> >> "18xx",1,
> >> "1x",1,
> >> "2",2,
> >> "30",1,
> >> "60",1,
> >> "78xx",1,
> >> "82xx",1,
> >> "ab",2,
> >> "access",5,
> >> "account",1,
> >> "accounts",1,
> >>...
> >>"california",13,
> >>...
> >>"enron",9,
> >>...
> >>]]]}
> >>  


>Вторник, 29 марта 2016, 13:40 -04:00 от Mikhail Khludnev 
>:
>
>Alisa,
>
>There is no such thing as child.facet.limit, etc
>
>On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z. < prol...@mail.ru > wrote:
>
>>  So the first issue eventually solved by adding facet: {top_terms_by_doc:
>> "unique(_root_)"} AND sorting the outer facet buckets by this faceting:
>>
>> curl http://localhost:8985/solr/enron_path_w_ts/query -d
>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
>> json.facet={
>>   filter_by_child_type :{
>> type:query,
>> q:"type_s:doc.enriched.text.keywords",
>> domain: { blockChildren : "type_s:doc" },
>> facet:{
>>   top_keywords_text : {
>> type: terms,
>> field: text_t,
>> limit: 10,
>> sort: "top_terms_by_doc desc",
>>  facet: {
>>top_terms_by_doc: "unique(_root_)"
>>  }
>>   }
>> }
>>   }
>> }'
>>
>>
>> The  BlockJoin Faceting  part is still open:  I've tried all conventional
>> faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit
>> ... nothing worked :(
>>
>>
>> >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. < prol...@mail.ru >:
>> >
>> >Ok, so for the 1st question, I think I'm getting closer:  adding  facet:
>> {top_terms_by_doc: "unique(_root_)"}  as indicated in
>>  http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns
>> correct counts. However, sorting is done by the upper faceting not by the
>> unique(_root_):
>> >
>> >
>> >curl  http://localhost:8985/solr/my_collection /query -d
>> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
>> >json.facet={
>> >  filter_by_child_type :{
>> >type:query,
>> >q:"type_s:doc.enriched.text.keywords",
>> >domain: { blockChildren : "type_s:doc" },
>> >facet:{
>> >  top_keywords_text : {
>> >type: terms,
>> >field: text_t,
>> >limit: 10,
>> >facet: {
>> >   top_terms_by_doc: "unique(_root_)"
>> > }
>> >  }
>> >}
>> >  }
>> >}'
>> >
>> >RETURNS
>> >
>> >{
>> >  "responseHeader":{
>> >"status":0,
>> >"QTime":25,
>> >"params":{
>> >  "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
>> +Subject_t:california",
>> >  "json.facet":"{\n  filter_by_child_type :{\ntype:query,\n
>> q:\"type_s:doc.enriched.text.keywords\",\ndomain: { blockChildren :
>> \"type_s:doc\" },\nfacet:{\n  top_keywords_text : {\ntype:
>> terms,\nfield: text_t,\nlimit: 10,\nfacet:
>> {\n   top_terms_by_doc: \"unique(_root_)\"\n }\n
>> }\n}\n  }\n}",
>> >  "rows":"0"}},
>> >  "response":{"numFound":19,"start":0,"docs":[]
>> >  },
>> >  "facets":{
>> >"count":19,
>> >"filter_by_child_type":{
>> >  "count":686,
>> >  "top_keywords_text":{
>> >"buckets":[{
>> >"val":"enron",
>> >"count":57,
>> >"top_terms_by_doc":9},
>> >  {
>> >"val":"california",
>> >"count":22,
>> >"top_terms_by_doc":13},
>> >  {
>> >"val":"power",
>> >"count":21,
>> >"top_terms_by_doc":7},
>> >  {
>> >"val":"rate",
>> >"count":15,
>> >"top_terms_by_doc":5},
>> >  {
>> >"val":"plan",
>> >"count":13,
>> >"top_terms_by_doc":3},
>> >  {
>> >"val":"hou",
>> >"count":12,
>> >"top_terms_by_doc":5},
>> >  {
>> >"val":"energy",
>> >"count":11,
>> >"top_terms_by_doc":5},
>> >  {
>> >"val":"na",
>> >"count":11,
>> >"top_terms_by_doc":5},
>> >  {
>> >"val":"mckinsey",
>> >"count":10,
>> >"top_terms_by_doc":1},

Re: Re[2]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

2016-03-29 Thread Mikhail Khludnev
Alisa,

There is no such thing as child.facet.limit, etc

On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z.  wrote:

>  So the first issue eventually solved by adding facet: {top_terms_by_doc:
> "unique(_root_)"} AND sorting the outer facet buckets by this faceting:
>
> curl http://localhost:8985/solr/enron_path_w_ts/query -d
> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
> json.facet={
>   filter_by_child_type :{
> type:query,
> q:"type_s:doc.enriched.text.keywords",
> domain: { blockChildren : "type_s:doc" },
> facet:{
>   top_keywords_text : {
> type: terms,
> field: text_t,
> limit: 10,
> sort: "top_terms_by_doc desc",
>  facet: {
>top_terms_by_doc: "unique(_root_)"
>  }
>   }
> }
>   }
> }'
>
>
> The  BlockJoin Faceting  part is still open:  I've tried all conventional
> faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit
> ... nothing worked :(
>
>
> >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. :
> >
> >Ok, so for the 1st question, I think I'm getting closer:  adding  facet:
> {top_terms_by_doc: "unique(_root_)"}  as indicated in
> http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns
> correct counts. However, sorting is done by the upper faceting not by the
> unique(_root_):
> >
> >
> >curl  http://localhost:8985/solr/my_collection /query -d
> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
> >json.facet={
> >  filter_by_child_type :{
> >type:query,
> >q:"type_s:doc.enriched.text.keywords",
> >domain: { blockChildren : "type_s:doc" },
> >facet:{
> >  top_keywords_text : {
> >type: terms,
> >field: text_t,
> >limit: 10,
> >facet: {
> >   top_terms_by_doc: "unique(_root_)"
> > }
> >  }
> >}
> >  }
> >}'
> >
> >RETURNS
> >
> >{
> >  "responseHeader":{
> >"status":0,
> >"QTime":25,
> >"params":{
> >  "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
> +Subject_t:california",
> >  "json.facet":"{\n  filter_by_child_type :{\ntype:query,\n
> q:\"type_s:doc.enriched.text.keywords\",\ndomain: { blockChildren :
> \"type_s:doc\" },\nfacet:{\n  top_keywords_text : {\ntype:
> terms,\nfield: text_t,\nlimit: 10,\nfacet:
> {\n   top_terms_by_doc: \"unique(_root_)\"\n }\n
> }\n}\n  }\n}",
> >  "rows":"0"}},
> >  "response":{"numFound":19,"start":0,"docs":[]
> >  },
> >  "facets":{
> >"count":19,
> >"filter_by_child_type":{
> >  "count":686,
> >  "top_keywords_text":{
> >"buckets":[{
> >"val":"enron",
> >"count":57,
> >"top_terms_by_doc":9},
> >  {
> >"val":"california",
> >"count":22,
> >"top_terms_by_doc":13},
> >  {
> >"val":"power",
> >"count":21,
> >"top_terms_by_doc":7},
> >  {
> >"val":"rate",
> >"count":15,
> >"top_terms_by_doc":5},
> >  {
> >"val":"plan",
> >"count":13,
> >"top_terms_by_doc":3},
> >  {
> >"val":"hou",
> >"count":12,
> >"top_terms_by_doc":5},
> >  {
> >"val":"energy",
> >"count":11,
> >"top_terms_by_doc":5},
> >  {
> >"val":"na",
> >"count":11,
> >"top_terms_by_doc":5},
> >  {
> >"val":"mckinsey",
> >"count":10,
> >"top_terms_by_doc":1},
> >  {
> >"val":"socal",
> >"count":10,
> >"top_terms_by_doc":4}]
> >
> >Nice, but I want them to be ordered by "top_terms_by_doc" frequencies,
> not by the "count" frequencies.
> >Any suggestions?
> >
> >Thanks,
> >Alisa
> >
> >
> >
> >
> >
> >>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < prol...@mail.ru
> >:
> >>
> >>Hi all,
> >>
> >>I am trying to perform faceting of parent docs by nested document
> fields. I've tried 2 approaches as in subject, yet in first the results are
> not quite correct and in the 2nd I cannot get the query right. So I need
> help on either of them and any explication or documentation or blogs on the
> behavior is much appreciated.
> >>
> >>Verbally the query is as follows: "Find top 10 keywords for all
> documents with "california" in email subject line"
> >>
> >>Here is the query with responses:
> >>
> >> Json Facet API 
> >>
> >>curl http://localhost:8985/solr/my_collection/query -d
> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
> >>json.facet={
> >>  filter_by_child_type :{
> >>type:query,
> >>q:"type_s:doc.enriched.text.keywords",
> >>domain: { blockChildren : "type_s:doc" },
> >>facet:{
> >>  

Re[2]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

2016-03-29 Thread Alisa Z .
 So the first issue eventually solved by adding facet: {top_terms_by_doc: 
"unique(_root_)"} AND sorting the outer facet buckets by this faceting:  

curl http://localhost:8985/solr/enron_path_w_ts/query -d 
'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"type_s:doc.enriched.text.keywords",
    domain: { blockChildren : "type_s:doc" },
    facet:{
  top_keywords_text : {
    type: terms,
    field: text_t,
    limit: 10,
    sort: "top_terms_by_doc desc",
     facet: {
   top_terms_by_doc: "unique(_root_)"
 }
  }
    }
  }
}'


The  BlockJoin Faceting  part is still open:  I've tried all conventional 
faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit ... 
nothing worked :( 


>Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. :
>
>Ok, so for the 1st question, I think I'm getting closer:  adding  facet: 
>{top_terms_by_doc: "unique(_root_)"}  as indicated in  
>http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns correct 
>counts. However, sorting is done by the upper faceting not by the 
>unique(_root_):  
>
>
>curl  http://localhost:8985/solr/my_collection /query -d 
>'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
>json.facet={
>  filter_by_child_type :{
>    type:query,
>    q:"type_s:doc.enriched.text.keywords",
>    domain: { blockChildren : "type_s:doc" },
>    facet:{
>  top_keywords_text : {
>    type: terms,
>    field: text_t,
>    limit: 10,
>    facet: {
>   top_terms_by_doc: "unique(_root_)"
> }
>  }
>    }
>  }
>}'
>
>RETURNS 
>
>{
>  "responseHeader":{
>    "status":0,
>    "QTime":25,
>    "params":{
>  "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
>+Subject_t:california",
>  "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    
>q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren : 
>\"type_s:doc\" },\n    facet:{\n  top_keywords_text : {\n    type: 
>terms,\n    field: text_t,\n    limit: 10,\n    facet: {\n 
>  top_terms_by_doc: \"unique(_root_)\"\n }\n  }\n    }\n  }\n}",
>  "rows":"0"}},
>  "response":{"numFound":19,"start":0,"docs":[]
>  },
>  "facets":{
>    "count":19,
>    "filter_by_child_type":{
>  "count":686,
>  "top_keywords_text":{
>    "buckets":[{
>    "val":"enron",
>    "count":57,
>    "top_terms_by_doc":9},
>  {
>    "val":"california",
>    "count":22,
>    "top_terms_by_doc":13},
>  {
>    "val":"power",
>    "count":21,
>    "top_terms_by_doc":7},
>  {
>    "val":"rate",
>    "count":15,
>    "top_terms_by_doc":5},
>  {
>    "val":"plan",
>    "count":13,
>    "top_terms_by_doc":3},
>  {
>    "val":"hou",
>    "count":12,
>    "top_terms_by_doc":5},
>  {
>    "val":"energy",
>    "count":11,
>    "top_terms_by_doc":5},
>  {
>    "val":"na",
>    "count":11,
>    "top_terms_by_doc":5},
>  {
>    "val":"mckinsey",
>    "count":10,
>    "top_terms_by_doc":1},
>  {
>    "val":"socal",
>    "count":10,
>    "top_terms_by_doc":4}]
>
>Nice, but I want them to be ordered by "top_terms_by_doc" frequencies,  not by 
>the "count" frequencies. 
>Any suggestions?
>
>Thanks,
>Alisa 
>
>
>
>
>
>>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < prol...@mail.ru >:
>>
>>Hi all, 
>>
>>I am trying to perform faceting of parent docs by nested document fields. 
>>I've tried 2 approaches as in subject, yet in first the results are not quite 
>>correct and in the 2nd I cannot get the query right. So I need help on either 
>>of them and any explication or documentation or blogs on the behavior is much 
>>appreciated.   
>>
>>Verbally the query is as follows: "Find top 10 keywords for all documents 
>>with "california" in email subject line"
>>
>>Here is the query with responses: 
>>
>> Json Facet API   
>>
>>curl http://localhost:8985/solr/my_collection/query -d 
>>'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california=0&
>>json.facet={
>>  filter_by_child_type :{
>>    type:query,
>>    q:"type_s:doc.enriched.text.keywords",
>>    domain: { blockChildren : "type_s:doc" },
>>    facet:{
>>  top_keywords_text : {
>>    type: terms,
>>    field: text_t,
>>    limit: 10
>>  }
>>    }
>>  }
>>}'
>>
>>RETURNS:  
>>
>>{
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":134,
>>    "params":{
>>  "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
>>+Subject_t:california",
>>  "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    

Deleted documents and expungeDeletes

2016-03-29 Thread Jostein Elvaker Haande
Hello everyone,

I apologise beforehand if this is a question that has been visited
numerous times on this list, but after hours spent on Google and
talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a
loss about SOLR and deleted documents.

I have quite a few indexes in both production and development
environments, where I see that the number of deleted documents just
keeps on growing and growing, but they never seem to be deleted. From
my understanding, this can be controller in the merge policy set for
the current core, but I've not been able to find any specifics on the
topic.

The general consensus on most search hits I've found is to perform an
optimize of the core, however this is both an expensive operation,
both in terms of CPU cycles as well as disk I/O, and also requires you
to have anywhere from 2 times to 3 times the size of the index
available on disk to be guaranteed to complete fully. Given these
criteria, it's often not something that is a viable option in certain
environments, both to it being a resource hog and often that you just
don't have the needed available disk space to perform the optimize.

After having spoken with a couple of people on IRC (thanks tokee and
elyograg), I was made aware of an optional parameter for 
called 'expungeDeletes' that can explicitly make sure that deleted
documents are deleted from the index, i.e:

curl http://localhost:8983/solr/coreName/update -H "Content-Type:
text/xml" --data-binary ''

Now my questions are as follows:

1) How can I make sure that this is dealt with in my merge policy, if
at all possible?
2) I've tried to find some disk space guidelines for 'expungeDeletes',
however I've not been able to find any. What are the general
guidelines here? Does it require as much space as an optimize, or is
it less "aggressive" compared to an optimize?
3) Is 'expungeDeletes' the recommended method to make sure your
deleted documents are actually removed from the index, or should you
deal with this in your merge policy?
4) I have also heard from talks on #SOLR that deleted documents has an
impact on the relevancy of performed searches. Is this correct, or
just misinformation?

If you require any additional information, like snippets from my
configuration (solrconfig.xml), I'm more than happy to provide this.

Again, if this is an issue that's being revisited for the Nth time, I
apologize, I'm just trying to get my head around this with my somewhat
limited SOLR knowledge.

-- 
Yours sincerely Jostein Elvaker Haande
"A free society is a society where it is safe to be unpopular"
- Adlai Stevenson

http://tolecnal.net -- tolecnal at tolecnal dot net


Solr not working on new environment

2016-03-29 Thread Jarus Bosman
Hi,

Introductions first (as I was taught): My name is Jarus Bosman, I am a
software developer from South Africa, doing development in Java, PHP and
Delphi. I have been programming for 19 years and find out more every day
that I don't actually know anything about programming ;).

My problem:

We recently moved our environment to a new server. I've installed 5.5.0 on
the new environment. When I want to start the server, I get the following:

*Welcome to the SolrCloud example!*

*Starting up 2 Solr nodes for your example SolrCloud cluster.*

*Solr home directory /opt/solr-5.5.0/example/cloud/node1/solr already
exists.*
*/opt/solr-5.5.0/example/cloud/node2 already exists.*

*Starting up Solr on port 8983 using command:*
*/opt/solr-5.5.0/bin/solr start -cloud -p 8983 -s
"/opt/solr-5.5.0/example/cloud/node1/solr"*

*Waiting up to 30 seconds to see Solr running on port 8983 [/]  Still not
seeing Solr listening on 8983 after 30 seconds!*
*INFO  - 2016-03-29 14:22:14.356; [   ] org.eclipse.jetty.util.log.Log;
Logging initialized @463ms*
*INFO  - 2016-03-29 14:22:14.717; [   ] org.eclipse.jetty.server.Server;
jetty-9.2.13.v20150730*
*WARN  - 2016-03-29 14:22:14.752; [   ]
org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog*
*INFO  - 2016-03-29 14:22:14.757; [   ]
org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
[file:/opt/solr-5.5.0/server/contexts/] at interval 0*
*INFO  - 2016-03-29 14:22:15.768; [   ]
org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
/solr, did not find org.apache.jasper.servlet.JspServlet*
*WARN  - 2016-03-29 14:22:15.790; [   ]
org.eclipse.jetty.security.ConstraintSecurityHandler;
ServletContext@o.e.j.w.WebAppContext@7a583307{/solr,file:/opt/solr-5.5.0/server/solr-webapp/webapp/,STARTING}{/opt/solr-5.5.0/server/solr-webapp/webapp}
has uncovered http methods for path: /*
*INFO  - 2016-03-29 14:22:15.809; [   ]
org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init():
WebAppClassLoader=1287618844@4cbf811c*
*INFO  - 2016-03-29 14:22:15.848; [   ]
org.apache.solr.core.SolrResourceLoader; JNDI not configured for solr
(NoInitialContextEx)*
*INFO  - 2016-03-29 14:22:15.849; [   ]
org.apache.solr.core.SolrResourceLoader; using system property
solr.solr.home: /opt/solr-5.5.0/example/cloud/node1/solr*
*INFO  - 2016-03-29 14:22:15.850; [   ]
org.apache.solr.core.SolrResourceLoader; new SolrResourceLoader for
directory: '/opt/solr-5.5.0/example/cloud/node1/solr'*
*INFO  - 2016-03-29 14:22:15.851; [   ]
org.apache.solr.core.SolrResourceLoader; JNDI not configured for solr
(NoInitialContextEx)*
*INFO  - 2016-03-29 14:22:15.852; [   ]
org.apache.solr.core.SolrResourceLoader; using system property
solr.solr.home: /opt/solr-5.5.0/example/cloud/node1/solr*
*INFO  - 2016-03-29 14:22:15.880; [   ] org.apache.solr.core.SolrXmlConfig;
Loading container configuration from
/opt/solr-5.5.0/example/cloud/node1/solr/solr.xml*
*INFO  - 2016-03-29 14:22:16.051; [   ]
org.apache.solr.core.CorePropertiesLocator; Config-defined core root
directory: /opt/solr-5.5.0/example/cloud/node1/solr*
*INFO  - 2016-03-29 14:22:16.104; [   ] org.apache.solr.core.CoreContainer;
New CoreContainer 1211012646*
*INFO  - 2016-03-29 14:22:16.104; [   ] org.apache.solr.core.CoreContainer;
Loading cores into CoreContainer
[instanceDir=/opt/solr-5.5.0/example/cloud/node1/solr]*
*WARN  - 2016-03-29 14:22:16.109; [   ] org.apache.solr.core.CoreContainer;
Couldn't add files from /opt/solr-5.5.0/example/cloud/node1/solr/lib to
classpath: /opt/solr-5.5.0/example/cloud/node1/solr/lib*
*INFO  - 2016-03-29 14:22:16.133; [   ]
org.apache.solr.handler.component.HttpShardHandlerFactory; created with
socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost :
20,maxConnections : 1,corePoolSize : 0,maximumPoolSize :
2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy :
false,useRetries : false,*
*INFO  - 2016-03-29 14:22:16.584; [   ]
org.apache.solr.update.UpdateShardHandler; Creating UpdateShardHandler HTTP
client with params: socketTimeout=60=6=true*
*INFO  - 2016-03-29 14:22:16.590; [   ] org.apache.solr.logging.LogWatcher;
SLF4J impl is org.slf4j.impl.Log4jLoggerFactory*
*INFO  - 2016-03-29 14:22:16.592; [   ] org.apache.solr.logging.LogWatcher;
Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]*
*INFO  - 2016-03-29 14:22:16.603; [   ]
org.apache.solr.cloud.SolrZkServerProps; Reading configuration from:
/opt/solr-5.5.0/example/cloud/node1/solr/zoo.cfg*
*INFO  - 2016-03-29 14:22:16.605; [   ] org.apache.solr.cloud.SolrZkServer;
STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983*
*INFO  - 2016-03-29 14:22:17.106; [   ] org.apache.solr.core.ZkContainer;
Zookeeper client=localhost:9983*
*ERROR: Did not see Solr at http://localhost:8983/solr
 come online within 30*



However, when I do a ps - ef|grep solr, I can see it is running:

*root  23835  1  0 16:22 pts/500:00:11 java -server 

Re: High Cpu sys usage

2016-03-29 Thread Toke Eskildsen
On Tue, 2016-03-29 at 20:12 +0800, YouPeng Yang wrote:
>   Our system still goes down as times going.We found lots of threads are
> WAITING.Here is the threaddump that I copy from the web page.And 4 pictures
> for it.
>   Is there any relationship with my problem?

That is a lot of commitScheduler-threads. Do you have hundreds of
collections in your cloud?


Try grepping for "Overlapping onDeckSearchers" in your solr.logs to see
if you got caught in a downwards spiral of concurrent commits.

- Toke Eskildsen, State and University Library, Denmark




Re: High Cpu sys usage

2016-03-29 Thread YouPeng Yang
Hi
  Our system still goes down as times going.We found lots of threads are
WAITING.Here is the threaddump that I copy from the web page.And 4 pictures
for it.
  Is there any relationship with my problem?


https://www.dropbox.com/s/h3wyez091oouwck/threaddump?dl=0
https://www.dropbox.com/s/p3ctuxb3t1jgo2e/threaddump1.jpg?dl=0
https://www.dropbox.com/s/w0uy15h6z984ntw/threaddump2.jpg?dl=0
https://www.dropbox.com/s/0frskxdllxlz9ha/threaddump3.jpg?dl=0
https://www.dropbox.com/s/46ptnly1ngi9nb6/threaddump4.jpg?dl=0


Best Regards

2016-03-18 14:35 GMT+08:00 YouPeng Yang :

> Hi
>   To Patrick: Never mind .Thank you for your suggestion all the same.
>   To Otis. We do not use SPM. We monintor the JVM just use jstat becasue
> my system went well before ,so we do not need  other tools.
> But SPM is really awesome .
>
>   Still looking for help.
>
> Best Regards
>
> 2016-03-18 6:01 GMT+08:00 Patrick Plaatje :
>
>> Yeah, I did’t pay attention to the cached memory at all, my bad!
>>
>> I remember running into a similar situation a couple of years ago, one of
>> the things to investigate our memory profile was to produce a full heap
>> dump and manually analyse that using a tool like MAT.
>>
>> Cheers,
>> -patrick
>>
>>
>>
>>
>> On 17/03/2016, 21:58, "Otis Gospodnetić" 
>> wrote:
>>
>> >Hi,
>> >
>> >On Wed, Mar 16, 2016 at 10:59 AM, Patrick Plaatje 
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> From the sar output you supplied, it looks like you might have a memory
>> >> issue on your hosts. The memory usage just before your crash seems to
>> be
>> >> *very* close to 100%. Even the slightest increase (Solr itself, or
>> possibly
>> >> by a system service) could caused the system crash. What are the
>> >> specifications of your hosts and how much memory are you allocating?
>> >
>> >
>> >That's normal actually - http://www.linuxatemyram.com/
>> >
>> >You *want* Linux to be using all your memory - you paid for it :)
>> >
>> >Otis
>> >--
>> >Monitoring - Log Management - Alerting - Anomaly Detection
>> >Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >
>> >
>> >
>> >
>> >>
>> >
>> >
>> >>
>> >>
>> >> On 16/03/2016, 14:52, "YouPeng Yang" 
>> wrote:
>> >>
>> >> >Hi
>> >> > It happened again,and worse thing is that my system went to crash.we
>> can
>> >> >even not connect to it with ssh.
>> >> > I use the sar command to capture the statistics information about
>> it.Here
>> >> >are my details:
>> >> >
>> >> >
>> >> >[1]cpu(by using sar -u),we have to restart our system just as the red
>> font
>> >> >LINUX RESTART in the logs.
>> >>
>> >>
>> >--
>> >> >03:00:01 PM all  7.61  0.00  0.92  0.07  0.00
>> >> >91.40
>> >> >03:10:01 PM all  7.71  0.00  1.29  0.06  0.00
>> >> >90.94
>> >> >03:20:01 PM all  7.62  0.00  1.98  0.06  0.00
>> >> >90.34
>> >> >03:30:35 PM all  5.65  0.00 31.08  0.04  0.00
>> >> >63.23
>> >> >03:42:40 PM all 47.58  0.00 52.25  0.00  0.00
>> >> > 0.16
>> >> >Average:all  8.21  0.00  1.57  0.05  0.00
>> >> >90.17
>> >> >
>> >> >04:42:04 PM   LINUX RESTART
>> >> >
>> >> >04:50:01 PM CPU %user %nice   %system   %iowait%steal
>> >> >%idle
>> >> >05:00:01 PM all  3.49  0.00  0.62  0.15  0.00
>> >> >95.75
>> >> >05:10:01 PM all  9.03  0.00  0.92  0.28  0.00
>> >> >89.77
>> >> >05:20:01 PM all  7.06  0.00  0.78  0.05  0.00
>> >> >92.11
>> >> >05:30:01 PM all  6.67  0.00  0.79  0.06  0.00
>> >> >92.48
>> >> >05:40:01 PM all  6.26  0.00  0.76  0.05  0.00
>> >> >92.93
>> >> >05:50:01 PM all  5.49  0.00  0.71  0.05  0.00
>> >> >93.75
>> >>
>> >>
>> >--
>> >> >
>> >> >[2]mem(by using sar -r)
>> >>
>> >>
>> >--
>> >> >03:00:01 PM   1519272 196633272 99.23361112  76364340
>> 143574212
>> >> >47.77
>> >> >03:10:01 PM   1451764 196700780 99.27361196  76336340
>> 143581608
>> >> >47.77
>> >> >03:20:01 PM   1453400 196699144 99.27361448  76248584
>> 143551128
>> >> >47.76
>> >> >03:30:35 PM   1513844 196638700 99.24361648  76022016
>> 143828244
>> >> >47.85
>> >> >03:42:40 PM   1481108 196671436 99.25361676  75718320
>> 144478784
>> >> >48.07
>> >> >Average:  5051607 193100937 97.45362421  81775777
>> 142758861
>> >> >47.50
>> >> >
>> >> >04:42:04 PM   LINUX RESTART
>> >> >
>> >> >04:50:01 PM kbmemfree kbmemused  %memused kbbuffers  

Distributing Collections across Shards

2016-03-29 Thread Salman Ansari
Hi,

I believe the default behavior of creating collections distributed across
shards through the following command

http://
[solrlocation]:8983/solr/admin/collections?action=CREATE=[collection_name]=2=2=2=[configuration_name]

is that Solr will create the collection as follows

*shard1: *leader in server1 and replica in server2
*shard2:* leader in server2 and replica in server1

However, I have seen cases when running the above command that it creates
both the leader and replica on the same server.

Wondering if there is a way to control this behavior (I mean control where
the leader and the replica of each shard will reside)?

Regards,
Salman


Re: Setting up a two nodes Solr Cloud 5.4.1 environment

2016-03-29 Thread Victor D'agostino

Hi guys

It seems I tried to add two additional shards on a existing Solr 
ensemble and this is not supported (or I didn't find how).


So after setting ZooKeeper I first setup my node n°2 and then setup my 
node n°1 with
wget --no-proxy 
"http://node1:8983/solr/admin/collections?=x=db=1=CREATE=4=2;


Because node n°2 was already up then two shards are created on each node.

Regards
Victor

 Message original 
*Sujet: *Re: Setting up a two nodes Solr Cloud 5.4.1 environment
*De : *Victor D'agostino 
*Pour : *solr-user@lucene.apache.org
*Copie à : *Erick Erickson 
*Date : *29/03/2016 09:58

Hi Erick

Thanks for your help, here is what I've done.

1. I deleted zookeepers and Solr installations.
2. I setup zookeepers on my two servers.
3. I successfully setup Solr Cloud node 1 with the same API call (1 
collection named db and two cores) :
 wget --no-proxy 
"http://$HOSTNAME:8983/solr/admin/collections?numShards=2=copiemail3=compositeId=2=mail_id=db=1=CREATE;


4. I didn't use the core API anymore.
I tried to setup node 2 with the collection API 
 
and here is the error message (shards can be added only to 'implicit' 
collections) :


*Request :*
wget --no-proxy 
"http://$HOSTNAME:8983/solr/admin/collections?action=CREATESHARD=db=db_shard3_replica1;

*
**Error log**:*
2016-03-29 08:49:09.422 INFO  (qtp2085805465-13) [   ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :createshard 
with params shard=db_shard3_replica1=CREATESHARD=db
2016-03-29 08:49:09.425 ERROR (qtp2085805465-13) [   ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: 
shards can be added only to 'implicit' collections
at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation$10.call(CollectionsHandler.java:468)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:176)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:438)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:499)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)


If I do a status check on Solr node 2 (lxlyosol31) I can see ZooKeeper 
is ok but node 2 is not in the cluster :


/etc/init.d/solr status

Found 1 Solr nodes:

Solr process 3883 running on port 8983
{
  "solr_home":"/data/solr-5.4.1/server/solr",
  "version":"5.4.1 1725212 - jpountz - 2016-01-18 11:51:45",
  "startTime":"2016-03-29T07:49:15.192Z",
  "uptime":"0 days, 0 hours, 7 minutes, 48 seconds",
  "memory":"259.7 MB (%10.8) of 2.4 GB",
  "cloud":{
"ZooKeeper":"lxlyosol30:2181,lxlyosol31:2181",
"liveNodes":"2",
"collections":"1"}}


Regards
Victor


 Message original 
*Sujet: *Re: Setting up a two nodes Solr Cloud 5.4.1 environment
*De : *Erick Erickson 
*Pour : 

Re: Problem in Issuing a Command to Upload Configuration

2016-03-29 Thread Salman Ansari
Moreover, I have created those new collections as a work around as my past
collections were not coming up after a complete restart for machines
hosting zookeepers and Solr. I would be interested to know what is the
proper procedure of bringing old collections up after a restart of
zookeeper ensemble and Solr instances.

Appreciate any feedback and comments.

Regards,
Salman


On Tue, Mar 29, 2016 at 11:53 AM, Salman Ansari 
wrote:

> Thanks Reth for your response. It did work.
>
> Regards,
> Salman
>
> On Mon, Mar 28, 2016 at 8:01 PM, Reth RM  wrote:
>
>> I think it should be "zkcli.bat" (all in lower case) that is shipped with
>> solr not zkCli.cmd(that is shipped with zookeeper)
>>
>> solr_home/server/scripts/cloud-scripts/zkcli.bat -zkhost 127.0.0.1:9983 \
>>-cmd upconfig -confname my_new_config -confdir
>> server/solr/configsets/basic_configs/conf
>>
>> On Mon, Mar 28, 2016 at 8:18 PM, Salman Ansari 
>> wrote:
>>
>> > Hi,
>> >
>> > I am facing issue uploading configuration to Zookeeper ensemble. I am
>> > running this on Windows as
>> >
>> > *Command*
>> > **
>> > zkCli.cmd -cmd upconfig -zkhost
>> > "[localserver]:2181,[second_server]:2181,[third_server]:2181" -confname
>> > [config_name]  -confdir "[config_dir]"
>> >
>> > and I got the following result
>> >
>> > *Result*
>> > =
>> > Connecting to localhost:2181
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:host.name=SabrSolrServer1.SabrSolrServer1.a2.internal.cloudapp.net
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.version=1.8.0_77
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.vendor=Oracle Corporation
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.home=C:\Program Files\Java\jre1.8.0_77
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> >
>> >
>> ent:java.class.path=C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\build\classes;C:\So
>> >
>> >
>> lr\Zookeeper\zookeeper-3.4.6\bin\..\build\lib\*;C:\Solr\Zookeeper\zookeeper-3.4.
>> >
>> >
>> 6\bin\..\zookeeper-3.4.6.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\jline-
>> >
>> >
>> 0.9.94.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\log4j-1.2.16.jar;C:\Solr
>> >
>> >
>> \Zookeeper\zookeeper-3.4.6\bin\..\lib\netty-3.7.0.Final.jar;C:\Solr\Zookeeper\zo
>> >
>> >
>> okeeper-3.4.6\bin\..\lib\slf4j-api-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\b
>> >
>> >
>> in\..\lib\slf4j-log4j12-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\conf
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> >
>> >
>> ent:java.library.path=C:\ProgramData\Oracle\Java\javapath;C:\Windows\Sun\Java\bi
>> >
>> >
>> n;C:\Windows\system32;C:\Windows;C:\ProgramData\Oracle\Java\javapath;C:\Windows\
>> >
>> >
>> system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
>> > ll\v1.0\;C:\Program Files\Java\JDK\bin;.
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.io.tmpdir=C:\Users\ADMIN_~1\AppData\Local\Temp\2\
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.compiler=
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:os.name=Windows Server 2012 R2
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:os.arch=amd64
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:os.version=6.3
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:user.name=admin_user
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:user.home=C:\Users\admin_user
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:user.dir=C:\Solr\Zookeeper\zookeeper-3.4.6\bin
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:ZooKeeper@438] -
>> Initiating
>> > client
>> >  connection, connectString=localhost:2181 sessionTimeout=3
>> > watcher=org.apach
>> > e.zookeeper.ZooKeeperMain$MyWatcher@506c589e
>> >
>> > It looks like that it is not even calling the command. Any idea why is
>> that
>> > happening?
>> >
>> > Regards,
>> > Salman
>> >
>>
>
>


Re: Problem in Issuing a Command to Upload Configuration

2016-03-29 Thread Salman Ansari
Thanks Reth for your response. It did work.

Regards,
Salman

On Mon, Mar 28, 2016 at 8:01 PM, Reth RM  wrote:

> I think it should be "zkcli.bat" (all in lower case) that is shipped with
> solr not zkCli.cmd(that is shipped with zookeeper)
>
> solr_home/server/scripts/cloud-scripts/zkcli.bat -zkhost 127.0.0.1:9983 \
>-cmd upconfig -confname my_new_config -confdir
> server/solr/configsets/basic_configs/conf
>
> On Mon, Mar 28, 2016 at 8:18 PM, Salman Ansari 
> wrote:
>
> > Hi,
> >
> > I am facing issue uploading configuration to Zookeeper ensemble. I am
> > running this on Windows as
> >
> > *Command*
> > **
> > zkCli.cmd -cmd upconfig -zkhost
> > "[localserver]:2181,[second_server]:2181,[third_server]:2181" -confname
> > [config_name]  -confdir "[config_dir]"
> >
> > and I got the following result
> >
> > *Result*
> > =
> > Connecting to localhost:2181
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:host.name=SabrSolrServer1.SabrSolrServer1.a2.internal.cloudapp.net
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.version=1.8.0_77
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.vendor=Oracle Corporation
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.home=C:\Program Files\Java\jre1.8.0_77
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> >
> >
> ent:java.class.path=C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\build\classes;C:\So
> >
> >
> lr\Zookeeper\zookeeper-3.4.6\bin\..\build\lib\*;C:\Solr\Zookeeper\zookeeper-3.4.
> >
> >
> 6\bin\..\zookeeper-3.4.6.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\jline-
> >
> >
> 0.9.94.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\log4j-1.2.16.jar;C:\Solr
> >
> >
> \Zookeeper\zookeeper-3.4.6\bin\..\lib\netty-3.7.0.Final.jar;C:\Solr\Zookeeper\zo
> >
> >
> okeeper-3.4.6\bin\..\lib\slf4j-api-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\b
> >
> >
> in\..\lib\slf4j-log4j12-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\conf
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> >
> >
> ent:java.library.path=C:\ProgramData\Oracle\Java\javapath;C:\Windows\Sun\Java\bi
> >
> >
> n;C:\Windows\system32;C:\Windows;C:\ProgramData\Oracle\Java\javapath;C:\Windows\
> >
> >
> system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
> > ll\v1.0\;C:\Program Files\Java\JDK\bin;.
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.io.tmpdir=C:\Users\ADMIN_~1\AppData\Local\Temp\2\
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.compiler=
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:os.name=Windows Server 2012 R2
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:os.arch=amd64
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:os.version=6.3
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:user.name=admin_user
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:user.home=C:\Users\admin_user
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:user.dir=C:\Solr\Zookeeper\zookeeper-3.4.6\bin
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:ZooKeeper@438] -
> Initiating
> > client
> >  connection, connectString=localhost:2181 sessionTimeout=3
> > watcher=org.apach
> > e.zookeeper.ZooKeeperMain$MyWatcher@506c589e
> >
> > It looks like that it is not even calling the command. Any idea why is
> that
> > happening?
> >
> > Regards,
> > Salman
> >
>


Re: Setting up a two nodes Solr Cloud 5.4.1 environment

2016-03-29 Thread Victor D'agostino

Hi Erick

Thanks for your help, here is what I've done.

1. I deleted zookeepers and Solr installations.
2. I setup zookeepers on my two servers.
3. I successfully setup Solr Cloud node 1 with the same API call (1 
collection named db and two cores) :
 wget --no-proxy 
"http://$HOSTNAME:8983/solr/admin/collections?numShards=2=copiemail3=compositeId=2=mail_id=db=1=CREATE;


4. I didn't use the core API anymore.
I tried to setup node 2 with the collection API 
 
and here is the error message (shards can be added only to 'implicit' 
collections) :


*Request :*
wget --no-proxy 
"http://$HOSTNAME:8983/solr/admin/collections?action=CREATESHARD=db=db_shard3_replica1;

*
**Error log**:*
2016-03-29 08:49:09.422 INFO  (qtp2085805465-13) [   ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :createshard with 
params shard=db_shard3_replica1=CREATESHARD=db
2016-03-29 08:49:09.425 ERROR (qtp2085805465-13) [   ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: shards 
can be added only to 'implicit' collections
at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation$10.call(CollectionsHandler.java:468)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:176)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:438)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:499)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)


If I do a status check on Solr node 2 (lxlyosol31) I can see ZooKeeper 
is ok but node 2 is not in the cluster :


/etc/init.d/solr status

Found 1 Solr nodes:

Solr process 3883 running on port 8983
{
  "solr_home":"/data/solr-5.4.1/server/solr",
  "version":"5.4.1 1725212 - jpountz - 2016-01-18 11:51:45",
  "startTime":"2016-03-29T07:49:15.192Z",
  "uptime":"0 days, 0 hours, 7 minutes, 48 seconds",
  "memory":"259.7 MB (%10.8) of 2.4 GB",
  "cloud":{
"ZooKeeper":"lxlyosol30:2181,lxlyosol31:2181",
"liveNodes":"2",
"collections":"1"}}


Regards
Victor


 Message original 
*Sujet: *Re: Setting up a two nodes Solr Cloud 5.4.1 environment
*De : *Erick Erickson 
*Pour : *solr-user 
*Date : *25/03/2016 19:44

bq:  (collections API won't work because i use compositeId routing mode)

This had better NOT be true or SolrCloud is horribly broken. compositeId is
the default and it's tested a all the time by unit tests. So is implicit for
that matter.

One question I have is that you've specified a route field with this param:

router.field=mail_id

so the data is being routed based on a hash of that field, your GUID-based
id field is totally ignored for routing purposes. That may be what you intend,
but it's confusing that you mentioned the GUID in that context.

As far as Solr is concerned, you only have a 2 shard