RE: How to make SolrCloud more elastic

2015-02-12 Thread Matt Kuiper
Toke,

Thanks for your reply.  Yes, I believe I will be working with a write once 
archive.  However, my understanding is that all shards are defined up front, 
with the option to split later.

Can you describe, or point me to documentation, on how to create shards one at 
a time?  

Thanks,
Matt

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Wednesday, February 11, 2015 11:47 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make SolrCloud more elastic

On Wed, 2015-02-11 at 21:32 +0100, Matt Kuiper wrote:
 I am starting a new project and one of the requirements is that Solr 
 must scale to handle increasing load (both search performance and 
 index size).

[...]

 Before I got too deep, I wondered if anyone has any tips or warnings 
 on these approaches, or has scaled Solr in a different manner.

If your corpus only contains static content (e.e. log files or a write-once 
archive), you can create shards one at a time and optimize them. This lowers 
requirements for your searchers.

- Toke Eskildsen, State and University Library, Denmark




RE: How to make SolrCloud more elastic

2015-02-12 Thread Toke Eskildsen
Matt Kuiper [matt.kui...@issinc.com] wrote:
 Thanks for your reply.  Yes, I believe I will be working with a write
 once archive.  However, my understanding is that all shards are
 defined up front, with the option to split later.

Our situation might be a bit special as a few minutes downtime - preferably at 
off-peak hours - now and then is acceptable.

We basically maintain a SolrCloud with static shards and use a completely 
separate builder to generate new shards, one at a time. When the builder has 
finished a shard, we add it to the cloud the hard way (re-configuration and 
restarting, hence the downtime). There's a description at 
https://sbdevel.wordpress.com/net-archive-search/

To avoid too much ZooKeeper hassle, we have a bunch of empty shards, ready to 
be switched with newly build ones. We have contemplated making the shard under 
construction being part of the Solrcloud, but have yet to experiment with that 
setup.

Static shards, optimized down to a single segment and using DocValues for 
faceting is a very potent mix: A Solr serving a non-static index needs more 
memory as it must be capable of handling having more than one version of the 
index open at a time, plus the indexing itself. Faceting on many unique values 
is more efficient with single-segment as there is no need for an internal 
structure mapping the terms between the segments.

- Toke Eskildsen


RE: How to make SolrCloud more elastic

2015-02-12 Thread Matt Kuiper
Thanks Alex. Per your recommendation I checked out the presentation and it was 
very informative.

While my problem space will not reach the scale addressed in this talk, some of 
the topics may be helpful.  Those being the improvements to shard splitting and 
the new 'migrate' API.

Thanks,
Matt

Matt Kuiper - Software Engineer
Intelligent Software Solutions
p. 719.452.7721 | matt.kui...@issinc.com 
www.issinc.com | LinkedIn: intelligent-software-solutions

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Wednesday, February 11, 2015 2:31 PM
To: solr-user
Subject: Re: How to make SolrCloud more elastic

Did you have a look at the presentations from the recent SolrRevolution? E.g.
https://www.youtube.com/watch?v=nxRROble76Alist=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 11 February 2015 at 15:32, Matt Kuiper matt.kui...@issinc.com wrote:
 I am starting a new project and one of the requirements is that Solr must 
 scale to handle increasing load (both search performance and index size).

 My understanding is that one way to address search performance is by adding 
 more replicas.

 I am more concerned about handling a growing index size.  I have already been 
 given some good input on this topic and am considering a shard splitting 
 approach, but am more focused on a rebalancing approach that includes 
 defining many shards up front and then moving these existing shards on to new 
 Solr servers as needed.  Plan to experiment with this approach first.

 Before I got too deep, I wondered if anyone has any tips or warnings on these 
 approaches, or has scaled Solr in a different manner.

 Thanks,
 Matt


Re: Multy-tenancy and quarantee of service per application (tenant)

2015-02-12 Thread Otis Gospodnetic
Not really, not 100%, if tenants share the same hardware and there is no
isolation through things like containers (in which case they don't share
the same SolrCloud cluster, really).

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Feb 12, 2015 at 11:17 AM, Victor Rondel rondelvic...@gmail.com
wrote:

 Hi everyone,

 I am wondering about multy-tenancy and garantee of service in SolrCloud :

 *Multy-tenant cluster* : Is there a way to *guarantee a level of service* /
 capacity planning for *each tenant* using the cluster (its *own
 collections*)
 ?


 Thanks,



Re: 43sec commit duration - blocked by index merge events?

2015-02-12 Thread Otis Gospodnetic
If you are using Solr and SPM for Solr, you can check a report that shows
the # of files in an index and the report that shows you the max docs-num
docs delta.  If you see the # of files drop during a commit, that's a
merge.  If you see a big delta change, that's probably a merge, too.

You could also jstack or kill -3 the JVM and see where it's spending its
time to give you some ideas what's going on inside.

HTH.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachum gilinac...@gmail.com wrote:

 Hello,

 During a load test I noticed a commit that took 43 seconds to complete
 (client hard complete).
 Is this to be expected? What's causing it?
 I have a pair of machines hosting a 128M docs collection (8 shards,
 replication factor=2).

 Could it be merges? In Lucene merges happen async of commit statements, but
 reading Solr's doc for Update Hanlder
 
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig
 
 it sounds like hard commits do wait for merges to occur: * The tradeoff is
 that a soft commit gives you faster visibility because it's not waiting for
 background merges to finish.*
 Thanks.



Re: Index directory containing only segments.gen

2015-02-12 Thread Zisis Tachtsidis
Well, I don't know If I'm being helpful but here goes.
My clusterstate.json actually has no leader for the shard in question. I
have 2 nodes as recovery_failed and one as down. No leaders there. I've
not used core admin or collections api to create anything. Everything was
setup using the - now deprecated - tags cores and core  inside solr.xml. 

Also index directories are different since I ended up copying the index from
the one node that still had it to the other too and restarting again. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186107.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multy-tenancy and quarantee of service per application (tenant)

2015-02-12 Thread Jack Krupansky
There are two main, distinct forms of multi-tenancy:

1. The service provider controls the app and the Solr server and the app is
carefully coded to isolate the data and load of the various tenants, such
as adding a filter query with the tenant ID and throttling requests in an
app server.
2. Each tenant has their own app and the service provider controls the Solr
server but has no control over the app or load.

The first is supported by Solr. The second is not, other than the service
provider spinning up separate instances of Solr on separate physical
servers.


-- Jack Krupansky

On Thu, Feb 12, 2015 at 1:30 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Not really, not 100%, if tenants share the same hardware and there is no
 isolation through things like containers (in which case they don't share
 the same SolrCloud cluster, really).

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Thu, Feb 12, 2015 at 11:17 AM, Victor Rondel rondelvic...@gmail.com
 wrote:

  Hi everyone,
 
  I am wondering about multy-tenancy and garantee of service in SolrCloud :
 
  *Multy-tenant cluster* : Is there a way to *guarantee a level of
 service* /
  capacity planning for *each tenant* using the cluster (its *own
  collections*)
  ?
 
 
  Thanks,
 



Re: Index directory containing only segments.gen

2015-02-12 Thread Zisis Tachtsidis
From the logs I've got one instance failing as described in my first comment
and the other two failing during PeerSync recovery when trying to
communicate with the server that was missing the segments_* files. The
exception follows


org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://server:host/solr/core
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.client.ClientProtocolException
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
... 10 more
Caused by: org.apache.http.ProtocolException: Invalid header: ,code=500}
at
org.apache.http.impl.io.AbstractMessageParser.parseHeaders(AbstractMessageParser.java:232)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:267)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:713)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:518)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
... 13 more




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186113.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud performance issues

2015-02-12 Thread Timothy Potter
Hi Vijay,


We're working on SOLR-6816 ... would love for you to be a test site for any
improvements we make ;-)

Curious if you've experimented with changing the mergeFactor to a higher
value, such as 25 and what happens if you set soft-auto-commits to
something lower like 15 seconds? Also, make sure your indexing clients are
not sending hard-commits as well, i.e. just rely on auto-commits.

re: When the number of replicas increases the bulk indexing time increase
almost exponentially ... ugh ... I'm wondering what your CPU utilization /
thread counts are? the Leader sends updates to all replicas in parallel, so
it shouldn't be a huge impact if you're doing 1 replica or 15 (probably a
little more overhead with 15, but not exponential for sure) ... what are
threads waiting on when this huge slow down occurs? jstack -l PID should
give you some idea.

Lastly, do you have GC logging enabled and have you ruled out GC pauses
causing the big slow down?

On Thu, Feb 12, 2015 at 4:07 PM, Vijay Sekhri sekhrivi...@gmail.com wrote:

 Hi Erick,
 We have following configuration of our solr cloud

1. 10 Shards
2. 15 replicas per shard
3. 9 GB of index size per shard
4. a total of around 90 mil documents
5. 2 collection viz search1 serving live traffic and search 2 for
indexing. We swap collection when indexing finishes
6. On 150 hosts we have 2 JVMs running one for search1 collection and
other for search2 collection
7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in
total
8. Each host has 16 processors
9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
10. We have two ways to index data.
1. Bulk indexing . All 90 million docs pumped in from 14 parallel
   process (on 14 different client hosts). This is done on
 collection that is
   not serving live traffic
   2.  Incremental indexing . Only delta changes (Range from 100K to 5
   Mil) every two hours. This is done on collection also serving live
 traffic
11. The request per second count on live collection is around 300 TPS
12. Hard commit setting is every 30 second with open searcher false and
soft commit setting is every 15 minutes . We have tried a lot of
 different
setting here BTW.




 Now we have two issues with indexing
 1) Solr just could not keep up with the bulk indexing when replicas are
 also active. We have concluded this by changing the number of replicas to
 just 2 , to 4 and then to 15. When the number of replicas increases the
 bulk indexing time increase almost exponentially
 We seem to have encountered the same issue reported here
 https://issues.apache.org/jira/browse/SOLR-6816
 It gets to a point that even to index 100 docs the solr cluster would take
 300 second. It would start of indexing 100 docs in 55 millisecond and
 slowly increase over time and within hour and a half just could not keep
 up. We have a workaround for this and i.e we stop all the replicas , do the
 bulk indexing and bring all the replicas up one by one . This sort of
 defeats the purpose of solr cloud but we can still work with this
 workaround. We can do this because , bulk indexing happen on the collection
 that is not serving live traffic. However we would love to have a solution
 from the solr cloud itself like ask it to stop replication and start via an
 API at the end of indexing.

 2) This issues is related to soft commit with incremental indexing . When
 we do incremental indexing, it is done on the same collection serving live
 traffic with 300 request per second throughput.  Everything is fine except
 whenever the soft commit happens. Each time soft commit (autosoftcommit in
 sorlconfig.xml) happens which BTW happens almost at the same time
 throughout the cluster , there is a spike in the response times and
 throughput decreases almost to 150 tps. The spike continues for 2 minutes
 and then it happens again at the exact interval when the soft commit
 happens. We have monitored the logs and found a direct co relation when the
 soft commit happens and when the response time tanks.

 Now the latter issue is quite disturbing , because it is serving live
 traffic and we cannot sustain these periodic degradation. We have played
 around with different soft commit setting . Interval ranging from 2 minutes
 to 30 minutes . Auto warming half cache  , auto warming full cache, auto
 warming only 10 %. Doing warm up queries on every new searcher , doing NONE
 warm up queries on every new searching and all the different setting yields
 the same results . As and when soft commit happens the response time tanks
 and throughput deceases. The difference is almost 50 % in response times
 and 50 % in throughput


 Our workaround for this solution is to also do incremental delta indexing
 on the collection not serving live traffic and swap when it is done. As you
 can see that this also defeats the purpose of solr 

RE: How to make SolrCloud more elastic

2015-02-12 Thread Matt Kuiper
Otis,

Thanks for your reply.  I see your point about too many shards and search 
efficiency.  I also agree that I need to get a better handle on customer 
requirements and expected loads.  

Initially I figured that with the shard splitting option, I would need to 
double my Solr nodes every time I split (as I would want to split every shard 
within the collection).  Where actually only the number of shards would double, 
and then I would have the opportunity to rebalance the shards over the existing 
Solr nodes plus a number of new nodes that make sense at the time.  This may be 
preferable to defining many micro shards up front.

The time-base collections may be an option for this project.  I am not familiar 
with query routing, can you point me to any documentation on how this might be 
implemented?

Thanks,
Matt

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Wednesday, February 11, 2015 9:13 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make SolrCloud more elastic

Hi Matt,

You could create extra shards up front, but if your queries are fanned out to 
all of them, you can run into situations where there are too many concurrent 
queries per node causing lots of content switching and ultimately being less 
efficient than if you had fewer shards.  So while this is an approach to take, 
I'd personally first try to run tests to see how much a single node can handle 
in terms of volume, expected query rates, and target latency, and then use 
monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that 
when you start approaching the target limits you are ready with additional 
nodes and shard splitting if needed.

Of course, if your data and queries are such that newer documents are queries   
more, you should look into time-based collections... and if your queries can 
only query a subset of data you should look into query routing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr  
Elasticsearch Support * http://sematext.com/


On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper matt.kui...@issinc.com wrote:

 I am starting a new project and one of the requirements is that Solr 
 must scale to handle increasing load (both search performance and index size).

 My understanding is that one way to address search performance is by 
 adding more replicas.

 I am more concerned about handling a growing index size.  I have 
 already been given some good input on this topic and am considering a 
 shard splitting approach, but am more focused on a rebalancing 
 approach that includes defining many shards up front and then moving 
 these existing shards on to new Solr servers as needed.  Plan to 
 experiment with this approach first.

 Before I got too deep, I wondered if anyone has any tips or warnings 
 on these approaches, or has scaled Solr in a different manner.

 Thanks,
 Matt



Re: Solrcloud performance issues

2015-02-12 Thread Otis Gospodnetic
Hi,

Did you say you have 150 servers in this cluster?  And 10 shards for just
90M docs?  If so, that 150 hosts sounds like too much for all other numbers
I see here.  I'd love to see some metrics here.  e.g. what happens with
disk IO around those commits?  How about GC time/size info?  Are JVM memory
pools full-ish and is the CPU jumping like crazy?  Can you share more info
to give us a more complete picture of your system? SPM for Solr
http://sematext.com/spm/ will help if you don't already capture these
types of things.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Feb 12, 2015 at 11:07 AM, Vijay Sekhri sekhrivi...@gmail.com
wrote:

 Hi Erick,
 We have following configuration of our solr cloud

1. 10 Shards
2. 15 replicas per shard
3. 9 GB of index size per shard
4. a total of around 90 mil documents
5. 2 collection viz search1 serving live traffic and search 2 for
indexing. We swap collection when indexing finishes
6. On 150 hosts we have 2 JVMs running one for search1 collection and
other for search2 collection
7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in
total
8. Each host has 16 processors
9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
10. We have two ways to index data.
1. Bulk indexing . All 90 million docs pumped in from 14 parallel
   process (on 14 different client hosts). This is done on
 collection that is
   not serving live traffic
   2.  Incremental indexing . Only delta changes (Range from 100K to 5
   Mil) every two hours. This is done on collection also serving live
 traffic
11. The request per second count on live collection is around 300 TPS
12. Hard commit setting is every 30 second with open searcher false and
soft commit setting is every 15 minutes . We have tried a lot of
 different
setting here BTW.




 Now we have two issues with indexing
 1) Solr just could not keep up with the bulk indexing when replicas are
 also active. We have concluded this by changing the number of replicas to
 just 2 , to 4 and then to 15. When the number of replicas increases the
 bulk indexing time increase almost exponentially
 We seem to have encountered the same issue reported here
 https://issues.apache.org/jira/browse/SOLR-6816
 It gets to a point that even to index 100 docs the solr cluster would take
 300 second. It would start of indexing 100 docs in 55 millisecond and
 slowly increase over time and within hour and a half just could not keep
 up. We have a workaround for this and i.e we stop all the replicas , do the
 bulk indexing and bring all the replicas up one by one . This sort of
 defeats the purpose of solr cloud but we can still work with this
 workaround. We can do this because , bulk indexing happen on the collection
 that is not serving live traffic. However we would love to have a solution
 from the solr cloud itself like ask it to stop replication and start via an
 API at the end of indexing.

 2) This issues is related to soft commit with incremental indexing . When
 we do incremental indexing, it is done on the same collection serving live
 traffic with 300 request per second throughput.  Everything is fine except
 whenever the soft commit happens. Each time soft commit (autosoftcommit in
 sorlconfig.xml) happens which BTW happens almost at the same time
 throughout the cluster , there is a spike in the response times and
 throughput decreases almost to 150 tps. The spike continues for 2 minutes
 and then it happens again at the exact interval when the soft commit
 happens. We have monitored the logs and found a direct co relation when the
 soft commit happens and when the response time tanks.

 Now the latter issue is quite disturbing , because it is serving live
 traffic and we cannot sustain these periodic degradation. We have played
 around with different soft commit setting . Interval ranging from 2 minutes
 to 30 minutes . Auto warming half cache  , auto warming full cache, auto
 warming only 10 %. Doing warm up queries on every new searcher , doing NONE
 warm up queries on every new searching and all the different setting yields
 the same results . As and when soft commit happens the response time tanks
 and throughput deceases. The difference is almost 50 % in response times
 and 50 % in throughput


 Our workaround for this solution is to also do incremental delta indexing
 on the collection not serving live traffic and swap when it is done. As you
 can see that this also defeats the purpose of solr cloud . We cannot do
 bulk indexing because replicas cannot keeps up and we cannot do incremental
 indexing because of soft commit performance.

 Is there a way to make the cluster not do soft commit all at the same time
 or is there a way to make soft commit not cause this degradation ?
 We are open 

Re: Index directory containing only segments.gen

2015-02-12 Thread Erick Erickson
So after adding some docs to the index (and committing) with those two
nodes active,
do segment files magically appear?

My _guess_ is that there's something radially wrong with you set up
the collection. Did
you by any chance use the core admin API to create the cores? That can lead to
interesting results of you don't get everything just right. For
instance, if you point
the data dir for all three nodes at the same directory...

What does your clusterstate.json file look like?

Best,
Erick

On Thu, Feb 12, 2015 at 8:30 AM, Zisis Tachtsidis zist...@runbox.com wrote:
 I'm using SolrCloud 4.10.3 and the current setup is simple using 3 nodes with
 1 shard. After a rolling restart of the Solr cluster I've ended up with 2
 failing nodes reporting the following

 org.apache.solr.servlet.SolrDispatchFilter
 null:org.apache.solr.common.SolrException: SolrCore 'core' is not available
 due to init failure: Error opening new searcher
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1574)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1686)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:853)
 ... 8 more
 Caused by: java.nio.file.NoSuchFileException: /path/to/index/segments_1
 at
 sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
 at
 sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
 at
 sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
 at
 sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
 at java.nio.channels.FileChannel.open(FileChannel.java:287)
 at java.nio.channels.FileChannel.open(FileChannel.java:334)
 at
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196)
 at
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341)
 at
 org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
 at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
 at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:792)
 at
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
 at
 org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
 at
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
 at
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
 at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1537)
 ... 10 more

 Checking the index directory of each node I found out that only
 *segments.gen* was inside. What I could not determine is how I ended up with
 this single file. Looking at the logs I could not find anything related. The
 3rd node had its index intact.
 Has anyone else encountered something similar?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: American British Dictionary for Solr

2015-02-12 Thread François Schiettecatte
Dinesh


See this:

http://wordlist.aspell.net/varcon/

You will need to do some work to convert to a SOLR friendly format though.

Cheers

François

 On Feb 12, 2015, at 12:22 AM, dinesh naik dineshkumarn...@gmail.com wrote:
 
 Hi ,
 We are looking for a dictionary to support American/British English synonym.
 Could you please let us know what all dictionaries are available ?
 -- 
 Best Regards,
 Dinesh Naik



Re: ApacheCon 2015 at Austin, TX

2015-02-12 Thread Dmitry Kan
Hi,

Looks like I'll be there. So if you want to discuss luke / lucene / solr,
will be happy to de-virtualize.

Dmitry

On Mon, Jan 12, 2015 at 6:32 PM, CP Mishra mishr...@gmail.com wrote:

 Hi,

 I am planning to attend ApacheCon 2015 at Austin, TX (Apr 13-16th) and
 wondering if there will be lucene/solr sessions in it.

 Anyone else planning to attend?

 Thanks,
 CP




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: SASL with zkcli.sh

2015-02-12 Thread Chris Hostetter

: I'm trying to start a SolrCloud cluster with a kerberized Zookeeper. I'm not
: sure if it is possible, I have a Hadoop Cluster with an already running
: zookeeper and I do not think running two zoo in parallel would be the wise
: choice.

: Is there a way to use SASL with SolrCloud ?

Work has been done along these lines, but it won't be available until Solr 
5.1...

https://issues.apache.org/jira/browse/SOLR-6915

...but you could certainly start experimenting with it using the 5x 
branch.

i don't know of any docs on how to use it yet -- but check the svn commits 
for details on what the test cofigs look like.


-Hoss
http://www.lucidworks.com/


Re: creating a new collection fails as SearchHandler can't be found

2015-02-12 Thread Lee Carroll
Hi
it was jars copied into a solr-zk-cli directory to allow easy running of
solr zk cmd line client. well i think that is what fixed tomcat! I've also
tried with jetty with a clean solr home and that also works and seems a
much cleaner way of running multiple instances  (probably more to do with
rubbish tomcat sys admin skills on my part than anything else)

anyway cheers for help.

On 11 February 2015 at 22:38, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : The collection fails to be created (shard_replica dir and data and index
 : across the servers get created but collection creation fails)
 :
 : The full log is appended below. I thought it should be a straight forward
 : class not found problem but I just can't seem to fix this (few hours
 now).
 : I've even placed all the libs from solr.war into a directory and
 referenced
 : these from the solrconfig. You can see these being loaded in the below
 log.

 what exactly did hte logs  errors look like *BEFORE* you started copying
 libs arround?

 as things stand right now, you've got at least 2 (possibly 3, i'm not
 certain) copies of every Solr class in your classpath -- which can cause a
 lot more classloader type errors (including ClassNotFound) then it will
 ever solve, because of how the classloader hierarchy works -- if you have
 to copies of ClassX loaded by two differnet classloaders, and ClassX
 refers to SearchHandler but SearchHandler is not available in the same
 ClassLoader that loaded ClassX, all sorts of not-fun ClassNotFound type
 exceptions can happen.

 Go back to a clean install, w/ a single solr.war file, and no manulaly
 copied jars anywhere, and see if you still get problems with SearchHandler
 when you create a collection.

 if you do, then take another step back and try a single (tomcat) solr node
 setup (no solrcloud) with an instanceDir already in place with a single
 SolrCore
 using hte example configs included in Solr 4.10.3 ... if that *STILL*
 doesn't work, and you still get ClassNotFound errors then something is
 jacked up with your tomcat setup/classpath (unless of coure you can still
 reproduce the same problem with the same example configs using hte
 Solr provided jetty -- in that case, the problem gets a lot more
 interesting and your logs from *that* would be helpful to diagnose it)





 : Any help would be appreciated.
 :
 : Cheers Lee C
 :
 :
 : INFO  - 2015-02-11 19:22:42.494;
 : org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null
 : path=/admin/collections
 : params={numShards=4name=hotelPackagereplicationFactor=1action=CREATE}
 : status=0 QTime=3519
 : INFO  - 2015-02-11 19:22:42.594;
 : org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change:
 : WatchedEvent state:SyncConnected type:NodeDataChanged
 : path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
 : INFO  - 2015-02-11 19:33:52.275;
 : org.apache.solr.handler.admin.CollectionsHandler; Creating Collection :
 : numShards=4name=hotelPackagereplicationFactor=1action=CREATE
 : INFO  - 2015-02-11 19:33:52.280;
 : org.apache.solr.cloud.DistributedQueue$LatchChildWatcher;
 LatchChildWatcher
 : fired on path: /overseer/collection-queue-work state: SyncConnected type
 : NodeChildrenChanged
 : INFO  - 2015-02-11 19:33:52.282;
 : org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection
 : Processor: Get the message
 id:/overseer/collection-queue-work/qn-78
 : message:{
 :   operation:createcollection,
 :   fromApi:true,
 :   name:hotelPackage,
 :   replicationFactor:1,
 :   numShards:4}
 : WARN  - 2015-02-11 19:33:52.283;
 : org.apache.solr.cloud.OverseerCollectionProcessor;
 : OverseerCollectionProcessor.processMessage : createcollection , {
 :   operation:createcollection,
 :   fromApi:true,
 :   name:hotelPackage,
 :   replicationFactor:1,
 :   numShards:4}
 : INFO  - 2015-02-11 19:33:52.284;
 : org.apache.solr.cloud.OverseerCollectionProcessor; Only one config set
 : found in zk - using it:hotelPackageConf
 : INFO  - 2015-02-11 19:33:52.285;
 : org.apache.solr.cloud.OverseerCollectionProcessor; creating collections
 : conf node /collections/hotelPackage
 : INFO  - 2015-02-11 19:33:52.285;
 org.apache.solr.common.cloud.SolrZkClient;
 : makePath: /collections/hotelPackage
 : INFO  - 2015-02-11 19:33:52.297;
 : org.apache.solr.cloud.DistributedQueue$LatchChildWatcher;
 LatchChildWatcher
 : fired on path: /overseer/queue state: SyncConnected type
 NodeChildrenChanged
 : INFO  - 2015-02-11 19:33:52.300;
 : org.apache.solr.cloud.Overseer$ClusterStateUpdater; building a new
 : collection: hotelPackage
 : INFO  - 2015-02-11 19:33:52.300;
 : org.apache.solr.cloud.Overseer$ClusterStateUpdater; Create collection
 : hotelPackage with shards [shard1, shard2, shard3, shard4]
 : INFO  - 2015-02-11 19:33:52.313;
 : org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change:
 : WatchedEvent state:SyncConnected type:NodeDataChanged
 : path:/clusterstate.json, has occurred - updating... (live nodes 

Re: Index directory containing only segments.gen

2015-02-12 Thread Erick Erickson
OK, I think this is the root of your problem:

bq:  Everything was setup using the - now deprecated - tags cores
and core  inside solr.xml.

There are a bunch of ways this could go wrong. I'm pretty sure you
have something that would take quite a while to untangle, so unless
you have a _very_ good reason for making this work, I'd blow
everything away.

First stop Zookeeper and all your Solr instances.

If you're using an external Zookeeper shut if off and, 'rm -rf
/tmp/zookeeper'. If using embedded, you can remove zoo_data under your
SOLR_HOME.

Completely remove all of your cores as in 'rm -rf corename' on all the
nodes. Nuke the entries in solr.xml and use the cloud friendly
solr.xml, I'd just copy the one in '...4.10/solr/example/solr'''. You
get the idea. Or you can brute-force this and just remove all of Solr
and re-install 4.10.3.

OK, now use the Collections API to create your collection, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API and
go from there (don't forget to push your configs to Zookeeper first)
and go from there.

Note that when I suggest you uninstall/reinstall Solr it's simply
because that's conceptually easier, of course you can spend some time
untangling things with your existing setup, but I really question
whether it's worth the effort.

Best,
Erick

On Thu, Feb 12, 2015 at 11:53 AM, Zisis Tachtsidis zist...@runbox.com wrote:
 From the logs I've got one instance failing as described in my first comment
 and the other two failing during PeerSync recovery when trying to
 communicate with the server that was missing the segments_* files. The
 exception follows


 org.apache.solr.client.solrj.SolrServerException: IOException occured when
 talking to server at: http://server:host/solr/core
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.http.client.ClientProtocolException
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
 ... 10 more
 Caused by: org.apache.http.ProtocolException: Invalid header: ,code=500}
 at
 org.apache.http.impl.io.AbstractMessageParser.parseHeaders(AbstractMessageParser.java:232)
 at
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:267)
 at
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
 at
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
 at
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:713)
 at
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:518)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 ... 13 more




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045p4186113.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi words query

2015-02-12 Thread melb
I am using rub gem rsolr and querying simply the collection by this query:

response = solr.get 'select', :params = {
  :q=query,
  :fl= 'id,title,description,body'
  :rows=10
}

response[response][docs].each{|doc| puts doc[id] }

I created a text field to copy all the fields to and the query handler
request this field

rgds,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-query-tp4185625p4185922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Batch updates and separate update request processor chain for atomic document updates

2015-02-12 Thread Andreas Hubold

Hi,

we're using a SolrJ client which either adds (or overwrites) existing 
documents or updates some meta-data fields of existing documents.


Our default update request processor chain is configured with a 
processor for language detection. To avoid setting a wrong language, 
we're using a different chain without that processor for partial updates 
by setting the update.chain request parameter.


Our SolrJ client uses batch indexing and I'm wondering if I can use a 
single update request with both document additions and partial updates. 
Is it possible to specify the update.chain parameter per document? 
Currently our client sends two requests, one for partial updates and one 
for document additions, removals and the commit. This works fine, I'm 
just wondering if there's a better way.


(We could of course move language detection from the update request 
processor chain to the SolrJ client.)


Regards,
Andreas




RE: American /British Dictionary for solr-4.10.2

2015-02-12 Thread Markus Jelsma
There are no dictionaries that sum up all possible conjugations, using a 
heuristics based normalizer would be more appropriate. There are nevertheless 
some good sources to start:

Contains lots of useful spelling issues, incl. 
british/american/canadian/australian
http://grammarist.com/spelling

Very useful
http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#Acronyms_and_abbreviations

A handy list
http://www.avko.org/free/reference/british-vs-american-spelling.html

There are some more lists but it seems the other one's tab is no longer open!
Good luck

-Original message-
 From:dinesh naik dineshkumarn...@gmail.com
 Sent: Thursday 12th February 2015 7:17
 To: solr-user@lucene.apache.org
 Subject: American /British Dictionary for solr-4.10.2
 
 Hi,
 
 What are the dictionaries available for Solr 4.10.2?
 We are looking for a dictionary to support American/British English synonym.
 
 
 -- 
 Best Regards,
 Dinesh Naik
 


variaton on boosting recent documents gives exception

2015-02-12 Thread Michael Lackhoff
Since my field to measure recency is not a date field but a string field
(with only year-numbers in it), I tried a variation on the suggested
boost function for recent documents:
  recip(sub(2015,min(sortyear,2015)),1,10,10)
But this gives an exception when used in a boost or bf parameter.
I guess the reason is that all the mathematics doesn't work with a
string field even if it only contains numbers. Am I right with this
guess? And if so, is there a function I can use to change the type to
something numeric? Or are there other problems with my function?

Another related question: as you can see the current year (2015) is hard
coded. Is there an easy way to get the current year within the function?
Messing around with NOW looks very complicated.

-Michael


Solr suggest is related to second letter, not to initial letter

2015-02-12 Thread Volkan Altan
Hello Everyone,

All I want to do with Solr suggester is obtaining the fact that the asserted 
suggestions  for the second letter whose entry actualizes after the initial 
letter  is actually related to initial letter, itself. But; just like the 
initial letters, the second letters rotate independently, as well. 


Example; 
http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s; 
http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22

adidas s

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=s
int name=numFound1/int
int name=startOffset27/int
int name=endOffset28/int
arr name=suggestion
strsamsung/str
/arr
/lst
lst name=collation
str name=collationQueryfacet_suggest_data:adidas samsung/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=adidasadidas/str
str name=ssamsung/str
/lst
/lst
/lst
/lst
/response


The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate 
documents. A common place in which both of them are available cannot be found.

How can I solve that problem?  



schema.xml

fieldType name=suggestions_type class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ApostropheFilterFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ApostropheFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

field name=“facet_suggest_data type=suggestions_type indexed=true 
multiValued=true stored=false omitNorms=true/


Best



Re: Collations are not working fine.

2015-02-12 Thread Nitin Solanki
Hi James Dyer,
  I did the same as you told me. Used
WordBreakSolrSpellChecker instead of shingles. But still collations are not
coming or working.
For instance, I tried to get collation of gone with the wind by searching
gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am
getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
Also I have documents which contains gone with the wind having 167 times
in the documents. I don't know that I am missing something or not.
Please check my below solr configuration:

*URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
wintwt=jsonindent=trueshards.qt=/spell

*solrconfig.xml:*

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpellCi/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldgram_ci/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix0/int
  int name=maxInspections5/int
  int name=minQueryLength2/int
  float name=maxQueryFrequency0.9/float
  str name=comparatorClassfreq/str
/lst
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldgram/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges5/int
/lst
/searchComponent

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfgram_ci/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count25/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.maxResultsForSuggest1/str
  str name=spellcheck.alternativeTermCount25/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations50/str
  str name=spellcheck.maxCollationTries50/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

*Schema.xml: *

field name=gram_ci type=textSpellCi indexed=true stored=true
multiValued=false/

/fieldTypefieldType name=textSpellCi class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


Need autocomplete on whole phrase for multiple words .

2015-02-12 Thread vamshi kiran
 Hi solrExperts,

Need autocomplete on whole phrase for multiple words .

 When I typed *br, *the results are brad , brad pitt but I need only brad
pitt to come.

I’m using shinglefilterfactory + terms component for autocomplete feature ,
the query is something like

http://localhost:8080/solr/actors/terms?terms.fl=content_autosuggestomitHeader=trueterms.sort=indexindent=truewt=jsonjson.nl=mapterms.prefix=bra



below is my shema configuration :



field name=actors type=text_auto indexed=true stored=true
multiValued=true/



fieldType class=solr.TextField name=text_auto

analyzer type=index

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/

filter class=solr.ShingleFilterFactory
minShingleSize=2 maxShingleSize=2 outputUnigrams=true
outputUnigramsIfNoShingles=false tokenSeparator=  fillerToken=_/



filter class=solr.PatternReplaceFilterFactory
pattern=_ replacement= replace=all/

filter class=solr.RemoveDuplicatesTokenFilterFactory/

/analyzer

analyzer type=query

  tokenizer class=solr.StandardTokenizerFactory/

  filter class=solr.LowerCaseFilterFactory/

 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/

filter class=solr.ShingleFilterFactory
minShingleSize=2 maxShingleSize=2 outputUnigrams=false
outputUnigramsIfNoShingles=false tokenSeparator=  fillerToken=_/



filter class=solr.PatternReplaceFilterFactory
pattern=_ replacement= replace=all/

filter class=solr.RemoveDuplicatesTokenFilterFactory/

/analyzer

/fieldType



Thanks  Regards,

Vamshi


Use cases - Production examples: datas, queries, cluster hardware and conf, and statistics

2015-02-12 Thread Victor Rondel
Hi everyone,

I am considering moving one or several Solr clusters to production.
Although Solr's documentation and community is *great*, I am strongly
startled not to find any *complete use-case story* stretching from
application(s) needs and data considerations to hardware ones.
Indeed, I understand why what/how much hardware / configuration /
sharding questions are systematically replied with both it depends
followed by test.
But then, what about a few complete descriptions, out of so many
elasticsearch users, from data use case to cluster's internals, along with
a few performance and nodes stats?

So here are questions, before moving to production :

Are there any *complete* use cases around? Could you share some? By
complete I mean including *at least some* of the following :

   1. *Application needs and scope*
   2. *Indexing Data indications* : data volume, documents mapping,
   documents / indexes volume
   3. *Searching Data indications* : different applications, queries, use
   of facets - filters - pivot facets, concurrent indexing
   4. *Cluster Hardware* : machines' hardware (RAM, Disks/SSD -
   DAS-JBOD/SAN/NAS), JVM heap / OS Cache, nb of machines, back office network
   5. *Cluster Configuration* : one or several indexes, sharding,
   replication, master nodes, data nodes, use of over-sharding at start-up,
   use of re-indexing
   6. *Benchmaks *: queries response times, QPS, with or without concurrent
   indexing, memory heap sweet spot, nodes stats

For those interested, here is the (not *complete*) best-among-very-few
exemples I've stumbled upon so far :

   - Perfs with hardware and query description :

   
http://fr.slideshare.net/charliejuggler/lucene-solrlondonug-meetup28nov2014-solr-es-performance


SASL with zkcli.sh

2015-02-12 Thread Simon Minery

Hello,

I'm trying to start a SolrCloud cluster with a kerberized Zookeeper. I'm 
not sure if it is possible, I have a Hadoop Cluster with an already 
running zookeeper and I do not think running two zoo in parallel would 
be the wise choice.


Is there a way to use SASL with SolrCloud ?
Thank you, Simon M.


Re: Analytics Component not working Solr-5.0

2015-02-12 Thread sumitj25
Can somebody help, has anyone used analytics component here?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Analytics-Component-not-working-Solr-5-0-tp4185666p4185977.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stopwords in shingles suggester

2015-02-12 Thread O. Klein
With more and more people starting to use the Suggester it seems that
enablePositionIncrements for StopFilterFactory is still needed.

Not sure why it is being removed from Solr5, but is there a way to keep the
functionality beyond lucene 4.3 ? Or can this feature be reinstated?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057p4185994.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multy-tenancy and quarantee of service per application (tenant)

2015-02-12 Thread Victor Rondel
Hi everyone,

I am wondering about multy-tenancy and garantee of service in SolrCloud :

*Multy-tenant cluster* : Is there a way to *guarantee a level of service* /
capacity planning for *each tenant* using the cluster (its *own collections*)
?


Thanks,


Solrcloud performance issues

2015-02-12 Thread Vijay Sekhri
Hi Erick,
We have following configuration of our solr cloud

   1. 10 Shards
   2. 15 replicas per shard
   3. 9 GB of index size per shard
   4. a total of around 90 mil documents
   5. 2 collection viz search1 serving live traffic and search 2 for
   indexing. We swap collection when indexing finishes
   6. On 150 hosts we have 2 JVMs running one for search1 collection and
   other for search2 collection
   7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in
   total
   8. Each host has 16 processors
   9. Linux XXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43
   UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
   10. We have two ways to index data.
   1. Bulk indexing . All 90 million docs pumped in from 14 parallel
  process (on 14 different client hosts). This is done on
collection that is
  not serving live traffic
  2.  Incremental indexing . Only delta changes (Range from 100K to 5
  Mil) every two hours. This is done on collection also serving live traffic
   11. The request per second count on live collection is around 300 TPS
   12. Hard commit setting is every 30 second with open searcher false and
   soft commit setting is every 15 minutes . We have tried a lot of different
   setting here BTW.




Now we have two issues with indexing
1) Solr just could not keep up with the bulk indexing when replicas are
also active. We have concluded this by changing the number of replicas to
just 2 , to 4 and then to 15. When the number of replicas increases the
bulk indexing time increase almost exponentially
We seem to have encountered the same issue reported here
https://issues.apache.org/jira/browse/SOLR-6816
It gets to a point that even to index 100 docs the solr cluster would take
300 second. It would start of indexing 100 docs in 55 millisecond and
slowly increase over time and within hour and a half just could not keep
up. We have a workaround for this and i.e we stop all the replicas , do the
bulk indexing and bring all the replicas up one by one . This sort of
defeats the purpose of solr cloud but we can still work with this
workaround. We can do this because , bulk indexing happen on the collection
that is not serving live traffic. However we would love to have a solution
from the solr cloud itself like ask it to stop replication and start via an
API at the end of indexing.

2) This issues is related to soft commit with incremental indexing . When
we do incremental indexing, it is done on the same collection serving live
traffic with 300 request per second throughput.  Everything is fine except
whenever the soft commit happens. Each time soft commit (autosoftcommit in
sorlconfig.xml) happens which BTW happens almost at the same time
throughout the cluster , there is a spike in the response times and
throughput decreases almost to 150 tps. The spike continues for 2 minutes
and then it happens again at the exact interval when the soft commit
happens. We have monitored the logs and found a direct co relation when the
soft commit happens and when the response time tanks.

Now the latter issue is quite disturbing , because it is serving live
traffic and we cannot sustain these periodic degradation. We have played
around with different soft commit setting . Interval ranging from 2 minutes
to 30 minutes . Auto warming half cache  , auto warming full cache, auto
warming only 10 %. Doing warm up queries on every new searcher , doing NONE
warm up queries on every new searching and all the different setting yields
the same results . As and when soft commit happens the response time tanks
and throughput deceases. The difference is almost 50 % in response times
and 50 % in throughput


Our workaround for this solution is to also do incremental delta indexing
on the collection not serving live traffic and swap when it is done. As you
can see that this also defeats the purpose of solr cloud . We cannot do
bulk indexing because replicas cannot keeps up and we cannot do incremental
indexing because of soft commit performance.

Is there a way to make the cluster not do soft commit all at the same time
or is there a way to make soft commit not cause this degradation ?
We are open to any ideas at this time now.






-- 
*
Vijay Sekhri
*


Re: Possible to dump clusterstate, system stats into solr log?

2015-02-12 Thread Erick Erickson
Jim:

Not that I know of. I'm guessing that accessing ZK directly and
dumping from there is also not possible?

Best,
Erick

On Wed, Feb 11, 2015 at 10:47 AM, Jim.Musil jim.mu...@target.com wrote:
 Hi,

 Is it possible to periodically dump the cluster state contents (or system 
 diagnostics) into the main solr log file?

 We have many security protocols in place that prevents us from running 
 diagnostic requests directly to the solr boxes, but we do have access to the 
 shipped logs.

 Thanks!
 Jim


Index directory containing only segments.gen

2015-02-12 Thread Zisis Tachtsidis
I'm using SolrCloud 4.10.3 and the current setup is simple using 3 nodes with
1 shard. After a rolling restart of the Solr cluster I've ended up with 2
failing nodes reporting the following

org.apache.solr.servlet.SolrDispatchFilter
null:org.apache.solr.common.SolrException: SolrCore 'core' is not available
due to init failure: Error opening new searcher
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1574)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1686)
at org.apache.solr.core.SolrCore.init(SolrCore.java:853)
... 8 more
Caused by: java.nio.file.NoSuchFileException: /path/to/index/segments_1
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196)
at
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341)
at
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:792)
at
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
at
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1537)
... 10 more

Checking the index directory of each node I found out that only
*segments.gen* was inside. What I could not determine is how I ended up with
this single file. Looking at the logs I could not find anything related. The
3rd node had its index intact.
Has anyone else encountered something similar?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-directory-containing-only-segments-gen-tp4186045.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: American /British Dictionary for solr-4.10.2

2015-02-12 Thread Walter Underwood
You are looking for this sort of thing?

elevator, lift
blueberry, whortleberry, bilberry
rutabega, swede
hood, bonnet
convertible top, hood
trunk, boot
daycare, preschool, nursery, playgroup
arugula, rocket
sidewalk, pavement
sweater, jumper
kerosene, paraffin
paraffin, wax
pants, trousers
underwear, pants

I did this once, as a demonstration to ship with Ultraseek. I dug through web 
resources manually and typed up a list. Some of the terms are domain-specific. 
There are big sets of automobile and railroad terms which are different.

http://en.wikipedia.org/wiki/Comparison_of_American_and_British_English
https://www.englishclub.com/vocabulary/british-american.htm
http://resources.woodlands-junior.kent.sch.uk/customs/questions/americanbritish.html
http://www.oxforddictionaries.com/us/words/british-and-american-terms
http://www.englisch-hilfen.de/en/words/be-ae.htm
http://www.englisch-hilfen.de/en/words/be-ae2.htm

This can cause confusing search results, because there are domain-specific 
terms that mean different things in the two dialects: hood, pants, paraffin, 
swede, rocket, etc. One of our customers made rocket propulsion systems. They 
were confused that “rocket fuel” suggested “arugula”.

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Feb 12, 2015, at 12:19 AM, Markus Jelsma markus.jel...@openindex.io wrote:

 There are no dictionaries that sum up all possible conjugations, using a 
 heuristics based normalizer would be more appropriate. There are nevertheless 
 some good sources to start:
 
 Contains lots of useful spelling issues, incl. 
 british/american/canadian/australian
 http://grammarist.com/spelling
 
 Very useful
 http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#Acronyms_and_abbreviations
 
 A handy list
 http://www.avko.org/free/reference/british-vs-american-spelling.html
 
 There are some more lists but it seems the other one's tab is no longer open!
 Good luck
 
 -Original message-
 From:dinesh naik dineshkumarn...@gmail.com
 Sent: Thursday 12th February 2015 7:17
 To: solr-user@lucene.apache.org
 Subject: American /British Dictionary for solr-4.10.2
 
 Hi,
 
 What are the dictionaries available for Solr 4.10.2?
 We are looking for a dictionary to support American/British English synonym.
 
 
 -- 
 Best Regards,
 Dinesh Naik