Query field alias - issue with circular reference

2019-08-09 Thread Jaroslaw Rozanski
Hi Folks,



Question about query field aliases.



Assuming one has fields:

 * foo1
 * foo2
Sending "defType=edismax=foo:hello=foo1 foo2" will work.



But what in case of, when one has fields:

 * foo
 * foo1
Say we want to add behaviour to queries that are already in use. We want to 
search in existing "foo" and "foo1" without making query changes.



Sending "defType=edismax=foo:hello=foo foo1" will *not* work. The 
error is "org.apache.solr.search.SyntaxError: Field aliases lead to a cycle".



So, is there anyway, to extend search query for the existing field without 
modifying index?


--
Jaroslaw Rozanski | m...@jarekrozanski.eu 


Contact for Wiki / Support page maintainer

2019-07-25 Thread Jaroslaw Rozanski
Hi folks!

Who is the maintainer of Solr Support page in the Apache Solr Wiki 
(https://cwiki.apache.org/confluence/display/solr/Support)?

Thanks,
Jaroslaw

--
Jaroslaw Rozanski | m...@jarekrozanski.eu 


deleteById for collection with composite router and routing.field - is this sufficient solution?

2019-02-23 Thread Jaroslaw Rozanski

Hi all,

Facing issues with delete by ID in Solr 5.5.5 (but it looks like it 
affects version as high as 7.5.x and possibly newer). Collection uses 
composite router with routing field set to field *other than* unique key.


In the above set-up the SolrJ .deleteByQuery works fine albeit is very 
slow (high load, large index, etc). However the .deleteById is not 
working correctly. The delete by ID, even when "route" param is provided 
yields: "missing _version_ on update from leader".


That does correlate with https://issues.apache.org/jira/browse/SOLR-7384.


In the org.apache.solr.cloud.FullSolrCloudDistribCmdsTest (5.5.5 release 
tag)


these tests are commented out:

    // See SOLR-7384
    //    testDeleteByIdImplicitRouter();
    //    testDeleteByIdCompositeRouterWithRouterField();

I followed suggestion from 
https://issues.apache.org/jira/browse/SOLR-12694 and arrived at this in:


org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec (starting 
from line #220)


        if (params != null) {
  Long version = getVersion(params);
  if (params.containsKey(UpdateRequest.ROUTE)) {
    // WAS updateRequest.deleteById(entry.getKey(), (String) 
params.get(UpdateRequest.ROUTE));
    updateRequest.deleteById(entry.getKey(), (String) 
params.get(UpdateRequest.ROUTE), version);

  } else {
    updateRequest.deleteById(entry.getKey(), version);
  }
    } else {
  updateRequest.deleteById(entry.getKey());
    }


With the above, by setting version on deserialized UpdateRequest the 
previously commented out tests pass once again.


Now, the question is: *Is this correct approach?*
If I am reading this correctly, this change means that the version 
deleted on the leader is passed to the replicas. But should it? Is the 
version always consistent between leader and replica (except for out of 
sync state)?



Thanks,
Jaroslaw


--
Jaroslaw Rozanski | e: m...@jarekrozanski.eu



CloudSolrClient with implicit route and disable distributed mode routes to unexpected cores

2018-12-27 Thread Jaroslaw Rozanski

Hi all,

Found interesting problem in Solr 7.5.0 regarding implicit router when 
_route_ param being provided in non-distributed request.


Imagine following set-up...

1. Collection: foo
2. Physical nodes: nodeA, nodeB
3. Shards: shard1, shard2
4. Replication factor: 2 (pure NRT)

- nodeA
-- foo_shard1_replica_n1
-- foo_shard2_replica_n1
- nodeB
-- foo_shard1_replica_n2
-- foo_shard2_replica_n2

TL;DR: two shards, two replicas each, co-sharing nodes.


Request: new SolrQuery("filter:value").setParam("_route_", 
"shard1").setParam("distrib", "false");


This request will return unpredictable results, depending on which core 
it hits.



The reason being is that CloudSolrClient will resolve node URLs to 
collection rather than cores. This is critical snippet in the code:


 Start from line 1072 ---

  List replicas = new ArrayList<>();
  String joinedInputCollections = StrUtils.join(inputCollections, ',');
  for (Slice slice : slices.values()) {
    for (ZkNodeProps nodeProps : slice.getReplicasMap().values()) {
  ZkCoreNodeProps coreNodeProps = new ZkCoreNodeProps(nodeProps);
  String node = coreNodeProps.getNodeName();
  if (!liveNodes.contains(node) // Must be a live node to continue
  || Replica.State.getState(coreNodeProps.getState()) != 
Replica.State.ACTIVE) // Must be an ACTIVE replica to continue

    continue;
  if (seenNodes.add(node)) { // if we haven't yet collected a 
URL to this node...
    String url = 
ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP), 
joinedInputCollections); // BOOM!

    if (sendToLeaders && coreNodeProps.isLeader()) {
  theUrlList.add(url); // put leaders here eagerly (if 
sendToLeader mode)

    } else {
  replicas.add(url); // replicas here
    }
  }
    }
  }



The URL of replica is formed using collection name, not core name:
Line 1082: 
ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP), 
joinedInputCollections)



Instead of getting URLs like:
- http://nodeA/solr/foo_shard1_replica_n1
- http://nodeB/solr/foo_shard1_replica_n2

We end up with:
- http://nodeA/solr/foo
- http://nodeB/solr/foo

Because in this example shards share physical nodes, sometimes request 
is routed to core of proper shard, sometimes not.


Should the CloudSolrClient resolve exact core URLs when distrib=false? I 
am guessing yes.



--
Jaroslaw Rozanski | e: m...@jarekrozanski.eu



Long value as unique key in long term

2018-11-22 Thread Jaroslaw Rozanski

Hi all,

This is interesting (to me):

1. The "TrieLongField" is deprecated (still there in 7.5, but marked
   deprecated)
2. The "LongPointField" is not allowed as uniqueKey

According to Solr Ref Guide 7.5 
(http://lucene.apache.org/solr/guide/7_5/field-types-included-with-solr.html) 
these are the only option for long types.


Short of mapping as "string", is there a long term solution for those 
who want/need to store unique ID as long?



For reference: https://issues.apache.org/jira/browse/SOLR-10829


Thanks,
Jarek

--
Jaroslaw Rozanski | e: m...@jarekrozanski.eu



Solr 7.4/7.5 and ZooKeeper version > 3.4.11

2018-11-07 Thread Jaroslaw Rozanski
Hi all,

Solr 7.4.x and 7.5.x ships with ZooKeeper 3.4.11. 

Meanwhile official stable ZooKeeper version is 3.4.12 and 3.4.13. The Solr 
supported version 3.4.11 has been removed from official distributions 
(https://www-eu.apache.org/dist/zookeeper/) I am guessing due to 
https://issues.apache.org/jira/browse/ZOOKEEPER-2960. 


It is _trivial_ to use newer version when using external ZK ensemble however 
Solr Documentation 
(https://lucene.apache.org/solr/guide/7_5/setting-up-an-external-zookeeper-ensemble.html)
 says:


When using an external ZooKeeper ensemble, you will need need to keep your 
local installation up-to-date with the latest version distributed with Solr. 
Since it is a stand-alone application in this scenario, it does not get 
upgraded as part of a standard Solr upgrade.



Question: Anyone using ZK 3.4.12/13 and Solr 7.4/7.5 in production with 
success? Any noteworthy issues found?

The release notes from .11 to .12 look pretty harmless on the face of it.

Thanks,
Jarek



--
Jaroslaw Rozanski | m...@jarekrozanski.eu


Re: Solr cloud inquiry

2017-11-17 Thread Jaroslaw Rozanski
Hi James,

This might not be 100% what you are looking for but some ideas to
explore:

1. Change session timeout on ZooKeeper client; this might help you move
unresponsive node to "down" state and Solr Cloud will take affected node
out of rotation on its own.
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions

2. Create own HttpClient with more aggressive connection/socket timeout
values and pass it to CloudSolrClient during construction; if client
timeouts, retry. You can also interrogate ZK what nodes serve given
shard and send request to the other node with distrib=false flag; that
might be more intrusive depending on your shards/data model/queries.

And of all suggestions: fix the infrastructure :)

 Good luck!

--
Jaroslaw Rozanski

On Fri, 17 Nov 2017, at 00:42, kasinger, james wrote:
> Hi,
> 
> We aren’t seeing any exceptions happening for solr during that time. When
> the disk freezes up, solr waits (please refer to the attached gc image
> which shows a period of about a minute where no new objects are created
> in memory). The node is still accepting and stacking requests, and when
> the disk is accessible solr resumes with those threads in healthy state
> albeit with increased latency.
> 
> We’ve explored solutions for marking the node as unhealthy when an
> incident like this occurs, but have determined that the risk of taking it
> out of rotation and impacting the cluster, outweighs the momentary
> latency that we are experiencing.  
> 
> Attached a thread dump to show the jetty theads that pile up while
> solr/storage is in freeze, as well as a graph of total system threads
> increasing and CPU IO wait on the disk.
> 
> It’s a temporary storage outage, though could be viewed as a performance
> issue, and perhaps we need to become aware of more creative ways of
> handling degraded performance… Any ideas?
> 
> Thanks,
> James Kasinger
> 
> 
> On 11/15/17, 8:50 PM, "Jaroslaw Rozanski" <m...@jarekrozanski.eu> wrote:
> 
> Hi,
> 
> It is interesting that node reports healthy despite store access
> issue.
> That node should be marked down if it can't open the core backing up
> sharded collection.
> 
> Maybe if you could share exceptions/errors that you see in
> console/logs. 
> 
> I have experienced issues with replica node not responding in timely
> manner due to performance issues but that does not seem to match your
> case.
> 
> 
> --
> Jaroslaw Rozanski 
> 
> On Wed, 15 Nov 2017, at 22:49, kasinger, james wrote:
> > Hello folks,
> > 
> > 
> > 
> > To start, we have a sharded solr cloud configuration running solr 
> version
> > 5.1.0 . During shard to shard communication there is a problem state
> > where queries are sent to a replica, and on that replica the storage is
> > inaccessible. The node is healthy so it’s still taking requests which 
> get
> > piled up waiting to read from disk resulting in a latency increase. 
> We’ve
> > tried resolving this storage inaccessibility but it appears related to
> > AWS ebs issues.  Has anyone encountered the same issue?
> > 
> > thanks
> 
> 
> Email had 1 attachment:
> + 23c0_threads_bad.zip
>   24k (application/zip)


Re: Solr cloud inquiry

2017-11-15 Thread Jaroslaw Rozanski
Hi,

It is interesting that node reports healthy despite store access issue.
That node should be marked down if it can't open the core backing up
sharded collection.

Maybe if you could share exceptions/errors that you see in console/logs. 

I have experienced issues with replica node not responding in timely
manner due to performance issues but that does not seem to match your
case.


--
Jaroslaw Rozanski 

On Wed, 15 Nov 2017, at 22:49, kasinger, james wrote:
> Hello folks,
> 
> 
> 
> To start, we have a sharded solr cloud configuration running solr version
> 5.1.0 . During shard to shard communication there is a problem state
> where queries are sent to a replica, and on that replica the storage is
> inaccessible. The node is healthy so it’s still taking requests which get
> piled up waiting to read from disk resulting in a latency increase. We’ve
> tried resolving this storage inaccessibility but it appears related to
> AWS ebs issues.  Has anyone encountered the same issue?
> 
> thanks


Re: Impact of changing precisionStep on already indexed content

2017-05-27 Thread Jaroslaw Rozanski
Apologies for dup. Please ignore.

On 26/05/17 18:00, Jaroslaw Rozanski wrote:
> Hi all,
> 
> 
> What is the impact of changing "precisionStep" without re-indexing
> document preceding the change?
> 
> Scenario:
> 
> Assume you have index with field:
> 
>  positionIncrementGap="0"/>
> 
> So `tdate` of precision 6. Now let's assume you were to change `tdate`
> to be of precision 8.
> 
> According to [1] the change will affect how many terms will be stored in
> index after type change is in place.
> 
> But will that affect queries and sorting if index contains documents
> with date precisionStep 6 and 8 at the same time?
> 
> Simple test indicates no difference (performance aside) but maybe you
> folks are aware of an edge case here.
> 
> 
> 
> [1]
> https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/NumericRangeQuery.html
> 
> 
> 
> 
> Cheers,
> Jarek
> 

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.eu


Impact of changing precisionStep on already indexed content

2017-05-26 Thread Jaroslaw Rozanski
Hi all,


What is the impact of changing "precisionStep" without re-indexing
document preceding the change?

Scenario:

Assume you have index with field:



So `tdate` of precision 6. Now let's assume you were to change `tdate`
to be of precision 8.

According to [1] the change will affect how many terms will be stored in
index after type change is in place.

But will that affect queries and sorting if index contains documents
with date precisionStep 6 and 8 at the same time?

Simple test indicates no difference (performance aside) but maybe you
folks are aware of an edge case here.



[1]
https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/NumericRangeQuery.html




Cheers,
Jarek


Changing precisionStep without re-indexing

2017-05-26 Thread Jaroslaw Rozanski
Hi all

What would be an impact of having index with same field of type
TrieDateField stored with precisionStep 6 and 8.

Scenario:
1. Existing index where field `tdate` (TrieDateField) with precisionStep
set to 6.
2. Change in field type with precisionStep 8.
3. Index new content with precisionStep 8, while old content stays with
step 6.


From what I understand from [1] the change will affect number of terms
in index.

Now, apart from impact on index size and potentially performance between
the two, would that impact queries/sorting in any way?


[1]
https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/NumericRangeQuery.html



-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com



signature.asc
Description: OpenPGP digital signature


Re: Separating Search and Indexing in SolrCloud

2016-12-18 Thread Jaroslaw Rozanski
Hi Erick,


Not talking about separation any more. I merely summarized message from
Pushkar. As I said it was clear that it was not possible.


About the RAMBufferSizeMB, getting back to my original question, is this
buffer for storing update requests or ready to index, analyzed documents?

Documentation suggests former, your first mention however suggests the
later.


Thanks,
Jaroslaw


On 18/12/16 02:16, Erick Erickson wrote:
> Yes indexing is adding stress. No you can't separate
> the two in SolrCloud. End of story, why beat it to death?
> You'll have to figure out the sharding strategy that
> meets your indexing and querying needs and live
> within that framework. I'd advise setting up a small
> cluster and driving it to its tipping point and extrapolating
> from there. Here's the long version of "the sizing exercise".
> 
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> My point that while indexing to Solr/Lucene there is
> additional pressure. That pressure has a fixed upper
> limit that doesn't grow with the number of docs. That's not
> true for searching, as you add more docs per node, the
> pressure (especially memory) increases. Concentrate
> your efforts there IMO.
> 
> Best
> Erick
> 
> 
> 
> On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski
> <m...@jarekrozanski.com> wrote:
>> Hi Erick,
>>
>> So what does this buffer represent? What does it actually store? Raw
>> update request or analyzed document?
>>
>> The documentation suggest that it stores actual update requests.
>>
>> Obviously analyzed document can and will occupy much more space than raw
>> one. Also analysis with create a lot of new allocations and subsequent
>> GC work.
>>
>> Yes, you are probably right that search puts more stress and is main
>> memory user but combination of:
>> - non-trivial analysis,
>> - high volume of updates and
>> - search on the same node
>>
>> seems adding fuel to the fire.
>>
>> From previous response by Pushkar, it is clear that separation is not
>> achievable with existing SolrCloud mechanism.
>>
>> Thanks
>>
>>
>> On 17/12/16 20:24, Erick Erickson wrote:
>>> bq: I am more concerned with indexing memory requirements at volume
>>>
>>> By and large this isn't much of a problem. RAMBufferSizeMB in
>>> solrconfig.xml governs how much memory is consumed in Solr for
>>> indexing. When that limit is exceeded, the buffer is flushed to disk.
>>> I've rarely heard of indexing being a memory issue. Anecdotally I
>>> haven't seen throughput benefit with buffer sizes over 128M.
>>>
>>> You're correct in that master/slave style replication would use less
>>> memory on the slave, although there are other costs. I.e. rather than
>>> the data for document X being sent to the replicas once as in
>>> SolrCloud, that data is re-sent to the slave every time it's merged
>>> into a new segment.
>>>
>>> That said, memory issues are _far_ more prevalent on the search side
>>> of things so unless this is a proven issue in your environment I would
>>> fight other fires.
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski <m...@jarekrozanski.com> 
>>> wrote:
>>>> Thanks, that issue looks interesting!
>>>>
>>>> On 16/12/16 16:38, Pushkar Raste wrote:
>>>>> This kind of separation is not supported yet.  There however some work
>>>>> going on,  you can read about it on
>>>>> https://issues.apache.org/jira/browse/SOLR-9835
>>>>>
>>>>> This unfortunately would not support soft commits and hence would not be a
>>>>> good solution for near real time indexing.
>>>>>
>>>>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <m...@jarekrozanski.com> 
>>>>> wrote:
>>>>>
>>>>>> Sorry, not what I meant.
>>>>>>
>>>>>> Leader is responsible for distributing update requests to replica. So
>>>>>> eventually all replicas have same state as leader. Not a problem.
>>>>>>
>>>>>> It is more about the performance of such. If I gather correctly normal
>>>>>> replication happens by standard update request. Not by, say, segment 
>>>>>> copy.
>>>>>>
>>>>>> Which means update on leader is as "expensive" as on replica.

Re: Separating Search and Indexing in SolrCloud

2016-12-17 Thread Jaroslaw Rozanski
Hi Erick,

So what does this buffer represent? What does it actually store? Raw
update request or analyzed document?

The documentation suggest that it stores actual update requests.

Obviously analyzed document can and will occupy much more space than raw
one. Also analysis with create a lot of new allocations and subsequent
GC work.

Yes, you are probably right that search puts more stress and is main
memory user but combination of:
- non-trivial analysis,
- high volume of updates and
- search on the same node

seems adding fuel to the fire.

From previous response by Pushkar, it is clear that separation is not
achievable with existing SolrCloud mechanism.

Thanks


On 17/12/16 20:24, Erick Erickson wrote:
> bq: I am more concerned with indexing memory requirements at volume
> 
> By and large this isn't much of a problem. RAMBufferSizeMB in
> solrconfig.xml governs how much memory is consumed in Solr for
> indexing. When that limit is exceeded, the buffer is flushed to disk.
> I've rarely heard of indexing being a memory issue. Anecdotally I
> haven't seen throughput benefit with buffer sizes over 128M.
> 
> You're correct in that master/slave style replication would use less
> memory on the slave, although there are other costs. I.e. rather than
> the data for document X being sent to the replicas once as in
> SolrCloud, that data is re-sent to the slave every time it's merged
> into a new segment.
> 
> That said, memory issues are _far_ more prevalent on the search side
> of things so unless this is a proven issue in your environment I would
> fight other fires.
> 
> Best,
> Erick
> 
> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski <m...@jarekrozanski.com> 
> wrote:
>> Thanks, that issue looks interesting!
>>
>> On 16/12/16 16:38, Pushkar Raste wrote:
>>> This kind of separation is not supported yet.  There however some work
>>> going on,  you can read about it on
>>> https://issues.apache.org/jira/browse/SOLR-9835
>>>
>>> This unfortunately would not support soft commits and hence would not be a
>>> good solution for near real time indexing.
>>>
>>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <m...@jarekrozanski.com> wrote:
>>>
>>>> Sorry, not what I meant.
>>>>
>>>> Leader is responsible for distributing update requests to replica. So
>>>> eventually all replicas have same state as leader. Not a problem.
>>>>
>>>> It is more about the performance of such. If I gather correctly normal
>>>> replication happens by standard update request. Not by, say, segment copy.
>>>>
>>>> Which means update on leader is as "expensive" as on replica.
>>>>
>>>> Hence, if my understanding is correct, sending search request to replica
>>>> only, in index heavy environment, would bring no benefit.
>>>>
>>>> So the question is: is there a mechanism, in SolrCloud (not legacy
>>>> master/slave set-up) to make one node take a load of indexing which
>>>> other nodes focus on searching.
>>>>
>>>> This is not a question of SolrClient cause that is clear how to direct
>>>> search request to specific nodes. This is more about index optimization
>>>> so that certain nodes (ie. replicas) could suffer less due to high
>>>> volume indexing while serving search requests.
>>>>
>>>>
>>>>
>>>>
>>>> On 16/12/16 12:35, Dorian Hoxha wrote:
>>>>> The leader is the source of truth. You expect to make the replica the
>>>>> source of truth or something???Doesn't make sense?
>>>>> What people do, is send write to leader/master and reads to
>>>> replicas/slaves
>>>>> in other solr/other-dbs.
>>>>>
>>>>> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski <m...@jarekrozanski.com
>>>>>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> According to documentation, in normal operation (not recovery) in Solr
>>>>>> Cloud configuration the leader sends updates it receives to all the
>>>>>> replicas.
>>>>>>
>>>>>> This means and all nodes in the shard perform same effort to index
>>>>>> single document. Correct?
>>>>>>
>>>>>> Is there then a benefit to *not* to send search requests to leader, but
>>>>>> only to replicas?
>>>>>>
>>>>>> Given index & search heavy Solr Cloud system, is it possible to separate
>>>>>> search from indexing nodes?
>>>>>>
>>>>>>
>>>>>> RE: Solr 5.5.0
>>>>>>
>>>>>> --
>>>>>> Jaroslaw Rozanski | e: m...@jarekrozanski.com
>>>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Jaroslaw Rozanski | e: m...@jarekrozanski.com
>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>
>>>>
>>>
>>
>> --
>> Jaroslaw Rozanski | e: m...@jarekrozanski.com
>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D



signature.asc
Description: OpenPGP digital signature


Re: Separating Search and Indexing in SolrCloud

2016-12-16 Thread Jaroslaw Rozanski
Thanks, that issue looks interesting!

On 16/12/16 16:38, Pushkar Raste wrote:
> This kind of separation is not supported yet.  There however some work
> going on,  you can read about it on
> https://issues.apache.org/jira/browse/SOLR-9835
> 
> This unfortunately would not support soft commits and hence would not be a
> good solution for near real time indexing.
> 
> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <m...@jarekrozanski.com> wrote:
> 
>> Sorry, not what I meant.
>>
>> Leader is responsible for distributing update requests to replica. So
>> eventually all replicas have same state as leader. Not a problem.
>>
>> It is more about the performance of such. If I gather correctly normal
>> replication happens by standard update request. Not by, say, segment copy.
>>
>> Which means update on leader is as "expensive" as on replica.
>>
>> Hence, if my understanding is correct, sending search request to replica
>> only, in index heavy environment, would bring no benefit.
>>
>> So the question is: is there a mechanism, in SolrCloud (not legacy
>> master/slave set-up) to make one node take a load of indexing which
>> other nodes focus on searching.
>>
>> This is not a question of SolrClient cause that is clear how to direct
>> search request to specific nodes. This is more about index optimization
>> so that certain nodes (ie. replicas) could suffer less due to high
>> volume indexing while serving search requests.
>>
>>
>>
>>
>> On 16/12/16 12:35, Dorian Hoxha wrote:
>>> The leader is the source of truth. You expect to make the replica the
>>> source of truth or something???Doesn't make sense?
>>> What people do, is send write to leader/master and reads to
>> replicas/slaves
>>> in other solr/other-dbs.
>>>
>>> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski <m...@jarekrozanski.com
>>>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> According to documentation, in normal operation (not recovery) in Solr
>>>> Cloud configuration the leader sends updates it receives to all the
>>>> replicas.
>>>>
>>>> This means and all nodes in the shard perform same effort to index
>>>> single document. Correct?
>>>>
>>>> Is there then a benefit to *not* to send search requests to leader, but
>>>> only to replicas?
>>>>
>>>> Given index & search heavy Solr Cloud system, is it possible to separate
>>>> search from indexing nodes?
>>>>
>>>>
>>>> RE: Solr 5.5.0
>>>>
>>>> --
>>>> Jaroslaw Rozanski | e: m...@jarekrozanski.com
>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>
>>>>
>>>
>>
>> --
>> Jaroslaw Rozanski | e: m...@jarekrozanski.com
>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>
>>
> 

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D



signature.asc
Description: OpenPGP digital signature


Re: Separating Search and Indexing in SolrCloud

2016-12-16 Thread Jaroslaw Rozanski
Thanks,


On 16/12/16 20:56, Shawn Heisey wrote:
> On 12/16/2016 5:43 AM, Jaroslaw Rozanski wrote:
>> Leader is responsible for distributing update requests to replica. So
>> eventually all replicas have same state as leader. Not a problem. It
>> is more about the performance of such. If I gather correctly normal
>> replication happens by standard update request. Not by, say, segment
>> copy. 
> 
> For SolrCloud, yes.  The master/slave replication that existed before
> SolrCloud does work by copying segment files, but SolrCloud does not
> work that way.  The old master/slave replication feature IS used by
> SolrCloud, but ONLY for index recovery -- copying the entire index from
> the leader to another replica in the event that the replica gets so far
> behind that it cannot be brought current by regular updates and/or the
> transaction log.  This is also used to make new replicas.
> 
>> Hence, if my understanding is correct, sending search request to
>> replica only, in index heavy environment, would bring no benefit. 
> 
> Correct, it would have no benefit.  There's something else: when you
> send queries to SolrCloud, they do not necessarily stay on the node
> where you sent them.  By default, multiple query requests are load
> balanced across the cloud, so they'll hit the leader anyway, even if you
> never send them to the leader.

With custom Solr Client the above logic no longer applies to my case. I
can easily control to which replica/core in shard my query is directed
to (along with distrib=false).

>> So the question is: is there a mechanism, in SolrCloud (not legacy
>> master/slave set-up) to make one node take a load of indexing which
>> other nodes focus on searching. 
> 
> Indexing will always be done by all replicas, including the leader.
> 
> Something to mention, although it doesn't accomplish what you're after: 
> There is a preferLocalShards parameter that you can send with your query
> to keep SolrCloud from doing its load balancing *if* the query can be
> satisfied from local indexes.  This parameter should only be used in one
> of the following situations:
> 
> * Your query rate is very low.
> * You are already load balancing the requests yourself.
> 
> If the preferlocalShards parameter is used in other situations, it can
> end up concentrating a large number of requests onto some replicas and
> leaving the other replicas idle.
> 
> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests#DistributedRequests-PreferLocalShards


Yeap, already solved. I am more concerned with indexing memory
requirements at volume affecting performance of search requests and/or
cluster stability.

> Thanks,
> Shawn
> 



-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D



signature.asc
Description: OpenPGP digital signature


Re: Stable releases of Solr

2016-12-16 Thread Jaroslaw Rozanski
Hi Deepak,

Lucene 6.3.0 is latest official release:
https://lucene.apache.org/core/6_3_0/index.html

Same applies to Solr if that is what you meant.

It is as stable as guaranteed by release process.

On 16/12/16 07:10, Deepak Kumar Gupta wrote:
> Hi,
> 
> I am planning to upgrade lucene version in my codebase from 3.6.1
> What is the latest stable version to which I can upgrade it?
> Is 6.3.X stable?
> 
> Thanks,
> Deepak
> 

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D



signature.asc
Description: OpenPGP digital signature


Re: Separating Search and Indexing in SolrCloud

2016-12-16 Thread Jaroslaw Rozanski
Sorry, not what I meant.

Leader is responsible for distributing update requests to replica. So
eventually all replicas have same state as leader. Not a problem.

It is more about the performance of such. If I gather correctly normal
replication happens by standard update request. Not by, say, segment copy.

Which means update on leader is as "expensive" as on replica.

Hence, if my understanding is correct, sending search request to replica
only, in index heavy environment, would bring no benefit.

So the question is: is there a mechanism, in SolrCloud (not legacy
master/slave set-up) to make one node take a load of indexing which
other nodes focus on searching.

This is not a question of SolrClient cause that is clear how to direct
search request to specific nodes. This is more about index optimization
so that certain nodes (ie. replicas) could suffer less due to high
volume indexing while serving search requests.




On 16/12/16 12:35, Dorian Hoxha wrote:
> The leader is the source of truth. You expect to make the replica the
> source of truth or something???Doesn't make sense?
> What people do, is send write to leader/master and reads to replicas/slaves
> in other solr/other-dbs.
> 
> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski <m...@jarekrozanski.com>
> wrote:
> 
>> Hi all,
>>
>> According to documentation, in normal operation (not recovery) in Solr
>> Cloud configuration the leader sends updates it receives to all the
>> replicas.
>>
>> This means and all nodes in the shard perform same effort to index
>> single document. Correct?
>>
>> Is there then a benefit to *not* to send search requests to leader, but
>> only to replicas?
>>
>> Given index & search heavy Solr Cloud system, is it possible to separate
>> search from indexing nodes?
>>
>>
>> RE: Solr 5.5.0
>>
>> --
>> Jaroslaw Rozanski | e: m...@jarekrozanski.com
>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>
>>
> 

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D



signature.asc
Description: OpenPGP digital signature


Separating Search and Indexing in SolrCloud

2016-12-16 Thread Jaroslaw Rozanski
Hi all,

According to documentation, in normal operation (not recovery) in Solr
Cloud configuration the leader sends updates it receives to all the
replicas.

This means and all nodes in the shard perform same effort to index
single document. Correct?

Is there then a benefit to *not* to send search requests to leader, but
only to replicas?

Given index & search heavy Solr Cloud system, is it possible to separate
search from indexing nodes?


RE: Solr 5.5.0

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D



signature.asc
Description: OpenPGP digital signature


Re: The state of Solr 5. Is it in maintenance mode only?

2016-11-28 Thread Jaroslaw Rozanski
Hi,

Thanks for elaborate response. Missed the link to duplicate JIRA. Makes
sense.

On the 5.x front I wasn't expecting 5.6 release now that we have 6.x but
was simply surprised to see fix for 4.x and not for 5.x.

As for adoption levels, it was my subjective feel reading this list. Do
we have community survey on that subject? That would be really
interesting to see.


Thanks,
Jaroslaw


On 28/11/16 12:59, Shawn Heisey wrote:
> On 11/28/2016 4:29 AM, Jaroslaw Rozanski wrote:
>> Recently I have noticed that couple of Solr issues have been
>> resolved/added only for Solr 4.x and Solr 6.x branch. For example
>> https://issues.apache.org/jira/browse/SOLR-2242. Has Solr 5.x branch
>> been moved to maintenance mode only? The 5 wasn't around for long
>> before 6 came about so I appreciate its adoption might not be vast.
> 
> The 5.0 version was announced in March 2015.  The 6.0 version was
> announced in April 2016.  Looks like 4.x was current for a little less
> than three years (July 2012 for 4.0).  5.x had one year, which I
> wouldn't call really call a short time.
> 
> Since the release of 6.0, 4.x is dead and 5.x is in maintenance mode. 
> Maintenance mode means that only particularly nasty bugs are fixed and
> only extremely trivial features are added.  The latter is usually only
> done if the lack of the feature can be considered a bug.  There is never
> any guarantee that a new 5.x release will be made, but if that happens,
> it will be a 5.5.x release.  The likelihood of seeing a 5.6 release is
> VERY low.
> 
> SOLR-2242 is a duplicate of SOLR-6348.  It probably had 4.9 in the fixed
> version field because that's what was already in it when it was resolved
> as a duplicate.  It's a very old issue that's been around since the 3.x
> days.  No changes were committed for SOLR-2242.  The changes for
> SOLR-6348 were committed to 5.2 and 6.0.  I have updated the fix
> versions in the older issue to match.  The versions should probably all
> be removed, but I am not sure what our general rule is for duplicates.
> 
> Thanks,
> Shawn
> 

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com



signature.asc
Description: OpenPGP digital signature


The state of Solr 5. Is it in maintenance mode only?

2016-11-28 Thread Jaroslaw Rozanski
Hi,

Recently I have noticed that couple of Solr issues have been
resolved/added only for Solr 4.x and Solr 6.x branch. For example
https://issues.apache.org/jira/browse/SOLR-2242.

Has Solr 5.x branch been moved to maintenance mode only? The 5 wasn't
around for long before 6 came about so I appreciate its adoption might
not be vast.



-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com




signature.asc
Description: OpenPGP digital signature


Re: Dergraded performance between Solr 4 and Solr 5

2016-04-27 Thread Jaroslaw Rozanski
Ok, so here is interesting find. 

As my setup requires frequent (soft) commits cache brings little value.
I tested following on Solr 5.5.0:

q={!cache=false}*:*&
fq={!cache=false}query1 /* not expensive */&
fq={!cache=false cost=200}query2 /* expensive! */&

Only with above set-up (and forcing Solr Post Filtering for expensive
query, hence cost 200) I was able to return to Solr 4.10.3 performance.

By Solr 4 performance I mean:
- not only Solr 4 response times (roughly) for queries returning values,
but also
- very fast response for queries that have 0 results 

I wonder what could be the underlying cause.

Thanks,
Jarek

On Wed, 27 Apr 2016, at 09:13, Jaroslaw Rozanski wrote:
> Hi Eric,
> 
> Measuring running queries via JMeter. Values provided are rounded median
> of multiple samples. Medians are just slightly better than 99th
> percentile for all samples. 
> 
> Filter cache is useless as you mentioned; they are effectively not used.
> There is auto-warming through cache autoWarm but no auto-warming
> queries. 
> 
> Small experiment with passing =... seems not to make any difference
> which would not be surprising given caches are barely involved.
> 
> Thanks for the suggestion on IO. After stopping indexing, the response
> time barely changed on Solr 5. On Solr 4, with indexing running it is
> still fast. So to effectively, Solr 4 under indexing load is faster than
> idle Solr 5. Both set-ups have same heap size and available RAM on
> machine (2x heap).
> 
> One other thing I am testing is issuing request to specific core, with
> distrib=false. No significant improvements there.
> 
> Now what is interesting is that aforementioned query takes the same
> amount of time to execute despite the number of documents found. 
> - Whether it is 0 or 10k, it takes couple seconds on Solr 5.5.0.
> - Meanwhile, on Solr 4.10.3, the response time is dependent on results
> size. For Solr 4 no results returns in few ms and few seconds for couple
> thousands of results. 
> (query used {!cache=false}q=...)
>   
> 
> Thanks,
> Jarek
> 
> On Wed, 27 Apr 2016, at 04:39, Erick Erickson wrote:
> > Well, the first question is always "how are you measuring this"?
> > Measuring a few queries is almost completely uninformative,
> > especially if the two systems have differing warmups. The only
> > meaningful measurements are when throwing away the first bunch
> > of queries then measuring a meaningful sample.
> > 
> > The setup you describe will be very sensitive to disk access
> > with the autowarm of 1 second, so if there's much at all in
> > the way of differences in I/O that would be a red flag.
> > 
> > From here on down doesn't really respond to the question, but
> > I thought I'd mention it.
> > 
> > And you don't have to worry about disabling your fitlerCache since
> > any filter query of the form fq=field:[mention NOW in here without
> > rounding]
> > never be re-used. So you might as well use {!cache=false}. Here's the
> > background:
> > 
> > https://lucidworks.com/blog/2012/02/23/date-math-now-and-filter-queries/
> > 
> > And your soft commit is probably throwing out all the filter caches
> > anyway.
> > 
> > I doubt you're doing any autowarming at all given the autocommit interval
> > of 1 second and continuously updating documents and your reported
> > query times. So you can pretty much forget what I said about throwing
> > out your first N queries since you're (probably) not getting any benefit
> > out of caches anyway.
> > 
> > On Tue, Apr 26, 2016 at 10:34 AM, Jaroslaw Rozanski
> > <s...@jarekrozanski.com> wrote:
> > > Hi all,
> > >
> > > I am migrating a large Solr Cloud cluster from Solr 4.10 to Solr 5.5.0
> > > and I observed big difference in query execution time.
> > >
> > > First a setup summary:
> > > - multiple collections - 6
> > > - each has multiple shards - 6
> > > - same/similar hardware
> > > - indexing tens of messages per second
> > > - autoSoftCommit with 1s; hard commit few tens of seconds
> > > - Java 8
> > >
> > > The query has following form: field1:[* TO NOW-14DAYS] OR (-field1:[* TO
> > > *] AND field2:[* TO NOW-14DAYS])
> > >
> > > The fields field1 & field2 are of date type:
> > >  > > positionIncrementGap="0"/>
> > >
> > > As query (q={!cache=false}...)
> > > Solr 4.10 -> 5s
> > > Solr 5.5.0 -> 12s
> > >
> > > As filter query (q={!cache=false}*:*=..,)
> > > Solr 4.10 -> 9s
> > > Solr 5.5.0 -> 11s
> > >
> > > The query itself is bad and its optimization aside, I am wondering if
> > > there is anything in Lucene/Solr that would have such an impact on query
> > > execution time between versions.
> > >
> > > Originally I though it might be related to
> > > https://issues.apache.org/jira/browse/SOLR-8251 and testing on small
> > > scale proved that there is a difference in performance. However upgraded
> > > version is already 5.5.0.
> > >
> > >
> > >
> > > Thanks,
> > > Jarek
> > >


Re: Dergraded performance between Solr 4 and Solr 5

2016-04-27 Thread Jaroslaw Rozanski
Hi Eric,

Measuring running queries via JMeter. Values provided are rounded median
of multiple samples. Medians are just slightly better than 99th
percentile for all samples. 

Filter cache is useless as you mentioned; they are effectively not used.
There is auto-warming through cache autoWarm but no auto-warming
queries. 

Small experiment with passing =... seems not to make any difference
which would not be surprising given caches are barely involved.

Thanks for the suggestion on IO. After stopping indexing, the response
time barely changed on Solr 5. On Solr 4, with indexing running it is
still fast. So to effectively, Solr 4 under indexing load is faster than
idle Solr 5. Both set-ups have same heap size and available RAM on
machine (2x heap).

One other thing I am testing is issuing request to specific core, with
distrib=false. No significant improvements there.

Now what is interesting is that aforementioned query takes the same
amount of time to execute despite the number of documents found. 
- Whether it is 0 or 10k, it takes couple seconds on Solr 5.5.0.
- Meanwhile, on Solr 4.10.3, the response time is dependent on results
size. For Solr 4 no results returns in few ms and few seconds for couple
thousands of results. 
(query used {!cache=false}q=...)
  

Thanks,
Jarek

On Wed, 27 Apr 2016, at 04:39, Erick Erickson wrote:
> Well, the first question is always "how are you measuring this"?
> Measuring a few queries is almost completely uninformative,
> especially if the two systems have differing warmups. The only
> meaningful measurements are when throwing away the first bunch
> of queries then measuring a meaningful sample.
> 
> The setup you describe will be very sensitive to disk access
> with the autowarm of 1 second, so if there's much at all in
> the way of differences in I/O that would be a red flag.
> 
> From here on down doesn't really respond to the question, but
> I thought I'd mention it.
> 
> And you don't have to worry about disabling your fitlerCache since
> any filter query of the form fq=field:[mention NOW in here without
> rounding]
> never be re-used. So you might as well use {!cache=false}. Here's the
> background:
> 
> https://lucidworks.com/blog/2012/02/23/date-math-now-and-filter-queries/
> 
> And your soft commit is probably throwing out all the filter caches
> anyway.
> 
> I doubt you're doing any autowarming at all given the autocommit interval
> of 1 second and continuously updating documents and your reported
> query times. So you can pretty much forget what I said about throwing
> out your first N queries since you're (probably) not getting any benefit
> out of caches anyway.
> 
> On Tue, Apr 26, 2016 at 10:34 AM, Jaroslaw Rozanski
> <s...@jarekrozanski.com> wrote:
> > Hi all,
> >
> > I am migrating a large Solr Cloud cluster from Solr 4.10 to Solr 5.5.0
> > and I observed big difference in query execution time.
> >
> > First a setup summary:
> > - multiple collections - 6
> > - each has multiple shards - 6
> > - same/similar hardware
> > - indexing tens of messages per second
> > - autoSoftCommit with 1s; hard commit few tens of seconds
> > - Java 8
> >
> > The query has following form: field1:[* TO NOW-14DAYS] OR (-field1:[* TO
> > *] AND field2:[* TO NOW-14DAYS])
> >
> > The fields field1 & field2 are of date type:
> >  > positionIncrementGap="0"/>
> >
> > As query (q={!cache=false}...)
> > Solr 4.10 -> 5s
> > Solr 5.5.0 -> 12s
> >
> > As filter query (q={!cache=false}*:*=..,)
> > Solr 4.10 -> 9s
> > Solr 5.5.0 -> 11s
> >
> > The query itself is bad and its optimization aside, I am wondering if
> > there is anything in Lucene/Solr that would have such an impact on query
> > execution time between versions.
> >
> > Originally I though it might be related to
> > https://issues.apache.org/jira/browse/SOLR-8251 and testing on small
> > scale proved that there is a difference in performance. However upgraded
> > version is already 5.5.0.
> >
> >
> >
> > Thanks,
> > Jarek
> >


Dergraded performance between Solr 4 and Solr 5

2016-04-26 Thread Jaroslaw Rozanski
Hi all,
 
I am migrating a large Solr Cloud cluster from Solr 4.10 to Solr 5.5.0
and I observed big difference in query execution time.
 
First a setup summary:
- multiple collections - 6
- each has multiple shards - 6
- same/similar hardware
- indexing tens of messages per second
- autoSoftCommit with 1s; hard commit few tens of seconds
- Java 8
 
The query has following form: field1:[* TO NOW-14DAYS] OR (-field1:[* TO
*] AND field2:[* TO NOW-14DAYS])
 
The fields field1 & field2 are of date type:

 
As query (q={!cache=false}...)
Solr 4.10 -> 5s
Solr 5.5.0 -> 12s
 
As filter query (q={!cache=false}*:*=..,)
Solr 4.10 -> 9s
Solr 5.5.0 -> 11s
 
The query itself is bad and its optimization aside, I am wondering if
there is anything in Lucene/Solr that would have such an impact on query
execution time between versions.
 
Originally I though it might be related to
https://issues.apache.org/jira/browse/SOLR-8251 and testing on small
scale proved that there is a difference in performance. However upgraded
version is already 5.5.0.
 
 
 
Thanks,
Jarek
 


Re: what is opening realtime Searcher

2016-04-19 Thread Jaroslaw Rozanski
Hi Erick,

Thanks for the info. Was under impression that we have extra setting
"openSearcher" to control when the searchers are being opened. 

>From what you saying a searcher can be opened not only as a result of
hard or soft commit.

What I am observe, to follow your example:
T0 - everything is committed 
T1 - index document
T2 - opens realtime searchers
(time passes)
T3 - soft commit (commitScheduler)
T4 - opens searcher
(time passes)
T5 - hard commit (commitScheduler, openSearcher=false)
(time passes)
T6 - soft commit (commitScheduler)
T7 - opens searcher  

The T2 in above example is what is unexpected. 

Having had a look at this thread
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3cd2f8751a-b16a-4736-9e03-50873711d...@gmail.com%3E
I was wondering if I had something misconfigured. 


Thanks,
Jarek

On Tue, 19 Apr 2016, at 01:02, Erick Erickson wrote:
> This is about real-time get. The idea is this. Suppose
> you have a doc doc1 already in your index at time T1
> and update it at time T2 and your soft commit happens
> at time T3.
> 
> If a search a search happens between time T1 and T2
> but the fetch happens between T2 and T3, you get
> back the updated document, not the doc that was in
> the index. So the reatime get is outside the
> soft and hard commit issues.
> 
> It's a pretty lightweight operation, no caches are invalidated
> or warmed etc.
> 
> Best,
> Erick
> 
> On Mon, Apr 18, 2016 at 9:59 AM, Jaroslaw Rozanski
> <s...@jarekrozanski.com> wrote:
> > Hi,
> >
> >  What exactly triggers opening new "realtime" searcher?
> >
> > 2016-04-18_16:28:02.33289 INFO  (qtp1038620625-13) [c:col1 s:shard1 
> > r:core_node3 x:col1_shard1_replica3] o.a.s.s.SolrIndexSearcher Opening 
> > Searcher@752e986f[col1_shard1_replica3] realtime
> >
> > I am seeing above being triggered when adding documents to index. The
> > frequency (from few milliseconds to few seconds) does not correlate with
> > maxTime of either autoCommit or autoSoftCommit (which are fixed to tens
> > of seconds).
> >
> > Client never sends commit message explicitly (and there is
> > IgnoreCommitOptimizeUpdateProcessorFactory in processor chain).
> >
> > Re: Solr 5.5.0
> >
> >
> >
> > Thanks,
> > Jarek
> >


what is opening realtime Searcher

2016-04-18 Thread Jaroslaw Rozanski
Hi,

 What exactly triggers opening new "realtime" searcher?

2016-04-18_16:28:02.33289 INFO  (qtp1038620625-13) [c:col1 s:shard1 
r:core_node3 x:col1_shard1_replica3] o.a.s.s.SolrIndexSearcher Opening 
Searcher@752e986f[col1_shard1_replica3] realtime

I am seeing above being triggered when adding documents to index. The
frequency (from few milliseconds to few seconds) does not correlate with
maxTime of either autoCommit or autoSoftCommit (which are fixed to tens
of seconds).
 
Client never sends commit message explicitly (and there is
IgnoreCommitOptimizeUpdateProcessorFactory in processor chain).
 
Re: Solr 5.5.0
 
 
 
Thanks,
Jarek
 


Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Jaroslaw Rozanski
Hi,

How are you executing searches? 

I am asking because if you search using Solr client, for example SolrJ -
ie. create instance of CloudSolrClient, and not directly via HTTP
endpoint, it will provided load-balancing (last time I checked it picks
random non-stale node).


Thanks,
Jarek

On Mon, 18 Apr 2016, at 05:58, John Bickerstaff wrote:
> Thanks, so on the matter of indexing -- while I could isolate a cloud
> replica from queries by not including it in the load balancer's list...
> 
> ... I cannot isolate any of the replicas from an indexing perspective by
> a
> similar strategy because the SOLR leader decides who does indexing?  Or
> do
> all "nodes" index the same incoming document independently?
> 
> Now that I know I still need a load balancer, I guess I'm trying to find
> a
> way to keep indexing load off servers that are busy serving search
> results...  Possibly by having one or two servers just handle indexing...
> 
> Perhaps I'm looking in the wrong direction though -- and should just spin
> up more replicas to handle more indexing load?
> On Apr 17, 2016 10:46 PM, "Walter Underwood" 
> wrote:
> 
> No, Zookeeper is used for managing the locations of replicas and the
> leader
> for indexing. Queries should still be distributed with a load balancer.
> 
> Queries do NOT go through Zookeeper.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> > On Apr 17, 2016, at 9:35 PM, John Bickerstaff 
> wrote:
> >
> > My prior use of SOLR in production was pre SOLR cloud.  We put a
> > round-robin  load balancer in front of replicas for searching.
> >
> > Do I understand correctly that a load balancer is unnecessary with SOLR
> > Cloud?  I. E. -- SOLR and Zookeeper will balance the load, regardless of
> > which replica's URL is getting hit?
> >
> > Are there any caveats?
> >
> > Thanks,


Re: Adding replica on solr - 5.50

2016-04-15 Thread Jaroslaw Rozanski
Hi,

Does the `=...` actually work for you? When attempting similar with
Solr 5.3.1, despite what documentation said, I had to use
`node_name=...`.


Thanks,
Jarek

On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
> Another thought - again probably not it, but just in case...
> 
> Shouldn't this: =x.x.x.x:9001_solr
> 
> 
> Actually be this?  =x.x.x.x:9001/solr
> 
> 
> (Note the / instead of _ )
> 
> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
>  > wrote:
> 
> > Jay - it's probably too simple, but the error says "not currently active"
> > which could, of course, mean that although it's up and running, it's not
> > listening on the port you have in the command line...  Or that the port is
> > blocked by a firewall or other network problem.
> >
> > I note that you're using ports different from the default 8983 for your
> > Solr instances...
> >
> > You probably checked already, but I thought I'd mention it.
> >
> >
> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
> > j...@johnbickerstaff.com> wrote:
> >
> >> Thanks Eric!
> >>
> >> I'll look into that immediately - yes, I think that cURL would qualify as
> >> scriptable for my IT lead.
> >>
> >> In the end, I found I could do it two ways...
> >>
> >> Either copy the entire solr data directory over to /var/solr/data on the
> >> new machine, change the directory name and the entries in the
> >> core.properties file, then start the already-installed Solr in cloud mode -
> >> everything came up roses in the cloud section of the UI - the new replica
> >> was there as part of the collection, properly named and worked fine.
> >>
> >> Alternatively, I used the command I mentioned earlier and then waited as
> >> the data was replicated over to the newly-created replica -- again,
> >> everything was roses in the Cloud section of the Admin UI...
> >>
> >> What might I have messed up in this scenario?  I didn't love the hackish
> >> feeling either, but had been unable to find anything like the addreplica -
> >> although I did look for a fairly long time - I'm glad to know about it now.
> >>
> >>
> >>
> >> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson 
> >> wrote:
> >>
> >>> bq:  the Solr site about how to add a
> >>> replica to a Solr cloud.  The Admin UI appears to require that the
> >>> directories be created anyway
> >>>
> >>> No, no, a thousand times NO! You're getting confused,
> >>> I think, with the difference between _cores_ and _collections_
> >>> (or replicas in a collection).
> >>>
> >>> Do not use the admin UI for _cores_ to create replicas. It's possible
> >>> if (and only if) you do it exactly correctly. Instead, use the
> >>> collections API
> >>> ADDREPLICA command here:
> >>>
> >>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
> >>>
> >>> Which you could cURL etc., does that qualify as "scripting" in your
> >>> situation?
> >>>
> >>> You're right, the Solr instance must be up and running for the replica to
> >>> be added, but that's not onerous
> >>>
> >>>
> >>> The bin/solr script is a "work in progress", and doesn't have direct
> >>> support
> >>> for "addreplica", but it could be added.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
> >>>  wrote:
> >>> > Sure - couldn't agree more.
> >>> >
> >>> > I couldn't find any good documentation on the Solr site about how to
> >>> add a
> >>> > replica to a Solr cloud.  The Admin UI appears to require that the
> >>> > directories be created anyway.
> >>> >
> >>> > There is probably a way to do it through the UI, once Solr is
> >>> installed on
> >>> > a new machine - and IIRC, I did manage that, but my IT guy wanted
> >>> > scriptable command lines.
> >>> >
> >>> > Also, IIRC, the stuff I did on the command line actually showed the
> >>> API URL
> >>> > as part of the output so Jay could try that and see what the difference
> >>> > is...
> >>> >
> >>> > Jay - I'm going offline now, but if you're still stuck tomorrow, I'll
> >>> try
> >>> > to recreate... I have a VM snapshot just before I issued the command...
> >>> >
> >>> > Keep in mind everything I did was in a Solr Cloud...
> >>> >
> >>> > On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes 
> >>> wrote:
> >>> >
> >>> >> I’m all for finding another way to make something work, but I feel
> >>> like
> >>> >> this is the wrong advice.
> >>> >>
> >>> >> There are two options:
> >>> >> 1) You are doing something wrong. In which case, you should probably
> >>> >> invest in figuring out what.
> >>> >> 2) Solr is doing something wrong. In which case, you should probably
> >>> >> invest in figuring out what, and then file a bug so it doesn’t happen
> >>> to
> >>> >> anyone