Is there way to autowarm new searcher using recently ran queries

2021-01-27 Thread Pushkar Raste
Hi,

A rookie question. We have a Solr cluster that doesn't get too much
traffic. We see that our queries take long time unless we run a script to
send more traffic to Solr.

We are indexing data all the time and use autoCommit.

I am wondering if there is a way to warmup new searcher on commit by
rerunning queries processed by the last searcher. May be it happens by
default but then I can't understand why we see high query times if those
searchers are being warmed.


Add custom comparator for field(s) for Sorting.

2020-08-07 Thread Pushkar Raste
Hi,
Is it possible to add a custom comparator to a field for sorting. e.g.
let's say I have field 'name' and following documents

{
  id : "doc1",
  name : "1"
}


{
  id : "doc2",
  name : "S1"
}


{
  id : "doc2",
  name : "S2"
}

if I sort using field 'name', the order would be : ["doc1", "doc2", "doc3"]
but I want pure numbers to last and want the order ["doc2", "doc3",
"doc1"]. Is there a way I can provide my own comparator?


Is it possible to direct queries to replicas in SolrCloud

2020-05-21 Thread Pushkar Raste
Hi,
In master/slave we can send queries to slaves only, now that we have tlog
and pull replicas can we send queries to those replicas to achieve similar
scaling like master/slave for large search volumes?


-- 
— Pushkar Raste


Re: Does Solr master/slave support shard split

2020-05-21 Thread Pushkar Raste
Thanks Eric. Moving to SolrCloud for splitting is what I too imagined 

On Thu, May 21, 2020 at 1:28 PM Erick Erickson 
wrote:

> In a word, “no”. It’s a whole ’nother architecture to deal
> with shards, and stand-alone (i.e. master/slave) has no
> concept of that.
>
> You could make a single-shard collection in SolrCloud,
> copy the index to the right place (I’d shut down Solr while
> I copied it), and then use SPLITSHARD on it, but that implies
> you’d be going to SolrCloud.
>
> Best,
> Erick
>
> > On May 21, 2020, at 10:35 AM, Pushkar Raste 
> wrote:
> >
> > Hi,
> > Does Solr support shard split in the master/slave setup. I understand
> that
> > there is no shard concept is master/slave and we just have cores but can
> we
> > split a core into two.
> >
> > If yes is there way to specify new mapping based on the unique key.
> > --
> > — Pushkar Raste
>
> --
— Pushkar Raste


Does Solr master/slave support shard split

2020-05-21 Thread Pushkar Raste
Hi,
Does Solr support shard split in the master/slave setup. I understand that
there is no shard concept is master/slave and we just have cores but can we
split a core into two.

If yes is there way to specify new mapping based on the unique key.
-- 
— Pushkar Raste


Re: Does Solr replicate data securely

2019-11-13 Thread Pushkar Raste
Hi,
Can some help me with my question.

On Tue, Nov 12, 2019 at 10:20 AM Pushkar Raste 
wrote:

> Hi,
> How about in the master/slave set up. If I enable ssl in master/slave
> setup would the segment and config files be copied using TLS.
>
> On Sat, Nov 9, 2019 at 3:31 PM Jan Høydahl  wrote:
>
>> You choose. If you use solr cloud and have enabled ssl in your cluster,
>> then all requests including replication will be secure (https). This it is
>> still tcp but using TLS :)
>>
>> Jan Høydahl
>>
>> > 6. nov. 2019 kl. 00:03 skrev Pushkar Raste :
>> >
>> > Hi,
>> > When slaves/pull replicas copy index files from master is done using an
>> > secure protocol or just over tcp?
>> > --
>> > — Pushkar Raste
>>
> --
— Pushkar Raste


Re: Does Solr replicate data securely

2019-11-12 Thread Pushkar Raste
Hi,
How about in the master/slave set up. If I enable ssl in master/slave setup
would the segment and config files be copied using TLS.

On Sat, Nov 9, 2019 at 3:31 PM Jan Høydahl  wrote:

> You choose. If you use solr cloud and have enabled ssl in your cluster,
> then all requests including replication will be secure (https). This it is
> still tcp but using TLS :)
>
> Jan Høydahl
>
> > 6. nov. 2019 kl. 00:03 skrev Pushkar Raste :
> >
> > Hi,
> > When slaves/pull replicas copy index files from master is done using an
> > secure protocol or just over tcp?
> > --
> > — Pushkar Raste
>


Does Solr replicate data securely

2019-11-05 Thread Pushkar Raste
Hi,
When slaves/pull replicas copy index files from master is done using an
secure protocol or just over tcp?
-- 
— Pushkar Raste


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Pushkar Raste
I understand that the problem will not be fixed. What I am trying to
understand is even with the exception (the only exception I saw after
running my Solr4 cluster on JDK11 for 4 weeks), I am able index and query
documents just fine.

What does this exception really affect.

On Wed, Aug 28, 2019 at 3:08 PM Shawn Heisey  wrote:

> On 8/27/2019 8:22 AM, Pushkar Raste wrote:
> > I am trying to run Solr 4 on JDK11, although this version is not
> supported
> > on JDK11 it seems to be working fine except for the error/exception
> "Unmap
> > hack not supported on this platform".
> > What the risks/downsides of running into this.
>
> The first version of Solr that was qualified with Java 9 was Solr 7.0.0.
>   New Java versions did not work properly with older versions of Solr.
> Java 8 is as high as you can go with Solr 4.
>
> Solr versions up through 4.7.x have a minimum Java version requirement
> of Java 6.  From 4.8.0 through 5.x, Java 7 is required as a minimum.
> Starting with Solr 6.0.0, the minimum requirement moved to Java 8.  When
> Solr 9.0.0 is released, its minimum requirement will be Java 11.
>
> Right now, with Solr 8.x being the current version, Solr 7.x is only
> going to get major bugfixes, and there will be no updates at all to
> version 6.x and older.  The problem you're running into with Solr 4 on
> Java 11 will not be fixed.  If you want to run Java 11, you will need to
> upgrade to the latest Solr 7.x or 8.x.  Early 7.x versions would not
> work with Java 10 or later.
>
> Thanks,
> Shawn
>
-- 
— Pushkar Raste


Re: What are the risk of running into "Unmap hack not supported on this platform"

2019-08-28 Thread Pushkar Raste
Can someone help me with this?

On Tue, Aug 27, 2019 at 10:22 AM Pushkar Raste 
wrote:

> Hi,
> I am trying to run Solr 4 on JDK11, although this version is not supported
> on JDK11 it seems to be working fine except for the error/exception "Unmap
> hack not supported on this platform".
> What the risks/downsides of running into this.
>
-- 
— Pushkar Raste


What are the risk of running into "Unmap hack not supported on this platform"

2019-08-27 Thread Pushkar Raste
Hi,
I am trying to run Solr 4 on JDK11, although this version is not supported
on JDK11 it seems to be working fine except for the error/exception "Unmap
hack not supported on this platform".
What the risks/downsides of running into this.


Re: Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Pushkar Raste
We have only a handful of fields that are stored and many (non Text) fields
which are neither stored nor have docValues :-(

Looks like giving Luke a shot is the answer. Can you point me to an example
to extract the fields from inverted Index using Luke.

On Wed, May 22, 2019 at 11:52 AM Erick Erickson 
wrote:

> Well, if they’re all docValues or stored=true, sure. It’d be kind of
> slow.. The short form is “if you can specify fl=f1,f2,f3…. for all your
> fields and see all your values, then it’s easy if slow”.
>
> If that works _and_ you are on Solr 4.7+ cursorMark will help the “deep
> paging” issue.
>
> If they’re all docValues, you could use the /export handler to dump them
> all to a file and re-index that.
>
> If none of those are possible, you can do this but it’d be quite painful.
> Luke can reassemble a document (lossily for text fields, but in this case
> it’d be OK since they’re simple types) by examining the inverted index and
> pulling out the values. Painfully slow and you’d have to write custom code
> probably at the Lucene level to make it all work.
>
> Best,
> Erick
>
> > On May 22, 2019, at 8:11 AM, Pushkar Raste 
> wrote:
> >
> > I know this is a long shot. I am trying move from Solr4 to Solr7.
> > Reindexing all the data from the source is difficult to do in a
> reasonable
> > time. All the fields are of basic types like int, long, float, double,
> > Boolean, date,  string.
> >
> > Since these fields don’t have analyzers, I was wondering if these fields
> > can be retrieved while iterating over index while reading the documents.
> > --
> > — Pushkar Raste
>
> --
— Pushkar Raste


Is it possible to reconstruct non stored fields and tun those into stored fields

2019-05-22 Thread Pushkar Raste
I know this is a long shot. I am trying move from Solr4 to Solr7.
Reindexing all the data from the source is difficult to do in a reasonable
time. All the fields are of basic types like int, long, float, double,
Boolean, date,  string.

Since these fields don’t have analyzers, I was wondering if these fields
can be retrieved while iterating over index while reading the documents.
-- 
— Pushkar Raste


How to stop a new slave from serving request until it has replicated index the first time.

2019-02-06 Thread Pushkar Raste
Hi,
In the master/slave setup, as soon as I start a new slave it starts to
serve request. Often the searches result in no documents being found as
index has not been replicated yet. Is there a way to stop replica from
serving request (marking node unhealthy) until the index is replicated for
the first time.


Re: API to convert a SolrInputDocument to JSON

2019-01-24 Thread Pushkar Raste
May be my questions wasn’t clear. By issues I meant will Solrj client for
7.x work to index documents in Solr 4.10 or vice versa.

I am OK to use HttpSolrClient

On Wed, Jan 23, 2019 at 9:33 PM Erick Erickson 
wrote:

> Walter:
>
> Don't know if it helps, but have you looked at:
> https://issues.apache.org/jira/browse/SOLR-445
>
> I have _not_ worked with this personally in prod SolrCloud systems, so
> I can't say much more
> than it exists. It's only available in Solr 6.1+
>
> Best,
> Erick
>
> On Wed, Jan 23, 2019 at 5:55 PM Pushkar Raste 
> wrote:
> >
> > You mean I can use SolrJ 7.x for both indexing documents to both Solr 4
> and
> > Solr 7 as well as the SolrInputDocument class from Solrj 7.x
> >
> > Wouldn’t there be issues if there are any backwards incompatible changes.
> >
> > On Wed, Jan 23, 2019 at 8:09 PM Shawn Heisey 
> wrote:
> >
> > > On 1/23/2019 5:49 PM, Pushkar Raste wrote:
> > > > Thanks for the quick response Shawn. It is migrating ion from Solr
> 4.10
> > > > master/slave to Solr Cloud 7.x
> > >
> > > In that case, use SolrJ 7.x, with CloudSolrClient to talk to the new
> > > version and HttpSolrClient to talk to the old version. Use the same
> > > SolrInputDocument objects for both.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
>


Re: API to convert a SolrInputDocument to JSON

2019-01-23 Thread Pushkar Raste
You mean I can use SolrJ 7.x for both indexing documents to both Solr 4 and
Solr 7 as well as the SolrInputDocument class from Solrj 7.x

Wouldn’t there be issues if there are any backwards incompatible changes.

On Wed, Jan 23, 2019 at 8:09 PM Shawn Heisey  wrote:

> On 1/23/2019 5:49 PM, Pushkar Raste wrote:
> > Thanks for the quick response Shawn. It is migrating ion from Solr 4.10
> > master/slave to Solr Cloud 7.x
>
> In that case, use SolrJ 7.x, with CloudSolrClient to talk to the new
> version and HttpSolrClient to talk to the old version. Use the same
> SolrInputDocument objects for both.
>
> Thanks,
> Shawn
>
>


Re: API to convert a SolrInputDocument to JSON

2019-01-23 Thread Pushkar Raste
Thanks for the quick response Shawn. It is migrating ion from Solr 4.10
master/slave to Solr Cloud 7.x

On Wed, Jan 23, 2019 at 7:41 PM Shawn Heisey  wrote:

> On 1/23/2019 5:05 PM, Pushkar Raste wrote:
> > We are setting up cluster with new version Solr and going to reindex
> data.
> > However, until all the data is indexed I need keep indexing data in the
> old
> > cluster as well. We are currently using the Solrj client and constructing
> > SolrInputDocument objects to index data.
> >
> >   To avoid conflicts with the old and new jars, I am planning to use
> > HttpClient and json payload . Is there a convenient API to convert a
> > SolrInputDocument to json
>
> First question: Is this SolrCloud?  If so, are both versions running
> SolrCloud?
>
> Second question: What are the old and new Solr versions on the server side?
>
> The answers to those questions will determine where I go with further
> questions and assistance.
>
> Thanks,
> Shawn
>
>


API to convert a SolrInputDocument to JSON

2019-01-23 Thread Pushkar Raste
Hi,
We are setting up cluster with new version Solr and going to reindex data.
However, until all the data is indexed I need keep indexing data in the old
cluster as well. We are currently using the Solrj client and constructing
SolrInputDocument objects to index data.

 To avoid conflicts with the old and new jars, I am planning to use
HttpClient and json payload . Is there a convenient API to convert a
SolrInputDocument to json


Re: Can Solr 4.10 work with JDK11

2019-01-15 Thread Pushkar Raste
Or let me rephrase the question. What is the minimum Solr version that is
JDK11 compatible.

On Tue, Jan 15, 2019 at 10:27 AM Pushkar Raste 
wrote:

> I probably already know the answer for this but was still wondering.
>


Can Solr 4.10 work with JDK11

2019-01-15 Thread Pushkar Raste
I probably already know the answer for this but was still wondering.


Questions about the IndexUpgrader tool.

2018-12-17 Thread Pushkar Raste
Hi,
I have questions about the IndexUpgrader tool.

- I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from
4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the
IndexUpgrader but without loading the Index in the Solr at all during the
successive upgrades.

- The note in the tool says "This tool only keeps last commit in an index".
Does this mean I have optimize the index before running the tool?

- There is another note about partially upgraded index. How can the index
be partially upgraded. One scenario I can think of is 'If I upgraded let's
say from Solr 5 to Solr 6 and then added some documents. The new documents
will be in Lucerne 6 format already, while old documents will still be Solr
5 format’ Is my understanding correct?


Re: NullPointerException in PeerSync.handleUpdates

2017-11-22 Thread Pushkar Raste
As mentioned in the JIRA, exception seems to be coming from a log
statement. The issue was fixed in 6.3, here is relevant line f rom 6.3
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.3.0/solr/core/src/java/org/apache/solr/update/PeerSync.java#L707



On Wed, Nov 22, 2017 at 1:18 AM, Erick Erickson 
wrote:

> Right, if there's no "fixed version" mentioned and if the resolution
> is "unresolved", it's not in the code base at all. But that JIRA is
> not apparently reproducible, especially on more recent versions that
> 6.2. Is it possible to test a more recent version (6.6.2 would be my
> recommendation).
>
> Erick
>
> On Tue, Nov 21, 2017 at 9:58 PM, S G  wrote:
> > My bad. I found it at https://issues.apache.org/jira/browse/SOLR-9453
> > But I could not find it in changes.txt perhaps because its yet not
> resolved.
> >
> > On Tue, Nov 21, 2017 at 9:15 AM, Erick Erickson  >
> > wrote:
> >
> >> Did you check the JIRA list? Or CHANGES.txt in more recent versions?
> >>
> >> On Tue, Nov 21, 2017 at 1:13 AM, S G  wrote:
> >> > Hi,
> >> >
> >> > We are running 6.2 version of Solr and hitting this error frequently.
> >> >
> >> > Error while trying to recover. core=my_core:java.lang.
> >> NullPointerException
> >> > at org.apache.solr.update.PeerSync.handleUpdates(
> >> PeerSync.java:605)
> >> > at org.apache.solr.update.PeerSync.handleResponse(
> >> PeerSync.java:344)
> >> > at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
> >> > at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> >> RecoveryStrategy.java:376)
> >> > at org.apache.solr.cloud.RecoveryStrategy.run(
> >> RecoveryStrategy.java:221)
> >> > at java.util.concurrent.Executors$RunnableAdapter.
> >> call(Executors.java:511)
> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >> > at org.apache.solr.common.util.ExecutorUtil$
> >> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> >> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> ThreadPoolExecutor.java:1142)
> >> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> ThreadPoolExecutor.java:617)
> >> > at java.lang.Thread.run(Thread.java:745)
> >> >
> >> >
> >> >
> >> > Is this a known issue and fixed in some newer version?
> >> >
> >> >
> >> > Thanks
> >> > SG
> >>
>


Re: Spread SolrCloud across two locations

2017-05-26 Thread Pushkar Raste
In my setup once DC1 comes back up make sure you start only two nodes.

 Now bring down the original observer and make it observer again.

Bring back the third node



It seems like lot of work compared to Jan's setup but you get 5 voting
members instead of 3 in normal situation.

On May 26, 2017 10:56 AM, "Susheel Kumar" <susheel2...@gmail.com> wrote:

> Thanks, Pushkar, Make sense.  Trying to understand the difference between
> your setup vs Jan's proposed setup.
>
> - Seems like when DC1 goes down, in your setup we have to bounce *one* from
> observer to non-observer while in Jan's setup *two* observers to
> non-observers.  Anything else I am missing
>
> - When DC1 comes back -  with your setup we need to bounce the one
> non-observer to observer to have 5 nodes quorum otherwise there are 3 + 3
> observers while with Jan's setup if we don't take any action when DC1 comes
> back, we are still operational with 5 nodes quorum.  Isn't it?  Or I am
> missing something.
>
>
>
> On Fri, May 26, 2017 at 10:07 AM, Pushkar Raste <pushkar.ra...@gmail.com>
> wrote:
>
> > Damn,
> > Math is hard
> >
> > DC1 : 3 non observers
> > DC2 : 2 non observers
> >
> > 3 + 2 = 5 non observers
> >
> > Observers don't participate in voting = non observers participate in
> voting
> >
> > 5 non observers = 5 votes
> >
> > In addition to the 2 non observer, DC2 also has an observer, which as you
> > pointed out does not participate in the voting.
> >
> > We still have 5 voting nodes.
> >
> >
> > Think of the observer as a standby name node in Hadoop 1.x, where some
> > intervention needed if the primary name node went down.
> >
> >
> > I hope my math makes sense
> >
> > On May 26, 2017 9:04 AM, "Susheel Kumar" <susheel2...@gmail.com> wrote:
> >
> > From ZK documentation, observers do not participate in vote,  so Pushkar,
> > when you said 5 nodes participate in voting, what exactly you mean?
> >
> > -- Observers are non-voting members of an ensemble which only hear the
> > results of votes, not the agreement protocol that leads up to them.
> >
> > Per ZK documentation, 3.4 includes observers,  does that mean Jan thought
> > experiment is practically possible, correct?
> >
> >
> > On Fri, May 26, 2017 at 3:53 AM, Rick Leir <rl...@leirtech.com> wrote:
> >
> > > Jan, Shawn, Susheel
> > >
> > > First steps first. First, let's do a fault-tolerant cluster, then
> maybe a
> > > _geographically_ fault-tolerant cluster.
> > >
> > > Add another server in either DC1 or DC2, in a separate rack, with
> > > independent power etc. As Shawn says below, install the third ZK there.
> > You
> > > would satisfy most of your requirements that way.
> > >
> > > cheers -- Rick
> > >
> > >
> > > On 2017-05-23 12:56 PM, Shawn Heisey wrote:
> > >
> > >> On 5/23/2017 10:12 AM, Susheel Kumar wrote:
> > >>
> > >>> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster
> > in
> > >>> one of lower env with 6 shards/replica in dc1 & 6 shard/replica in
> dc2
> > >>> (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK
> in
> > dc2.
> > >>> (I didn't have the availability of 3rd data center for ZK so went
> with
> > only
> > >>> 2 data center with above configuration) and so far no issues. Its
> been
> > >>> running fine, indexing, replicating data, serving queries etc. So in
> my
> > >>> test, setting up single cluster across two zones/data center works
> > without
> > >>> any issue when there is no or very minimal latency (in my case around
> > 30ms
> > >>> one way
> > >>>
> > >>
> > >> With that setup, if dc2 goes down, you're all good, but if dc1 goes
> > down,
> > >> you're not.
> > >>
> > >> There aren't enough ZK servers in dc2 to maintain quorum when dc1 is
> > >> unreachable, and SolrCloud is going to go read-only.  Queries would
> most
> > >> likely work, but you would not be able to change the indexes at all.
> > >>
> > >> ZooKeeper with N total servers requires int((N/2)+1) servers to be
> > >> operational to maintain quorum.  This means that with five total
> > servers,
> > >> three must be operational and able to talk to each other, or ZK cannot
> > >> guarantee that there is no split-brain, so quorum is lost.
> > >>
> > >> ZK in two data centers will never be fully fault-tolerant. There is no
> > >> combination of servers that will work properly.  You must have three
> > data
> > >> centers for a geographically fault-tolerant cluster.  Solr would be
> > >> optional in the third data center.  ZK must be installed in all three.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> > >
> >
>


Re: Spread SolrCloud across two locations

2017-05-26 Thread Pushkar Raste
Damn,
Math is hard

DC1 : 3 non observers
DC2 : 2 non observers

3 + 2 = 5 non observers

Observers don't participate in voting = non observers participate in voting

5 non observers = 5 votes

In addition to the 2 non observer, DC2 also has an observer, which as you
pointed out does not participate in the voting.

We still have 5 voting nodes.


Think of the observer as a standby name node in Hadoop 1.x, where some
intervention needed if the primary name node went down.


I hope my math makes sense

On May 26, 2017 9:04 AM, "Susheel Kumar"  wrote:

>From ZK documentation, observers do not participate in vote,  so Pushkar,
when you said 5 nodes participate in voting, what exactly you mean?

-- Observers are non-voting members of an ensemble which only hear the
results of votes, not the agreement protocol that leads up to them.

Per ZK documentation, 3.4 includes observers,  does that mean Jan thought
experiment is practically possible, correct?


On Fri, May 26, 2017 at 3:53 AM, Rick Leir  wrote:

> Jan, Shawn, Susheel
>
> First steps first. First, let's do a fault-tolerant cluster, then maybe a
> _geographically_ fault-tolerant cluster.
>
> Add another server in either DC1 or DC2, in a separate rack, with
> independent power etc. As Shawn says below, install the third ZK there.
You
> would satisfy most of your requirements that way.
>
> cheers -- Rick
>
>
> On 2017-05-23 12:56 PM, Shawn Heisey wrote:
>
>> On 5/23/2017 10:12 AM, Susheel Kumar wrote:
>>
>>> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster in
>>> one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2
>>> (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in
dc2.
>>> (I didn't have the availability of 3rd data center for ZK so went with
only
>>> 2 data center with above configuration) and so far no issues. Its been
>>> running fine, indexing, replicating data, serving queries etc. So in my
>>> test, setting up single cluster across two zones/data center works
without
>>> any issue when there is no or very minimal latency (in my case around
30ms
>>> one way
>>>
>>
>> With that setup, if dc2 goes down, you're all good, but if dc1 goes down,
>> you're not.
>>
>> There aren't enough ZK servers in dc2 to maintain quorum when dc1 is
>> unreachable, and SolrCloud is going to go read-only.  Queries would most
>> likely work, but you would not be able to change the indexes at all.
>>
>> ZooKeeper with N total servers requires int((N/2)+1) servers to be
>> operational to maintain quorum.  This means that with five total servers,
>> three must be operational and able to talk to each other, or ZK cannot
>> guarantee that there is no split-brain, so quorum is lost.
>>
>> ZK in two data centers will never be fully fault-tolerant. There is no
>> combination of servers that will work properly.  You must have three data
>> centers for a geographically fault-tolerant cluster.  Solr would be
>> optional in the third data center.  ZK must be installed in all three.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Spread SolrCloud across two locations

2017-05-25 Thread Pushkar Raste
ZK 3.5 isn't officially released. It is alpha/beta for years. I wouldn't
use it in production.

The setup I proposed :

DC1 : 3 nodes, all are non observer's.
DC2 : 3 nodes, 2 are non observer's and 1 is observer

This means only 5 nodes participate in voting and 3 nodes make quorum. If
DC1 goes down, you will have to bounce observer and bring it back as non
observer.
This will still cause some down time but it is easier to do than trying to
add another node when you are already in a fire fighting mode.

On May 25, 2017 5:50 PM, "Jan Høydahl" <jan@cominvent.com> wrote:

Thanks for the tip Pushkar,

> A setup I have used in the past was to have an observer I  DC2. If DC1 one
I was not aware that ZK 3.4 supports observers, thought it was a 3.5
feature.
So do you setup followers only in DC1 (3x), and observers only in DC2 (3x)
and then point each Solr node to all 6 ZKs?
Then you could get away by flipping the DC2 observers to followers in a
rolling manner and avoid restarting Solr?

> When I tested this solr survived just fine, but it been a while.
Anirudha you say you tested this with 3.5. Does that mean that ZK3.5 works
with Solr?

> Whether ZK 3.5 is there or not, there is potential unknown behavior when
> dc1 comes back online, unless you can have dc1 personnel shut the
> servers down, or block communication between your servers in dc1 and dc2.
Shawn, you are right that in the sketched setup, if DC1 is allowed to come
back up
and join the cluster, then various Solr nodes will use a different ZK
connection string
and the DC1 ZKs won’t talk to all the DC2 ones.

But with the observer model, all nodes know about all ZKs all the time, and
the “odd number of ZKs”
requirement is only applying to the voting followers, see
https://zookeeper.apache.org/doc/trunk/zookeeperObservers.html <
https://zookeeper.apache.org/doc/trunk/zookeeperObservers.html>

Let’s do a thought experiment:
- Have 2 ZKs in DC1, both being followers, and 3 ZKs in DC2, one being
follower and the others observers
- All Solr nodes have a ZK_HOST=zk1dc1,zk2dc1,zk1dc2,zk2dc2,zk3dc2
connection string, i.e. five nodes where initially 3 form the quorum
- At loss of DC2 all is fine
- At loss of DC1, you reconfigure zk2dc2 and zk3dc2 from observers to
followers, meaning they now are in majority (3 of 5) and writes resume
- When DC1 comes back up, zk1dc1 and zk2dc1 are now not a majority and need
to sync up from the others
- After all is stable again, you flip two DC2 ZKs back to observers

I may have misunderstood the observer thing here, but if this is at all
doable, this should be scriptable with ssh or ansible quite easily?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. mai 2017 kl. 00.35 skrev Pushkar Raste <pushkar.ra...@gmail.com>:
>
> A setup I have used in the past was to have an observer I  DC2. If DC1 one
> goes boom you need manual intervention to change observer's role to make
it
> a follower.
>
> When DC1 comes back up change on instance in DC2 to make it a observer
> again
>
> On May 24, 2017 6:15 PM, "Jan Høydahl" <jan@cominvent.com> wrote:
>
>> Sure, ZK does by design not support a two-node/two-location setup. But
>> still, users may want/need to deploy that,
>> and my question was if there are smart ways to make such a setup as
little
>> painful as possible in case of failure.
>>
>> Take the example of DC1: 3xZK and DC2: 2xZK again. And then DC1 goes
BOOM.
>> Without an active action DC2 would be read-only
>> What if then the Ops personnel in DC2 could, with a single
script/command,
>> instruct DC2 to resume “master” role:
>> - Add a 3rd DC2 ZK to the two existing, reconfigure and let them sync up.
>> - Rolling restart of Solr nodes with new ZK_HOST string
>> Of course, they would also then need to make sure that DC1 does not boot
>> up again before compatible change has been done there too.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>> 23. mai 2017 kl. 18.56 skrev Shawn Heisey <elyog...@elyograg.org>:
>>>
>>> On 5/23/2017 10:12 AM, Susheel Kumar wrote:
>>>> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster
>> in one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2
>> (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in
dc2.
>> (I didn't have the availability of 3rd data center for ZK so went with
only
>> 2 data center with above configuration) and so far no issues. Its been
>> running fine, indexing, replicating data, serving queries etc. So in my
>> test, setting up single cluster across two zones/data center works
without
>> any issue when there is no or very minimal latency (in my case around
30ms

Re: Spread SolrCloud across two locations

2017-05-24 Thread Pushkar Raste
A setup I have used in the past was to have an observer I  DC2. If DC1 one
goes boom you need manual intervention to change observer's role to make it
a follower.

When DC1 comes back up change on instance in DC2 to make it a observer
again

On May 24, 2017 6:15 PM, "Jan Høydahl"  wrote:

> Sure, ZK does by design not support a two-node/two-location setup. But
> still, users may want/need to deploy that,
> and my question was if there are smart ways to make such a setup as little
> painful as possible in case of failure.
>
> Take the example of DC1: 3xZK and DC2: 2xZK again. And then DC1 goes BOOM.
> Without an active action DC2 would be read-only
> What if then the Ops personnel in DC2 could, with a single script/command,
> instruct DC2 to resume “master” role:
> - Add a 3rd DC2 ZK to the two existing, reconfigure and let them sync up.
> - Rolling restart of Solr nodes with new ZK_HOST string
> Of course, they would also then need to make sure that DC1 does not boot
> up again before compatible change has been done there too.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 23. mai 2017 kl. 18.56 skrev Shawn Heisey :
> >
> > On 5/23/2017 10:12 AM, Susheel Kumar wrote:
> >> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster
> in one of lower env with 6 shards/replica in dc1 & 6 shard/replica in dc2
> (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK in dc2.
> (I didn't have the availability of 3rd data center for ZK so went with only
> 2 data center with above configuration) and so far no issues. Its been
> running fine, indexing, replicating data, serving queries etc. So in my
> test, setting up single cluster across two zones/data center works without
> any issue when there is no or very minimal latency (in my case around 30ms
> one way
> >
> > With that setup, if dc2 goes down, you're all good, but if dc1 goes
> down, you're not.
> >
> > There aren't enough ZK servers in dc2 to maintain quorum when dc1 is
> unreachable, and SolrCloud is going to go read-only.  Queries would most
> likely work, but you would not be able to change the indexes at all.
> >
> > ZooKeeper with N total servers requires int((N/2)+1) servers to be
> operational to maintain quorum.  This means that with five total servers,
> three must be operational and able to talk to each other, or ZK cannot
> guarantee that there is no split-brain, so quorum is lost.
> >
> > ZK in two data centers will never be fully fault-tolerant. There is no
> combination of servers that will work properly.  You must have three data
> centers for a geographically fault-tolerant cluster.  Solr would be
> optional in the third data center.  ZK must be installed in all three.
> >
> > Thanks,
> > Shawn
> >
>
>


Re: Disable All kind of caching in Solr/Lucene

2017-05-23 Thread Pushkar Raste
What version are you on. There was a bug where if you use cache size 0, it
would still create a cache with size 2 (or may be just 1). It was fixed
under https://issues.apache.org/jira/browse/SOLR-9886?filter=-2



On Apr 3, 2017 9:26 AM, "Nilesh Kamani"  wrote:

> @Yonik even though the code change is in SolrIndexer class, it has nothing
> do with index itself.
> After fetching docIds, I am filtering them on one more criteria. (Very
> weird code it is).
>
> I tried q={!cache=false}, but not working. Subsequent search is done under
> 2 milliseconds.
>
> Does anybdody have more insight  on this ?
>
> On Fri, Mar 31, 2017 at 2:17 PM, Yonik Seeley  wrote:
>
> > On Fri, Mar 31, 2017 at 1:53 PM, Nilesh Kamani 
> > wrote:
> > > @Alexandre - Could you please point me to reference doc to remove
> default
> > > cache settings ?
> > >
> > > @Yonik - The code change is in Solr Indexer to sort the results.
> >
> > OK, so to test indexing performance, there are no caches to worry
> > about (as long as you have autowarmCount=0 on all caches, as is the
> > case with the Solr example configs).
> >
> > To test sorted query performance (I assume you're sorting the index to
> > accelerate certain sorted queries), if you can't make the queries
> > unique, then add
> > {!cache=false} to the query
> > example: q={!cache=false}*:*
> > You could also add a random term on a non-existent field to change the
> > query and prevent unwanted caching...
> > example: q=*:* does_not_exist_s:149475394
> >
> > -Yonik
> >
>


Re: Latest advice on G1 collector?

2017-01-23 Thread Pushkar Raste
Hi Walter,
We have been using G1GC for more than a year now and are very happy with
it.

The only flag we have enabled is 'ParallelRefProcEnabled'

On Jan 23, 2017 3:00 PM, "Walter Underwood"  wrote:

> We have a workload with very long queries, and that can drive the CMS
> collector into using about 20% of the CPU time. So I’m ready to try G1 on a
> couple of replicas and see what happens. I’ve already upgraded to Java 8
> update 121.
>
> I’ve read these pages:
>
> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector
>  >
> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66 <
> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66>
>
> Any updates on recommended settings?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>


Re: retrieve ids of all indexed docs efficiently

2017-01-18 Thread Pushkar Raste
I think we should add the suggestion about docValues to the cursormark wiki
(documentation), we too ran in the same problem.

On Jan 18, 2017 5:52 PM, "Erick Erickson"  wrote:

> Is your ID field docValues? Making it a docValues field should reduce
> the amount of JVM heap you need.
>
>
> But the export is _much_ preferred, it'll be lots faster as well. Of
> course to export you need the values you're returning to be
> docValues...
>
> Erick
>
> On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David 
> wrote:
> > The export feature sounds promising, although I'll have to talk with our
> deployment folks here about enabling it.
> >
> > The query I'm issuing is:
> >
> > http://:8983/solr/_shard1_replica1/
> select?q=*:*=id+asc=1000=&
> fl=id=true=false=json
> >
> > Thanks,
> > Div.
> >
> >
> > On 1/18/17, 3:54 PM, "Jan Høydahl"  wrote:
> >
> > Don't know why you have mem problems. Can you paste in examples of
> full query strings during cursor mark querying? Sounds like you may be
> using it wrong.
> >
> > Or try exporting
> >
> > https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%
> 2Fsolr%2FExporting%2BResult%2BSets=01%7C01%7C%
> 7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea6
> 4919%7C1=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D=0
> >
> > --
> > Jan Høydahl
> >
> > > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David <
> david.slo...@here.com>:
> > >
> > > Hi --
> > >
> > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1
> index.  In my query, I've set rows=1000, fl=id, and am using the cursorMark
> mechanism to split the overall traversal into multiple requests.  Not
> because I care about the order, but because the documentation implies that
> it's necessary to make cursorMark work reliably, I've also set sort=id
> asc.  While this does give me the data I need on a smaller index, it causes
> the heap memory utilization to go through the roof; for our large indices,
> the Solr JVM throws an out of memory exception, and we've already
> configured it as large as is practical given the physical memory of the
> machine.
> > >
> > > For what it's worth, we do use Solr cloud to split each of our
> indices into multiple shards.  However for this query, I'm addressing a
> single shard directly (connecting to the correct Solr server instance for
> one replica of that shard and setting distrib=false in my query) rather
> than relying on Solr to route and assemble the results.
> > > Thanks in advance,
> > > Div Slomin.
> > >
> >
> >
>


Re: SolrCloud: ClusterState says we are the leader but locally we don't think so

2017-01-17 Thread Pushkar Raste
Try bouncing the overseer for your cluster.

On Jan 17, 2017 12:01 PM, "Kelly, Frank"  wrote:

> Solr Version: 5.3.1
>
> Configuration: 3 shards, 3 replicas each
>
> After running out of heap memory recently (cause unknown) we’ve been
> successfully restarting nodes to recover.
>
> Finally we did one restart and one of the nodes now says the following
> 2017-01-17 16:57:16.835 ERROR (qtp1395089624-17)
> [c:prod_us-east-1_here_account s:shard3 r:core_node26
> x:prod_us-east-1_here_account_shard3_replica3] o.a.s.c.SolrCore
> org.apache.solr.common.SolrException: ClusterState says we are the leader
> (http://10.255.6.196:8983/solr/prod_us-east-1_here_account_shard3_replica3),
> but locally we don't think so. Request came from null
>
> How can we recover from this (for Solr 5.3.1)?
> Is there someway to force a new leader (I know the following feature
> exists but in 5.4.0 https://issues.apache.org/jira/browse/SOLR-7569)
>
> Thanks!
>
> -Frank
>
> [image: Description: Macintosh
> HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PDF:HERE_Logo_2016_POS_sRGB.pdf]
>
>
>
> *Frank Kelly*
>
> *Principal Software Engineer*
>
>
>
> HERE
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32" W*
>
>
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_360.gif]
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Twitter.gif]
>    [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_FB.gif]
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_IN.gif]
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Insta.gif]
> 
>


Re: Debug logging in Maven project

2017-01-10 Thread Pushkar Raste
Seems like you have enabled only console appender. I remember there was a
changed made to disable console appender if Solr is started in background
mode.

On Jan 10, 2017 5:55 AM, "Markus Jelsma"  wrote:

> Hello,
>
> I used to enable debug logging in my Maven project's unit tests by just
> setting log4j's global level to DEBUG, very handy, especially in debugging
> some Solr Cloud start up issues. Since a while, not sure to long, i don't
> seem to be able to get any logging at all. This project depends on 6.3.
> Anyone here that can tell me how to get something so simple but so helpful
> back to work?
>
> Many thanks,
> Markus
>
> $ cat src/test/resources/log4j.properties
> log4j.rootLogger=debug,info,stdout
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n
>
>


Re: Very long young generation stop the world GC pause

2016-12-21 Thread Pushkar Raste
You should probably have as small a swap as possible. I still feel long GCs
are either due to swapping or thread contention.

Did you try to remove all other G1GC tuning parameters except for the
ParallelRefProcEnabled?

On Dec 19, 2016 1:39 AM, "forest_soup"  wrote:

> Sorry for my wrong memory. The swap is 16GB.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Very-long-young-generation-stop-the-world-GC-
> pause-tp4308911p4310301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Separating Search and Indexing in SolrCloud

2016-12-16 Thread Pushkar Raste
This kind of separation is not supported yet.  There however some work
going on,  you can read about it on
https://issues.apache.org/jira/browse/SOLR-9835

This unfortunately would not support soft commits and hence would not be a
good solution for near real time indexing.

On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski"  wrote:

> Sorry, not what I meant.
>
> Leader is responsible for distributing update requests to replica. So
> eventually all replicas have same state as leader. Not a problem.
>
> It is more about the performance of such. If I gather correctly normal
> replication happens by standard update request. Not by, say, segment copy.
>
> Which means update on leader is as "expensive" as on replica.
>
> Hence, if my understanding is correct, sending search request to replica
> only, in index heavy environment, would bring no benefit.
>
> So the question is: is there a mechanism, in SolrCloud (not legacy
> master/slave set-up) to make one node take a load of indexing which
> other nodes focus on searching.
>
> This is not a question of SolrClient cause that is clear how to direct
> search request to specific nodes. This is more about index optimization
> so that certain nodes (ie. replicas) could suffer less due to high
> volume indexing while serving search requests.
>
>
>
>
> On 16/12/16 12:35, Dorian Hoxha wrote:
> > The leader is the source of truth. You expect to make the replica the
> > source of truth or something???Doesn't make sense?
> > What people do, is send write to leader/master and reads to
> replicas/slaves
> > in other solr/other-dbs.
> >
> > On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski  >
> > wrote:
> >
> >> Hi all,
> >>
> >> According to documentation, in normal operation (not recovery) in Solr
> >> Cloud configuration the leader sends updates it receives to all the
> >> replicas.
> >>
> >> This means and all nodes in the shard perform same effort to index
> >> single document. Correct?
> >>
> >> Is there then a benefit to *not* to send search requests to leader, but
> >> only to replicas?
> >>
> >> Given index & search heavy Solr Cloud system, is it possible to separate
> >> search from indexing nodes?
> >>
> >>
> >> RE: Solr 5.5.0
> >>
> >> --
> >> Jaroslaw Rozanski | e: m...@jarekrozanski.com
> >> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
> >>
> >>
> >
>
> --
> Jaroslaw Rozanski | e: m...@jarekrozanski.com
> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>
>


Re: Distribution Packages

2016-12-12 Thread Pushkar Raste
We use jdeb maven plugin to build the debian packages, we use it for Solr
as well

On Dec 12, 2016 9:03 AM, "Adjamilton Junior"  wrote:

> Hi folks,
>
> I am new here and I wonder to know why there's no Solr 6.x packages for
> ubuntu/debian?
>
> Thank you.
>
> Adjamilton Junior
>


Re: Very long young generation stop the world GC pause

2016-12-09 Thread Pushkar Raste
My guess is system time is high either due  to lock contention (too many
parallel threads) or page faults.

Heap size was less than 6gb when this long pause occurred, and and young
generation was less than 2gb. Though lowering heap size would help I don't
think that is the root cause here

On Dec 9, 2016 3:02 AM, "Ere Maijala" <ere.maij...@helsinki.fi> wrote:

> Then again, if the load characteristics on the Solr instance differ e.g.
> by time of day, G1GC, in my experience, may have trouble adapting. For
> instance if your query load reduces drastically during the night, it may
> take a while for G1GC to catch up in the morning. What I've found useful
> from experience, and your mileage will probably vary, is to limit the young
> generation size with a large heap. With Xmx31G something like these could
> work:
>
> -XX:+UnlockExperimentalVMOptions \
> -XX:G1MaxNewSizePercent=5 \
>
> The aim here is to only limit the maximum and still allow some adaptation.
>
> --Ere
>
> 8.12.2016, 16.07, Pushkar Raste kirjoitti:
>
>> Disable all the G1GC tuning your are doing except for
>> ParallelRefProcEnabled
>>
>> G1GC is an adaptive algorithm and would keep tuning to reach the default
>> pause goal of 250ms which should be good for most of the applications.
>>
>> Can you also tell us how much RAM you have on your machine and if you have
>> swap enabled and being used?
>>
>> On Dec 8, 2016 8:53 AM, "forest_soup" <tanglin0...@gmail.com> wrote:
>>
>> Besides, will those JVM options make it better?
>>> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10
>>>
>>>
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Very-long-young-generation-stop-the-world-GC-
>>> pause-tp4308911p4308937.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>


Re: Very long young generation stop the world GC pause

2016-12-08 Thread Pushkar Raste
Disable all the G1GC tuning your are doing except for ParallelRefProcEnabled

G1GC is an adaptive algorithm and would keep tuning to reach the default
pause goal of 250ms which should be good for most of the applications.

Can you also tell us how much RAM you have on your machine and if you have
swap enabled and being used?

On Dec 8, 2016 8:53 AM, "forest_soup"  wrote:

> Besides, will those JVM options make it better?
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Very-long-young-generation-stop-the-world-GC-
> pause-tp4308911p4308937.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: AW: AW: Resync after restart

2016-11-25 Thread Pushkar Raste
Did you index any documents while node was being restarted? There was a
issue introduced due to IndexFingerprint comparison. Check SOLR-9310. I am
not sure if fix made it to Solr6.2

On Nov 25, 2016 3:51 AM, "Arkadi Colson"  wrote:

> I am using SolrCloud on version 6.2.1. I will upgrade to 6.3.0 next week.
>
> This is the current config for numVersionBuckets:
>
> 
>   ${solr.ulog.dir:}
>   ${solr.ulog.numVersionBuckets:65536
> }
> 
>
> Are you saying that I should not use the config below on SolrCloud?
>
>   
> 
>   18.75
>   05:00:00
>   15
>   30
> 
>   
>
> Br,
> Arkadi
>
>
> On 24-11-16 17:46, Erick Erickson wrote:
>
>> Hold on. Are you using SolrCloud or not? There is a lot of talk here
>> about masters and slaves, then you say "I always add slaves with the
>> collection API", collections are a SolrCloud construct.
>>
>> It sounds like you're mixing the two. You should _not_ configure
>> master/slave replication parameters with SolrCloud. Take a look at the
>> sample configs
>>
>> And you haven't told us what version of Solr you're using, we can
>> infer a relatively recent one because of the high number you have for
>> numVersionBuckets, but that's guessing.
>>
>> If you are _not_ in SolrCloud, then maybe:
>> https://issues.apache.org/jira/browse/SOLR-9036 is relevant.
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 24, 2016 at 3:10 AM, Arkadi Colson 
>> wrote:
>>
>>> This is the code from the master node. Al configs are the same on all
>>> nodes.
>>> I always add slaves with the collection API. Is there an other place to
>>> look
>>> for this part of the config?
>>>
>>>
>>>
>>> On 24-11-16 12:02, Michael Aleythe, Sternwald wrote:
>>>
 You need to change this on the master node. The part of the config you
 pasted here, looks like it is from the slave node.

 -Ursprüngliche Nachricht-
 Von: Arkadi Colson [mailto:ark...@smartbit.be]
 Gesendet: Donnerstag, 24. November 2016 11:56
 An: solr-user@lucene.apache.org
 Betreff: Re: AW: Resync after restart

 Hi Michael

 Thanks for the quick response! The line does not exist in my config. So
 can I assume that the default configuration is to not replicate at
 startup?

  

  18.75
  05:00:00
  15
  30

  

 Any other idea's?


 On 24-11-16 11:49, Michael Aleythe, Sternwald wrote:

> Hi Arkadi,
>
> you need to remove the line "startup"
> from your ReplicationHandler-config in solrconfig.xml ->
> https://wiki.apache.org/solr/SolrReplication.
>
> Greetings
> Michael
>
> -Ursprüngliche Nachricht-
> Von: Arkadi Colson [mailto:ark...@smartbit.be]
> Gesendet: Donnerstag, 24. November 2016 09:26
> An: solr-user 
> Betreff: Resync after restart
>
> Hi
>
> Almost every time when restarting a solr instance the index is
> replicated
> completely. Is there a way to avoid this somehow? The index currently
> has a
> size of about 17GB.
> Some advice here would be great.
>
> 99% of the config is defaul:
>
>  ${solr.ulog.dir:}  name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
>  
>${solr.autoCommit.maxTime:15000}
>false
>  
>
> If you need more info, just let me know...
>
> Thx!
> Arkadi
>
>
>


Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field

2016-11-21 Thread Pushkar Raste
Did you turn on/off docValues on a already existing field?

On Nov 16, 2016 11:51 AM, "Jaco de Vroed"  wrote:

> Hi,
>
> I made a typo. The Solr version number in which this error occurs is 5.5.3.
> I also checked 6.3.0, same problem.
>
> Thanks, bye,
>
> Jaco.
>
> On 16 November 2016 at 17:39, Jaco de Vroed  wrote:
>
> > Hello Solr users,
> >
> > I’m running into an error situation using Solr 5.3.3. The case is as
> > follows. In my schema, I have a field with a definition like this:
> >
> >  > positionIncrementGap="0”/>
> > ….
> >  > docValues="true" />
> >
> > That field is used in function queries for boosting purposes, using the
> > rord() function. We’re coming from Solr 4, not using docValues for that
> > field, and now moving to Solr 5, using docValues. Now, this is causing a
> > problem. When doing this:
> >
> > http://localhost:8983/solr/core1/select?q=*:*=ID,
> > recip(rord(PublicationDate),0.15,300,10)
> >
> > The following error is given: "*unexpected docvalues type NUMERIC for
> > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
> > UninvertingReader or index with docvalues*” (full stack trace below).
> >
> > This does not happen when the field is changed to be multiValued, but I
> > don’t want to change that at this point (and I noticed that changing from
> > single valued to multivalued, then attempting to post the document again
> > also results in an error related to docvalues type, but that could be the
> > topic of another mail I guess). This is now blocking our long desired
> > upgrade to Solr 5. We initially tried upgrading without docValues, but
> > performance was completely killed because of our function query based
> > ranking stuff, so we decide to use docValues.
> >
> > To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, the
> > exact same error is in https://issues.apache.org/jira/browse/SOLR-7495,
> > but that is a different case.
> >
> > I can create a JIRA issue for this of course, but first wanted to throw
> > this at the mailing list to see if there’s any insights that can be
> shared.
> >
> > Thanks a lot in advance, bye,
> >
> > Jaco..
> >
> > unexpected docvalues type NUMERIC for field 'PublicationDate' (expected
> > one of [SORTED, SORTED_SET]). Use UninvertingReader or index with
> docvalues.
> > java.lang.IllegalStateException: unexpected docvalues type NUMERIC for
> > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use
> > UninvertingReader or index with docvalues.
> > at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
> > at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306)
> > at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(
> > ReverseOrdFieldSource.java:98)
> > at org.apache.lucene.queries.function.valuesource.
> ReciprocalFloatFunction.
> > getValues(ReciprocalFloatFunction.java:64)
> > at org.apache.solr.response.transform.ValueSourceAugmenter.transform(
> > ValueSourceAugmenter.java:95)
> > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160)
> > at org.apache.solr.response.TextResponseWriter.writeDocuments(
> > TextResponseWriter.java:246)
> > at org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:151)
> > at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:113)
> > at org.apache.solr.response.XMLResponseWriter.write(
> > XMLResponseWriter.java:39)
> > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:52)
> > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrCall.java:728)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:257)
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:208)
> > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1652)
> > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHandler.java:585)
> > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:577)
> > at org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:223)
> > at org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1127)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHandler.java:515)
> > at org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1061)
> > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:215)
> > at 

Re: Merge policy

2016-10-27 Thread Pushkar Raste
Try commit with expungeDeletes="true"

I am not sure if it will merge old segments that have deleted documents.

In the worst case you can 'optimize' your index which should take care of
removing deleted document

On Oct 27, 2016 4:20 AM, "Arkadi Colson"  wrote:

> Hi
>
> As you can see in the screenshot above in the oldest segments there are a
> lot of deletions. In total the shard has about 26% deletions. How can I get
> rid of them so the index will be smaller again?
> Can this only be done with an optimize or does it also depend on the merge
> policy? If it also depends also on the merge policy which one should I
> choose then?
>
> Thanks!
>
> BR,
> Arkadi
>


Re: Solr Cloud A/B Deployment Issue

2016-10-26 Thread Pushkar Raste
Nodes will still go into recovery but only for a short duration.

On Oct 26, 2016 1:26 PM, "jimtronic"  wrote:

It appears this has all been resolved by the following ticket:

https://issues.apache.org/jira/browse/SOLR-9446

My scenario fails in 6.2.1, but works in 6.3 and Master where this bug has
been fixed.

In the meantime, we can use our workaround to issue a simple delete command
that deletes a non-existent document.

Jim



--
View this message in context: http://lucene.472066.n3.
nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4303210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud A/B Deployment Issue

2016-10-26 Thread Pushkar Raste
This is due to leader initiated recovery. When Take a look at

https://issues.apache.org/jira/browse/SOLR-9446

On Oct 24, 2016 1:23 PM, "jimtronic"  wrote:

> We are running into a timing issue when trying to do a scripted deployment
> of
> our Solr Cloud cluster.
>
> Scenario to reproduce (sometimes):
>
> 1. launch 3 clean solr nodes connected to zookeeper.
> 2. create a 1 shard collection with replicas on each node.
> 3. load data (more will make the problem worse)
> 4. launch 3 more nodes
> 5. add replicas to each new node
> 6. once entire cluster is healthy, start killing first three nodes.
>
> Depending on the timing, the second three nodes end up all in RECOVERING
> state without a leader.
>
> This appears to be happening because when the first leader dies, all the
> new
> nodes go into full replication recovery and if all the old boxes happen to
> die during that state, the boxes are stuck. The boxes cannot serve requests
> and they eventually (1-8 hours) go into RECOVERY_FAILED state.
>
> This state is easy to fix with a FORCELEADER call to the collections API,
> but that's only remediation, not prevention.
>
> My question is this: Why do the new nodes have to go into full replication
> recovery when they are already up to date? I just added the replica, so it
> shouldn't have to a new full replication again.
>
> Jim
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: OOM Error

2016-10-25 Thread Pushkar Raste
You should look into using docValues.  docValues are stored off heap and
hence you would be better off than just bumping up the heap.

Don't enable docValues on existing fields unless you plan to reindex data
from scratch.

On Oct 25, 2016 3:04 PM, "Susheel Kumar"  wrote:

> Thanks, Toke.  Analyzing GC logs helped to determine that it was a sudden
> death.  The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9
>
> Will look into the queries more closer and also adjusting the cache sizing.
>
>
> Thanks,
> Susheel
>
> On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen 
> wrote:
>
> > On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
> > > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> > > today. So far our solr cluster has been running fine but suddenly
> > > today many of the VM's Solr instance got killed.
> >
> > As you have the GC-logs, you should be able to determine if it was a
> > slow death (e.g. caches gradually being filled) or a sudden one (e.g.
> > grouping or faceting on a large new non-DocValued field).
> >
> > Try plotting the GC logs with time on the x-axis and free memory after
> > GC on the y-axis. It it happens to be a sudden death, the last lines in
> > solr.log might hold a clue after all.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
>


Re: OOM Error

2016-10-24 Thread Pushkar Raste
Did you look into the heap dump ?

On Mon, Oct 24, 2016 at 6:27 PM, Susheel Kumar 
wrote:

> Hello,
>
> I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> today. So far our solr cluster has been running fine but suddenly today
> many of the VM's Solr instance got killed. I had 8G of heap allocated on 64
> GB machines with 20+ GB of index size on each shards.
>
> What could be looked to find the exact root cause. I am suspecting of any
> query (wildcard prefix query etc.) might have caused this issue.  The
> ingestion and query load looks normal as other days.  I have the solr GC
> logs as well.
>
> Thanks,
> Susheel
>


Re: disable updates during startup

2016-10-23 Thread Pushkar Raste
The reason node is in recovery for long time could be related to
https://issues.apache.org/jira/browse/SOLR-9310

On Tue, Oct 4, 2016 at 9:14 PM, Rallavagu  wrote:

> Solr Cloud 5.4.1 with embedded Jetty - jdk 8
>
> Is there a way to disable incoming updates (from leader) during startup
> until "firstSearcher" queries finished? I am noticing that firstSearcher
> queries keep on running at the time of startup and node shows up as
> "Recovering".
>
> Thanks
>


Re: group.facet fails when facet on double field

2016-10-23 Thread Pushkar Raste
 This error is thrown when you add (or remove) on an existing field but do
not reindex you data from scratch. It is result of removing field cache
from Lucene. Although you were not getting error with Solr 4.8, I am pretty
sure that you were getting incorrect results.

Stand up a small test cluster with Solr 6.2.X and index a few documents in
it and  try your group.facet query, it would definitely work.

On Thu, Oct 20, 2016 at 9:18 AM, karel braeckman 
wrote:

> Hi,
>
> We are trying to upgrade from Solr 4.8 to Solr 6.2.
>
> This query:
>
> ?q=*%3A*=0=2=json=true=true&
> group.field=mediaObjectId=true=rating=true
>
> is returning the following error:
>
> null:org.apache.solr.common.SolrException: Exception during facet.field:
> rating
> at org.apache.solr.request.SimpleFacets.lambda$
> getFacetFieldCounts$0(SimpleFacets.java:739)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.solr.request.SimpleFacets$2.execute(
> SimpleFacets.java:672)
> ...
> Caused by: java.lang.IllegalStateException: unexpected docvalues type
> NUMERIC for field 'mediaObjectId' (expected=SORTED). Re-index with
> correct docvalues type.
> at org.apache.lucene.index.DocValues.checkField(
> DocValues.java:212)
> at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
> at org.apache.lucene.search.grouping.term.
> TermGroupFacetCollector$SV.doSetNextReader(TermGroupFacetCollector.java:
> 128)
> ...
>
>
> The same query without the group.facet=true option does not give an
> error. On Solr 4.8 the query did not give problems.
>
>
> The relevant fields are configured as follows:
>
>
>  precisionStep="0" positionIncrementGap="0"/> type="double" indexed="true" stored="true" multiValued="false"
> /> multiValued="false" />
>
> Am I doing anything wrong, or do you have any suggestions on what to try
> next?
>
>
> Best regards
>
> Karel Braeckman
>


Re: Zookeeper connection issues

2016-10-10 Thread Pushkar Raste
If Solr has GC pauses greater than 15 seconds, zookeeper is going to assume
node is down and hence would send it into recovery when node comes out of a
GC pause and reconnects to zookeeper.

You should look into keeping GC pause as short as possible.

Using G1GC with ParallelRefProcEnabled has helped me a lot, but you may
want to come to settings that work best for using trial and error.

If you are not using MMapDirectory try switching to it. It helps by keeping
index off the heap.

Also check if issue SOLR-9310 affects you, since your replicas are going
into recovery, I am afraid it is doing full index copy than just picking up
the delta.

On Oct 10, 2016 11:43 AM, "philippa griggs" 
wrote:

> Hello,
>
>
> Solr Set up
>
>
> Solr 5.4.1, Zookeeper 3.4.6   (5 zookeeper ensemble)
>
>
> We have one collection which has multiple shards (two shards for each
> week). Each shard has a leader and a replica. We only write to the latest
> week- two shards (four cores) which we refer to a ‘hot cores’.   The rest,
> ‘cold cores’ are for queries. We have multiple solr processes running on an
> instance- currently 5 each with a 15Gb Heap (there is 122G available
> memory). As the index grows as the week goes on the heap size starts low
> and increase to around 9/10Gb. The index size on each core ends up around 8
> million docs, 6.5Gb which are stored on 40Gb drives. The zookeeper timeout
> is 60Secs.
>
>
> The issue:
>
>
> We are experiencing issues with connectivity  and have started seeing
> errors messages about being unable to connect to zookeeper. Most of the
> time solr recovers itself after a while but we are seeing these ‘blips’
> more and more often with the last ‘blip’ ending up with manually restarting
> the hot cores. So far this has only been seen on one shard at a time. All
> other shards in the cluster don’t have an issue.
>
>
> There is nothing in the zookeeper log. Below are the solr logs for the
> last ‘blip’.
>
>
> I’ve looked at the heap size and its not hitting 15Gb (max around 11Gb).
> At around the time of the blip the GC is 40sec, which is not over the
> timeout but is however much larger than we normally see.
>
>
> These blips are happening towards the end of the week when the index size
> gets larger.
>
>
> I’m not sure what is going on, is this a zookeeper issue or solr? What
> would be causing solr to lose connection with zookeeper if it’s not the
> timeout? We have checked the network and it doesn’t indicate a network
> issue.
>
>
> Any suggests would be useful.
>
>
>
> Error Logs for core A
>
>
> 2016-10-08 18:45:36.617 WARN  (qtp697960108-32664) [c:xxx s:20161003_A
> r:20161003_A54130 x:xxx] o.e.j.h.HttpParser badMessage: 
> java.lang.IllegalStateException:
> too much data after closed for HttpChannelOverHttp@1b87c2cf{
> r=1370,c=false,a=IDLE,uri=-}
>
> 2016-10-08 18:45:36.717 WARN  
> (updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr
> x:xxx s:20161003_A c:xxx r:20161003_A54130) [c:xxx s:20161003_A
> r:20161003_A54130 x:xxx] o.a.s.c.ZkController Unable to read
> /collections/xxx/leader_initiated_recovery/20161003_A/20161003_A54130 due
> to: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /collections/xxx/leader_
> initiated_recovery/20161003_A/20161003_A54130
>
> 2016-10-08 18:45:44.907 ERROR 
> (updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr
> x:xxx s:20161003_A c:xxx r:20161003_A54130) [c:xxx s:20161003_A
> r:20161003_A54130 x:xxx] o.a.s.u.PeerSync PeerSync: core=xxx url=
> http://x.x.x.x:8987/solr ERROR, update log not in ACTIVE or REPLAY state.
> FSUpdateLog{state=BUFFERING, 
> tlog=tlog{file=/solrLog_8987/tlog/tlog.0011469
> refcount=1}}
>
> 2016-10-08 18:45:44.908 WARN  
> (updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr
> x:xxx s:20161003_A c:xxx r:20161003_A54130) [c:xxx s:20161003_A
> r:20161003_A54130 x:xxx] o.a.s.u.PeerSync PeerSync: core=xxx url=
> http://x.x.x.x:8987/solr too many updates received since start -
> startingUpdates no longer overlaps with our currentUpdates
>
> 2016-10-08 18:47:25.772 WARN  
> (updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr
> x:xxx s:20161003_A c:xxx r:20161003_A54130) [c:xxx s:20161003_A
> r:20161003_A54130 x:xxx] o.a.s.h.IndexFetcher File _1ftq.si did not
> match. expected checksum is 4254234714 and actual is checksum 2090625558.
> expected length is 422 and actual length is 422
>
> 2016-10-08 18:47:26.286 WARN  (zkCallback-3-thread-76-
> processing-n:x.x.x.x:8987_solr-EventThread) [   ]
> o.a.s.c.RecoveryStrategy Stopping recovery for core=xxx
> coreNodeName=20161003_A54130
>
> 2016-10-08 18:47:54.935 WARN  
> (updateExecutor-2-thread-8523-processing-n:x.x.x.x:8987_solr
> x:xxx s:20161003_A c:xxx r:20161003_A54130) [c:xxx s:20161003_A
> r:20161003_A54130 x:xxx] o.a.s.h.IndexFetcher File _1ftq.si did not
> match. expected checksum is 4254234714 and actual is checksum 2090625558.
> expected length is 

Re: solr perf metric for the last 1 hour (or last 10 mn)

2016-10-10 Thread Pushkar Raste
Hi Dominique,
Unfortunately Solr doesn't support metrics you are interested in. You can
however have another process that makes jmx queries on the solr process, do
required transformation and store data in some kind of data store.

Just make sure you are not DDOSing your Solr instances :-)

On Oct 10, 2016 11:58 AM, "Dominique De Vito"  wrote:

Hi,

It looks like the Solr metric "avgTimePerRequest" is computed with requests
from t0 (startup time).

If so, it's quite useless, for example, for detecting a surge in latency
within the last 10 mn for example.

Is my understanding correct ?

If so, is there a way
(1) to configure Solr to compute all its metrics per period of time (let's
say every 10 mn)
or
(2) to reset metrics through some (?) call
?

Thanks.

Dominique


Re: [Solr-5-4-1] Why SolrCloud leader is putting all replicas in recovery at the same time ?

2016-10-06 Thread Pushkar Raste
A couple of questions/suggestions
- This normally happens after leader election, when new leader gets
elected, it will force all the nodes to sync with itself.
Check logs to see when this happens, if leader was changed. If that is true
then you will have to investigate why leader change takes place.
I suspect leader goes into long enough GC pause that makes zookeeper leader
is no longer available and initiates leader election.

- What version of Solr you are using.  SOLR-8586
 introduced
IndexFingerprint check, unfortunately it was broken and hence replica would
always do full index replication. Issue is now fixed in SOLR-9310
, this should help
replicas recover faster.

- You should also increase ulog log size (default threshold is 100 docs or
10 tlogs whichever is hit first). This will again help replicas recover
faster from tlogs (of course, there would be a threshold after which
recovering from tlog would in fact take longer than copying over all the
index files from leader)


On Thu, Oct 6, 2016 at 5:23 AM, Gerald Reinhart 
wrote:

>
> Hello everyone,
>
> Our Solr Cloud  works very well for several months without any
> significant changes: the traffic to serve is stable, no major release
> deployed...
>
> But randomly, the Solr Cloud leader puts all the replicas in recovery
> at the same time for no obvious reason.
>
> Hence, we can not serve the queries any more and the leader is
> overloaded while replicating all the indexes on the replicas at the same
> time which eventually implies a downtime of approximately 30 minutes.
>
> Is there a way to prevent it ? Ideally, a configuration saying a
> percentage of replicas to be put in recovery at the same time?
>
> Thanks,
>
> Gérald, Elodie and Ludovic
>
>
> --
> [image: Kelkoo]
>
> *Gérald Reinhart *Software Engineer
>
> *E*  
> gerald.reinh...@kelkoo.com*Y!Messenger* gerald.reinhart
> *T* +33 (0)4 56 09 07 41
> *A* Parc Sud Galaxie - Le Calypso, 6 rue des Méridiens, 38130 Echirolles
>
>
>
> --
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 158 Ter Rue du Temple 75003 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>


Re: Queries to help warm up (mmap)

2016-10-06 Thread Pushkar Raste
One of the tricks I had read somewhere was to cat all files in the index
directory and OS will have file in the disk cache.

On Thu, Oct 6, 2016 at 11:55 AM, Rallavagu  wrote:

> Looking for clues/recommendations to help warm up during startup. Not
> necessarily Solr caches but mmap as well. I have used some like "q= name>:[* TO *]" for various fields and it seems to help with mmap
> population around 40-50%. Is there anything else that could help achieve
> 90% or more? Thanks.
>


RE: how to sampling search result

2016-09-28 Thread Pushkar Raste
Purely of algorithmic point of view - look into reservoir sampling for
unbiased sampling.

On Sep 28, 2016 11:00 AM, "Yongtao Liu"  wrote:

Alexandre,

Thanks for reply.
The use case is customer want to review document based on search result.
But they do not want to review all, since it is costly.
So, they want to pick partial (from 1% to 100%) document to review.
For statistics, user also ask this function.
It is kind of common requirement
Do you know any plan to implement this feature in future?

Post filter should work. Like collapsing query parser.

Thanks,
Yongtao
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Tuesday, September 27, 2016 9:25 PM
To: solr-user
Subject: Re: how to sampling search result

I am not sure I understand what the business case is. However, you might be
able to do something with a custom post-filter.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 27 September 2016 at 22:29, Yongtao Liu  wrote:
> Mikhail,
>
> Thanks for your reply.
>
> Random field is based on index time.
> We want to do sampling based on search result.
>
> Like if the random field has value 1 - 100.
> And the query touched documents may all in range 90 - 100.
> So random field will not help.
>
> Is it possible we can sampling based on search result?
>
> Thanks,
> Yongtao
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Tuesday, September 27, 2016 11:16 AM
> To: solr-user
> Subject: Re: how to sampling search result
>
> Perhaps, you can apply a filter on random field.
>
> On Tue, Sep 27, 2016 at 5:57 PM, googoo  wrote:
>
>> Hi,
>>
>> Is it possible I can sampling based on  "search result"?
>> Like run query first, and search result return 1 million documents.
>> With random sampling, 50% (500K) documents return for facet, and stats.
>>
>> The sampling need based on "search result".
>>
>> Thanks,
>> Yongtao
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/how-to-sampling-search-result-tp4298269.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: Whether SolrCloud can support 2 TB data?

2016-09-23 Thread Pushkar Raste
Solr is RAM hungry. Make sure that you have enough RAM to have most if the
index of a core in the RAM itself.

You should also consider using really good SSDs.

That would be a good start. Like others said, test and verify your setup.

--Pushkar Raste

On Sep 23, 2016 4:58 PM, "Jeffery Yuan" <yuanyun...@gmail.com> wrote:

Thanks so much for your prompt reply.

We are definitely going to use SolrCloud.

I am just wondering whether SolrCloud can scale even at TB data level and
what kind of hardware configuration it should be.

Thanks.



--
View this message in context: http://lucene.472066.n3.
nabble.com/Whether-solr-can-support-2-TB-data-tp4297790p4297800.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Create collection PeerSync "no frame of reference" warnings

2016-09-20 Thread Pushkar Raste
If you are creating a collection these warnings are harmless. There is
patch being worked on under SOLR-9446 (although for a different scen) it
would help suppressing this error.


RE: Facetting on a field doesn't work, until i optimized the index

2016-09-15 Thread Pushkar Raste
Markus,
Can you pick up one of the values in the facets and try running query using
that. Ideally numFound should match the facet count. If those aren't
matching I guess your index is still sorta damaged but you aren't really
noticing it.

On Sep 15, 2016 4:44 AM, "Markus Jelsma"  wrote:

> Mikhail - yes, there are results for the query. The set is empty as if
> there are no values for the field, {}.
>
> Everything checked out correctly but no facet results, until immediately
> after the optimize.
>
> Thanks,
> Markus
>
> -Original message-
> > From:Mikhail Khludnev 
> > Sent: Thursday 15th September 2016 10:12
> > To: solr-user 
> > Subject: Re: Facetting on a field doesn't work, until i optimized the
> index
> >
> > Markus,
> >
> > Could you spot more details about this particular field type? How does
> > 'empty set of facets' look like? Is there results in that query?
> >
> > On Wed, Sep 14, 2016 at 4:15 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello - we've just spotted the weirdest issue on Solr 6.1.
> > >
> > > We have a Solr index full of logs, new items are added every few
> minutes.
> > > We also have an application that shows charts based on what's in the
> index,
> > > Banana style.
> > >
> > > Yesterday we saw facets for a specific field were missing. Today we
> > > checked it out until we reduced the facet query just to
> > > facet=true=FIELD, but it returned nothing of use, just an
> empty
> > > set of facets.
> > >
> > > My colleague suggested the crazy idea to optimize the index, i
> protested
> > > because it is no use, numDoc always equals maxDoc and the optimize
> button
> > > was missing anyway. So i forced an optimize via the URL, and it
> worked, the
> > > facets for that field are now back!
> > >
> > > Any ideas? Is there a related ticket?
> > >
> > > Thanks,
> > > Markus
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


Re: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Pushkar Raste
Damn I didn't put comments in the ticket but replied to question " Is it
safe to upgrade an existing field to docvalues?" on the mailing list.

Check that out

On Sep 14, 2016 5:59 PM, "Pushkar Raste" <pushkar.ra...@gmail.com> wrote:

> We experienced exact opposite issue on Solr 4.10
>
> Check my comments in https://issues.apache.org/jira/browse/SOLR-9437
>
> I am not sure if issue was fixed in Solr 6
>
> I do be interested in tracking down patch for this.
>
> On Sep 14, 2016 3:04 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
>> Weird indeed. Optimize _shouldn't_ be necessary if the index was
>> rebuilt from scratch after changing something like DV, but in a mixed
>> set of segments I'm not sure what would happen. Perhaps one of the
>> Lucene folks can chime in?
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 14, 2016 at 9:22 AM, Markus Jelsma
>> <markus.jel...@openindex.io> wrote:
>> > Well, it could be that indeed. I know i enabled docValues on that field
>> three and a half months ago. But usually when i do that, i force an
>> optimize.
>> >
>> > On the other hand, i'd reckon that in the past few months, all segments
>> should have been merged with another one at least once because data keeps
>> streaming in. But i'm not sure it would anyway.
>> >
>> > Thanks,
>> > Markus
>> >
>> > -Original message-
>> >> From:Erick Erickson <erickerick...@gmail.com>
>> >> Sent: Wednesday 14th September 2016 17:22
>> >> To: solr-user <solr-user@lucene.apache.org>
>> >> Subject: Re: Facetting on a field doesn't work, until i optimized the
>> index
>> >>
>> >> That's strange
>> >>
>> >> Is there any chance that the schema changed? This is _really_ a shot
>> >> in the dark, but perhaps the optimize "normalized" the field
>> >> definitions stored with each segment.
>> >>
>> >> Imagine segments 1-5 have one definition, and segments 6-10 have a
>> >> different definition for your field. Optimize would have to resolve
>> >> this somehow, perhaps that process made the magic happen?
>> >>
>> >> NOTE: I'm not conversant with the internals of merge, so this may be
>> >> totally bogus..
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
>> >> <markus.jel...@openindex.io> wrote:
>> >> > Hello - we've just spotted the weirdest issue on Solr 6.1.
>> >> >
>> >> > We have a Solr index full of logs, new items are added every few
>> minutes. We also have an application that shows charts based on what's in
>> the index, Banana style.
>> >> >
>> >> > Yesterday we saw facets for a specific field were missing. Today we
>> checked it out until we reduced the facet query just to
>> facet=true=FIELD, but it returned nothing of use, just an empty
>> set of facets.
>> >> >
>> >> > My colleague suggested the crazy idea to optimize the index, i
>> protested because it is no use, numDoc always equals maxDoc and the
>> optimize button was missing anyway. So i forced an optimize via the URL,
>> and it worked, the facets for that field are now back!
>> >> >
>> >> > Any ideas? Is there a related ticket?
>> >> >
>> >> > Thanks,
>> >> > Markus
>> >>
>>
>


Re: Facetting on a field doesn't work, until i optimized the index

2016-09-14 Thread Pushkar Raste
We experienced exact opposite issue on Solr 4.10

Check my comments in https://issues.apache.org/jira/browse/SOLR-9437

I am not sure if issue was fixed in Solr 6

I do be interested in tracking down patch for this.

On Sep 14, 2016 3:04 PM, "Erick Erickson"  wrote:

> Weird indeed. Optimize _shouldn't_ be necessary if the index was
> rebuilt from scratch after changing something like DV, but in a mixed
> set of segments I'm not sure what would happen. Perhaps one of the
> Lucene folks can chime in?
>
> Best,
> Erick
>
> On Wed, Sep 14, 2016 at 9:22 AM, Markus Jelsma
>  wrote:
> > Well, it could be that indeed. I know i enabled docValues on that field
> three and a half months ago. But usually when i do that, i force an
> optimize.
> >
> > On the other hand, i'd reckon that in the past few months, all segments
> should have been merged with another one at least once because data keeps
> streaming in. But i'm not sure it would anyway.
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> >> From:Erick Erickson 
> >> Sent: Wednesday 14th September 2016 17:22
> >> To: solr-user 
> >> Subject: Re: Facetting on a field doesn't work, until i optimized the
> index
> >>
> >> That's strange
> >>
> >> Is there any chance that the schema changed? This is _really_ a shot
> >> in the dark, but perhaps the optimize "normalized" the field
> >> definitions stored with each segment.
> >>
> >> Imagine segments 1-5 have one definition, and segments 6-10 have a
> >> different definition for your field. Optimize would have to resolve
> >> this somehow, perhaps that process made the magic happen?
> >>
> >> NOTE: I'm not conversant with the internals of merge, so this may be
> >> totally bogus..
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Sep 14, 2016 at 6:15 AM, Markus Jelsma
> >>  wrote:
> >> > Hello - we've just spotted the weirdest issue on Solr 6.1.
> >> >
> >> > We have a Solr index full of logs, new items are added every few
> minutes. We also have an application that shows charts based on what's in
> the index, Banana style.
> >> >
> >> > Yesterday we saw facets for a specific field were missing. Today we
> checked it out until we reduced the facet query just to
> facet=true=FIELD, but it returned nothing of use, just an empty
> set of facets.
> >> >
> >> > My colleague suggested the crazy idea to optimize the index, i
> protested because it is no use, numDoc always equals maxDoc and the
> optimize button was missing anyway. So i forced an optimize via the URL,
> and it worked, the facets for that field are now back!
> >> >
> >> > Any ideas? Is there a related ticket?
> >> >
> >> > Thanks,
> >> > Markus
> >>
>


Re: Is it safe to upgrade an existing field to docvalues?

2016-09-02 Thread Pushkar Raste
Hi Ronald,
Turning on docValues for existing field works in Solr 4. As you mentioned
it will use un-inverting method if docValues are nit found on existing
document. This all works fine until segments that have documents without
docValues merge with segment that have docValues for the field. In the
merged segment documents from the old segment will be stored without
docValues however segment's metadata will indicate docValues are turned ON
for the field in question.

Now if you are sorting on the field those poor documents would seem out of
order and facet counts would be wrong as well.

Solr 5 doesn't throws exception if you have mixed case of docValues for a
field.

I think it is better to crate a copy field, reindex all of the data and
then switch over to use copy field

On Aug 25, 2016 9:21 AM, "Ronald Wood"  wrote:

> Alessandro, yes I can see how this could be conceived of as a more general
> problem; and yes useDocValues also strikes me as being unlike the other
> properties since it would only be used temporarily.
>
> We’ve actually had to migrate fields from one to another when changing
> types, along with awkward naming like ‘fieldName’ (int) to ‘fieldNameLong’.
> But I’m not sure how a change like that could actually be done in place.
>
> The point is stronger when it comes to term vectors etc. where data exists
> in separate files and switches in code control whether they are used or not.
>
> I guess where I would argue that docValues might be different is that so
> much new functionality depends on this that it might be worth treating it
> differently. Given that docValues now is on by default, I wonder if it will
> at some point be mandatory, in which case everyone would have to migrate to
> keep up with Solr version. (Of course, I don’t know what the general
> thinking is on this amongst the implementers.)
>
> Regardless, this change may be so important to us that we’d choose to
> branch the code on GitHub and apply the patch ourselves, use it while we
> transition, and then deploy an official build once we’re done. The
> difference in the level of effort between this approach and the
> alternatives would be too great. The risks of using a custom build for
> production would have to be weighed carefully, naturally.
>
> - Ronald S. Wood
>
>
> On 8/25/16, 06:49, "Alessandro Benedetti"  wrote:
>
> > switching is done in Solr on field.hasDocValues. The code would be
> amended
> > to (field.hasDocValues && field.useDocValues) throughout.
> >
>
> This is correct. Currently we use DocValues if they are available, and
> to
> check the availabilty we check the schema attribute.
> This can be problematic in the scenarios you described ( for example
> half
> the index has docValues for a field and the other half not yet ).
>
> Your proposal is interesting.
> Technically it should work and should allow transparent migration from
> not
> docValues to docValues.
> But it is a risky one, because we are decreasing the readability a bit
> (
> althought a user will specify the attribute only in special cases like
> yours) .
>
> The only problem I see is that the same discussion we had for docValues
> actually applies to all other invasive schema changes :
> 1) you change the field type
> 2) you enable or disable term vectors
> 3) you enable/disable term positions,offsets ect ect
>
> So basically this is actually a general problem, that probably would
> require a general re-think .
> So although  can be a quick fix that will work, I fear can open the
> road to
> messy configuration attributes.
>
> Cheers
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>
>
>


Re: commit it taking 1300 ms

2016-09-02 Thread Pushkar Raste
It would be worth looking into iostats of your disks.

On Aug 22, 2016 10:11 AM, "Alessandro Benedetti" 
wrote:

> I agree with the suggestions so far.
> The cache auto-warming doesn't seem the problem as the index is not massive
> and the auto-warm is for only 10 docs.
> Are you using any warming query for the new searcher ?
>
> Are you using soft or hard commit ?
> This can make the difference ( soft are much cheaper, not free but cheaper)
> .
> You said :
> " Actually earlier it was taking less but suddenly it has increased "
>
> What happened ?
> Anyway, there are a lot of questions to answer before we can help you...
>
> Cheers
>
> On Fri, Aug 12, 2016 at 4:58 AM, Esther-Melaine Quansah <
> esther.quan...@lucidworks.com> wrote:
>
> > Midas,
> >
> > I’d like further clarification as well. Are you sending commits along
> with
> > each document that you’re POSTing to Solr? If so, you’re essentially
> either
> > opening a new searcher or flushing to disk with each POST which could
> > explain latency between each request.
> >
> > Thanks,
> >
> > Esther
> > > On Aug 11, 2016, at 12:19 PM, Erick Erickson 
> > wrote:
> > >
> > > bq:  we post json documents through the curl it takes the time (same
> > time i
> > > would like to say that we are not hard committing ). that curl takes
> time
> > > i.e. 1.3 sec.
> > >
> > > OK, I'm really confused. _what_ is taking 1.3 seconds? When you said
> > > commit, I was thinking of Solr's commit operation, which is totally
> > distinct
> > > from just adding a doc to the index. But I read the above statement
> > > as you're saying it takes 1.3 seconds just to send a doc to Solr.
> > >
> > > Let's see the exact curl command you're using please?
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Thu, Aug 11, 2016 at 5:32 AM, Emir Arnautovic
> > >  wrote:
> > >> Hi Midas,
> > >>
> > >> 1. How many indexing threads?
> > >> 2. Do you batch documents and what is your batch size?
> > >> 3. How frequently do you commit?
> > >>
> > >> I would recommend:
> > >> 1. Move commits to Solr (set auto soft commit to max allowed time)
> > >> 2. Use batches (bulks)
> > >> 3. tune bulk size and number of threads to achieve max performance.
> > >>
> > >> Thanks,
> > >> Emir
> > >>
> > >>
> > >>
> > >> On 11.08.2016 08:21, Midas A wrote:
> > >>>
> > >>> Emir,
> > >>>
> > >>> other queries:
> > >>>
> > >>> a) Solr cloud : NO
> > >>> b)  > >>> size="5000" initialSize="5000" autowarmCount="10"/>
> > >>> c)   > >>> size="1000" initialSize="1000" autowarmCount="10"/>
> > >>> d)  > >>> size="1000" initialSize="1000" autowarmCount="10"/>
> > >>> e) we are using multi threaded system.
> > >>>
> > >>> On Thu, Aug 11, 2016 at 11:48 AM, Midas A 
> > wrote:
> > >>>
> >  Emir,
> > 
> >  we post json documents through the curl it takes the time (same
> time i
> >  would like to say that we are not hard committing ). that curl takes
> > time
> >  i.e. 1.3 sec.
> > 
> >  On Wed, Aug 10, 2016 at 2:29 PM, Emir Arnautovic <
> >  emir.arnauto...@sematext.com> wrote:
> > 
> > > Hi Midas,
> > >
> > > According to your autocommit configuration and your worry about
> > commit
> > > time I assume that you are doing explicit commits from client code
> > and
> > > that
> > > 1.3s is client observed commit time. If that is the case, than it
> > might
> > > be
> > > opening searcher that is taking time.
> > >
> > > How do you index data - single threaded or multithreaded? How
> > frequently
> > > do you commit from client? Can you let Solr do soft commits instead
> > of
> > > explicitly committing? Do you have warmup queries? Is this
> SolrCloud?
> > > What
> > > is number of servers (what spec), shards, docs?
> > >
> > > In any case monitoring can give you more info about server/Solr
> > behavior
> > > and help you diagnose issues more easily/precisely. One such
> > monitoring
> > > tool is our SPM .
> > >
> > > Regards,
> > > Emir
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > > On 10.08.2016 05:20, Midas A wrote:
> > >
> > >> Thanks for replying
> > >>
> > >> index size:9GB
> > >> 2000 docs/sec.
> > >>
> > >> Actually earlier it was taking less but suddenly it has increased
> .
> > >>
> > >> Currently we do not have any monitoring  tool.
> > >>
> > >> On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
> > >> emir.arnauto...@sematext.com> wrote:
> > >>
> > >> Hi Midas,
> > >>>
> > >>> Can you give us more details on your index: size, number of new
> > docs
> > >>> between commits. Why do you think 1.3s for commit is to much and
> > why
> > >>> do
> > >>> you
> > 

Fwd: About SOLR-9310

2016-07-20 Thread Pushkar Raste
-- Forwarded message --
From: "Pushkar Raste" <pushkar.ra...@gmail.com>
Date: Jul 20, 2016 11:08 AM
Subject: About SOLR-9310
To: <d...@lucene.apache.org>
Cc:

Hi,
https://issues.apache.org/jira/browse/SOLR-9310

PeerSync replication in SOLR seems to be completely broken since
fingerprint check was introduced. (or may be some other change made later,
I have not gone through git bisect process, to pin down commit that caused
the issue ).

I have documented my observation and have attached patch for fix and tests
to the ticket.


Can someone take a look at the issue. Let me know if I am missing something
here. Since this is a pretty big issue, I am surprised that no one has
noticed it yet and have not had much comments on the JIRA either.


Re: SolrCloud: Frequent "No registered leader was found" errors

2015-12-22 Thread Pushkar Raste
If you have GC logs, check if you have long GC pauses that make zookeeper
think that node(s) are going down. If this is the cases then your nodes are
going into recovery and and based on your settings in  in
solr.xml you may end up in situation when no nodes gets promoted to be a
leader.



On 22 December 2015 at 08:46, Bram Van Dam  wrote:

> Hi folks,
>
> Been doing some SolrCloud testing and I've been experiencing some
> problems. I'll try to be relatively brief, but feel free to ask for
> additional information.
>
> I've added about 200 million documents to a SolrCloud. The cloud
> contains 3 collections, and all documents were added to all three
> collections.
>
> While indexing these documents, we noticed 486k (!!) "No registered
> leader was found"-errors. 482k (!!) of which referred to the same shard.
> The other shards are or more or less evenly distributed in the log.
>
> This indexing job has been running for about 5 days now, and is pretty
> much IO-bound. CPU usage is ~50%. The load average, on the other hand,
> has been 128 for 5 days straight. Which is high, but fine: the machine
> is responsive.
>
> Memory usage is fine. Most of it is going towards file system caches and
> the like. Each Solr instance has 8GB Xmx, and is currently using about
> 7GB. I haven't noticed any OutOfMemoryErrors in the log files.
>
> Monitoring shows that both Solr instances have been up throughout these
> procedings.
>
> Now, I'm willing to accept that these Solr instances don't have enough
> memory, or anything else, but I'm not seeing any of this reflected in
> the log files, which I'm finding troubling.
>
> What I do notice in the log file, is the very vague "SolrException:
> Service Unavailable". See below.
>
> Could anyone shed some light on what could be causing these errors?
>
> Thanks a bunch,
>
>  - Bram
>
>
> SolrCloud Setup:
> 
>
> - Version: 5.4.0
> - 3 Collections
> -- firstCollection : 18 shards
> -- secondCollection: 36 shards
> -- thirdCollection : 79 shards
> - Routing: implicit
> - 2 Solr Instances
> -- 8GB Xmx.
>
> Machine:
> 
> - Hexacore Xeon E5-1650
> - 64GB RAM
> - 50TB Disk (RAID6, 10 disks)
>
> Leader Stack Trace:
> ---
>
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No
> registered leader was found after waiting for 4000ms , collection:
> biweekly slice: thirdCollectionShard39
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
> at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:118)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
> at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
> at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
> ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32]
>
>
> Service Unavailable Log:
> 
>
>
> 527280878 ERROR (qtp59559151-194160) [c:collectionTwo
> s:collectionTwoShard12 r:core_node12
> x:collectionTwo_collectionTwoShard12_replica1]
> o.a.s.u.SolrCmdDistributor forwarding update to
> http://[CENSORED]:8983/solr/collectionTwo_collectionTwoShard1_replica1/
> failed - retrying ... retries: 15 add{,id=000195641101}
> params:update.distrib=TOLEADER=http://
> [CENSORED]:/solr/collectionTwo_collectionTwoShard12_replica1/
> rsp:503:org.apache.solr.common.SolrException: Service Unavailable
>
>
>
>


Re: A field _indexed_at_tdt added when I index documents.

2015-12-17 Thread Pushkar Raste
You must have this field in your schema with some default value assigned to
it (most probably default value is NOW). This field is usually used to
determine latest timestamp when this document was last indexed.

On 17 December 2015 at 04:51, Guillermo Ortiz  wrote:

> I'm indexing documents in solr with Spark and it's missing the a field
>  _indexed_at_tdt who is doesn't exist in my documents.
>
> I have added this field in my schema, why is this field being added? any
> solution?
>


Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-14 Thread Pushkar Raste
Hi Philippa,
Try taking a heap dump (when heap usage is high) and then using a profiler
look at which objects are taking up most of the memory. I have seen that if
you are using faceting/sorting on large number of documents then fieldCache
grows very big and dominates most of of the heap. Enabling docValues on the
fields you are sorting/faceting on helps.

On 8 December 2015 at 07:17, philippa griggs 
wrote:

> Hello Emir,
>
> The query load is around 35 requests per min on each shard, we don't
> document route so we query the entire index.
>
> We do have some heavy queries like faceting and its possible that a heavy
> queries is causing the nodes to go down- we are looking into this.  I'm new
> to solr so this could be a slightly stupid question but would a heavy query
> cause most of the nodes to go down? This didn't happen with the previous
> solr version we were using Solr 4.10.0, we did have nodes/shards which went
> down but there wasn't wipe out effect where most of the nodes go.
>
> Many thanks
>
> Philippa
>
> 
> From: Emir Arnautovic 
> Sent: 08 December 2015 10:38
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>
> Hi Phillippa,
> My guess would be that you are running some heavy queries (faceting/deep
> paging/large pages) or have high query load (can you give bit details
> about load) or have misconfigured caches. Do you query entire index or
> you have query routing?
>
> You have big machine and might consider running two Solr on each node
> (with smaller heap) and split shards so queries can be more
> parallelized, resources better utilized, and smaller heap to GC.
>
> Regards,
> Emir
>
> On 08.12.2015 10:49, philippa griggs wrote:
> > Hello Erick,
> >
> > Thanks for your reply.
> >
> > We have one collection and are writing documents to that collection all
> the time- it peaks at around 2,500 per minute and dips to 250 per minute,
> the size of the document varies. On each node we have around 55,000,000
> documents with a data size of 43G located on a drive of 200G.
> >
> > Each node has 122G memory, the heap size is currently set at 45G
> although we have plans to increase this to 50G.
> >
> > The heap settings we are using are:
> >
> >   -XX: +UseG1GC,
> > -XX:+ParallelRefProcEnabled.
> >
> > Please let me know if you need any more information.
> >
> > Philippa
> > 
> > From: Erick Erickson 
> > Sent: 07 December 2015 16:53
> > To: solr-user
> > Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
> >
> > Tell us a bit more.
> >
> > Are you adding documents to your collections or adding more
> > collections? Solr is a balancing act between the number of docs you
> > have on each node and the memory you have allocated. If you're
> > continually adding docs to Solr, you'll eventually run out of memory
> > and/or hit big GC pauses.
> >
> > How much memory are you allocating to Solr? How much physical memory
> > to you have? etc.
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
> >  wrote:
> >> Hello,
> >>
> >>
> >> I'm using:
> >>
> >>
> >> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
> >>
> >>
> >> Zookeeper 3.4.6.
> >>
> >>
> >> About half a year ago we upgraded to Solr 5.2.1 and since then have
> been experiencing a 'wipe out' effect where all of a sudden most if not all
> nodes will go down. Sometimes they will recover by themselves but more
> often than not we have to step in to restart nodes.
> >>
> >>
> >> Nothing in the logs jumps out as being the problem. With the latest
> wipe out we noticed that 10 out of the 20 nodes had garbage collections
> over 1min all at the same time, with the heap usage spiking up in some
> cases to 80%. We also noticed the amount of selects run on the solr cluster
> increased just before the wipe out.
> >>
> >>
> >> Increasing the heap size seems to help for a while but then it starts
> happening again- so its more like a delay than a fix. Our GC settings are
> set to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
> >>
> >>
> >> With our previous version of solr (4.10.0) this didn't happen. We had
> nodes/shards go down but it was contained, with the new version they all
> seem to go at around the same time. We can't really continue just
> increasing the heap size and would like to solve this issue rather than
> delay it.
> >>
> >>
> >> Has anyone experienced something simular?
> >>
> >> Is there a difference between the two versions around the recovery
> process?
> >>
> >> Does anyone have any suggestions on a fix.
> >>
> >>
> >> Many thanks
> >>
> >>
> >> Philippa
> > >
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Using properties placeholder ${someProperty} for xml node attribute in solrconfig

2015-12-04 Thread Pushkar Raste
Thanks Erick, I verified that we can use properties placeholders for
attributes on a xml node. One last question. I was reading through
CommitTracker and looks like setting maxTime for 'autoCommit' or '
autoSoftCommit' will disable commits. Is my understanding right?

On 3 December 2015 at 15:40, Erick Erickson <erickerick...@gmail.com> wrote:

> Hmmm, never tried it. You can check by looking at the admin
> UI>>plugins/stats>>cahces>>filterCache with a property defined like
> you want.
>
> And assuming that works, yes. the filterCache is turned off if its size is
> zero.
>
> Another option might be to add {!cache=false} to your fq clauses on
> the client in this case if that is possible/convenient.
>
> Best,
> Erick
>
> On Thu, Dec 3, 2015 at 11:19 AM, Pushkar Raste <pushkar.ra...@gmail.com>
> wrote:
> > Hi,
> > I want to make turning filter cache on/off configurable (I really have a
> > use case to turn off filter cache), can I use properties placeholders
> like
> > ${someProperty} in the filter cache config. i.e.
> >
> >  >  size="${solr.filterCacheSize:4096}"
> >  initialSize=""${solr.filterCacheInitialSize:2048}"
> >  autowarmCount="0"/>
> >
> > In short, can I use properties placeholders for attributes for xml node
> in
> > solrconfig. Follow up question is, provided I can do that, to turn off
> > filterCache can I simply set values 0 (zero) for 'solr.filterCacheSize'
> and
> > 'solr.filterCacheInitialSize'
>


Re: How to list all collections in solr-4.7.2

2015-12-03 Thread Pushkar Raste
Will 'wget http://host;port//solr/admin/collections?action=LIST' help?

On 3 December 2015 at 12:12, rashi gandhi  wrote:

> Hi all,
>
> I have setup two solr-4.7.2 server instances on two diff machines with 3
> zookeeper severs in solrcloud mode.
>
> Now, I want to retrieve list of all the collections that I have created in
> solrcloud mode.
>
> I tried LIST command of collections api, but its not working with
> solr-4.7.2.
> Error: unknown command LIST
>
> Please suggest me the command, that I can use.
>
> Thanks.
>


Using properties placeholder ${someProperty} for xml node attribute in solrconfig

2015-12-03 Thread Pushkar Raste
Hi,
I want to make turning filter cache on/off configurable (I really have a
use case to turn off filter cache), can I use properties placeholders like
${someProperty} in the filter cache config. i.e.



In short, can I use properties placeholders for attributes for xml node in
solrconfig. Follow up question is, provided I can do that, to turn off
filterCache can I simply set values 0 (zero) for 'solr.filterCacheSize' and
'solr.filterCacheInitialSize'


Re: SolrCloud breaks and does not recover

2015-11-07 Thread Pushkar Raste
HI,
To minimize GC pauses, try using G1GC and turn on 'ParallelRefProcEnabled'
jvm flag. G1GC works much better for heaps > 4 GB. Lowering
'InitiatingHeapOccupancyPercent'
will also help to avoid long GC pauses at the cost of more short pauses.

On 3 November 2015 at 12:12, Björn Häuser  wrote:

> Hi,
>
> thank you for your answer.
>
> 1> No OOM hit, the log does not contain any hind of that. Also solr
> wasn't restarted automatically. But the gc log has some pauses which
> are longer than 15 seconds.
>
> 2> So, if we need to recover a system we need to stop ingesting data into
> it?
>
> 3> The JVMs currently use a little bit more then 1GB of Heap, with a
> now changed max-heap of 3GB. Currently thinking of lowering the heap
> to 1.5 / 2 GB (following Uwe's post).
>
> Also the RES is 4.1gb and VIRT is 12.5gb. Swap is more or less not
> used (40mb of 1GB assigned swap). According to our server monitoring
> sometimes an io spike happens, but again not that much.
>
> What I am going todo:
>
> 1.) make sure that in case of failure we stop ingesting data into solrcloud
> 2.) lower the heap to 2GB
> 3.) Make sure that zookeeper can fsync its write-ahead log fast enough (<1
> sec)
>
> Thanks
> Björn
>
> 2015-11-03 16:27 GMT+01:00 Erick Erickson :
> > The GC logs don't really show anything interesting, there would
> > be 15+ second GC pauses. The Zookeeper log isn't actually very
> > interesting. As far as OOM errors, I was thinking of _solr_ logs.
> >
> > As to why the cluster doesn't self-heal, a couple of things:
> >
> > 1> Once you hit an OOM, all bets are off. The JVM needs to be
> > bounced. Many installations have kill scripts that bounce the
> > JVM. So it's explainable if you have OOM errors.
> >
> > 2> The system may be _trying_ to recover, but if you're
> > still ingesting data it may get into a resource-starved
> > situation where it makes progress but never catches up.
> >
> > Again, though, this seems like very little memory for the
> > situation you describe, I suspect you're memory-starved to
> > a point where you can't really run. But that's a guess.
> >
> > When you run, how much JVM memory are you using? The admin
> > UI should show that.
> >
> > But the pattern of 8G physical memory and 6G for Java is a red
> > flag as per Uwe's blog post, you may be swapping a lot (OS
> > memory) and that may be slowing things down enough to have
> > sessions drop. Grasping at straws here, but "top" or similar
> > should tell you what the system is doing.
> >
> > Best,
> > Erick
> >
> > On Tue, Nov 3, 2015 at 12:04 AM, Björn Häuser 
> wrote:
> >> Hi!
> >>
> >> Thank you for your super fast answer.
> >>
> >> I can provide more data, the question is which data :-)
> >>
> >> These are the config parameters solr runs with:
> >> https://gist.github.com/bjoernhaeuser/24e7080b9ff2a8785740 (taken from
> >> the admin ui)
> >>
> >> These are the log files:
> >>
> >> https://gist.github.com/bjoernhaeuser/a60c2319d71eb35e9f1b
> >>
> >> I think your first obversation is correct: SolrCloud looses the
> >> connection to zookeeper, because the connection times out.
> >>
> >> But why isn't solrcloud able to recover it self?
> >>
> >> Thanks
> >> Björn
> >>
> >>
> >> 2015-11-02 22:32 GMT+01:00 Erick Erickson :
> >>> Without more data, I'd guess one of two things:
> >>>
> >>> 1> you're seeing stop-the-world GC pauses that cause Zookeeper to
> >>> think the node is unresponsive, which puts a node into recovery and
> >>> things go bad from there.
> >>>
> >>> 2> Somewhere in your solr logs you'll see OutOfMemory errors which can
> >>> also cascade a bunch of problems.
> >>>
> >>> In general it's an anti-pattern to allocate such a large portion of
> >>> our physical memory to the JVM, see:
> >>>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >>>
> >>>
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>>
> >>>
> >>> On Mon, Nov 2, 2015 at 1:21 PM, Björn Häuser 
> wrote:
>  Hey there,
> 
>  we are running a SolrCloud, which has 4 nodes, same config. Each node
>  has 8gb memory, 6GB assigned to the JVM. This is maybe too much, but
>  worked for a long time.
> 
>  We currently run with 2 shards, 2 replicas and 11 collections. The
>  complete data-dir is about 5.3 GB.
>  I think we should move some JVM heap back to the OS.
> 
>  We are running Solr 5.2.1., as I could not see any related bugs to
>  SolrCloud in the release notes for 5.3.0 and 5.3.1, we did not bother
>  to upgrade first.
> 
>  One of our nodes (node A) reports these errors:
> 
>  org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>  Error from server at http://10.41.199.201:9004/solr/catalogue:
> Invalid
>  version (expected 2, but 101) or the data in not in 'javabin' format
> 
>  Stacktrace:
> 

Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-06 Thread Pushkar Raste
I may be wrong but I think 'delete' and 'optimize' can not be executed
concurrently on a Lucene index

On 4 November 2015 at 15:36, Shawn Heisey  wrote:

> On 11/4/2015 1:17 PM, Yonik Seeley wrote:
> > On Wed, Nov 4, 2015 at 3:06 PM, Shawn Heisey 
> wrote:
> >> I had understood that since 4.0, Solr (Lucene) can continue to update an
> >> index even while that index is optimizing.
> > Yes, that should be the case.
> >
> >> I have discovered in the logs of my SolrJ index maintenance program that
> >> this does not appear to actually be true.
> > Hmmm, perhaps some other resource is getting exhausted, like number of
> > background merges hit the limit?
>
> I hope it's a misconfiguration, not a bug.
>
> Below is my indexConfig.  I have already increased maxMergeCount because
> without that, full-import from MySQL will stop processing updates during
> a large merge, and the pause is long enough that the JDBC connection
> times out and closes.
>
> 
>   
> 35
> 35
> 105
>   
>   
> 1
> 6
>   
>   48
>   false
> 
>
> The specific index update that fails during the optimize is the SolrJ
> deleteByQuery call.
>
> Thanks,
> Shawn
>
>


How turn on logging for segment merging

2015-11-01 Thread Pushkar Raste
Is segment merging information logged at level finer than INFO? I have
application setup with INFO level logging and I am indexing documents at
rate of about few hundred a min. I am using default merge policy
parameters. However I never see logs that can give me information about
segment merging.

Is there special operation I have to set to turn on segment merging
information?

-- Pushkar Raste


Re: restore quorum after majority of zk nodes down

2015-10-30 Thread Pushkar Raste
We need bounce it, but outage will be very short and you don't have to take
down rest of the zookeeper instances.

On 30 October 2015 at 11:00, Daniel Collins <danwcoll...@gmail.com> wrote:

> Aren't you asking for dynamic ZK configuration which isn't supported yet
> (ZOOKEEPER-107, only in in 3.5.0-alpha)?  How do you swap a zookeeper
> instance from being an observer to a voting member?
>
> On 30 October 2015 at 09:34, Matteo Grolla <matteo.gro...@gmail.com>
> wrote:
>
> > Pushkar... I love this solution
> >   thanks
> > I'd just go with 3 zk nodes on each side
> >
> > 2015-10-29 23:46 GMT+01:00 Pushkar Raste <pushkar.ra...@gmail.com>:
> >
> > > How about having let's say 4 nodes on each side and make one node in
> one
> > of
> > > data centers a observer. When data center with majority of the nodes go
> > > down, bounce the observer by reconfiguring it as a voting member.
> > >
> > > You will have to revert back the observer back to being one.
> > >
> > > There will be a short outage as far as indexing is concerned but
> queries
> > > should continue to work and you don't have to take all the zookeeper
> > nodes
> > > down.
> > >
> > > -- Pushkar Raste
> > > On Oct 29, 2015 4:33 PM, "Matteo Grolla" <matteo.gro...@gmail.com>
> > wrote:
> > >
> > > > Hi Walter,
> > > >   it's not a problem to take down zk for a short (1h) time and
> > > > reconfigure it. Meanwhile solr would go in readonly mode.
> > > > I'd like feedback on the fastest way to do this. Would it work to
> just
> > > > reconfigure the cluster with other 2 empty zk nodes? Would they
> > correctly
> > > > sync from the nonempty one? Should first copy data from zk3 to the
> two
> > > > empty zk?
> > > > Matteo
> > > >
> > > >
> > > > 2015-10-29 18:34 GMT+01:00 Walter Underwood <wun...@wunderwood.org>:
> > > >
> > > > > You can't. Zookeeper needs a majority. One node is not a majority
> of
> > a
> > > > > three node ensemble.
> > > > >
> > > > > There is no way to split a Solr Cloud cluster across two
> datacenters
> > > and
> > > > > have high availability. You can do that with three datacenters.
> > > > >
> > > > > You can probably bring up a new Zookeeper ensemble and configure
> the
> > > Solr
> > > > > cluster to talk to it.
> > > > >
> > > > > wunder
> > > > > Walter Underwood
> > > > > wun...@wunderwood.org
> > > > > http://observer.wunderwood.org/  (my blog)
> > > > >
> > > > >
> > > > > > On Oct 29, 2015, at 10:08 AM, Matteo Grolla <
> > matteo.gro...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > I'm designing a solr cloud installation where nodes from a single
> > > > cluster
> > > > > > are distributed on 2 datacenters which are close and very well
> > > > connected.
> > > > > > let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and
> > > let's
> > > > > say
> > > > > > that DC1 goes down and the cluster is left with zk3.
> > > > > > how can I restore a zk quorum from this situation?
> > > > > >
> > > > > > thanks
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Pushkar Raste
How about having let's say 4 nodes on each side and make one node in one of
data centers a observer. When data center with majority of the nodes go
down, bounce the observer by reconfiguring it as a voting member.

You will have to revert back the observer back to being one.

There will be a short outage as far as indexing is concerned but queries
should continue to work and you don't have to take all the zookeeper nodes
down.

-- Pushkar Raste
On Oct 29, 2015 4:33 PM, "Matteo Grolla" <matteo.gro...@gmail.com> wrote:

> Hi Walter,
>   it's not a problem to take down zk for a short (1h) time and
> reconfigure it. Meanwhile solr would go in readonly mode.
> I'd like feedback on the fastest way to do this. Would it work to just
> reconfigure the cluster with other 2 empty zk nodes? Would they correctly
> sync from the nonempty one? Should first copy data from zk3 to the two
> empty zk?
> Matteo
>
>
> 2015-10-29 18:34 GMT+01:00 Walter Underwood <wun...@wunderwood.org>:
>
> > You can't. Zookeeper needs a majority. One node is not a majority of a
> > three node ensemble.
> >
> > There is no way to split a Solr Cloud cluster across two datacenters and
> > have high availability. You can do that with three datacenters.
> >
> > You can probably bring up a new Zookeeper ensemble and configure the Solr
> > cluster to talk to it.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Oct 29, 2015, at 10:08 AM, Matteo Grolla <matteo.gro...@gmail.com>
> > wrote:
> > >
> > > I'm designing a solr cloud installation where nodes from a single
> cluster
> > > are distributed on 2 datacenters which are close and very well
> connected.
> > > let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's
> > say
> > > that DC1 goes down and the cluster is left with zk3.
> > > how can I restore a zk quorum from this situation?
> > >
> > > thanks
> >
> >
>


Re: Two seperate intance of Solr on the same machine

2015-10-27 Thread Pushkar Raste
add "-Dsolr.log=" to your command line

On 27 October 2015 at 08:13, Steven White <swhite4...@gmail.com> wrote:

> How do I specify a different log directory by editing "log4j.properties"?
>
> Steve
>
> On Mon, Oct 26, 2015 at 9:08 PM, Pushkar Raste <pushkar.ra...@gmail.com>
> wrote:
>
> > It depends on your case. If you don't mind logs from 3 different
> instances
> > inter-mingled with each other you should be fine.
> > You add "-Dsolr.log=" to make logs to go different
> > directories. If you want logs to go to same directory but different files
> > try updating log4j.properties.
> >
> > On 26 October 2015 at 13:33, Steven White <swhite4...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > For reasons I have no control over, I'm required to run 2 (maybe more)
> > > instances of Solr on the same server (Windows and Linux).  To be more
> > > specific, I will need to start each instance like so:
> > >
> > >   > solr\bin start -p 8983 -s ..\instance_one
> > >   > solr\bin start -p 8984 -s ..\instance_two
> > >   > solr\bin start -p 8985 -s ..\instance_three
> > >
> > > Each of those instances is a stand alone Solr (no ZK here at all).
> > >
> > > I have tested this over and over and did not see any issue.  However, I
> > did
> > > notice that each instance is writing to the same solr\server\logs\
> files
> > > (will this be an issue?!!)
> > >
> > > Is the above something I should avoid?  If so, why?
> > >
> > > Thanks in advanced !!
> > >
> > > Steve
> > >
> >
>


Re: Two seperate intance of Solr on the same machine

2015-10-26 Thread Pushkar Raste
It depends on your case. If you don't mind logs from 3 different instances
inter-mingled with each other you should be fine.
You add "-Dsolr.log=" to make logs to go different
directories. If you want logs to go to same directory but different files
try updating log4j.properties.

On 26 October 2015 at 13:33, Steven White  wrote:

> Hi,
>
> For reasons I have no control over, I'm required to run 2 (maybe more)
> instances of Solr on the same server (Windows and Linux).  To be more
> specific, I will need to start each instance like so:
>
>   > solr\bin start -p 8983 -s ..\instance_one
>   > solr\bin start -p 8984 -s ..\instance_two
>   > solr\bin start -p 8985 -s ..\instance_three
>
> Each of those instances is a stand alone Solr (no ZK here at all).
>
> I have tested this over and over and did not see any issue.  However, I did
> notice that each instance is writing to the same solr\server\logs\ files
> (will this be an issue?!!)
>
> Is the above something I should avoid?  If so, why?
>
> Thanks in advanced !!
>
> Steve
>


Re: Anyone users IBM J9 JVM with 32G max heap ? Tuning recommendations?

2015-10-19 Thread Pushkar Raste
Do you have GC logging turned on? If yes can you provide excerpt from the
GC log for a pause that took > 30sec

On 19 October 2015 at 04:16, Jeff Wu  wrote:

> Hi all,
>
> we are using solr4.7 on top of IBM JVM J9 Java7, max heap to 32G, system
> RAM 64G.
>
> JVM parameters: -Xgcpolicy:balanced -verbose:gc -Xms12228m -Xmx32768m
> -XX:PermSize=128m -XX:MaxPermSize=512m
>
> We faced one issue here: we set zkClient timeout value to 30 seconds. By
> using the balanced GC policy, we sometimes occurred a global GC pause
> >30seconds, therefore the solr server disconnected with ZK, and /update
> requests on this solr was disabled after zk disconnect. We have to restart
> this solr server to recover.
>
> By staying with IBM JVM, anyone has recommendations on this ? The general
> average heap usage in our solr server is around 26G so we'd like to stay
> with 32G max heap, but want to better tune the JVM to have less global gc
> pause.
>


Re: slow queries

2015-10-14 Thread Pushkar Raste
Consider
1. Turning on docValues for fields you are sorting, faceting on. This will
require to reindex your data
2. Try using TrieInt type field you are trying to do range search on (you
may have to fiddle with precisoinStep) to balance index size vs performance.
3. If slowness is intermittent - turn on GC logging and see if there are
any long and tune GC strategy accordingly.

-- Pushkar Raste

On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró <
lorenzo.fund...@dawandamail.com> wrote:

> Hello,
>
> I have following conf for filters and commits :
>
> Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
> acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)
>
>  
>
>${solr.autoCommit.maxTime:15000}
>false
>  
>
>  
>
>${solr.autoSoftCommit.maxTime:60}
>  
>
> and the following stats for filters:
>
> lookups = 3602
> hits  =  3148
> hit ratio = 0.87
> inserts = 455
> evictions = 400
> size = 63
> warmupTime = 770
>
> *Problem: *a lot of slow queries, for example:
>
> {q=*:*=1.0=edismax=standard
> =map==pk_i,​score=0=view_counter_i
> desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50
> cache=true}in_languages_t:de={!cost=99
> cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
> (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378
>
> I could increase the size of the filter so I would decrease the amount of
> evictions, but it seems to me this would not be solving the root problem.
>
> Some ideas on where/how to start for optimisation ? Is it actually normal
> that this query takes this time ?
>
> We have an index of ~14 million docs. 4 replicas with two cores and 1 shard
> each.
>
> thank you.
>
>
> --
>
> --
> Lorenzo Fundaro
> Backend Engineer
> E-Mail: lorenzo.fund...@dawandamail.com
>
> Fax   + 49 - (0)30 - 25 76 08 52
> Tel+ 49 - (0)179 - 51 10 982
>
> DaWanda GmbH
> Windscheidstraße 18
> 10627 Berlin
>
> Geschäftsführer: Claudia Helming, Michael Pütz
> Amtsgericht Charlottenburg HRB 104695 B
>


Re: slow queries

2015-10-14 Thread Pushkar Raste
You may want to start solr with following settings to enable logging GC
details. Here are some flags you might want to enable.

-Xloggc:/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintHeapAtGC

Once you have GC logs, look for string "Total time for which application
threads were stopped" to check if you have long pauses (you may get long
pauses even with young generation GC).

-- Pushkar Raste

On Wed, Oct 14, 2015 at 11:47 AM, Lorenzo Fundaró <
lorenzo.fund...@dawandamail.com> wrote:

> < =true to the query?>>
>
> "debug": { "rawquerystring": "*:*", "querystring": "*:*", "parsedquery":
> "(+MatchAllDocsQuery(*:*))/no_coord", "parsedquery_toString": "+*:*", "
> explain": { "Product:47047358": "\n1.0 = (MATCH) MatchAllDocsQuery, product
> of:\n 1.0 = queryNorm\n", "Product:3223": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:30852121":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:35018929": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:31682082": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:31077677": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:22298365":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:41094514": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:13106166": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:19142249": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38243373":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:20434065": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:25194801": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:885482": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:45356790":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:67719831": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:12843394": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:38126213": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38798130":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:30292169": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:11535854": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:8443674": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:51012182":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:75780871": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:20227881": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:38093629": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3142218":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:15295602": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:3375982": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:38276777": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:10726118":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:50827742": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:5771722": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:3245678": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:13702130":
> "\n1.0
> = (MATCH) MatchAllDo

Issue while adding Long.MAX_VALUE to a TrieLong field

2015-09-10 Thread Pushkar Raste
Hi,
I am trying to following add document (value for price.long is
Long.MAX_VALUE)

  
411
one
9223372036854775807


However upon querying my collection value I get back for "price.long" is
9223372036854776000

Definition for 'price.long' field and 'long' look like following




My test case shows that MAX Value Solr can store without losing precision
is  18014398509481982. This is equivalent to ('Long.MAX_VALUE >> 9) - 1'
 (Not really sure if this computation really means something).


Can someone help to understand why TrieLong can't accept values >
18014398509481982


Re: Issue while adding Long.MAX_VALUE to a TrieLong field

2015-09-10 Thread Pushkar Raste
Thank you Yonik, looks like I missed previous reply. This is seems logical
as Max Long in java script is (2^53 - 1), which the max value I can insert
and validate through Admin UI. Never though Admin UI itself would trick me
though.


On Thu, Sep 10, 2015 at 6:01 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Thu, Sep 10, 2015 at 5:43 PM, Pushkar Raste <pushkar.ra...@gmail.com>
> wrote:
>
> Did you see my previous response to you today?
> http://markmail.org/message/wt6db4ocqmty5a42
>
> Try querying a different way, like from the command line using curl,
> or from your browser, but not through the solr admin.
>
> [...]
> > My test case shows that MAX Value Solr can store without losing precision
> > is  18014398509481982. This is equivalent to '2 ^53 - 1'  (Not really
> sure
> > if this computation really means something).
>
> 53 happens to be the effective number of mantissa bits in a 64 bit
> double precision floating point ;-)
>
> -Yonik
>


Issue while adding Long.MAX_VALUE to a TrieLong field

2015-09-10 Thread Pushkar Raste
I am trying following add document (value for price.long is Long.MAX_VALUE)

  
411
one
9223372036854775807
   

However upon querying my collection value I get back for "price.long"
is  9223372036854776000
(I got same behavior when I used JSON file)

Definition for 'price.long' field and 'long' look like following




My test case shows that MAX Value Solr can store without losing precision
is  18014398509481982. This is equivalent to '2 ^53 - 1'  (Not really sure
if this computation really means something).

I wrote test using SolrTestHarness and it successfully saved value
9223372036854775807 to Solr.

Can someone help to understand why TrieLong can't accept values >
18014398509481982, when I try to use XML/JSON file to add a document