Re: 7.3 pull replica with 7.2 tlog leader

2018-05-06 Thread Will Currie
Thanks. Done. https://issues.apache.org/jira/browse/SOLR-12321

On Mon, May 7, 2018 at 12:56 PM, Mark Miller  wrote:

> Yeah, the project should never use built in serialization. I'd file a JIRA
> issue. We should remove this when we can.
>
> - Mark
>
> On Sun, May 6, 2018 at 9:39 PM Will Currie  wrote:
>
> > Premise: During an upgrade I should be able to run a 7.3 pull replica
> > against a 7.2 tlog leader. Or vice versa.
> >
> > Maybe I'm totally wrong in assuming that!
> >
> > Assuming that's correct it looks like adding a new method[1] to
> > SolrResponse has broken binary compatibility. When I try to register a
> new
> > pull replica using the admin api[2] I get an HTTP 500 responseI see this
> > error logged: java.io.InvalidClassException:
> > org.apache.solr.client.solrj.SolrResponse; local class incompatible:
> stream
> > classdesc serialVersionUID = 3945300637328478755, local class
> > serialVersionUID = -793110010336024264
> >
> > The replica actually seems to register ok it just can't read the response
> > because the bytes from the 7.2 leader include a different
> serialVersionUID.
> >
> > Should SolrResponse include a serialVersionIUID? All subclasses too.
> >
> > It looks like stock java serialization is only used for these admin
> > responses. Query responses use JavaBinCodec instead..
> >
> > Full(ish) stack trace:
> >
> > ERROR HttpSolrCall null:org.apache.solr.common.SolrException:
> > java.io.InvalidClassException: org.apache.solr.client.solrj.
> SolrResponse;
> > local class incompatible: st
> > ream classdesc serialVersionUID = 3945300637328478755, local class
> > serialVersionUID = -7931100103360242645
> > at
> > org.apache.solr.client.solrj.SolrResponse.deserialize(
> SolrResponse.java:73)
> > at
> >
> > org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(
> CollectionsHandler.java:348)
> > at
> >
> > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(
> CollectionsHandler.java:256)
> > at
> >
> > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
> CollectionsHandler.java:230)
> > at
> >
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:195)
> > at
> > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
> > at
> >
> > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(
> HttpSolrCall.java:717)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
> > at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:384)
> > at
> >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:330)
> >
> > [1]
> >
> > https://github.com/apache/lucene-solr/commit/
> 5ce83237e804ac1130eaf5cf793955667793fee0#diff-
> b809fa594f93aa6805381029a188e4e2R46
> > [2]
> >
> > http://localhost:8983/solr/admin/collections?action=
> ADDREPLICA&collection=blah&shard=shard1&node=blah&type=pull
> >
> > Thanks,
> > Will
> >
> --
> - Mark
> about.me/markrmiller
>


Re: Determine Solr Core Creation Timestamp

2018-05-06 Thread Shawn Heisey

On 5/6/2018 3:09 PM, Atita Arora wrote:

I am working on a developing a utility which lets one monitor the
indexPipeline Status.
The indexing job runs in two forms where either it -
1. Creates a new core OR
2. Runs the delta on existing core.
To put down to simplest form I look into the DB timestamp when the indexing
job was triggered and have a desire to read some stat / metric from Solr
(preferably an API) which reports a timestamp when the CORE was created /
modified.
My utility completely relies on the difference between timestamps from DB &
Solr as these two timestamps are leveraged to determine health of pipeline.

I see the Master Version Timestamp under each shard which details the
version / Gen / Size.
Is that what I should be using ? How can I grab these from API ?
I tried using metrics api :
*http://localhost:8983/solr/admin/metrics?group=core&prefix=CORE
*
which details *CORE.startTime *but this timestamp changes whenever data is
being added to any core on this node.
*Is there any other suggestion to use some other way to determine the core
creation timestamp* ?


The startTime value is the time at which Solr started the core.  If that 
is getting updated frequently, then a reload operation is probably 
happening on the core.  Or, less likely, the Solr instance has been 
restarted.  I have checked a 6.6 system and on a core that is getting 
updates as frequently as once a minute, startTime is a couple of days 
ago, which was the last time that core was reloaded.


I've been trying to figure out whether a Lucene index keeps track of the 
time it was created, but I haven't found anything yet.  If it doesn't, I 
do wonder whether there might be some kind of metadata that Solr could 
write to the index to record information like this.  Solr would always 
have the option of writing such metadata to an entirely different 
location within the instanceDir.  The index creation time is probably 
not the only information that would be useful to have available.


Thanks,
Shawn



Re: 7.3 pull replica with 7.2 tlog leader

2018-05-06 Thread Mark Miller
Yeah, the project should never use built in serialization. I'd file a JIRA
issue. We should remove this when we can.

- Mark

On Sun, May 6, 2018 at 9:39 PM Will Currie  wrote:

> Premise: During an upgrade I should be able to run a 7.3 pull replica
> against a 7.2 tlog leader. Or vice versa.
>
> Maybe I'm totally wrong in assuming that!
>
> Assuming that's correct it looks like adding a new method[1] to
> SolrResponse has broken binary compatibility. When I try to register a new
> pull replica using the admin api[2] I get an HTTP 500 responseI see this
> error logged: java.io.InvalidClassException:
> org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream
> classdesc serialVersionUID = 3945300637328478755, local class
> serialVersionUID = -793110010336024264
>
> The replica actually seems to register ok it just can't read the response
> because the bytes from the 7.2 leader include a different serialVersionUID.
>
> Should SolrResponse include a serialVersionIUID? All subclasses too.
>
> It looks like stock java serialization is only used for these admin
> responses. Query responses use JavaBinCodec instead..
>
> Full(ish) stack trace:
>
> ERROR HttpSolrCall null:org.apache.solr.common.SolrException:
> java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
> local class incompatible: st
> ream classdesc serialVersionUID = 3945300637328478755, local class
> serialVersionUID = -7931100103360242645
> at
> org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:73)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:348)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:256)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:230)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
> at
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
> at
>
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
>
> [1]
>
> https://github.com/apache/lucene-solr/commit/5ce83237e804ac1130eaf5cf793955667793fee0#diff-b809fa594f93aa6805381029a188e4e2R46
> [2]
>
> http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=blah&shard=shard1&node=blah&type=pull
>
> Thanks,
> Will
>
-- 
- Mark
about.me/markrmiller


7.3 pull replica with 7.2 tlog leader

2018-05-06 Thread Will Currie
Premise: During an upgrade I should be able to run a 7.3 pull replica
against a 7.2 tlog leader. Or vice versa.

Maybe I'm totally wrong in assuming that!

Assuming that's correct it looks like adding a new method[1] to
SolrResponse has broken binary compatibility. When I try to register a new
pull replica using the admin api[2] I get an HTTP 500 responseI see this
error logged: java.io.InvalidClassException:
org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream
classdesc serialVersionUID = 3945300637328478755, local class
serialVersionUID = -793110010336024264

The replica actually seems to register ok it just can't read the response
because the bytes from the 7.2 leader include a different serialVersionUID.

Should SolrResponse include a serialVersionIUID? All subclasses too.

It looks like stock java serialization is only used for these admin
responses. Query responses use JavaBinCodec instead..

Full(ish) stack trace:

ERROR HttpSolrCall null:org.apache.solr.common.SolrException:
java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
local class incompatible: st
ream classdesc serialVersionUID = 3945300637328478755, local class
serialVersionUID = -7931100103360242645
at
org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:73)
at
org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:348)
at
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:256)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:230)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)

[1]
https://github.com/apache/lucene-solr/commit/5ce83237e804ac1130eaf5cf793955667793fee0#diff-b809fa594f93aa6805381029a188e4e2R46
[2]
http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=blah&shard=shard1&node=blah&type=pull

Thanks,
Will


RE: Re:the number of docs in each group depends on rows

2018-05-06 Thread Ian Caldwell
When I looked at this in solr 5.5.3 The second phase of the query was only sent 
to the shards that returned documents in the first phase, the problem is that 
one shard may contain matching documents in a group but ranked outside the top 
N results.

Fatduo this solution won't help you unless you are looking at changing some 
solr code, but is to help with Diego point that maby this could be fixed(as a 
starting point to look at as the code may have changed in 7.0).

We changed the grouping code to search all shards on the second phase. (I think 
that this was all that was needed but we changed grouping to be two level so 
lots of change is grouping code)
In the 5.5.3 code base we changed the method construceRequest(ResponseBuilder 
rb) in TopGroupsShardRequestFactory to always call createRequestForAllShards(rb)


Ian
NLA

-Original Message-
From: Diego Ceccarelli (BLOOMBERG/ LONDON)  
Sent: Friday, 4 May 2018 9:37 PM
To: solr-user@lucene.apache.org
Subject: Re:the number of docs in each group depends on rows

Hello, 

I'm not sure 100% but I think that if you have multiple shards the number of 
docs matched in each group is *not* guarantee to be exact. Increasing the rows 
will increase the amount of partial information that each shard sends to the 
federator and make the number more precise.

For exact counts you might need one shard OR  to make sure that all the 
documents in the same group are in the same shard by using document routing via 
composite keys [1].

Thinking about that, it should be possible to fix grouping to compute the exact 
numbers on request...

cheers,
Diego


[1] 
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#shards-and-indexing-data-in-solrcloud


From: solr-user@lucene.apache.org At: 05/04/18 07:53:41To:  
solr-user@lucene.apache.org
Subject: the number of docs in each group depends on rows

Hi,
We used Solr Cloud 7.1.0(3 nodes, 3 shards with 2 replicas). When we used group 
query, we found that the number of docs in each group depends on the rows 
number(group number).

difference:
 

when the rows bigger then 5, the return docs are correct and stable, for the 
rest, the number of docs is smaller than the actual result.

Could you please explain why and give me some suggestion about how to decide 
the rows number?


--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Presentation/Demo using latest Solr feature (SortableTextField)

2018-05-06 Thread Alexandre Rafalovitch
To those who noticed SortableTextField in Solr 7.3 and are curious
whether it is useful, it seems to be.

I've just finished a presentation at a Montreal Solr meetup on a rapid
schema evolution. I based it on a simplest schema/solrconfig that used
SortableTextField to make the initial data ingestion super-simple.

The presentation is at:
https://www.slideshare.net/arafalov/rapid-solr-schema-development-phone-directory
The git repo to match it, including all the schema changes (as
separate commits) and the actual dataset is at:
https://github.com/arafalov/solr-presentation-2018-may

Any comments/suggestions would be appreciated.

Hope this is useful,
Alex.


Determine Solr Core Creation Timestamp

2018-05-06 Thread Atita Arora
Hi,

I am working on a developing a utility which lets one monitor the
indexPipeline Status.
The indexing job runs in two forms where either it -
1. Creates a new core OR
2. Runs the delta on existing core.
To put down to simplest form I look into the DB timestamp when the indexing
job was triggered and have a desire to read some stat / metric from Solr
(preferably an API) which reports a timestamp when the CORE was created /
modified.
My utility completely relies on the difference between timestamps from DB &
Solr as these two timestamps are leveraged to determine health of pipeline.

I see the Master Version Timestamp under each shard which details the
version / Gen / Size.
Is that what I should be using ? How can I grab these from API ?
I tried using metrics api :
*http://localhost:8983/solr/admin/metrics?group=core&prefix=CORE
*
which details *CORE.startTime *but this timestamp changes whenever data is
being added to any core on this node.
*Is there any other suggestion to use some other way to determine the core
creation timestamp* ?

Please help !

Thanks,
Atita