Re: SolrCloud query results order master vs replica

2014-02-05 Thread M. Flatterie
Thank you Sir for that confirmation!
Nic


On Wed, 2/5/14, Chris Hostetter  wrote:

 Subject: Re: SolrCloud query results order master vs replica
 To: solr-user@lucene.apache.org
 Received: Wednesday, February 5, 2014, 11:33 AM
 
 
 : Just
 to make sure I interpret the results correctly:
 : - they all have a score of 1.7046129
 : - the order they are presented in is
 therefore not related to the score, 
 : it is
 just the order in which the data is internally stored (like
 an SQL 
 : SELECT statement without ORDER BY
 clause)
 
 The order they are
 presented *is* related to the score -- but since the 
 scores are all identical, and no secondary sort
 is specified, the behavior 
 is undefined --
 and can varry depending on the replica used.
 
 :   - If I want to force a sort
 operation, I should add a sort parameter 
 :
 in the query.  The first sort will be done by score and
 then documents 
 : with the same score will
 be sorted by my sort=?? paremeter?
 :   - or will my sort parameter
 overwrite the score sorting?
 
 if you specify a sort param, it should be the
 full sort you want -- it 
 won't be
 "appended" to the default score sort ... so if,
 for example, you 
 wanted to sort by score,
 with a secondary fallback sort by your "id" 
 field, use something like...
 
     sort=score desc, id
 asc
 
 
 
 -Hoss
 http://www.lucidworks.com/



Re: SolrCloud query results order master vs replica

2014-02-05 Thread M. Flatterie
eq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = 
fieldNorm(doc=15029)\n",
  "563023": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15698) 
[DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 15698, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = 
fieldNorm(doc=15698)\n",
  "894824": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 19256) 
[DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 19256, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = 
fieldNorm(doc=19256)\n",
  "540476": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20843) 
[DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 20843, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = 
fieldNorm(doc=20843)\n",
  "671271": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23778) 
[DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 23778, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = 
fieldNorm(doc=23778)\n",
  "527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25053) 
[DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 25053, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=25053)\n"



Just to make sure I interpret the results correctly:
- they all have a score of 1.7046129
- the order they are presented in is therefore not related to the score, it is 
just the order in which the data is internally stored (like an SQL SELECT 
statement without ORDER BY clause)

Follow up question:
  - If I want to force a sort operation, I should add a sort parameter in the 
query.  The first sort will be done by score and then documents with the same 
score will be sorted by my sort=?? paremeter?
  - or will my sort parameter overwrite the score sorting?

Thank you again for your help,

Nic.




On Mon, 2/3/14, Erick Erickson  wrote:

 Subject: Re: SolrCloud query results order master vs replica
 To: solr-user@lucene.apache.org
 Received: Monday, February 3, 2014, 2:19 PM
 
 This should only be
 happening if the scores are _exactly_ the same,
 which is actually
 quite rare.
 In that case, the tied scores are broken by the internal
 Lucene document
 ID, and the
 relative order of the docs on the two machines isn't
 guaranteed to be the
 same, the
 internal ID can change during segment merging, which is NOT
 the same
 on both machines.
 
 But this should be relatively
 rare. If you're doing *:* queries or
 other such, then they
 aren't scored (see ConstantScoreQuery). So
 in practical terms, I suspect you're
 seeing some kind of test artifact. Try adding
 &debug=all to the query
 and you'll
 see
 how documents are scored.
 
 Best,
 Erick
 
 On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie
 
 wrote:
 > Greetings,
 >
 > My setup is:
 > - SolrCloud V4.3
 > - On
 collection
 > - one shard
 > - 1 master, 1 replica
 >
 > so each instance
 contains the entire index.  The index is rather small and
 the replica is used for robustness.  There is no need
 (IMHO) to split shard the index (yet, until the index gets
 bigger).
 >
 > My
 question:
 > - if I do a query on a
 product name (that is what the index is about) on the master
 I get a certain number of results and the documents.
 > - if I do the same query on the replica, I
 get the same number of results but the docs are in a
 different order.
 > - I do not specify a
 sort parameter in my query, simply a q=.
 > - obviously if I force a sort
 order, everything is ok, same results, same order from both
 instances.
 > - am I wrong in expecting
 the same results, in the SAME order?
 >
 > Follow up question if the order is not
 guaranteed:
 > - should I force the dev.
 to use an explicit sort order?
 > - if we
 force the sort, we then bypass the ranking / score order do
 we not?
 > - should I force all queries to
 go to the master and fall back on the replica only in the
 context of a total loss of the master?
 >
 > Other useful
 information:
 >   - the admin
 page shows same number of documents in both instances.
 >   - logs are clean, load and
 replication and queries worked ok.
 >   - the web application that
 queries SOLR round robins between the two instances, so
 getting results in a different order is bad for
 consistency.
 >
 > Thank
 you for your help!
 >
 >
 Nic
 >



SolrCloud query results order master vs replica

2014-02-03 Thread M. Flatterie
Greetings,

My setup is:
- SolrCloud V4.3
- On collection
- one shard
- 1 master, 1 replica

so each instance contains the entire index.  The index is rather small and the 
replica is used for robustness.  There is no need (IMHO) to split shard the 
index (yet, until the index gets bigger).

My question:
- if I do a query on a product name (that is what the index is about) on the 
master I get a certain number of results and the documents.
- if I do the same query on the replica, I get the same number of results but 
the docs are in a different order.
- I do not specify a sort parameter in my query, simply a q=.
- obviously if I force a sort order, everything is ok, same results, same order 
from both instances.
- am I wrong in expecting the same results, in the SAME order?

Follow up question if the order is not guaranteed:
- should I force the dev. to use an explicit sort order?
- if we force the sort, we then bypass the ranking / score order do we not?
- should I force all queries to go to the master and fall back on the replica 
only in the context of a total loss of the master?

Other useful information:
  - the admin page shows same number of documents in both instances.
  - logs are clean, load and replication and queries worked ok.
  - the web application that queries SOLR round robins between the two 
instances, so getting results in a different order is bad for consistency.

Thank you for your help!

Nic



Re: Migrating from 4.2.1 to 4.3.0

2013-05-16 Thread M. Flatterie
Great it works, I am back on track!  Thank you!!!
Nic




 From: Shawn Heisey 
To: solr-user@lucene.apache.org 
Sent: Thursday, May 16, 2013 4:25:09 PM
Subject: Re: Migrating from 4.2.1 to 4.3.0
 

On 5/16/2013 1:40 PM, M. Flatterie wrote:
> Oups sorry about that, since it was referring context I thought it was the 
> Tomcat one.
>
> Here is the /home/solradm1/solr.xml file (comments removed!)
>
> 
> 
>      host="${host:}" hostPort="8180" hostContext="${hostContext:}" 
>zkClientTimeout="${zkClientTimeout:15000}">
>          
>              value="/home/solradm1/WebOrder_Collection/data" />
>              value="/home/solradm1/WebOrder_Collection/ulog" />
>          
>      
> 

The hostContext attribute needs changing.  It should be this instead:

hostContext="${hostContext:/solr}"

Looks like the previous version wasn't taking this attribute from your 
config, but the new version is.  This is probably a bug that was fixed 
in 4.3.

Thanks,
Shawn

Re: Migrating from 4.2.1 to 4.3.0

2013-05-16 Thread M. Flatterie
Oups sorry about that, since it was referring context I thought it was the 
Tomcat one.

Here is the /home/solradm1/solr.xml file (comments removed!)



    
    
    
    
    
    




Note: I configure solr.data.dir and solr.ulog.dir so I can run two instances on 
the same system and separate the data and ulog directories between the 
instances.

Nic.





 From: Shawn Heisey 
To: solr-user@lucene.apache.org 
Sent: Thursday, May 16, 2013 3:29:41 PM
Subject: Re: Migrating from 4.2.1 to 4.3.0
 

On 5/16/2013 12:37 PM, M. Flatterie wrote:
> Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 
> to 4.3.0):
>
>          docBase="/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war" debug="0" 
>crossContext="true">
>              value="/home/solradm1" override="true"/>
>          

That is not the solr.xml Mark is referring to.  This solr.xml configures 
tomcat to load Solr.  You will have /home/solradm1/solr.xml as well, 
that is the one we are concerned with.

Thanks,
Shawn

Re: Migrating from 4.2.1 to 4.3.0

2013-05-16 Thread M. Flatterie
Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 
4.3.0):

    
    
    




 From: Mark Miller 
To: solr-user@lucene.apache.org 
Sent: Thursday, May 16, 2013 2:28:52 PM
Subject: Re: Migrating from 4.2.1 to 4.3.0
 

Your solr webapp context appears to be "" rather than "solr". There was a JIRA 
issue in 4.3 that may have affected this, but I only saw it from a distance, so 
just a guess.

What does it say in solr.xml for the context (an attribute on )

- Mark

On May 16, 2013, at 2:02 PM, "M. Flatterie"  wrote:

> Greetings, I just started with Solr a couple weeks ago, with version 4.2.1.
> 
> I installed the following setup:
> - ZooKeeper: 3 instances ensemble
> - Solr: on Tomcat, 4 instances
>     - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica
> 
>     - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica
> 
>     - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica
> 
> 
> With version 4.2.1 everything works fine.  But I do have a problem if I query 
> instance 3 for something in the WebOrder_Collection.  I found that this is a 
> bug in 4.2.1,. I must query instances 1 or 2 to get results from 
> WebOrder_Collection.
> 
> 
> Now that I have upgraded to 4.3.0 I have the following problem.  My replicas 
> will not recover.  The recovery will retry, and retry, ... forever.
> 
> Details.  If I look at the Zoo, I see that:
>      - node_name
>             10.0.2.15:8180_solr        in solr 4.2.1
>             10.0.2.15:8180_             in solr 4.3.0
>      - base_url
>            http://10.0.2.15:8180/solr      in solr 4.2.1
> 
>            http://10.0.2.15:8180            in solr 4.3.0
> 
> My solr logs show this:
> 
> 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  – 
> Error while trying to recover. 
> core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>  Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found
>     at 
>org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>     at 
>org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
>     at 
>org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
>     at 
>org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
>     at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
> 
> 
> I have not been able to find more info than that.  The Solr cloud diagram 
> shows instance1 as active and leader, instance 2 as recovering.  My 
> solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag.
> 
> 
> Any idea?  I hope that it is a configuration issue on my part...
> 
> Thank you for any help, Nic.

Migrating from 4.2.1 to 4.3.0

2013-05-16 Thread M. Flatterie
Greetings, I just started with Solr a couple weeks ago, with version 4.2.1.

I installed the following setup:
- ZooKeeper: 3 instances ensemble
- Solr: on Tomcat, 4 instances
    - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica

    - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica

    - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica


With version 4.2.1 everything works fine.  But I do have a problem if I query 
instance 3 for something in the WebOrder_Collection.  I found that this is a 
bug in 4.2.1,. I must query instances 1 or 2 to get results from 
WebOrder_Collection.


Now that I have upgraded to 4.3.0 I have the following problem.  My replicas 
will not recover.  The recovery will retry, and retry, ... forever.

Details.  If I look at the Zoo, I see that:
 - node_name
10.0.2.15:8180_solr    in solr 4.2.1
10.0.2.15:8180_ in solr 4.3.0
 - base_url
            http://10.0.2.15:8180/solr  in solr 4.2.1

            http://10.0.2.15:8180    in solr 4.3.0

My solr logs show this:

8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  – Error 
while trying to recover. 
core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found
    at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
    at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
    at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
    at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
    at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)


I have not been able to find more info than that.  The Solr cloud diagram shows 
instance1 as active and leader, instance 2 as recovering.  My solrconfig.xml 
are identical, except for the LUCENE_42 or LUCENE_43 tag.


Any idea?  I hope that it is a configuration issue on my part...

Thank you for any help, Nic.