RE: COMMERCIAL: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-06-09 Thread Ronald Matamoros
Hi Chris,

Created ticket https://issues.apache.org/jira/browse/SOLR-6154
Included to the ticket the data.xml and a PDF with instructions on how to 
replicate.

Sending different updates to different ports was just how the confluence 
tutorial made the steps; it does not affect the result of the test

As soon as I have more information will post to the ticket.
Appreciate the interest, let me know about any suggestion or feedback  

Thank you
Ronald Matamoros


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 06 June 2014 22:00
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: RE: SolrCloud: facet range option 
f.field.facet.mincount=1 omits buckets on response



Ronald: I'm having a little trouble understading the  steps o reproduce that 
you are describing -- in particular Step 1 f ii because i'm not really sure i 
understand what exactly you are putting in mem2.xml

Also: Since you don't appera to be using implicit routing, i'm not clear on why 
you are explicitly sending differnet updates to different ports in Step 1 f i 
-- does that affect the results of your test?


If you can reliably reproduce using modified data from the example, could you 
please open a Jira outline these steps and atached the modified data to index 
directly to that issue?  (FWIW: If it doesn't matter what port you use to send 
which documents, then you should be able to create a single unified data.xml 
file containing all the docs to index in a single
command)



: Date: Thu, 29 May 2014 18:06:38 +
: From: Ronald Matamoros rmatamo...@searchtechnologies.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org solr-user@lucene.apache.org
: Subject: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits
: buckets on response
: 
: Hi all,
: 
: At the moment I am reviewing the code to determine if this is a legitimate 
bug that needs to be set as a JIRA ticket.
: Any insight or recommendation is appreciated.
: 
: Including the replication steps as text:
: 
: -
: Solr versions where issue was replicated.
:   * 4.5.1 (Linux)
:   * 4.8.1 (Windows + Cygwin)
: 
: Replicating
: 
:   1. Created two-shard environment - no replication 
:  
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: 
:  a. Download Solr distribution from 
http://lucene.apache.org/solr/downloads.html 
:  b. Unzipped solr-4.8.1.zip to a temporary location: SOLR_DIST_HOME 
:  c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
:  d. Create nodes
:   i. cd SOLR_DIST_HOME
:   ii. Via Windows Explorer copied example to node1
:   iii. Via Windows Explorer copied example to node2
: 
:  e. Start Nodes 
:   i. Start node 1
: 
:cd node1
:java -DzkRun -DnumShards=2 
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar 
start.jar
: 
:   ii. Start node 2
: 
:cd node2
:java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
: 
:  f. Fed sample documents
:   i. Out of the box
: 
:curl http://localhost:8983/solr/update?commit=true -H 
Content-Type: text/xml -d @mem.xml
:curl http://localhost:7574/solr/update?commit=true -H 
Content-Type: text/xml -d @monitor2.xml
: 
:   ii. Create a copy of mem.xml to mem2.xml; modified identifiers, 
names, prices and fed
: 
:curl http://localhost:8983/solr/update?commit=true -H 
Content-Type: text/xml -d @mem2.xml
: 
:add
:  doc
:field name=idCOMPANY1/field
:field name=nameCOMPANY1 Device/field
:field name=manuCOMPANY1 Device Mfg/field
:.
:field name=price190/field
:.
:  /doc
:  doc
:field name=idCOMPANY2/field
:field name=nameCOMPANY2 flatscreen/field
:field name=manuCOMPANY2 Device Mfg./field
:.
:field name=price200.00/field
:.
:  /doc
:  doc
:field name=idCOMPANY3/field
:field name=nameCOMPANY3 Laptop/field
:field name=manuCOMPANY3 Device Mfg./field
:.
:field name=price800.00/field
:.
:  /doc
:  
:  /add
: 
:   2. Query **without** f.price.facet.mincount=1, counts and buckets are OK
: 
:  
http://localhost:8983/solr/collection1/select?q=*:*fl=id,pricesort=id+ascfacet=truefacet.range=pricef.price.facet.range.start=0f.price.facet.range.end=1000f.price.facet.range.gap=50f.price.facet.range.other=allf.price.facet.range.include

Re: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-05-30 Thread Ronald Matamoros
Hi Shawn,

Thanks very much for the feedback.

Have tested using the routing mechanism/composite-id on a larger scale.
Unfortunately the same behaviour.

Regards
Ronald


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 29 May 2014 20:16
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: Re: SolrCloud: facet range option 
f.field.facet.mincount=1 omits buckets on response

On 5/29/2014 12:06 PM, Ronald Matamoros wrote:
 Hi all,

 At the moment I am reviewing the code to determine if this is a legitimate 
 bug that needs to be set as a JIRA ticket.
 Any insight or recommendation is appreciated.

snip

   Note: the value in int name=between changes with every other 
 refresh of the query. 

Whenever distributed search results change from one query to the next, it's 
almost always caused by having documents with the same uniqueKey in more than 
one shard.  Solr is able to remove these duplicates from the results, but there 
are other aspects of distributed searching that cannot be dealt with when there 
are duplicate documents.  This leads to problems like numFound changing from 
one request to the next.

To avoid these problems with SolrCloud, you'll likely want to create a new 
collection and set its router to compositeId.  This ensures that indexed 
documents are distributed to shards according to the hash of their uniqueKey, 
not imported directly into the node where you made the update request.

It's possible that my guess here is completely wrong, but this is usually the 
problem.

Thanks,
Shawn



RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-05-29 Thread Ronald Matamoros
 name=before0/int
  int name=after0/int
  int name=between4/int
/lst
  /lst

 Refresh of the Query (may need to do this multiple times with F5 key on 
browser)

  lst name=facet_ranges
lst name=price
  lst name=counts
int name=150.03/int
int name=250.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between2/int
/lst
  /lst

Thank you,
Ronald Matamoros

-Original Message-
From: Ronald Matamoros [mailto:rmatamo...@searchtechnologies.com] 
Sent: 27 May 2014 16:25
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: SolrCloud: facet range option f.field.facet.mincount=1 
omits buckets on response

Good afternoon,

Is the f.field.facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the 
option f.field.facet.mincount=1.

The Solr logs do not indicate any error or warning during execution.
The debug=true option and increasing the log levels to the FacetComponent do 
not provide any hints to the behaviour.

Replicated the issue on both Solr 4.5.1  4.8.1.
Attached a PDF that provides additional details and steps to replicate the 
behaviour using the out of the box Solr distribution.

Any insight or recommendation to tackle this situation is much appreciated.

Example, 

  Removing the f.field.facet.mincount=1 option gives the expected list of 
buckets for the 6 documents matched.

lst name=facet_ranges
 lst name=price
   lst name=counts
 int name=0.00/int
 int name=50.01/int
 int name=100.00/int
 int name=150.03/int
 int name=200.00/int
 int name=250.01/int
 int name=300.00/int
 int name=350.00/int
 int name=400.00/int
 int name=450.00/int
 int name=500.00/int
 int name=550.00/int
 int name=600.00/int
 int name=650.00/int
 int name=700.00/int
 int name=750.01/int
 int name=800.00/int
 int name=850.00/int
 int name=900.00/int
 int name=950.00/int
   /lst
   float name=gap50.0/float
   float name=start0.0/float
   float name=end1000.0/float
   int name=before0/int
   int name=after0/int
   int name=between2/int
 /lst
   /lst

  Using the f.field.facet.mincount=1 option removes the 0 count buckets 
but will also omit bucket int name=250.0

   lst name=facet_ranges
  lst name=price
lst name=counts
int name=50.01/int
int name=150.03/int
int name=750.01/int
 /lst
 float name=gap50.0/float
 float name=start0.0/float
 float name=end1000.0/float
 int name=before0/int
 int name=after0/int
 int name=between4/int
  /lst
/lst

 Refreshing the query using the browser's F5 option renders a different 
bucket list 
 (you may need to refresh multiple times)

   lst name=facet_ranges
  lst name=price
lst name=counts
int name=150.03/int
int name=250.01/int
 /lst
 float name=gap50.0/float
 float name=start0.0/float
 float name=end1000.0/float
 int name=before0/int
 int name=after0/int
 int name=between2/int
  /lst
/lst

Regards 
Ronald Matamoros


SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-05-27 Thread Ronald Matamoros
Good afternoon,

Is the f.field.facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the 
option f.field.facet.mincount=1.

The Solr logs do not indicate any error or warning during execution.
The debug=true option and increasing the log levels to the FacetComponent do 
not provide any hints to the behaviour.

Replicated the issue on both Solr 4.5.1  4.8.1.
Attached a PDF that provides additional details and steps to replicate the 
behaviour using the out of the box Solr distribution.

Any insight or recommendation to tackle this situation is much appreciated.

Example, 

  Removing the f.field.facet.mincount=1 option gives the expected list of 
buckets for the 6 documents matched.

lst name=facet_ranges
 lst name=price
   lst name=counts
 int name=0.00/int
 int name=50.01/int
 int name=100.00/int
 int name=150.03/int
 int name=200.00/int
 int name=250.01/int
 int name=300.00/int
 int name=350.00/int
 int name=400.00/int
 int name=450.00/int
 int name=500.00/int
 int name=550.00/int
 int name=600.00/int
 int name=650.00/int
 int name=700.00/int
 int name=750.01/int
 int name=800.00/int
 int name=850.00/int
 int name=900.00/int
 int name=950.00/int
   /lst
   float name=gap50.0/float
   float name=start0.0/float
   float name=end1000.0/float
   int name=before0/int
   int name=after0/int
   int name=between2/int
 /lst
   /lst

  Using the f.field.facet.mincount=1 option removes the 0 count buckets 
but will also omit bucket int name=250.0

   lst name=facet_ranges
  lst name=price
lst name=counts
int name=50.01/int
int name=150.03/int
int name=750.01/int
 /lst
 float name=gap50.0/float
 float name=start0.0/float
 float name=end1000.0/float
 int name=before0/int
 int name=after0/int
 int name=between4/int
  /lst
/lst

 Refreshing the query using the browser's F5 option renders a different 
bucket list 
 (you may need to refresh multiple times)

   lst name=facet_ranges
  lst name=price
lst name=counts
int name=150.03/int
int name=250.01/int
 /lst
 float name=gap50.0/float
 float name=start0.0/float
 float name=end1000.0/float
 int name=before0/int
 int name=after0/int
 int name=between2/int
  /lst
/lst

Regards 
Ronald Matamoros