Facet Results Strange - Help

2007-04-27 Thread realw5

Hello,
I'm running into some strange results for some facets of mine. Below you'll
see the XML returned from solr. I did a query using the standard request
handler. Notice the duplicated values returned (american standard, delta,
etc). There is actually quite a few of them. At first I though it may be
because of case sensitivity, but I since lower case everything going to
solr. 

Hopefully someone can chime in with some tips, thanks!

Dan

?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime4/int 
  /lst
  result name=response numFound=2328 start=0 / 
- lst name=facet_counts
  lst name=facet_queries / 
- lst name=facet_fields
- lst name=manufacturer_facet
  int name=kohler1560/int 
  int name=american standard197/int 
  int name=toto181/int 
  int name=bemis83/int 
  int name=porcher56/int 
  int name=ginger45/int 
  int name=elements of design40/int 
  int name=brasstech18/int 
  int name=st thomas18/int 
  int name=hansgrohe15/int 
  int name=sterling14/int 
  int name=whitehaus13/int 
  int name=delta12/int 
  int name=jacuzzi10/int 
  int name=cifial8/int 
  int name=kwc8/int 
  int name=herbeau7/int 
  int name=jado7/int 
  int name=elizabethan classics6/int 
  int name=showhouse by moen5/int 
  int name=grohe4/int 
  int name=creative specialties3/int 
  int name=latoscana3/int 
  int name=american standard2/int 
  int name=danze2/int 
  int name=ronbow2/int 
  int name=belle foret1/int 
  int name=dornbracht1/int 
  int name=kohler1/int 
  int name=myson1/int 
  int name=newport brass1/int 
  int name=price pfister1/int 
  int name=quayside publishing1/int 
  int name=st. thomas1/int 
  int name=adagio0/int 
  int name=alno0/int 
  int name=alsons0/int 
  int name=bates and bates0/int 
  int name=blanco0/int 
  int name=cec0/int 
  int name=cole and co0/int 
  int name=competitive0/int 
  int name=corstone0/int 
  int name=creative specialties0/int 
  int name=danze0/int 
  int name=decolav0/int 
  int name=dolan designs0/int 
  int name=doralfe0/int 
  int name=dornbracht0/int 
  int name=dreamline0/int 
  int name=elkay0/int 
  int name=fontaine0/int 
  int name=franke0/int 
  int name=grohe0/int 
  int name=hamat0/int 
  int name=hydrosystems0/int 
  int name=improvement direct0/int 
  int name=insinkerator0/int 
  int name=kenroy international0/int 
  int name=kichler0/int 
  int name=kindred0/int 
  int name=maxim0/int 
  int name=mico0/int 
  int name=moen0/int 
  int name=moen0/int 
  int name=mr sauna0/int 
  int name=mr steam0/int 
  int name=neo elements0/int 
  int name=newport brass0/int 
  int name=ondine0/int 
  int name=pegasus0/int 
  int name=price pfister0/int 
  int name=progress lighting0/int 
  int name=pulse0/int 
  int name=quoizel0/int 
  int name=robern0/int 
  int name=rohl0/int 
  int name=sagehill designs0/int 
  int name=sea gull lighting0/int 
  int name=show house0/int 
  int name=sloan0/int 
  int name=st%2e thomas0/int 
  int name=st%2e thomas creations0/int 
  int name=steamist0/int 
  int name=swanstone0/int 
  int name=thomas lighting0/int 
  int name=warmatowel0/int 
  int name=waste king0/int 
  int name=waterstone0/int 
  /lst
  /lst
  /lst
  /response
-- 
View this message in context: 
http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a1084
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet Results Strange - Help

2007-04-27 Thread realw5

I have a dynamic field setup for facets. It looks like this:

dynamicField name=*_facet type=string indexed=true stored=false
multiValued=true / 

I do this, because we add facets quite often, so having to modify the schema
every time would be unfeasible.

I'm currently reindexing from scratch, so I cannot try wt=python for little
bit longer. Once it's done indexing I'll give that a go and see if I notice
anything.

Dan


Yonik Seeley wrote:
 
 On 4/27/07, realw5 [EMAIL PROTECTED] wrote:
 Hello,
 I'm running into some strange results for some facets of mine. Below
 you'll
 see the XML returned from solr. I did a query using the standard request
 handler. Notice the duplicated values returned (american standard, delta,
 etc). There is actually quite a few of them. At first I though it may be
 because of case sensitivity, but I since lower case everything going to
 solr.

 Hopefully someone can chime in with some tips, thanks!
 
 What's the field definition for manufacturer_facet in your schema?  Is
 it multi-valued or not?
 
 Also, can you try the python response format (wt=python) as it outputs
 only ASCII and escapes everything else... there is an off chance the
 strings look the same but aren't.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10226359
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet Results Strange - Help

2007-04-27 Thread Yonik Seeley

On 4/27/07, realw5 [EMAIL PROTECTED] wrote:

I have a dynamic field setup for facets. It looks like this:

dynamicField name=*_facet type=string indexed=true stored=false
multiValued=true /

I do this, because we add facets quite often, so having to modify the schema
every time would be unfeasible.

I'm currently reindexing from scratch, so I cannot try wt=python for little
bit longer. Once it's done indexing I'll give that a go and see if I notice
anything.


If it's really the same field value repeated, you've hit a bug.
If so, it would be helpful if you could open a JIRA bug, and anything
you can do to help us reproduce the problem would be appreciated.

-Yonik


Re: Facet Results Strange - Help

2007-04-27 Thread realw5

Ok, I just finished indexing about 20k in documents. I took a look at so far
the problem has not appearred again. What I'm thinking caused it was I was
not adding overwritePending  overwriteCommited in the add process. Therefor
over time as data was being cleaned up, it was just appending to the
existing data.

I did have once cause of repeated values, but after looking at the python
writer, I notice a space at the end. I can fix this issue by triming all my
values before sening them to solr :-) 

I'm going to continue indexing, and if the problem popups up once fully
indexed I'll post back again. Otherwise thanks for the quick replies!

Dan


Yonik Seeley wrote:
 
 On 4/27/07, realw5 [EMAIL PROTECTED] wrote:
 I have a dynamic field setup for facets. It looks like this:

 dynamicField name=*_facet type=string indexed=true stored=false
 multiValued=true /

 I do this, because we add facets quite often, so having to modify the
 schema
 every time would be unfeasible.

 I'm currently reindexing from scratch, so I cannot try wt=python for
 little
 bit longer. Once it's done indexing I'll give that a go and see if I
 notice
 anything.
 
 If it's really the same field value repeated, you've hit a bug.
 If so, it would be helpful if you could open a JIRA bug, and anything
 you can do to help us reproduce the problem would be appreciated.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10226731
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet Results Strange - Help

2007-04-27 Thread Yonik Seeley

On 4/27/07, realw5 [EMAIL PROTECTED] wrote:

Ok, I just finished indexing about 20k in documents. I took a look at so far
the problem has not appearred again. What I'm thinking caused it was I was
not adding overwritePending  overwriteCommited in the add process. Therefor
over time as data was being cleaned up, it was just appending to the
existing data.


That is the default anyway.  Even if duplicate documents were somehow
added, that should not cause duplicates in facet results.  It should
be impossible to get duplicate values from facet.field, regardless of
what the index looks like.


I did have once cause of repeated values, but after looking at the python
writer, I notice a space at the end. I can fix this issue by triming all my
values before sening them to solr :-)


Hopefully you should have also seen the space in the XML response...
if it's not there, that would be a bug.

-Yonik


Re: Facet Results Strange - Help

2007-04-27 Thread Chris Hostetter

: It's likely you have the facet category added more than once for one
: or more docs. Like this;
:
: field name=manufacturer_facetamerican standard/field
: field name=manufacturer_facetamerican standard/field
:
: Are you adding the facet values on-the-fly? This happened to me and I
: solved it by removing the duplicate facet fields.

that's really odd ... i can't think of any way that exactly duplicate
field values would be counted twice in the current facet.field code.

I just tested this using the exampledocs by adding electronics to the
cat field of some docs multiple times, and i couldn't reproduce this
behavior.

can you elaborate more on how to trigger it?


-Hoss



Re: Facet Results Strange - Help

2007-04-27 Thread Chris Hostetter
: writer, I notice a space at the end. I can fix this issue by triming all my
: values before sening them to solr :-)

The built in Field Faceting works on the indexed values, so Solr can solve
this for you if you use something like this for your facet field type...

   fieldType name=facetString class=solr.TextField omitNorms=true
 analyzer
  !-- KeywordTokenizer does no actual tokenizing, so the entire
   input string is preserved as a single token
--
  tokenizer class=solr.KeywordTokenizerFactory/
  !-- The LowerCase TokenFilter does what you expect, which can be
   when you want your sorting to be case insensitive
--
  filter class=solr.LowerCaseFilterFactory /
  !-- The TrimFilter removes any leading or trailing whitespace --
  filter class=solr.TrimFilterFactory /
 /analyzer
   /fieldType



-Hoss