Thanks Erick.

What about Solr defect SOLR-7495 that Nick mentioned?  It sounds like
because of this defect, I should NOT set docValues="true" on a filed when:
a) type="int" and b) multiValued="true".  Can you confirm that I got this
right?  I'm on Solr 5.2.1

Steve


On Fri, May 27, 2016 at 1:30 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: my index size grew by 20%.  Is this expected
>
> Yes. But don't worry about it ;). Basically, you've serialized
> to disk the "uninverted" form of the field. But, that is
> accessed through Lucene by MMapDirectory, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> If you don't use DocValues, the uninverted version
> is built in Java's memory, which is much more expensive
> for a variety of reasons. What you lose in disk size you gain
> in a lower JVM footprint, fewer GC problems etc.
>
> But the implication is, indeed, that you should use DocValues
> for field you intend to facet and/or sort etc on. If you only search
> it's just wasted space.
>
> Best,
> Erick
>
> On Fri, May 27, 2016 at 6:25 AM, Steven White <swhite4...@gmail.com>
> wrote:
> > Thank you Erick for pointing out about DocValues.  I re-indexed my data
> > with it set to true and my index size grew by 20%.  Is this expected?
> >
> > Hi Nick, I'm not clear about SOLR-7495.  Are you saying I should not use
> > docValues=true if:type="int"and multiValued="true"?  I'm on Solr 5.2.1.
> > Thanks.
> >
> > Steve
> >
> > On Thu, May 26, 2016 at 9:29 PM, Nick D <ndrake0...@gmail.com> wrote:
> >
> >> Although you did mention that you wont need to sort and you are using
> >> mutlivalued=true. On the off chance you do change something like
> >> multivalued=false docValues=false then this will come in to play:
> >>
> >> https://issues.apache.org/jira/browse/SOLR-7495
> >>
> >> This has been a rather large pain to deal with in terms of faceting.
> (the
> >> Lucene change that caused a number of Issues is also referenced in this
> >> Jira).
> >>
> >> Nick
> >>
> >>
> >> On Thu, May 26, 2016 at 11:45 AM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >> > I always prefer ints to strings, they can't help but take
> >> > up less memory, comparing two ints is much faster than
> >> > two strings etc. Although Lucene can play some tricks
> >> > to make that less noticeable.
> >> >
> >> > Although if these are just a few values, it'll be hard to
> >> > actually measure the perf difference.
> >> >
> >> > And if it's a _lot_ of unique values, you have other problems
> >> > than the int/string distinction. Faceting on very high
> >> > cardinality fields is something that can have performance
> >> > implications.
> >> >
> >> > But I'd certainly add docValues="true" to the definition no matter
> >> > which you decide on.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Wed, May 25, 2016 at 9:29 AM, Steven White <swhite4...@gmail.com>
> >> > wrote:
> >> > > Hi everyone,
> >> > >
> >> > > I will be faceting on data of type integers and I'm wonder if there
> is
> >> > any
> >> > > difference on how I design my schema.  I have no need to sort or use
> >> > range
> >> > > facet, given this, in terms of Lucene performance and index size,
> does
> >> it
> >> > > make any difference if I use:
> >> > >
> >> > > #1: <field name="FACET_ID" type="string" multiValued="true"
> >> > indexed="true"
> >> > > required="true" stored="false"/>
> >> > >
> >> > > Or
> >> > >
> >> > > #2: <field name="FACET_ID" type="int" multiValued="true"
> indexed="true"
> >> > > required="true" stored="false"/>
> >> > >
> >> > > (notice how I changed the "type" from "string" to "int" in #2)
> >> > >
> >> > > Thanks in advanced.
> >> > >
> >> > > Steve
> >> >
> >>
>

Reply via email to