Thanks Erick. What about Solr defect SOLR-7495 that Nick mentioned? It sounds like because of this defect, I should NOT set docValues="true" on a filed when: a) type="int" and b) multiValued="true". Can you confirm that I got this right? I'm on Solr 5.2.1
Steve On Fri, May 27, 2016 at 1:30 PM, Erick Erickson <erickerick...@gmail.com> wrote: > bq: my index size grew by 20%. Is this expected > > Yes. But don't worry about it ;). Basically, you've serialized > to disk the "uninverted" form of the field. But, that is > accessed through Lucene by MMapDirectory, see: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > If you don't use DocValues, the uninverted version > is built in Java's memory, which is much more expensive > for a variety of reasons. What you lose in disk size you gain > in a lower JVM footprint, fewer GC problems etc. > > But the implication is, indeed, that you should use DocValues > for field you intend to facet and/or sort etc on. If you only search > it's just wasted space. > > Best, > Erick > > On Fri, May 27, 2016 at 6:25 AM, Steven White <swhite4...@gmail.com> > wrote: > > Thank you Erick for pointing out about DocValues. I re-indexed my data > > with it set to true and my index size grew by 20%. Is this expected? > > > > Hi Nick, I'm not clear about SOLR-7495. Are you saying I should not use > > docValues=true if:type="int"and multiValued="true"? I'm on Solr 5.2.1. > > Thanks. > > > > Steve > > > > On Thu, May 26, 2016 at 9:29 PM, Nick D <ndrake0...@gmail.com> wrote: > > > >> Although you did mention that you wont need to sort and you are using > >> mutlivalued=true. On the off chance you do change something like > >> multivalued=false docValues=false then this will come in to play: > >> > >> https://issues.apache.org/jira/browse/SOLR-7495 > >> > >> This has been a rather large pain to deal with in terms of faceting. > (the > >> Lucene change that caused a number of Issues is also referenced in this > >> Jira). > >> > >> Nick > >> > >> > >> On Thu, May 26, 2016 at 11:45 AM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >> > I always prefer ints to strings, they can't help but take > >> > up less memory, comparing two ints is much faster than > >> > two strings etc. Although Lucene can play some tricks > >> > to make that less noticeable. > >> > > >> > Although if these are just a few values, it'll be hard to > >> > actually measure the perf difference. > >> > > >> > And if it's a _lot_ of unique values, you have other problems > >> > than the int/string distinction. Faceting on very high > >> > cardinality fields is something that can have performance > >> > implications. > >> > > >> > But I'd certainly add docValues="true" to the definition no matter > >> > which you decide on. > >> > > >> > Best, > >> > Erick > >> > > >> > On Wed, May 25, 2016 at 9:29 AM, Steven White <swhite4...@gmail.com> > >> > wrote: > >> > > Hi everyone, > >> > > > >> > > I will be faceting on data of type integers and I'm wonder if there > is > >> > any > >> > > difference on how I design my schema. I have no need to sort or use > >> > range > >> > > facet, given this, in terms of Lucene performance and index size, > does > >> it > >> > > make any difference if I use: > >> > > > >> > > #1: <field name="FACET_ID" type="string" multiValued="true" > >> > indexed="true" > >> > > required="true" stored="false"/> > >> > > > >> > > Or > >> > > > >> > > #2: <field name="FACET_ID" type="int" multiValued="true" > indexed="true" > >> > > required="true" stored="false"/> > >> > > > >> > > (notice how I changed the "type" from "string" to "int" in #2) > >> > > > >> > > Thanks in advanced. > >> > > > >> > > Steve > >> > > >> >