Re: Consequences for using multivalued on all fields

Erick Erickson Wed, 22 Dec 2010 10:58:05 -0800

PositionIncrementGap for multiValued fields is, perhaps, the most
interesting
difference. One of the drivers here is, say, indexing across some boundary
that you don't want phrases or near clauses to match. For instance, say you
have text with
sentences, and your requirement is that phrases don't match across sentence
boundaries. One way to handle that is to add successive sentences to a
multivalued
field and define that field with a large increment gap.


But otherwise, as far as I know, there's no difference worth mentioning
between
indexing a bunch of stuff as one long string or breaking it up into multiple
segments in a multivalued field with the increment gap set to 1, except for
edge cases like the sorting thing Geert-Jan mentions....

Best
Erick

On Tue, Dec 21, 2010 at 12:49 PM, Dennis Gearon <gear...@sbcglobal.net>wrote:

> Thanks you for the input. You might have seen my posts about doing a
> flexible
> schema for derived objects. Sounds like dynamic fields might be the ticket.
>
> We'll be ready to test the idea in about a month, mabye 3 weeks. I'll post
> a
> comment about it whn it gets there.
>
> I don't know if I would gain anything, but I think that ALL boolean that
> were
> NOT in the base object but wehre in the derived objects could be put into
> one
> field and textually positioned key:pairs, at least for searh purposes.
>
>
> Since the derived object would have it's own, additional methods, one of
> those
> methods could be to 'unserialize' the 'boolean column'. In fact, that could
> be a
> base object function - Empty boolean column values just end up not
> populating
> any extra base object attiributes.
>
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> ----- Original Message ----
> From: kenf_nc <ken.fos...@realestate.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, December 21, 2010 6:07:51 AM
> Subject: Re: Consequences for using multivalued on all fields
>
>
> I have about 30 million documents and with the exception of the Unique ID,
> Type and a couple of date fields, every document is made of dynamic fields.
> Now, I only have maybe 1 in 5 being multi-value, but search and facet
> performance doesn't look appreciably different from a fixed schema
> solution.
> I don't do some of the fancier things, highlighting, spell check, etc. And
> I
> use a lot more string or lowercase field types than I do Text (so not as
> many fully tokenized fields), that probably helps with performance.
>
> The only disadvantage I know of is dealing with field names at runtime.
> Depending on your architecture, you don't really know what your document
> looks like until you have it in a result set. For what I'm doing, that
> isn't
> a problem.
> --
> View this message in context:
>
> http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Consequences for using multivalued on all fields

Reply via email to