On 10/23/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
>
> > ???  maxFieldLength only applies to the number of tokens indexed.  You
> > will always get the complete field back if it's stored, regardless of
> > what maxFieldLength is.
>
> What I meant was, that it is different from just having a field with all
> the tokens compared to using copyField to copy all the content to a
> field. CopyField doesn't just copy the contents to the field but seems
> to somehow link them there.

copyField simply creates an additional value for the target...
it would end up the same as if you sent it in yourself.

> So if my maxFieldLength is for example set to 100 and I use copyField
> for 101 other fields, will the 101th get truncated?

copyField and maxFieldLength have nothing to do with each other.

maxFieldLength limits the number of *tokens* in all values of a given
name in a given document.

So if you had

field1: this is a test
and a maxFieldLength of 3, then the "test" token would be dropped.

if you had
field1: this is
field1: a test
and a maxFieldLength of 3, then the "test" token would still be dropped.


> >> Is there a performance penalty for using copyFields when indexing?
> >
> > copyFields are done as a discrete step before indexing... almost no
> > cost to do that.
> > Indexing itself will have a performance impact if there are more
> > fields to index + store as a result of the copyField commands.
>
> The documents in my application have something like 400+ fields (many
> multivalued). For easy searching the application copies all the contents
> of the 400+ fields to one field (fulltext field) which is used as
> defaultfield. This field is quite large for many documents (it gets as
> long as 550000 tokens). I was thinking about using copyField for copying
> the fields onto that field instead of having the application do it
> before sending it to Solr.

The "indexing" cost will be identical in either case.  Since copyField
is a little more elegant (why force the user to send the data more
than once), I'd use that.

If you don't need to search on all 400+ fields individually, don't
index them (just index your defaultfield).
And I wouldn't store your defaultfield since it's redundant info.

-Yonik

Reply via email to