On 10/23/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > > > ??? maxFieldLength only applies to the number of tokens indexed. You > > will always get the complete field back if it's stored, regardless of > > what maxFieldLength is. > > What I meant was, that it is different from just having a field with all > the tokens compared to using copyField to copy all the content to a > field. CopyField doesn't just copy the contents to the field but seems > to somehow link them there.
copyField simply creates an additional value for the target... it would end up the same as if you sent it in yourself. > So if my maxFieldLength is for example set to 100 and I use copyField > for 101 other fields, will the 101th get truncated? copyField and maxFieldLength have nothing to do with each other. maxFieldLength limits the number of *tokens* in all values of a given name in a given document. So if you had field1: this is a test and a maxFieldLength of 3, then the "test" token would be dropped. if you had field1: this is field1: a test and a maxFieldLength of 3, then the "test" token would still be dropped. > >> Is there a performance penalty for using copyFields when indexing? > > > > copyFields are done as a discrete step before indexing... almost no > > cost to do that. > > Indexing itself will have a performance impact if there are more > > fields to index + store as a result of the copyField commands. > > The documents in my application have something like 400+ fields (many > multivalued). For easy searching the application copies all the contents > of the 400+ fields to one field (fulltext field) which is used as > defaultfield. This field is quite large for many documents (it gets as > long as 550000 tokens). I was thinking about using copyField for copying > the fields onto that field instead of having the application do it > before sending it to Solr. The "indexing" cost will be identical in either case. Since copyField is a little more elegant (why force the user to send the data more than once), I'd use that. If you don't need to search on all 400+ fields individually, don't index them (just index your defaultfield). And I wouldn't store your defaultfield since it's redundant info. -Yonik