It's an interesting question. To start from, the copyField copies the source content, so there is no source-related tokenization description. Only the target's one. So, that approach is not suitable.
Regarding the lookups/auto-complete. There has been a bunch of various implementations added recently, but they are not really documented. Things like BlendedInfixSuggester are a bit hard to discover at the moment. So, there might be something there if one digs a lot. The other option is to do the tokenization in the UpdateRequestProcessor chain. You could clone a field, and do some processing so that by the time the content hits solr, it's already pre-tokenized into multi-value field. Then, you could have KeywordTokenizer on your collector field and separate URPs sub-chains for each original fields that go into that. One related hack would be to create a subclass of FieldMutatingUpdateProcessorFactory that wraps an arbitrary tokenizer and splits out tokens as multi-value output. This is a bit hazy, even in my own mind, but hopefully gives you something new to think about. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 11, 2014 at 8:05 AM, Michael Sokolov <msoko...@safaribooksonline.com> wrote: > The lack of response to this question makes me think that either there is no > good answer, or maybe the question was too obtuse. So I'll give it one more > go with some more detail ... > > My main goal is to implement autocompletion with a mix of words and short > phrases, where the words are drawn from the text of largish documents, and > the phrases are author names and document titles. > > I think the best way to accomplish this is to concoct a single field that > contains data from these other "source" fields (as usual with copyField), > but with some of the fields treated as keywords (ie with their values > inserted as single tokens), and others tokenized. I believe this would be > possible at the Lucene level by calling Document.addField () with multiple > fields having the same name: some marked as TOKENIZED and others not. I > think the tokenized fields would have to share the same analyzer, but that's > OK for my case. > > I can't see how this could be made to happen in Solr without a lot of custom > coding though. It seems as if the conversion from Solr fields to Lucene > fields is not an easy thing to influence. If anyone has an idea how to > achieve the subgoal, or perhaps a different way of getting at the main goal, > I'd love to hear about it. > > So far my only other idea is to write some kind of custom analyzer that > treats short texts as keywords and tokenizes longer ones, which is probably > what I'll look at if nothing else comes up. > > Thanks > > Mike > > > > On 4/9/2014 4:16 PM, Michael Sokolov wrote: >> >> I think I would like to do something like copyfield from a bunch of fields >> into a single field, but with different analysis for each source, and I'm >> pretty sure that's not a thing. Is there some alternate way to accomplish my >> goal? >> >> Which is to have a suggester that suggests words from my full text field >> and complete phrases drawn from my author and title fields all at the same >> time. So If I could index author and title using KeyWordAnalyzer, and full >> text tokenized, that would be the bees knees. >> >> -Mike > >