Thanks for the responses, By saying "dirty data" you imply that only one of the values is "good" or > "clean" and that the others can be safely discarded/ignored, as opposed to > true multi-valued data where each value is there for good reason and needs > to be preserved. In any case, how do you know/decide which value should be > used for sorting - and did you just get lucky that Solr happened to use the > right one? >
I haven't gone back and checked the old version's docs where this was "working", however, I suspect that either the field never ended up appearing in docs more than once, or if it did, it had the same value repeated... The real issue here is that the docs are created externally, and the producer won't (yet) guarantee that fields that should appear once will actually appear once. Because of this, I don't want to declare the field as multiValued="false" as I don't want to cause indexing errors. It would be great for me (and apparently many others after searching) if there were an option as simple as forceSingleValued="true" - where some deterministic behavior such as "use first field encountered, ignore all others", would occur. The preferred technique would be the preprocess and "clean" the data before > it is handed to Solr or SolrJ, even if the source must remain "dirty". > Baring that a preprocessor or a custom update processor certainly. > I could write preprocessors (this is really what will eventually happen when the producer cleans their data), custom processors, etc... however, for something this simple it would be great not to be producing more code that would have to be maintained. > Please clarify exactly how the data is being fed into Solr. > I am using "generic" code to read from a key/value store and compose documents. This is another reason fixing the data at this point would not be desirable, the currently generic code would need to be made specific to look for these particular fields and then coerce them to single values... Thanks again, Aaron