Hello, Is it possible to manipulate the value of a field before it is stored?
I'm indexing a database where some field contain raw HTML, including named character entities. Using solr.HTMLStripCharFilterFactory on the index analyzer, results in this HTML being correctly stripped, and named character entities replaced by the corresponding characters, in the index (as verified when searching, and with Luke). But, the stored values of the documents are stored unmodified, so the result sets, including highlights, contain HTML tags (that are escaped) and "entities" (where the leading '&' is also escaped) which make handling the results quite difficult. So, is it possible to apply some filters to the data before it is stored in the non-indexed fields? I couldn't find a part of the documentation that said whether it was possible or not; I did find this message in the archives of this list: > From: Noble Paul > Sent: Tuesday, March 31, 2009 5:41 PM > Subject: Re: indexed fields vs stored fields > > indexed = can be searched (mean you can use this to query). This undergoes tokenization filter etc > stored = can be retrieved. No modification to the data. This is stored verbatim which seems to say that it is not possible; but maybe things have changed since then? Any other idea? given that: - I have zero control over what is stored in the database - using the Solr XML update protocol i could probably transform the data before sending it - ... but I'd much rather continue using DataImportHandler to access the database Thanks, Regards, EB