The other alternative is to use SolrJ to parse the documents and do your processing there. Here's an article on the pros/cons and an example program.
https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Wed, May 27, 2015 at 1:57 AM, Erik Hatcher <erik.hatc...@gmail.com> wrote: > Edwin - > > There’s a bunch of built-in update processors you can use, including a script > one that allows you to code it dynamically in JavaScript (or other JVM > scripting language). > > See > https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors > <https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors> > for an exhaustive list. The RegexReplaceProcessorFactory probably will do > what you need. > > — > Erik Hatcher, Senior Solutions Architect > http://www.lucidworks.com <http://www.lucidworks.com/> > > > > >> On May 27, 2015, at 3:36 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> >> wrote: >> >> Hi Shawn, >> >> Thanks for your reply. >> >> So that means the only way for me is to write my own custom class in order >> for the removing characters like '\n' to work? >> >> >> Regards, >> Edwin >> >> >> >> On 27 May 2015 at 14:46, Shawn Heisey <apa...@elyograg.org> wrote: >> >>> On 5/26/2015 10:16 PM, Zheng Lin Edwin Yeo wrote: >>>> I tried to follow the example here >>>> https://wiki.apache.org/solr/UpdateRequestProcessor, by putting >>>> the updateRequestProcessorChain in my solrconfig.xml >>>> >>>> But I'm getting the following error when I tried to reload the core. >>>> >>>> Caused by: org.apache.solr.common.SolrException: Error loading class >>>> 'solr.CustomUpdateRequestProcessorFactory' >>>> >>>> Is there anything I might have missed out? I'm using Solr 5.1. >>> >>> CustomUpdateRequestProcessorFactory is not the name of an actual usable >>> update processor. On that wiki page, it is a placeholder for a custom >>> class name. >>> >>> This class actually does exist within the Solr source code, but it is >>> defined in the *TEST* code, not the main source code that actually >>> creates the information that's included in the Solr download. >>> >>> I've updated the wiki page to try making this more clear, by using an >>> entirely fictional class name. >>> >>> Thanks, >>> Shawn >>> >>> >