Re: Removing characters like '\n \n' from indexing

Erick Erickson Wed, 27 May 2015 10:21:54 -0700

The other alternative is to use SolrJ to parse the documents and do
your processing there. Here's an article on the pros/cons and an
example program.


https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Wed, May 27, 2015 at 1:57 AM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> Edwin -
>
> There’s a bunch of built-in update processors you can use, including a script 
> one that allows you to code it dynamically in JavaScript (or other JVM 
> scripting language).
>
> See 
> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors 
> <https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors> 
> for an exhaustive list.  The RegexReplaceProcessorFactory probably will do 
> what you need.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com <http://www.lucidworks.com/>
>
>
>
>
>> On May 27, 2015, at 3:36 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> 
>> wrote:
>>
>> Hi Shawn,
>>
>> Thanks for your reply.
>>
>> So that means the only way for me is to write my own custom class in order
>> for the removing characters like '\n' to work?
>>
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On 27 May 2015 at 14:46, Shawn Heisey <apa...@elyograg.org> wrote:
>>
>>> On 5/26/2015 10:16 PM, Zheng Lin Edwin Yeo wrote:
>>>> I tried to follow the example here
>>>> https://wiki.apache.org/solr/UpdateRequestProcessor, by putting
>>>> the updateRequestProcessorChain in my solrconfig.xml
>>>>
>>>> But I'm getting the following error when I tried to reload the core.
>>>>
>>>> Caused by: org.apache.solr.common.SolrException: Error loading class
>>>> 'solr.CustomUpdateRequestProcessorFactory'
>>>>
>>>> Is there anything I might have missed out? I'm using Solr 5.1.
>>>
>>> CustomUpdateRequestProcessorFactory is not the name of an actual usable
>>> update processor.  On that wiki page, it is a placeholder for a custom
>>> class name.
>>>
>>> This class actually does exist within the Solr source code, but it is
>>> defined in the *TEST* code, not the main source code that actually
>>> creates the information that's included in the Solr download.
>>>
>>> I've updated the wiki page to try making this more clear, by using an
>>> entirely fictional class name.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>

Re: Removing characters like '\n \n' from indexing

Reply via email to