Hi Edwin,
With \W you will also replace non-word characters such as punktuation. If 
that's OK fine. Otherwise you need to identify the white space characters that 
are causing the problem.
________________________________
Von: Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Gesendet: Mittwoch, 13. März 2019 03:25:39
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n

Hi,

We have managed to resolve the issue, by changing the \s to \W. The reason
could be due to that some of the spaces and white space instead of just a
space. Using \s will only remove the spaces and not the white spaces, but
using \W will remove the white spaces as well.

We have used this config, and it works.

<processor class="solr.RegexReplaceProcessorFactory">
   <str name="fieldName">content</str>
   <str name="pattern">(\n\W*){2,}</str>
   <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
   <bool name="literalReplacement">true</bool>
</processor>
<processor class="solr.RegexReplaceProcessorFactory">
   <str name="fieldName">content</str>
   <str name="pattern">(\n\W*){1,}</str>
   <str name="replacement">&lt;br&gt;</str>
   <bool name="literalReplacement">true</bool>
</processor>

Regards,
Edwin

On Tue, 12 Mar 2019 at 10:49, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi,
>
> Has anyone else faced the same issue before?
> So far all the regex patterns that we tried in this thread are not able to
> resolve the issue.
>
> Regards,
> Edwin
>
> On Fri, 8 Mar 2019 at 12:17, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
>> Hi Paul,
>>
>> Sorry, I realized there is an extra ']' in the pattern provided, which is
>> why there are so many <br> in the output.
>>
>> The output is exactly the same as previously (previous index result) if
>> we remove the extra ']', as shown in the configuration below.
>>
>>  <processor class="solr.RegexReplaceProcessorFactory">
>>    <str name="fieldName">content</str>
>>    <str name="pattern">[ \t\x0b\f]*\r?\n</str>
>>    <str name="replacement">&lt;br&gt;</str>
>>    <bool name="literalReplacement">true</bool>
>>  </processor>
>>  <processor class="solr.RegexReplaceProcessorFactory">
>>    <str name="fieldName">content</str>
>>    <str name="pattern">(&lt;br&gt;[ \t\x0b\f]*){3,}</str>
>>    <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>    <bool name="literalReplacement">true</bool>
>>  </processor>
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Thu, 7 Mar 2019 at 22:51, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> wrote:
>>
>>> Hi Paul,
>>>
>>> Thanks for the reply.
>>>
>>> For the 2nd pattern, if we put this pattern <str
>>> name="pattern">(&lt;br&gt;[ \t\x0b\f]]*){3,}</str>, which is like the
>>> configurations below:
>>>
>>> <processor class="solr.RegexReplaceProcessorFactory">
>>>    <str name="fieldName">content</str>
>>>    <str name="pattern">[ \t\x0b\f]*\r?\n</str>
>>>    <str name="replacement">&lt;br&gt;</str>
>>>    <bool name="literalReplacement">true</bool>
>>> </processor>
>>> <processor class="solr.RegexReplaceProcessorFactory">
>>>    <str name="fieldName">content</str>
>>>    <str name="pattern">(&lt;br&gt;[ \t\x0b\f]]*){3,}</str>
>>>    <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>    <bool name="literalReplacement">true</bool>
>>> </processor>
>>>
>>> It will not be able to change all those more than 3 <br> to 2 <br>.
>>>
>>> We will end up with many <br> in the output, like the example below:
>>>
>>>  http://www.concorded.com/<br><br>  
>>> <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
>>>  On Tue, Dec 18, 2018
>>>
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>>
>>>
>>> On Thu, 7 Mar 2019 at 20:44, <paul.d...@ub.unibe.ch> wrote:
>>>
>>>> Hi Edwin
>>>>
>>>>
>>>>
>>>> I can’t understand why the pattern is not working and where the spaces
>>>> between the <br> are coming from. It should be possible to allow for spaces
>>>> between the <br> in the second match pattern however i.e. 2nd pattern
>>>>
>>>>
>>>>
>>>> <str name="pattern">(&lt;br&gt;[ \t\x0b\f]]*){3,}</str>
>>>>
>>>>
>>>>
>>>> /Paul
>>>>
>>>>
>>>>
>>>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für
>>>> Windows 10
>>>>
>>>>
>>>>
>>>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com>
>>>> Gesendet: Mittwoch, 6. März 2019 16:28
>>>> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
>>>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>>>>
>>>>
>>>>
>>>> Hi Paul,
>>>>
>>>> I have tried with the first match pattern to be <str name="pattern">[
>>>> \t\x0b\f]*\r?\n</str>, like the configuration below:
>>>>
>>>> <processor class="solr.RegexReplaceProcessorFactory">
>>>>    <str name="fieldName">content</str>
>>>>    <str name="pattern">[ \t\x0b\f]*\r?\n</str>
>>>>    <str name="replacement">&lt;br&gt;</str>
>>>>    <bool name="literalReplacement">true</bool>
>>>> </processor>
>>>> <processor class="solr.RegexReplaceProcessorFactory">
>>>>    <str name="fieldName">content</str>
>>>>    <str name="pattern">(&lt;br&gt;){3,}</str>
>>>>    <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>>    <bool name="literalReplacement">true</bool>
>>>> </processor>
>>>>
>>>> However, the result is still the same as before (previous index
>>>> results),
>>>> with the 4 <br>.
>>>>
>>>> Regards,
>>>> Edwin
>>>>
>>>>
>>>> On Wed, 6 Mar 2019 at 18:23, <paul.d...@ub.unibe.ch> wrote:
>>>>
>>>> > Hi Edwin
>>>> >
>>>> >
>>>> >
>>>> > You are correct  re the 2nd pattern – my bad. Looking at the 4 <br>,
>>>> it’s
>>>> > actually the sequence «<br><br>  <br><br>»? So perhaps the first match
>>>> > pattern could be <str name="pattern">[ \t\x0b\f]*\r?\n</str>
>>>> >
>>>> >
>>>> >
>>>> > i.e. [space tab vertical-tab formfeed]
>>>> >
>>>> >
>>>> >
>>>> > Regards,
>>>> >
>>>> > Paul
>>>> >
>>>> >
>>>> >
>>>> > Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für
>>>> > Windows 10
>>>> >
>>>> >
>>>> >
>>>> > Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com>
>>>> > Gesendet: Mittwoch, 6. März 2019 07:44
>>>> > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
>>>> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple
>>>> \n
>>>> >
>>>> >
>>>> >
>>>> > Hi Paul,
>>>> >
>>>> > I have modified the second pattern to be (&lt;br&gt;){3,}, instead of
>>>> > (&lt;br&gt;&lt;br&gt;){3,}. This pattern of
>>>> (&lt;br&gt;&lt;br&gt;){3,}
>>>> > will actually look for 6 or more <br> instead of 3 <br>,  as we have
>>>> put
>>>> > the <br> two times in the pattern, which is the reason that there are
>>>> more
>>>> > <br> in the result, as cases where there are less than 6 <br> are not
>>>> being
>>>> > replaced, so we ended up having up to 5 <br> in the index.
>>>> >
>>>> > Modified configuration:
>>>> >  <processor class="solr.RegexReplaceProcessorFactory">
>>>> >    <str name="fieldName">content</str>
>>>> >    <str name="pattern">(&lt;br&gt;){3,}</str>
>>>> >    <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> >    <bool name="literalReplacement">true</bool>
>>>> >  </processor>
>>>> >
>>>> > This will bring us back to the result of the previous index content,
>>>> > meaning the issue of having the 4 <br> is still there.
>>>> >
>>>> > Regards,
>>>> > Edwin
>>>> >
>>>> >
>>>> >
>>>> > Regards,
>>>> > Edwin
>>>> >
>>>> > On Wed, 6 Mar 2019 at 11:37, Zheng Lin Edwin Yeo <
>>>> edwinye...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Hi Paul,
>>>> > >
>>>> > > Further to my previous email, which there was an extra "}" in the
>>>> > > configuration, I have changed to use the below configuration based
>>>> on
>>>> > your
>>>> > > suggestion.
>>>> > >
>>>> > > <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >    <str name="fieldName">content</str>
>>>> > >    <str name="pattern">[ \t]*\r?\n</str>
>>>> > >    <str name="replacement">&lt;br&gt;</str>
>>>> > >    <bool name="literalReplacement">true</bool>
>>>> > > </processor>
>>>> > > <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >    <str name="fieldName">content</str>
>>>> > >    <str name="pattern">(&lt;br&gt;&lt;br&gt;){3,}</str>
>>>> > >    <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >    <bool name="literalReplacement">true</bool>
>>>> > > </processor>
>>>> > >
>>>> > > However, the result that I get still has more than 2 <br>. In fact,
>>>> the
>>>> > > result become worse, as you can see from the comparison below.
>>>> > >
>>>> > > Example 1: The sentence that the regex pattern used to work
>>>> correctly.
>>>> > But
>>>> > > with the latest pattern, it has now changed from 2 <br> to become 5
>>>> <br>,
>>>> > > which is wrong.
>>>> > > *Original content in EML file:*
>>>> > > Dear Sir,
>>>> > >
>>>> > >
>>>> > > I am terminating
>>>> > > *Original content:*    Dear Sir,  \n\n \n \n\n I am terminating
>>>> > > *Previous Index content: *    Dear Sir,  <br><br>I am terminating
>>>> > > *Current Index content*:   Dear Sir, <br><br><br><br><br> I am
>>>> > terminating
>>>> > >
>>>> > > Example 2: The sentence that the above regex pattern is partially
>>>> working
>>>> > > (as you can see, instead of 2 <br>, there are 4 <br>)
>>>> > > *Original content in EML file:*
>>>> > >
>>>> > > *exalted*
>>>> > >
>>>> > > *Psalm 89:17*
>>>> > >
>>>> > >
>>>> > > 3 Choa Chu Kang Avenue 4
>>>> > > *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
>>>> Choa
>>>> > > Chu Kang Avenue 4, Singapore
>>>> > > *Previous Index content: *exalted  <br><br>Psalm 89:17   <br><br>
>>>> > > <br><br>3 Choa Chu Kang Avenue 4, Singapore
>>>> > > *Current Index content*: <br><br><br>   Psalm 89:17<br><br>
>>>> <br><br>  3
>>>> > > Choa Chu Kang Avenue 3, Singapor4
>>>> > >
>>>> > > Example 3: The sentence that the above regex pattern is partially
>>>> working
>>>> > > (as you can see, instead of 2 <br>, there are 4 <br>). For the
>>>> latest
>>>> > code,
>>>> > > there are now 5 <br>
>>>> > > *Original content in EML file:*
>>>> > >
>>>> > > http://www.concorded.com/
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tue, Dec 18, 2018 at 10:07 AM
>>>> > > *Original content:* http://www.concorded.com/   \n\n   \n\n \n
>>>> \n\n \n\n
>>>> > > \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18,
>>>> 2018 at
>>>> > > 10:07 AM
>>>> > > *Previous Index content: *http://www.concorded.com/   <br><br>
>>>> > > <br><br>On Tue, Dec 18, 2018 at 10:07 AM
>>>> > > *Current Index content:* http://www.concorded.com/<br><br>
>>>> <br><br><br>
>>>> > > On Tue, Dec 18, 2018 at 10:07 AM
>>>> > >
>>>> > >
>>>> > > Regards,
>>>> > > Edwin
>>>> > >
>>>> > > On Wed, 6 Mar 2019 at 00:29, Zheng Lin Edwin Yeo <
>>>> edwinye...@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > >> Hi Paul,
>>>> > >>
>>>> > >> Thank you for the reply.
>>>> > >>
>>>> > >> I have tried to add the following configuration according to your
>>>> > >> suggestion:
>>>> > >>
>>>> > >> <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >>    <str name="fieldName">content</str>
>>>> > >>    <str name="pattern">[ \t]*\r?\n}</str>
>>>> > >>    <str name="replacement">&lt;br&gt;</str>
>>>> > >>    <bool name="literalReplacement">true</bool>
>>>> > >> </processor>
>>>> > >>
>>>> > >> <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >>    <str name="fieldName">content</str>
>>>> > >>    <str name="pattern">(&lt;br&gt;&lt;br&gt;){3,}</str>
>>>> > >>    <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>    <bool name="literalReplacement">true</bool>
>>>> > >> </processor>
>>>> > >>
>>>> > >> However, none of the \n is being removed this time round.
>>>> > >> Is the order and/or the pattern correct?
>>>> > >>
>>>> > >> Regards,
>>>> > >> Edwin
>>>> > >>
>>>> > >> On Tue, 5 Mar 2019 at 19:54, <paul.d...@ub.unibe.ch> wrote:
>>>> > >>
>>>> > >>> Hi Edwin
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Try for the first pattern/replacement
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> <str name="pattern">[ \t]*\r?\n</str>
>>>> > >>>
>>>> > >>> <str name="replacement">&lt;br&gt;</str>
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Now all line endings and preceding whitespace characters should be
>>>> > >>> changed to ‘<br>’.
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> The second pattern replacement should replace 3 or more ‘<br>’
>>>> > sequences
>>>> > >>> to 2 ‘<br>’ sequences:
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> <str name="pattern">(&lt;br&gt;&lt;br&gt;){3,}</str>
>>>> > >>>
>>>> > >>> <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Hope this approach works. Sorry for not replying earlier and best
>>>> > >>> regards,
>>>> > >>>
>>>> > >>> Paul
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986>
>>>> für
>>>> > >>> Windows 10
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com>
>>>> > >>> Gesendet: Dienstag, 5. März 2019 03:35
>>>> > >>> An: solr-user@lucene.apache.org<mailto:
>>>> solr-user@lucene.apache.org>
>>>> > >>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect
>>>> multiple \n
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> Hi,
>>>> > >>>
>>>> > >>> For your info, this issue is occurring in the new Solr 7.7.1 as
>>>> well.
>>>> > >>>
>>>> > >>> Regards,
>>>> > >>> Edwin
>>>> > >>>
>>>> > >>> On Mon, 25 Feb 2019 at 10:28, Zheng Lin Edwin Yeo <
>>>> > edwinye...@gmail.com>
>>>> > >>> wrote:
>>>> > >>>
>>>> > >>> > Hi,
>>>> > >>> >
>>>> > >>> > Anyone else has other suggestions or have faced the same
>>>> problem?
>>>> > >>> >
>>>> > >>> > Regards,
>>>> > >>> > Edwin
>>>> > >>> >
>>>> > >>> > On Wed, 20 Feb 2019 at 16:58, Zheng Lin Edwin Yeo <
>>>> > >>> edwinye...@gmail.com>
>>>> > >>> > wrote:
>>>> > >>> >
>>>> > >>> >> Hi Paul,
>>>> > >>> >>
>>>> > >>> >> If I tried to execute the second step first, then I will only
>>>> get a
>>>> > >>> >> single <br> for those with 2 <br>.
>>>> > >>> >> For those that we originally get 4 <br>, there will be 2 <br>
>>>> with a
>>>> > >>> >> space in between.
>>>> > >>> >>
>>>> > >>> >> This is just changing the 2 <br> to be a single <br>, since the
>>>> > second
>>>> > >>> >> step is to replace with a single <br>.
>>>> > >>> >> But it has not solved the underlying problem yet.
>>>> > >>> >>
>>>> > >>> >> Regards,
>>>> > >>> >> Edwin
>>>> > >>> >>
>>>> > >>> >>
>>>> > >>> >> On Wed, 20 Feb 2019 at 16:41, <paul.d...@ub.unibe.ch> wrote:
>>>> > >>> >>
>>>> > >>> >>> If the second step is executed first, then you will get the
>>>> > unwanted
>>>> > >>> 4
>>>> > >>> >>> <br>
>>>> > >>> >>>
>>>> > >>> >>>
>>>> > >>> >>>
>>>> > >>> >>> Gesendet von Mail<
>>>> https://go.microsoft.com/fwlink/?LinkId=550986>
>>>> > >>> für
>>>> > >>> >>> Windows 10
>>>> > >>> >>>
>>>> > >>> >>>
>>>> > >>> >>>
>>>> > >>> >>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com>
>>>> > >>> >>> Gesendet: Mittwoch, 20. Februar 2019 09:29
>>>> > >>> >>> An: solr-user@lucene.apache.org<mailto:
>>>> solr-user@lucene.apache.org
>>>> > >
>>>> > >>> >>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect
>>>> > multiple
>>>> > >>> \n
>>>> > >>> >>>
>>>> > >>> >>>
>>>> > >>> >>>
>>>> > >>> >>> Hi Jörn ,
>>>> > >>> >>>
>>>> > >>> >>> Do you mean the regex is not correct?
>>>> > >>> >>>
>>>> > >>> >>> We are already using two RegexReplaceProcessorFactory steps,
>>>> like
>>>> > >>> the one
>>>> > >>> >>> shown below. The output that we get is still the same.
>>>> > >>> >>>
>>>> > >>> >>> <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>>      <str name="fieldName">content</str>
>>>> > >>> >>>      <str name="pattern">([ \t]*\r?\n){2,}</str>
>>>> > >>> >>>      <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>> >>>      <bool name="literalReplacement">true</bool>
>>>> > >>> >>> <processor>
>>>> > >>> >>>
>>>> > >>> >>> <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>>      <str name="fieldName">content</str>
>>>> > >>> >>>      <str name="pattern">([ \t]*\r?\n){1,}</str>
>>>> > >>> >>>      <str name="replacement">&lt;br&gt;</str>
>>>> > >>> >>>      <bool name="literalReplacement">true</bool>
>>>> > >>> >>> <processor>
>>>> > >>> >>>
>>>> > >>> >>> Regards,
>>>> > >>> >>> Edwin
>>>> > >>> >>>
>>>> > >>> >>> On Wed, 20 Feb 2019 at 16:03, Jörn Franke <
>>>> jornfra...@gmail.com>
>>>> > >>> wrote:
>>>> > >>> >>>
>>>> > >>> >>> > Then you need two regexprocessfactory steps
>>>> > >>> >>> >
>>>> > >>> >>> > > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo <
>>>> > >>> >>> edwinye...@gmail.com
>>>> > >>> >>> > >:
>>>> > >>> >>> > >
>>>> > >>> >>> > > Hi,
>>>> > >>> >>> > >
>>>> > >>> >>> > > Thanks for the reply.
>>>> > >>> >>> > >
>>>> > >>> >>> > > Do you know of any regex online tool that works correctly
>>>> for
>>>> > >>> Java
>>>> > >>> >>> regex?
>>>> > >>> >>> > > I tried to find some, but they are not working properly.
>>>> > >>> >>> > >
>>>> > >>> >>> > > Yes, our plan is to replace more than one \n with
>>>> <br><br>, and
>>>> > >>> >>> single \n
>>>> > >>> >>> > > with single <br>.
>>>> > >>> >>> > >
>>>> > >>> >>> > > Regards,
>>>> > >>> >>> > > Edwin
>>>> > >>> >>> > >
>>>> > >>> >>> > >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke <
>>>> > jornfra...@gmail.com
>>>> > >>> >
>>>> > >>> >>> wrote:
>>>> > >>> >>> > >>
>>>> > >>> >>> > >> Solr uses Java regex matching, so i doubt there is a bug
>>>> - it
>>>> > >>> would
>>>> > >>> >>> then
>>>> > >>> >>> > >> be in the JDK. Try out in a regex online Tool that
>>>> supports
>>>> > Java
>>>> > >>> >>> regex
>>>> > >>> >>> > for
>>>> > >>> >>> > >> your solution.
>>>> > >>> >>> > >>
>>>> > >>> >>> > >> I believe you want to have 2 regex process factories:
>>>> > >>> >>> > >> One that deals with single \n and one that deals with
>>>> more
>>>> > than
>>>> > >>> one
>>>> > >>> >>> \n
>>>> > >>> >>> > >>
>>>> > >>> >>> > >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
>>>> > >>> >>> > edwinye...@gmail.com
>>>> > >>> >>> > >>> :
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> Hi,
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> We have tried with the following pattern ([
>>>> \t]*\r?\n){2,}
>>>> > and
>>>> > >>> >>> > >>> configuration:
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> <processor class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>> > >>>  <str name="fieldName">content</str>
>>>> > >>> >>> > >>>  <str name="pattern">([ \t]*\r?\n){2,}</str>
>>>> > >>> >>> > >>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>> >>> > >>>  <bool name="literalReplacement">true</bool>
>>>> > >>> >>> > >>> </processor>
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> However, the issue is still occurring.
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> Anyone else is able to help?
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> Regards,
>>>> > >>> >>> > >>> Edwin
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
>>>> > >>> >>> > edwinye...@gmail.com>
>>>> > >>> >>> > >>> wrote:
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>>> Hi,
>>>> > >>> >>> > >>>>
>>>> > >>> >>> > >>>> For your info, this issue is occurring in Solr 7.7.0 as
>>>> > well.
>>>> > >>> >>> > >>>>
>>>> > >>> >>> > >>>> Regards,
>>>> > >>> >>> > >>>> Edwin
>>>> > >>> >>> > >>>>
>>>> > >>> >>> > >>>> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
>>>> > >>> >>> > edwinye...@gmail.com
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>>> wrote:
>>>> > >>> >>> > >>>>
>>>> > >>> >>> > >>>>> Hi,
>>>> > >>> >>> > >>>>>
>>>> > >>> >>> > >>>>> Should we report this as a bug in Solr?
>>>> > >>> >>> > >>>>>
>>>> > >>> >>> > >>>>> Regards,
>>>> > >>> >>> > >>>>> Edwin
>>>> > >>> >>> > >>>>>
>>>> > >>> >>> > >>>>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
>>>> > >>> >>> > edwinye...@gmail.com
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>>>> wrote:
>>>> > >>> >>> > >>>>>
>>>> > >>> >>> > >>>>>> Hi Paul,
>>>> > >>> >>> > >>>>>>
>>>> > >>> >>> > >>>>>> Regarding the regex (\n\s*){2,} that we are using,
>>>> when we
>>>> > >>> try
>>>> > >>> >>> in on
>>>> > >>> >>> > >>>>>> https://regex101.com/, it is able to give us the
>>>> correct
>>>> > >>> >>> result for
>>>> > >>> >>> > >> all
>>>> > >>> >>> > >>>>>> the examples (ie: All of them will only have
>>>> <br><br>, and
>>>> > >>> not
>>>> > >>> >>> more
>>>> > >>> >>> > >> than
>>>> > >>> >>> > >>>>>> that like what we are getting in Solr in our earlier
>>>> > >>> examples).
>>>> > >>> >>> > >>>>>>
>>>> > >>> >>> > >>>>>> Could there be a possibility of a bug in Solr?
>>>> > >>> >>> > >>>>>>
>>>> > >>> >>> > >>>>>> Regards,
>>>> > >>> >>> > >>>>>> Edwin
>>>> > >>> >>> > >>>>>>
>>>> > >>> >>> > >>>>>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
>>>> > >>> >>> > >> edwinye...@gmail.com>
>>>> > >>> >>> > >>>>>> wrote:
>>>> > >>> >>> > >>>>>>
>>>> > >>> >>> > >>>>>>> Hi Paul,
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> We have tried it with the space preceeding the \n
>>>> i.e.
>>>> > <str
>>>> > >>> >>> > >>>>>>> name="pattern">(\s*\n){2,}</str>, with the following
>>>> > regex
>>>> > >>> >>> pattern:
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> <processor
>>>> class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>> > >>>>>>>  <str name="fieldName">content</str>
>>>> > >>> >>> > >>>>>>>  <str name="pattern">(\s*\n){2,}</str>
>>>> > >>> >>> > >>>>>>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>> >>> > >>>>>>> </processor>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> However, we are also getting the exact same results
>>>> as
>>>> > the
>>>> > >>> >>> earlier
>>>> > >>> >>> > >>>>>>> Example 1, 2 and 3.
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> As for your point 2 on perhaps in the data you have
>>>> other
>>>> > >>> (non
>>>> > >>> >>> > >>>>>>> printing) characters than \n, we have find that
>>>> there are
>>>> > >>> no
>>>> > >>> >>> non
>>>> > >>> >>> > >> printing
>>>> > >>> >>> > >>>>>>> characters. It is just next line with a space. You
>>>> can
>>>> > >>> refer
>>>> > >>> >>> to the
>>>> > >>> >>> > >>>>>>> original content in the same examples below.
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> Example 1: The sentence that the above regex
>>>> pattern is
>>>> > >>> working
>>>> > >>> >>> > >>>>>>> correctly
>>>> > >>> >>> > >>>>>>> *Original content in EML file:*
>>>> > >>> >>> > >>>>>>> Dear Sir,
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> I am terminating
>>>> > >>> >>> > >>>>>>> *Original content:*    Dear Sir,  \n\n \n \n\n I am
>>>> > >>> terminating
>>>> > >>> >>> > >>>>>>> *Index content: *    Dear Sir,  <br><br>I am
>>>> terminating
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> Example 2: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> partially
>>>> > >>> >>> > >>>>>>> working (as you can see, instead of 2 <br>, there
>>>> are 4
>>>> > >>> <br>)
>>>> > >>> >>> > >>>>>>> *Original content in EML file:*
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> *exalted*
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> *Psalm 89:17*
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> 3 Choa Chu Kang Avenue 4
>>>> > >>> >>> > >>>>>>> *Original content:* exalted  \n \n\n   Psalm 89:17
>>>>  \n\n
>>>> > >>> >>>  \n\n  3
>>>> > >>> >>> > >>>>>>> Choa Chu Kang Avenue 4, Singapore
>>>> > >>> >>> > >>>>>>> *Index content: *exalted  <br><br>Psalm 89:17
>>>>  <br><br>
>>>> > >>> >>> <br><br>3
>>>> > >>> >>> > >>>>>>> Choa Chu Kang Avenue 4, Singapore
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> Example 3: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> partially
>>>> > >>> >>> > >>>>>>> working (as you can see, instead of 2 <br>, there
>>>> are 4
>>>> > >>> <br>)
>>>> > >>> >>> > >>>>>>> *Original content in EML file:*
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> http://www.concordpri.moe.edu.sg/
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> On Tue, Dec 18, 2018 at 10:07 AM
>>>> > >>> >>> > >>>>>>> *Original content:*
>>>> http://www.concordpri.moe.edu.sg/
>>>> > >>>  \n\n
>>>> > >>> >>> >  \n\n
>>>> > >>> >>> > >> \n
>>>> > >>> >>> > >>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n
>>>> > >>> \n\n\n  On
>>>> > >>> >>> Tue,
>>>> > >>> >>> > >> Dec 18,
>>>> > >>> >>> > >>>>>>> 2018 at 10:07 AM
>>>> > >>> >>> > >>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/
>>>> > >>>  <br><br>
>>>> > >>> >>> > >>>>>>> <br><br>On Tue, Dec 18, 2018 at 10:07 AM
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> Appreciate any other ideas or suggestions that you
>>>> may
>>>> > >>> have.
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> Thank you.
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>> Regards,
>>>> > >>> >>> > >>>>>>> Edwin
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>>>>>>> On Thu, 7 Feb 2019 at 22:49, <
>>>> paul.d...@ub.unibe.ch>
>>>> > >>> wrote:
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Hi Edwin
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> 1.  Sorry, the pattern was wrong, the space should
>>>> > preceed
>>>> > >>> >>> the \n
>>>> > >>> >>> > >>>>>>>> i.e. <str name="pattern">(\s*\n){2,}</str>
>>>> > >>> >>> > >>>>>>>> 2.  Perhaps in the data you have other (non
>>>> printing)
>>>> > >>> >>> characters
>>>> > >>> >>> > >>>>>>>> than \n?
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Gesendet von Mail<
>>>> > >>> >>> https://go.microsoft.com/fwlink/?LinkId=550986>
>>>> > >>> >>> > >> für
>>>> > >>> >>> > >>>>>>>> Windows 10
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:
>>>> edwinye...@gmail.com>
>>>> > >>> >>> > >>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 15:23
>>>> > >>> >>> > >>>>>>>> An: solr-user@lucene.apache.org<mailto:
>>>> > >>> >>> > solr-user@lucene.apache.org>
>>>> > >>> >>> > >>>>>>>> Betreff: Re: RegexReplaceProcessorFactory pattern
>>>> to
>>>> > >>> detect
>>>> > >>> >>> > >> multiple \n
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Hi Paul,
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> We have tried this suggested regex pattern as
>>>> follow:
>>>> > >>> >>> > >>>>>>>> <processor
>>>> class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>> > >>>>>>>>  <str name="fieldName">content</str>
>>>> > >>> >>> > >>>>>>>>  <str name="pattern">(\n\s*){2,}</str>
>>>> > >>> >>> > >>>>>>>>  <str name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>> >>> > >>>>>>>> </processor>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> But we still have exactly the same problem of
>>>> Example
>>>> > 1,2
>>>> > >>> and
>>>> > >>> >>> 3
>>>> > >>> >>> > >> below.
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Example 1: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> working
>>>> > >>> >>> > >>>>>>>> correctly
>>>> > >>> >>> > >>>>>>>> *Original content:*    Dear Sir,  \n\n \n \n\n I am
>>>> > >>> >>> terminating
>>>> > >>> >>> > >>>>>>>> *Index content: *    Dear Sir,  <br><br>I am
>>>> terminating
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Example 2: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> partially
>>>> > >>> >>> > >>>>>>>> working
>>>> > >>> >>> > >>>>>>>> (as you can see, instead of 2 <br>, there are 4
>>>> <br>)
>>>> > >>> >>> > >>>>>>>> *Original content:* exalted  \n \n\n   Psalm 89:17
>>>> >  \n\n
>>>> > >>> >>>  \n\n
>>>> > >>> >>> > 3
>>>> > >>> >>> > >>>>>>>> Choa
>>>> > >>> >>> > >>>>>>>> Chu Kang Avenue 4, Singapore
>>>> > >>> >>> > >>>>>>>> *Index content: *exalted  <br><br>Psalm 89:17
>>>>  <br><br>
>>>> > >>> >>> > <br><br>3
>>>> > >>> >>> > >>>>>>>> Choa
>>>> > >>> >>> > >>>>>>>> Chu Kang Avenue 4, Singapore
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Example 3: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> partially
>>>> > >>> >>> > >>>>>>>> working
>>>> > >>> >>> > >>>>>>>> (as you can see, instead of 2 <br>, there are 4
>>>> <br>)
>>>> > >>> >>> > >>>>>>>> *Original content:*
>>>> http://www.concordpri.moe.edu.sg/
>>>> > >>>  \n\n
>>>> > >>> >>> >  \n\n
>>>> > >>> >>> > >>>>>>>> \n \n\n
>>>> > >>> >>> > >>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n
>>>> \n\n\n
>>>> > On
>>>> > >>> >>> Tue, Dec
>>>> > >>> >>> > >> 18,
>>>> > >>> >>> > >>>>>>>> 2018
>>>> > >>> >>> > >>>>>>>> at 10:07 AM
>>>> > >>> >>> > >>>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/
>>>> > >>>  <br><br>
>>>> > >>> >>> > >>>>>>>> <br><br>On
>>>> > >>> >>> > >>>>>>>> Tue, Dec 18, 2018 at 10:07 AM
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Any further suggestion?
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Thank you.
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>> Regards,
>>>> > >>> >>> > >>>>>>>> Edwin
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>>> On Thu, 7 Feb 2019 at 22:20, <
>>>> paul.d...@ub.unibe.ch>
>>>> > >>> wrote:
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> To avoid the «\n+\s*» matching too many \n and
>>>> then
>>>> > >>> failing
>>>> > >>> >>> on
>>>> > >>> >>> > the
>>>> > >>> >>> > >>>>>>>> {2,}
>>>> > >>> >>> > >>>>>>>>> part you could try
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> <str name="pattern">(\n\s*){2,}</str>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> If you also want to match CRLF then
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> <str name="pattern">(\r?\n\s*){2,}</str>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Gesendet von Mail<
>>>> > >>> >>> https://go.microsoft.com/fwlink/?LinkId=550986
>>>> > >>> >>> > >
>>>> > >>> >>> > >>>>>>>> für
>>>> > >>> >>> > >>>>>>>>> Windows 10
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:
>>>> edwinye...@gmail.com>
>>>> > >>> >>> > >>>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 15:10
>>>> > >>> >>> > >>>>>>>>> An: solr-user@lucene.apache.org<mailto:
>>>> > >>> >>> > solr-user@lucene.apache.org
>>>> > >>> >>> > >>>
>>>> > >>> >>> > >>>>>>>>> Betreff: Re: RegexReplaceProcessorFactory pattern
>>>> to
>>>> > >>> detect
>>>> > >>> >>> > >> multiple
>>>> > >>> >>> > >>>>>>>> \n
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Hi Paul,
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Thanks for your reply.
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> When I use this pattern:
>>>> > >>> >>> > >>>>>>>>> <processor
>>>> class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>> > >>>>>>>>>  <str name="fieldName">content</str>
>>>> > >>> >>> > >>>>>>>>>  <str name="pattern">(\n+\s*){2,}</str>
>>>> > >>> >>> > >>>>>>>>>  <str
>>>> name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>> >>> > >>>>>>>>> </processor>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> It is working for some sentence within the same
>>>> content
>>>> > >>> and
>>>> > >>> >>> not
>>>> > >>> >>> > >>>>>>>> working for
>>>> > >>> >>> > >>>>>>>>> some sentences. Please see below for the one that
>>>> is
>>>> > >>> working
>>>> > >>> >>> and
>>>> > >>> >>> > >>>>>>>> another
>>>> > >>> >>> > >>>>>>>>> that is not working (partially working):
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Example 1: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> working
>>>> > >>> >>> > >>>>>>>> correctly
>>>> > >>> >>> > >>>>>>>>> *Original content:*    Dear Sir,  \n\n \n \n\n I
>>>> am
>>>> > >>> >>> terminating
>>>> > >>> >>> > >>>>>>>>> *Index content: *    Dear Sir,  <br><br>I am
>>>> > terminating
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Example 2: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> partially
>>>> > >>> >>> > >>>>>>>> working
>>>> > >>> >>> > >>>>>>>>> (as you can see, instead of 2 <br>, there are 4
>>>> <br>)
>>>> > >>> >>> > >>>>>>>>> *Original content:* exalted  \n \n\n   Psalm 89:17
>>>> >  \n\n
>>>> > >>> >>> >  \n\n  3
>>>> > >>> >>> > >>>>>>>> Choa
>>>> > >>> >>> > >>>>>>>>> Chu Kang Avenue 4, Singapore
>>>> > >>> >>> > >>>>>>>>> *Index content: *exalted  <br><br>Psalm 89:17
>>>> >  <br><br>
>>>> > >>> >>> > <br><br>3
>>>> > >>> >>> > >>>>>>>> Choa
>>>> > >>> >>> > >>>>>>>>> Chu Kang Avenue 4, Singapore
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Example 3: The sentence that the above regex
>>>> pattern is
>>>> > >>> >>> partially
>>>> > >>> >>> > >>>>>>>> working
>>>> > >>> >>> > >>>>>>>>> (as you can see, instead of 2 <br>, there are 4
>>>> <br>)
>>>> > >>> >>> > >>>>>>>>> *Original content:*
>>>> http://www.concordpri.moe.edu.sg/
>>>> > >>>  \n\n
>>>> > >>> >>> > >> \n\n
>>>> > >>> >>> > >>>>>>>> \n
>>>> > >>> >>> > >>>>>>>>> \n\n
>>>> > >>> >>> > >>>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n
>>>> \n\n\n
>>>> > On
>>>> > >>> >>> Tue,
>>>> > >>> >>> > Dec
>>>> > >>> >>> > >>>>>>>> 18, 2018
>>>> > >>> >>> > >>>>>>>>> at 10:07 AM
>>>> > >>> >>> > >>>>>>>>> *Index content: *
>>>> http://www.concordpri.moe.edu.sg/
>>>> > >>> >>>  <br><br>
>>>> > >>> >>> > >>>>>>>> <br><br>On
>>>> > >>> >>> > >>>>>>>>> Tue, Dec 18, 2018 at 10:07 AM
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> We would appreciate your help to see what is
>>>> wrong?
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Thank you.
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>> Regards,
>>>> > >>> >>> > >>>>>>>>> Edwin
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> On Thu, 7 Feb 2019 at 21:24, <
>>>> paul.d...@ub.unibe.ch>
>>>> > >>> wrote:
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> You don’t say what happens, just that it is not
>>>> > >>> working. I
>>>> > >>> >>> > assume
>>>> > >>> >>> > >>>>>>>> nothing
>>>> > >>> >>> > >>>>>>>>>> is replaced? Perhaps the pattern should be
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>  <str name="pattern">"(\n\s*){2,}"</str>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> ??
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> Gesendet von Mail<
>>>> > >>> >>> > https://go.microsoft.com/fwlink/?LinkId=550986>
>>>> > >>> >>> > >>>>>>>> für
>>>> > >>> >>> > >>>>>>>>>> Windows 10
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto:
>>>> edwinye...@gmail.com>
>>>> > >>> >>> > >>>>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 14:08
>>>> > >>> >>> > >>>>>>>>>> An: solr-user@lucene.apache.org<mailto:
>>>> > >>> >>> > >> solr-user@lucene.apache.org
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> Betreff: RegexReplaceProcessorFactory pattern to
>>>> > detect
>>>> > >>> >>> multiple
>>>> > >>> >>> > >> \n
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> Hi,
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> I am trying to use the
>>>> RegexReplaceProcessorFactory to
>>>> > >>> >>> remove
>>>> > >>> >>> > more
>>>> > >>> >>> > >>>>>>>> than
>>>> > >>> >>> > >>>>>>>>> two
>>>> > >>> >>> > >>>>>>>>>> \n with any number of spaces between them (Eg:
>>>> \n\n,
>>>> > \n
>>>> > >>> \n,
>>>> > >>> >>> \n
>>>> > >>> >>> > \n
>>>> > >>> >>> > >>>>>>>> \n
>>>> > >>> >>> > >>>>>>>>> \n),
>>>> > >>> >>> > >>>>>>>>>> and replace it with two <br>.
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> I use the following regex pattern and it is
>>>> working
>>>> > >>> when I
>>>> > >>> >>> test
>>>> > >>> >>> > it
>>>> > >>> >>> > >>>>>>>> in
>>>> > >>> >>> > >>>>>>>>>> regex101.com. But it is not working when I put
>>>> it
>>>> > >>> inside
>>>> > >>> >>> the
>>>> > >>> >>> > >>>>>>>>>> RegexReplaceProcessorFactory as below:
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> <updateRequestProcessorChain name="removeCode">
>>>> > >>> >>> > >>>>>>>>>> <processor
>>>> class="solr.RegexReplaceProcessorFactory">
>>>> > >>> >>> > >>>>>>>>>>  <str name="fieldName">content</str>
>>>> > >>> >>> > >>>>>>>>>>  <str name="pattern">"(\\n\s*){2,}"</str>
>>>> > >>> >>> > >>>>>>>>>>  <str
>>>> name="replacement">&lt;br&gt;&lt;br&gt;</str>
>>>> > >>> >>> > >>>>>>>>>> </processor>
>>>> > >>> >>> > >>>>>>>>>>         </updateRequestProcessorChain>
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> To explain further about my regex pattern, \s* is
>>>> > >>> >>> instructing
>>>> > >>> >>> > the
>>>> > >>> >>> > >>>>>>>> regex
>>>> > >>> >>> > >>>>>>>>> to
>>>> > >>> >>> > >>>>>>>>>> match any \n that have space after and {2,} is
>>>> > >>> instructing
>>>> > >>> >>> the
>>>> > >>> >>> > >>>>>>>> regex to
>>>> > >>> >>> > >>>>>>>>>> match 2 or more occurrence of such pattern (\n).
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> Please kindly let me know what is wrong and how
>>>> should
>>>> > >>> I do
>>>> > >>> >>> it?
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> I am using Solr 7.6.0.
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>> Regards,
>>>> > >>> >>> > >>>>>>>>>> Edwin
>>>> > >>> >>> > >>>>>>>>>>
>>>> > >>> >>> > >>>>>>>>>
>>>> > >>> >>> > >>>>>>>>
>>>> > >>> >>> > >>>>>>>
>>>> > >>> >>> > >>
>>>> > >>> >>> >
>>>> > >>> >>>
>>>> > >>> >>
>>>> > >>>
>>>> > >>
>>>> >
>>>>
>>>

Reply via email to