AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-13 Thread paul.dodd
Hi Edwin, With \W you will also replace non-word characters such as punktuation. If that's OK fine. Otherwise you need to identify the white space characters that are causing the problem. Von: Zheng Lin Edwin Yeo Gesendet: Mittwoch, 13. März 2019 03:25:39 An:

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-07 Thread paul.dodd
Hi Edwin I can’t understand why the pattern is not working and where the spaces between the are coming from. It should be possible to allow for spaces between the in the second match pattern however i.e. 2nd pattern (br[ \t\x0b\f]]*){3,} /Paul Gesendet von

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-06 Thread paul.dodd
Hi Edwin You are correct re the 2nd pattern – my bad. Looking at the 4 , it’s actually the sequence « »? So perhaps the first match pattern could be [ \t\x0b\f]*\r?\n i.e. [space tab vertical-tab formfeed] Regards, Paul Gesendet von

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-05 Thread paul.dodd
Hi Edwin Try for the first pattern/replacement [ \t]*\r?\n br Now all line endings and preceding whitespace characters should be changed to ‘’. The second pattern replacement should replace 3 or more ‘’ sequences to 2 ‘’ sequences: (brbr){3,} brbr Hope this approach works.

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread paul.dodd
If the second step is executed first, then you will get the unwanted 4 Gesendet von Mail für Windows 10 Von: Zheng Lin Edwin Yeo Gesendet: Mittwoch, 20. Februar 2019 09:29 An:

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread paul.dodd
BTW, which Java Version are you using? Gesendet von Mail für Windows 10 Von: Zheng Lin Edwin Yeo Gesendet: Mittwoch, 20. Februar 2019 08:13 An: solr-user@lucene.apache.org

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
Hi Edwin 1. Sorry, the pattern was wrong, the space should preceed the \n i.e. (\s*\n){2,} 2. Perhaps in the data you have other (non printing) characters than \n? Gesendet von Mail für Windows 10 Von: Zheng Lin Edwin

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
To avoid the «\n+\s*» matching too many \n and then failing on the {2,} part you could try (\n\s*){2,} If you also want to match CRLF then (\r?\n\s*){2,} Gesendet von Mail für Windows 10 Von: Zheng Lin Edwin

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
You don’t say what happens, just that it is not working. I assume nothing is replaced? Perhaps the pattern should be "(\n\s*){2,}" ?? Gesendet von Mail für Windows 10 Von: Zheng Lin Edwin Yeo Gesendet: