Re: Is anyone using proxy caching in front of solr?

2019-02-20 Thread Furkan KAMACI
Hi Joakim,

I suggest you to read these resources:

http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
https://wiki.apache.org/solr/SolrAndHTTPCaches

which gives information about HTTP Caching including Varnish Cache,
Last-Modified, ETag, Expires, Cache-Control headers.

Kind Regards,
Furkan KAMACI

On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson 
wrote:

> Hello dear user list!
> I work at a company in retail where we use solr to perform searches as you
> type.
> As soon as you type more than 1 characters in the search field solr starts
> serving hits.
> Of course this generates a lot of "unnecessary" queries (in the sense that
> they are never shown to the user) which is why I started thinking about
> using something like squid or varnish to cache a bunch of these 2-4
> character queries.
>
> It seems most stuff I find about it is from pretty old sources, but as far
> as I know solrcloud doesn't have distributed cache support.
>
> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> don't use a lot of shards and replicas (biggest index is split to 3 shards
> with 2 replicas). All shards/replicas are not on the same solr host.
> Our solr setup handles around 80-200 queries per second during the day with
> peaks at >1500 before holiday season and sales.
>
> I haven't really read up on the details yet but it seems like I could use
> etags and Expires headers to work around having to do some of that
> "unnecessary" work.
>
> Is anyone doing this? Why? Why not?
>
> - peace!
>


Re: Re-read from CloudSolrStream

2019-02-20 Thread Joel Bernstein
It sounds like you just need to catch the exception?


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Feb 18, 2019 at 3:14 AM SOLR4189  wrote:

> Hi all,
>
> Let's say I have a next code:
>
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> <
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html>
>
>
> public class StreamingClient {
>
>public static void main(String args[]) throws IOException {
>   String zkHost = args[0];
>   String collection = args[1];
>
>   Map props = new HashMap();
>   props.put("q", "*:*");
>   props.put("qt", "/export");
>   props.put("sort", "fieldA asc");
>   props.put("fl", "fieldA,fieldB,fieldC");
>
>   CloudSolrStream cstream = new CloudSolrStream(zkHost,
> collection,
> props);
>   try {
>
> cstream.open();
> while(true) {
>
>   Tuple tuple = cstream.read();
>   if(tuple.EOF) {
>  break;
>   }
>
>   String fieldA = tuple.getString("fieldA");
>   String fieldB = tuple.getString("fieldB");
>   String fieldC = tuple.getString("fieldC");
>   System.out.println(fieldA + ", " + fieldB + ", " + fieldC);
> }
>
>   } finally {
>cstream.close();
>   }
>}
> }
>
> What can I do if I get exception in the line *Tuple tuple =
> cstream.read();*? How can I re-read the same tuple, i.e. to continue from
> exception moment ?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Newbie question - Error loading an existing config file

2019-02-20 Thread Shawn Heisey

On 2/20/2019 11:07 AM, Greg Robinson wrote:

Lets try this: https://imgur.com/a/z5OzbLW

What I'm trying to do seems pretty straightforward:

1. Install Solr Server 7.4 on Linux (Completed)
2. Connect my Drupal 7 site to the Solr Server and use it for indexing
content

My understanding is that I must first create a core in order to connect my
drupal site to Solr Server. This is where I'm currently stuck.


The assertion in your screenshot that the dataDir must exist is 
incorrect.  If current versions of Solr say this also, that is something 
we will need to change.  This is what actually happens:  If all the 
other requirements are met and the dataDir does not exist, it will be 
created automatically when the core starts, if the process has 
sufficient permissions.


See the large "warning" box on the CREATE action documentation for 
details on what you need:


https://lucene.apache.org/solr/guide/7_4/coreadmin-api.html#coreadmin-create

The warning box is the one that has a red triangle to the left of it. 
The red triangle contains an exclamation point.


The essence of what it says there is that the core's instance directory 
must exist, that directory must contain a "conf" directory, and all 
required config files must be in the conf directory.


If you're running in SolrCloud mode, then you're using the wrong API.

Thanks,
Shawn


Re: Upload Synonym to Solr Cloud

2019-02-20 Thread Erick Erickson
bin/solr zk -help
particularly
bin/solr zk cp

> On Feb 20, 2019, at 4:00 PM, Rathor, Piyush (US - Philadelphia) 
>  wrote:
> 
> I am new to solr.
> Need command to upload synonym.txt to solr cloud.
> 
> Thanks & Regards
> 
> 
> This message (including any attachments) contains confidential information 
> intended for a specific individual and purpose, and is protected by law. If 
> you are not the intended recipient, you should delete this message and any 
> disclosure, copying, or distribution of this message, or the taking of any 
> action based on it, by you is strictly prohibited.
> 
> v.E.1



Upload Synonym to Solr Cloud

2019-02-20 Thread Rathor, Piyush (US - Philadelphia)
I am new to solr.
Need command to upload synonym.txt to solr cloud.

Thanks & Regards


This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

v.E.1


Re: Querying on sum of child documents

2019-02-20 Thread Mikhail Khludnev
q={!frange l=8}{!parent which=isParent:1 score=total
v=$chq}=+isParent:2^=0 AND description:payroll^=0 AND {!func}exp


On Wed, Feb 20, 2019 at 7:39 PM flatmind  wrote:

> Hi
> I tried with the below query
>
> q={!frange l=8}{!parent which=isParent:1 score=total v=$chq} AND chq=+
> isParent:2 AND description:payroll AND {!func}exp
>
> I applied lower limit as 8 still the record is coming in the results.Where
> sum of the child documents matching with "payroll" description is 7.
> So what's wrong with my function query ?Please help me.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Is anyone using proxy caching in front of solr?

2019-02-20 Thread Joakim Hansson
Hello dear user list!
I work at a company in retail where we use solr to perform searches as you
type.
As soon as you type more than 1 characters in the search field solr starts
serving hits.
Of course this generates a lot of "unnecessary" queries (in the sense that
they are never shown to the user) which is why I started thinking about
using something like squid or varnish to cache a bunch of these 2-4
character queries.

It seems most stuff I find about it is from pretty old sources, but as far
as I know solrcloud doesn't have distributed cache support.

Our indexes aren't updated that frequently, about 4 - 6 times a day. We
don't use a lot of shards and replicas (biggest index is split to 3 shards
with 2 replicas). All shards/replicas are not on the same solr host.
Our solr setup handles around 80-200 queries per second during the day with
peaks at >1500 before holiday season and sales.

I haven't really read up on the details yet but it seems like I could use
etags and Expires headers to work around having to do some of that
"unnecessary" work.

Is anyone doing this? Why? Why not?

- peace!


Re: Reporting security vulnerability in Solr

2019-02-20 Thread Tomás Fernández Löbbe
Hi Krzysztof,
There is some information on the past CVEs and dependency issues in
https://wiki.apache.org/solr/SolrSecurity. For reporting, creating a
private Jira is good, or following the guidelines here:
https://www.apache.org/security/ (email secur...@apache.org or
secur...@lucene.apache.org)

On Wed, Feb 20, 2019 at 9:16 AM Erick Erickson 
wrote:

> You did the right thing, but there will be no new versions of the 6x code
> line released. Meanwhile, the versions of jar files in the two JIRAs you
> created have been replaced with newer versions.
>
> You could get the source code and upgrade the jar files (see
> lucene/ivy-versions.properties) if you can’t upgrade to a newer Solr
> release.
>
> Best,
> Erick
>
> > On Feb 20, 2019, at 5:48 AM, Krzysztof Dębski 
> wrote:
> >
> > Hi,
> >
> > What is the right way to report a security vulnerability in Solr?
> >
> > A few days ago I created two issues:
> > https://issues.apache.org/jira/browse/SOLR-13250
> > https://issues.apache.org/jira/browse/SOLR-13251
> >
> > I chose Security Level: Private (Security Issue) and added "security"
> label.
> >
> > Do I need to do anything else to report a security issue?
> >
> > Regards,
> > Krzysztof
>
>


Re: Newbie question - Error loading an existing config file

2019-02-20 Thread Greg Robinson
Gotcha.

Lets try this: https://imgur.com/a/z5OzbLW

What I'm trying to do seems pretty straightforward:

1. Install Solr Server 7.4 on Linux (Completed)
2. Connect my Drupal 7 site to the Solr Server and use it for indexing
content

My understanding is that I must first create a core in order to connect my
drupal site to Solr Server. This is where I'm currently stuck.

Thanks for your help!

On Wed, Feb 20, 2019 at 10:43 AM Erick Erickson 
wrote:

> Attachments generally are stripped by the mail server.
>
> Are you trying to create a core as part of a SolrCloud _collection_? If
> so, this
> is an anti-pattern, use the collection API commands. Shot in the dark.
>
> Best,
> Erick
>
> > On Feb 19, 2019, at 3:05 PM, Greg Robinson 
> wrote:
> >
> > I used the front end admin (see attached)
> >
> > thanks
> >
> > On Tue, Feb 19, 2019 at 3:54 PM Erick Erickson 
> wrote:
> > Hmmm, that’s not very helpful…..
> >
> > Don’t quite know what to say. There should be something more helpful
> > in the logs.
> >
> > Hmmm, How did you create the core?
> >
> > Best,
> > Erick
> >
> >
> > > On Feb 19, 2019, at 1:29 PM, Greg Robinson 
> wrote:
> > >
> > > Thanks for your direction regarding the log.
> > >
> > > I was able to locate it and these two lines stood out:
> > >
> > > Caused by: org.apache.solr.common.SolrException: Could not load conf
> for
> > > core new_solr_core: Error loading solr config from
> > > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml
> > >
> > > Caused by: org.apache.solr.common.SolrException: Error loading solr
> config
> > > from /home/solr/server/solr/new_solr_core/conf/solrconfig.xml
> > >
> > > which seems to point to the same issue.
> > >
> > > I also went ahead and updated permissions/owner to "solr" on all
> > > directories and files within "/home/solr/server/solr/new_solr_core".
> > >
> > > Still no luck. This is currently the same message that I'm getting on
> the
> > > admin front end:
> > >
> > > new_solr_core:
> > >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > Could not load conf for core new_solr_core: Error loading solr config
> from
> > > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml.
> > >
> > > thanks!
> > >
> > >
> > >
> > > On Tue, Feb 19, 2019 at 1:55 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> do a recursive seach for “solr.log" under SOLR_HOME…….
> > >>
> > >> Best,
> > >> ERick
> > >>
> > >>> On Feb 19, 2019, at 8:08 AM, Greg Robinson 
> > >> wrote:
> > >>>
> > >>> Hi Erick,
> > >>>
> > >>> Thanks for the quick response.
> > >>>
> > >>> Here is what is currently contained within  the conf dir:
> > >>>
> > >>> drwxr-xr-x 2 root root  4096 Feb 18 17:51 lang
> > >>> -rw-r--r-- 1 root root 54513 Feb 18 17:51 managed-schema
> > >>> -rw-r--r-- 1 root root   329 Feb 18 17:51 params.json
> > >>> -rw-r--r-- 1 root root   894 Feb 18 17:51 protwords.txt
> > >>> -rwxrwxrwx 1 root root 55323 Feb 18 17:51 solrconfig.xml
> > >>> -rw-r--r-- 1 root root   795 Feb 18 17:51 stopwords.txt
> > >>> -rw-r--r-- 1 root root  1153 Feb 18 17:51 synonyms.txt
> > >>>
> > >>> As far as the log, where exactly might I find the specific log that
> would
> > >>> give more info in regards to this error?
> > >>>
> > >>> thanks again!
> > >>>
> > >>> On Tue, Feb 19, 2019 at 9:06 AM Erick Erickson <
> erickerick...@gmail.com>
> > >>> wrote:
> > >>>
> >  Are all the other files there in your conf dir? Solrconfig.xml
> > >> references
> >  things like nanaged-schema etc.
> > 
> >  Also, your log file might contain more clues...
> > 
> >  On Tue, Feb 19, 2019, 08:03 Greg Robinson  > >> wrote:
> > 
> > > Hello,
> > >
> > > We have Solr 7.4 up and running on a Linux machine.
> > >
> > > I'm just trying to add a new core so that I can eventually point a
> > >> Drupal
> > > site to the Solr Server for indexing.
> > >
> > > When attempting to add a core, I'm getting the following error:
> > >
> > > new_solr_core:
> > >
> > 
> > >>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > Could not load conf for core new_solr_core: Error loading solr
> config
> >  from
> > > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml
> > >
> > > I've confirmed that
> > > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml exists
> but I'm
> > > still getting the error.
> > >
> > > Any direction is appreciated.
> > >
> > > Thanks!
> > >
> > 
> > >>>
> > >>>
> > >>> --
> > >>> Greg Robinson
> > >>> CEO - Mobile*Enhanced*
> > >>> www.mobileenhanced.com
> > >>> g...@mobileenhanced.com
> > >>> 303-598-1865
> > >>
> > >>
> > >
> > > --
> > > Greg Robinson
> > > CEO - Mobile*Enhanced*
> > > www.mobileenhanced.com
> > > g...@mobileenhanced.com
> > > 303-598-1865
> >
> >
> >
> > --
> > Greg Robinson
> > CEO - MobileEnhanced
> > www.mobileenhanced.com
> > g...@mobileenhanced.com
> > 

Re: Newbie question - Error loading an existing config file

2019-02-20 Thread Erick Erickson
Attachments generally are stripped by the mail server.

Are you trying to create a core as part of a SolrCloud _collection_? If so, this
is an anti-pattern, use the collection API commands. Shot in the dark.

Best,
Erick

> On Feb 19, 2019, at 3:05 PM, Greg Robinson  wrote:
> 
> I used the front end admin (see attached)
> 
> thanks
> 
> On Tue, Feb 19, 2019 at 3:54 PM Erick Erickson  
> wrote:
> Hmmm, that’s not very helpful…..
> 
> Don’t quite know what to say. There should be something more helpful
> in the logs.
> 
> Hmmm, How did you create the core?
> 
> Best,
> Erick
> 
> 
> > On Feb 19, 2019, at 1:29 PM, Greg Robinson  wrote:
> > 
> > Thanks for your direction regarding the log.
> > 
> > I was able to locate it and these two lines stood out:
> > 
> > Caused by: org.apache.solr.common.SolrException: Could not load conf for
> > core new_solr_core: Error loading solr config from
> > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml
> > 
> > Caused by: org.apache.solr.common.SolrException: Error loading solr config
> > from /home/solr/server/solr/new_solr_core/conf/solrconfig.xml
> > 
> > which seems to point to the same issue.
> > 
> > I also went ahead and updated permissions/owner to "solr" on all
> > directories and files within "/home/solr/server/solr/new_solr_core".
> > 
> > Still no luck. This is currently the same message that I'm getting on the
> > admin front end:
> > 
> > new_solr_core:
> > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Could not load conf for core new_solr_core: Error loading solr config from
> > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml.
> > 
> > thanks!
> > 
> > 
> > 
> > On Tue, Feb 19, 2019 at 1:55 PM Erick Erickson 
> > wrote:
> > 
> >> do a recursive seach for “solr.log" under SOLR_HOME…….
> >> 
> >> Best,
> >> ERick
> >> 
> >>> On Feb 19, 2019, at 8:08 AM, Greg Robinson 
> >> wrote:
> >>> 
> >>> Hi Erick,
> >>> 
> >>> Thanks for the quick response.
> >>> 
> >>> Here is what is currently contained within  the conf dir:
> >>> 
> >>> drwxr-xr-x 2 root root  4096 Feb 18 17:51 lang
> >>> -rw-r--r-- 1 root root 54513 Feb 18 17:51 managed-schema
> >>> -rw-r--r-- 1 root root   329 Feb 18 17:51 params.json
> >>> -rw-r--r-- 1 root root   894 Feb 18 17:51 protwords.txt
> >>> -rwxrwxrwx 1 root root 55323 Feb 18 17:51 solrconfig.xml
> >>> -rw-r--r-- 1 root root   795 Feb 18 17:51 stopwords.txt
> >>> -rw-r--r-- 1 root root  1153 Feb 18 17:51 synonyms.txt
> >>> 
> >>> As far as the log, where exactly might I find the specific log that would
> >>> give more info in regards to this error?
> >>> 
> >>> thanks again!
> >>> 
> >>> On Tue, Feb 19, 2019 at 9:06 AM Erick Erickson 
> >>> wrote:
> >>> 
>  Are all the other files there in your conf dir? Solrconfig.xml
> >> references
>  things like nanaged-schema etc.
>  
>  Also, your log file might contain more clues...
>  
>  On Tue, Feb 19, 2019, 08:03 Greg Robinson  >> wrote:
>  
> > Hello,
> > 
> > We have Solr 7.4 up and running on a Linux machine.
> > 
> > I'm just trying to add a new core so that I can eventually point a
> >> Drupal
> > site to the Solr Server for indexing.
> > 
> > When attempting to add a core, I'm getting the following error:
> > 
> > new_solr_core:
> > 
>  
> >> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Could not load conf for core new_solr_core: Error loading solr config
>  from
> > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml
> > 
> > I've confirmed that
> > /home/solr/server/solr/new_solr_core/conf/solrconfig.xml exists but I'm
> > still getting the error.
> > 
> > Any direction is appreciated.
> > 
> > Thanks!
> > 
>  
> >>> 
> >>> 
> >>> --
> >>> Greg Robinson
> >>> CEO - Mobile*Enhanced*
> >>> www.mobileenhanced.com
> >>> g...@mobileenhanced.com
> >>> 303-598-1865
> >> 
> >> 
> > 
> > -- 
> > Greg Robinson
> > CEO - Mobile*Enhanced*
> > www.mobileenhanced.com
> > g...@mobileenhanced.com
> > 303-598-1865
> 
> 
> 
> -- 
> Greg Robinson
> CEO - MobileEnhanced
> www.mobileenhanced.com
> g...@mobileenhanced.com
> 303-598-1865



Re: Reporting security vulnerability in Solr

2019-02-20 Thread Erick Erickson
You did the right thing, but there will be no new versions of the 6x code line 
released. Meanwhile, the versions of jar files in the two JIRAs you created 
have been replaced with newer versions.

You could get the source code and upgrade the jar files (see 
lucene/ivy-versions.properties) if you can’t upgrade to a newer Solr release.

Best,
Erick

> On Feb 20, 2019, at 5:48 AM, Krzysztof Dębski  wrote:
> 
> Hi,
> 
> What is the right way to report a security vulnerability in Solr?
> 
> A few days ago I created two issues:
> https://issues.apache.org/jira/browse/SOLR-13250
> https://issues.apache.org/jira/browse/SOLR-13251
> 
> I chose Security Level: Private (Security Issue) and added "security" label.
> 
> Do I need to do anything else to report a security issue?
> 
> Regards,
> Krzysztof



Re: Querying on sum of child documents

2019-02-20 Thread flatmind
Hi
I tried with the below query 

q={!frange l=8}{!parent which=isParent:1 score=total v=$chq} AND chq=+ 
isParent:2 AND description:payroll AND {!func}exp

I applied lower limit as 8 still the record is coming in the results.Where
sum of the child documents matching with "payroll" description is 7.
So what's wrong with my function query ?Please help me.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr - Join two cores and apply OR condition on each core's field

2019-02-20 Thread ViLi
I have two cores and each core has a common field to join the query. I am
able to write Solr query for this.

Each core has a date field in the Solr.Second part of my query is to apply a
date condition on Solr cores by giving OR condition on each core's date
field.

Below is the SQL query that I am using to get the results.

select * from companies core1
inner join company_types core2
on core1.company_type_id = core2.id and
(core1.created_at>=2012-08-1 OR core2.created_at>=2012-08-1)

What would be Solr equivalent to the above SQL query



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Reporting security vulnerability in Solr

2019-02-20 Thread Krzysztof Dębski
Hi,

What is the right way to report a security vulnerability in Solr?

A few days ago I created two issues:
https://issues.apache.org/jira/browse/SOLR-13250
https://issues.apache.org/jira/browse/SOLR-13251

I chose Security Level: Private (Security Issue) and added "security" label.

Do I need to do anything else to report a security issue?

Regards,
Krzysztof


Solr CDRC updating data in target and not in source

2019-02-20 Thread ypriverol
Hi: 

I'm using a CDRC feature from solr 7.1. My source solrcloud cluster is 3
shards and the target similar 3 shards. When we create both clusters and
push to the source and then enable CDRC the data is transfer nicely to the
target. If we start adding records everything is fine. 

However, we have deleted ALL records in source and start adding again all
with our pipelines (Spring solr). Interestingly,  all records appear in
target but not in the source. We have even stopped the cdrc and the data
continue transferring to the target and not appear in the source even when
we are 100% we are inserting in the source.  

Any ideas? 

Regards 
Yasset 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Terms Component with filtering

2019-02-20 Thread csathishkumar
Hi..

iam new to this,
Iam doing rich document indexing in solr.I have 3 fields
data,author,title.Total content of document indexed in data field.Iam
indexing 1000 of documents.


example
data:"Note that it’s necessary to wrap the query in double-quotes as a
phrase.Otherwise unpredictable and unwanted matches can occur."
title:"xyz"
author:"xyz"

data:"i need is filtering with terms component like 
terms.fl=data=wr.*=title:"xyz". iam not getting expected result
when i exec."
title:"xyz"
author:"xyz"

now im using terms component to search a word in data field.if i query
terms.fl=data=wr.* Result is wrap,wright..etc its works fine.
 
what i need is filtering with terms component like 
terms.fl=data=wr.*=title:"xyz". iam not getting expected result
when i execute this query.I know we cant filter with terms component.

give any solution to solve this one.Thank you



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Zheng Lin Edwin Yeo
Hi Paul,

If I tried to execute the second step first, then I will only get a single
 for those with 2 .
For those that we originally get 4 , there will be 2  with a space
in between.

This is just changing the 2  to be a single , since the second step
is to replace with a single .
But it has not solved the underlying problem yet.

Regards,
Edwin


On Wed, 20 Feb 2019 at 16:41,  wrote:

> If the second step is executed first, then you will get the unwanted 4 
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Mittwoch, 20. Februar 2019 09:29
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi Jörn ,
>
> Do you mean the regex is not correct?
>
> We are already using two RegexReplaceProcessorFactory steps, like the one
> shown below. The output that we get is still the same.
>
> 
>  content
>  ([ \t]*\r?\n){2,}
>  brbr
>  true
> 
>
> 
>  content
>  ([ \t]*\r?\n){1,}
>  br
>  true
> 
>
> Regards,
> Edwin
>
> On Wed, 20 Feb 2019 at 16:03, Jörn Franke  wrote:
>
> > Then you need two regexprocessfactory steps
> >
> > > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >:
> > >
> > > Hi,
> > >
> > > Thanks for the reply.
> > >
> > > Do you know of any regex online tool that works correctly for Java
> regex?
> > > I tried to find some, but they are not working properly.
> > >
> > > Yes, our plan is to replace more than one \n with , and single
> \n
> > > with single .
> > >
> > > Regards,
> > > Edwin
> > >
> > >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke 
> wrote:
> > >>
> > >> Solr uses Java regex matching, so i doubt there is a bug - it would
> then
> > >> be in the JDK. Try out in a regex online Tool that supports Java regex
> > for
> > >> your solution.
> > >>
> > >> I believe you want to have 2 regex process factories:
> > >> One that deals with single \n and one that deals with more than one \n
> > >>
> > >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > >>> :
> > >>>
> > >>> Hi,
> > >>>
> > >>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
> > >>> configuration:
> > >>>
> > >>> 
> > >>>  content
> > >>>  ([ \t]*\r?\n){2,}
> > >>>  brbr
> > >>>  true
> > >>> 
> > >>>
> > >>> However, the issue is still occurring.
> > >>>
> > >>> Anyone else is able to help?
> > >>>
> > >>> Regards,
> > >>> Edwin
> > >>>
> > >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hi,
> > 
> >  For your info, this issue is occurring in Solr 7.7.0 as well.
> > 
> >  Regards,
> >  Edwin
> > 
> >  On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > >>>
> >  wrote:
> > 
> > > Hi,
> > >
> > > Should we report this as a bug in Solr?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > >>>
> > > wrote:
> > >
> > >> Hi Paul,
> > >>
> > >> Regarding the regex (\n\s*){2,} that we are using, when we try in
> on
> > >> https://regex101.com/, it is able to give us the correct result
> for
> > >> all
> > >> the examples (ie: All of them will only have , and not
> more
> > >> than
> > >> that like what we are getting in Solr in our earlier examples).
> > >>
> > >> Could there be a possibility of a bug in Solr?
> > >>
> > >> Regards,
> > >> Edwin
> > >>
> > >> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> > >> edwinye...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Paul,
> > >>>
> > >>> We have tried it with the space preceeding the \n i.e.  > >>> name="pattern">(\s*\n){2,}, with the following regex
> pattern:
> > >>>
> > >>> 
> > >>>  content
> > >>>  (\s*\n){2,}
> > >>>  brbr
> > >>> 
> > >>>
> > >>> However, we are also getting the exact same results as the
> earlier
> > >>> Example 1, 2 and 3.
> > >>>
> > >>> As for your point 2 on perhaps in the data you have other (non
> > >>> printing) characters than \n, we have find that there are no non
> > >> printing
> > >>> characters. It is just next line with a space. You can refer to
> the
> > >>> original content in the same examples below.
> > >>>
> > >>>
> > >>> Example 1: The sentence that the above regex pattern is working
> > >>> correctly
> > >>> *Original content in EML file:*
> > >>> Dear Sir,
> > >>>
> > >>>
> > >>> I am terminating
> > >>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> > >>> *Index content: *Dear Sir,  I am terminating
> > >>>
> > >>> Example 2: The sentence that the above regex pattern is partially
> 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread paul.dodd
If the second step is executed first, then you will get the unwanted 4 



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Mittwoch, 20. Februar 2019 09:29
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Jörn ,

Do you mean the regex is not correct?

We are already using two RegexReplaceProcessorFactory steps, like the one
shown below. The output that we get is still the same.


 content
 ([ \t]*\r?\n){2,}
 brbr
 true



 content
 ([ \t]*\r?\n){1,}
 br
 true


Regards,
Edwin

On Wed, 20 Feb 2019 at 16:03, Jörn Franke  wrote:

> Then you need two regexprocessfactory steps
>
> > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > Thanks for the reply.
> >
> > Do you know of any regex online tool that works correctly for Java regex?
> > I tried to find some, but they are not working properly.
> >
> > Yes, our plan is to replace more than one \n with , and single \n
> > with single .
> >
> > Regards,
> > Edwin
> >
> >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:
> >>
> >> Solr uses Java regex matching, so i doubt there is a bug - it would then
> >> be in the JDK. Try out in a regex online Tool that supports Java regex
> for
> >> your solution.
> >>
> >> I believe you want to have 2 regex process factories:
> >> One that deals with single \n and one that deals with more than one \n
> >>
> >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>> :
> >>>
> >>> Hi,
> >>>
> >>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
> >>> configuration:
> >>>
> >>> 
> >>>  content
> >>>  ([ \t]*\r?\n){2,}
> >>>  brbr
> >>>  true
> >>> 
> >>>
> >>> However, the issue is still occurring.
> >>>
> >>> Anyone else is able to help?
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
>  Hi,
> 
>  For your info, this issue is occurring in Solr 7.7.0 as well.
> 
>  Regards,
>  Edwin
> 
>  On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
>  wrote:
> 
> > Hi,
> >
> > Should we report this as a bug in Solr?
> >
> > Regards,
> > Edwin
> >
> > On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
> > wrote:
> >
> >> Hi Paul,
> >>
> >> Regarding the regex (\n\s*){2,} that we are using, when we try in on
> >> https://regex101.com/, it is able to give us the correct result for
> >> all
> >> the examples (ie: All of them will only have , and not more
> >> than
> >> that like what we are getting in Solr in our earlier examples).
> >>
> >> Could there be a possibility of a bug in Solr?
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> wrote:
> >>
> >>> Hi Paul,
> >>>
> >>> We have tried it with the space preceeding the \n i.e.  >>> name="pattern">(\s*\n){2,}, with the following regex pattern:
> >>>
> >>> 
> >>>  content
> >>>  (\s*\n){2,}
> >>>  brbr
> >>> 
> >>>
> >>> However, we are also getting the exact same results as the earlier
> >>> Example 1, 2 and 3.
> >>>
> >>> As for your point 2 on perhaps in the data you have other (non
> >>> printing) characters than \n, we have find that there are no non
> >> printing
> >>> characters. It is just next line with a space. You can refer to the
> >>> original content in the same examples below.
> >>>
> >>>
> >>> Example 1: The sentence that the above regex pattern is working
> >>> correctly
> >>> *Original content in EML file:*
> >>> Dear Sir,
> >>>
> >>>
> >>> I am terminating
> >>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> >>> *Index content: *Dear Sir,  I am terminating
> >>>
> >>> Example 2: The sentence that the above regex pattern is partially
> >>> working (as you can see, instead of 2 , there are 4 )
> >>> *Original content in EML file:*
> >>>
> >>> *exalted*
> >>>
> >>> *Psalm 89:17*
> >>>
> >>>
> >>> 3 Choa Chu Kang Avenue 4
> >>> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
> >>> Choa Chu Kang Avenue 4, Singapore
> >>> *Index content: *exalted  Psalm 89:17 3
> >>> Choa Chu Kang Avenue 4, Singapore
> >>>
> >>> Example 3: The sentence that the above regex pattern is partially
> >>> working (as you can see, instead of 2 , there are 4 )
> >>> *Original content in EML file:*
> >>>
> >>> http://www.concordpri.moe.edu.sg/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> 

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Zheng Lin Edwin Yeo
Hi Jörn ,

Do you mean the regex is not correct?

We are already using two RegexReplaceProcessorFactory steps, like the one
shown below. The output that we get is still the same.


 content
 ([ \t]*\r?\n){2,}
 brbr
 true



 content
 ([ \t]*\r?\n){1,}
 br
 true


Regards,
Edwin

On Wed, 20 Feb 2019 at 16:03, Jörn Franke  wrote:

> Then you need two regexprocessfactory steps
>
> > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > Thanks for the reply.
> >
> > Do you know of any regex online tool that works correctly for Java regex?
> > I tried to find some, but they are not working properly.
> >
> > Yes, our plan is to replace more than one \n with , and single \n
> > with single .
> >
> > Regards,
> > Edwin
> >
> >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:
> >>
> >> Solr uses Java regex matching, so i doubt there is a bug - it would then
> >> be in the JDK. Try out in a regex online Tool that supports Java regex
> for
> >> your solution.
> >>
> >> I believe you want to have 2 regex process factories:
> >> One that deals with single \n and one that deals with more than one \n
> >>
> >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>> :
> >>>
> >>> Hi,
> >>>
> >>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
> >>> configuration:
> >>>
> >>> 
> >>>  content
> >>>  ([ \t]*\r?\n){2,}
> >>>  brbr
> >>>  true
> >>> 
> >>>
> >>> However, the issue is still occurring.
> >>>
> >>> Anyone else is able to help?
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
>  Hi,
> 
>  For your info, this issue is occurring in Solr 7.7.0 as well.
> 
>  Regards,
>  Edwin
> 
>  On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
>  wrote:
> 
> > Hi,
> >
> > Should we report this as a bug in Solr?
> >
> > Regards,
> > Edwin
> >
> > On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
> > wrote:
> >
> >> Hi Paul,
> >>
> >> Regarding the regex (\n\s*){2,} that we are using, when we try in on
> >> https://regex101.com/, it is able to give us the correct result for
> >> all
> >> the examples (ie: All of them will only have , and not more
> >> than
> >> that like what we are getting in Solr in our earlier examples).
> >>
> >> Could there be a possibility of a bug in Solr?
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> wrote:
> >>
> >>> Hi Paul,
> >>>
> >>> We have tried it with the space preceeding the \n i.e.  >>> name="pattern">(\s*\n){2,}, with the following regex pattern:
> >>>
> >>> 
> >>>  content
> >>>  (\s*\n){2,}
> >>>  brbr
> >>> 
> >>>
> >>> However, we are also getting the exact same results as the earlier
> >>> Example 1, 2 and 3.
> >>>
> >>> As for your point 2 on perhaps in the data you have other (non
> >>> printing) characters than \n, we have find that there are no non
> >> printing
> >>> characters. It is just next line with a space. You can refer to the
> >>> original content in the same examples below.
> >>>
> >>>
> >>> Example 1: The sentence that the above regex pattern is working
> >>> correctly
> >>> *Original content in EML file:*
> >>> Dear Sir,
> >>>
> >>>
> >>> I am terminating
> >>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> >>> *Index content: *Dear Sir,  I am terminating
> >>>
> >>> Example 2: The sentence that the above regex pattern is partially
> >>> working (as you can see, instead of 2 , there are 4 )
> >>> *Original content in EML file:*
> >>>
> >>> *exalted*
> >>>
> >>> *Psalm 89:17*
> >>>
> >>>
> >>> 3 Choa Chu Kang Avenue 4
> >>> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
> >>> Choa Chu Kang Avenue 4, Singapore
> >>> *Index content: *exalted  Psalm 89:17 3
> >>> Choa Chu Kang Avenue 4, Singapore
> >>>
> >>> Example 3: The sentence that the above regex pattern is partially
> >>> working (as you can see, instead of 2 , there are 4 )
> >>> *Original content in EML file:*
> >>>
> >>> http://www.concordpri.moe.edu.sg/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Dec 18, 2018 at 10:07 AM
> >>> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n
>  \n\n
> >> \n
> >>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue,
> >> Dec 18,
> >>> 2018 at 10:07 AM
> >>> *Index content: *http://www.concordpri.moe.edu.sg/   
> >>> On Tue, Dec 18, 2018 at 10:07 AM
> >>>
> >>>
> >>> 

Re: Solr Auto correct user query

2019-02-20 Thread Rohan Kasat
Can you share your config file and use case ?
Its difficult to guess how you have configured the component.

Regards,
Rohan Kasat

On Wed, Feb 20, 2019 at 12:21 AM Prasad_sarada 
wrote:

> Hi,
> I want to implement solr auto correct feature, i have tried doing the spell
> check one but not getting satisfying result. it's showing the top
> suggestion
> but not giving the result of the correct word.
> ex:if i am searching for "procesor" then i should get the result of
> "processor" coz the second one is the correct word.
>
> Please help me doing this
>
> Thanks,
> Sarada Prasad
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 

*Regards,Rohan Kasat*


Solr Auto correct user query

2019-02-20 Thread Prasad_sarada
Hi,
I want to implement solr auto correct feature, i have tried doing the spell
check one but not getting satisfying result. it's showing the top suggestion
but not giving the result of the correct word.
ex:if i am searching for "procesor" then i should get the result of
"processor" coz the second one is the correct word.

Please help me doing this

Thanks,
Sarada Prasad



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Zheng Lin Edwin Yeo
Hi Paul,

I am using Java 1.8.0_201.

Regards,
Edwin

On Wed, 20 Feb 2019 at 16:01,  wrote:

> BTW, which Java Version are you using?
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Mittwoch, 20. Februar 2019 08:13
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi,
>
> Thanks for the reply.
>
> Do you know of any regex online tool that works correctly for Java regex?
> I tried to find some, but they are not working properly.
>
> Yes, our plan is to replace more than one \n with , and single \n
> with single .
>
> Regards,
> Edwin
>
> On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:
>
> > Solr uses Java regex matching, so i doubt there is a bug - it would then
> > be in the JDK. Try out in a regex online Tool that supports Java regex
> for
> > your solution.
> >
> > I believe you want to have 2 regex process factories:
> > One that deals with single \n and one that deals with more than one \n
> >
> > > Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >:
> > >
> > > Hi,
> > >
> > > We have tried with the following pattern ([ \t]*\r?\n){2,} and
> > > configuration:
> > >
> > > 
> > >   content
> > >   ([ \t]*\r?\n){2,}
> > >   brbr
> > >   true
> > > 
> > >
> > > However, the issue is still occurring.
> > >
> > > Anyone else is able to help?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> For your info, this issue is occurring in Solr 7.7.0 as well.
> > >>
> > >> Regards,
> > >> Edwin
> > >>
> > >> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Should we report this as a bug in Solr?
> > >>>
> > >>> Regards,
> > >>> Edwin
> > >>>
> > >>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > >>> wrote:
> > >>>
> >  Hi Paul,
> > 
> >  Regarding the regex (\n\s*){2,} that we are using, when we try in on
> >  https://regex101.com/, it is able to give us the correct result for
> > all
> >  the examples (ie: All of them will only have , and not more
> > than
> >  that like what we are getting in Solr in our earlier examples).
> > 
> >  Could there be a possibility of a bug in Solr?
> > 
> >  Regards,
> >  Edwin
> > 
> >  On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> >  wrote:
> > 
> > > Hi Paul,
> > >
> > > We have tried it with the space preceeding the \n i.e.  > > name="pattern">(\s*\n){2,}, with the following regex pattern:
> > >
> > > 
> > >   content
> > >   (\s*\n){2,}
> > >   brbr
> > > 
> > >
> > > However, we are also getting the exact same results as the earlier
> > > Example 1, 2 and 3.
> > >
> > > As for your point 2 on perhaps in the data you have other (non
> > > printing) characters than \n, we have find that there are no non
> > printing
> > > characters. It is just next line with a space. You can refer to the
> > > original content in the same examples below.
> > >
> > >
> > > Example 1: The sentence that the above regex pattern is working
> > > correctly
> > > *Original content in EML file:*
> > > Dear Sir,
> > >
> > >
> > > I am terminating
> > > *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> > > *Index content: *Dear Sir,  I am terminating
> > >
> > > Example 2: The sentence that the above regex pattern is partially
> > > working (as you can see, instead of 2 , there are 4 )
> > > *Original content in EML file:*
> > >
> > > *exalted*
> > >
> > > *Psalm 89:17*
> > >
> > >
> > > 3 Choa Chu Kang Avenue 4
> > > *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
> > > Choa Chu Kang Avenue 4, Singapore
> > > *Index content: *exalted  Psalm 89:17 3
> > > Choa Chu Kang Avenue 4, Singapore
> > >
> > > Example 3: The sentence that the above regex pattern is partially
> > > working (as you can see, instead of 2 , there are 4 )
> > > *Original content in EML file:*
> > >
> > > http://www.concordpri.moe.edu.sg/
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 18, 2018 at 10:07 AM
> > > *Original content:* http://www.concordpri.moe.edu.sg/   \n\n
>  \n\n
> > \n
> > > \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue,
> > Dec 18,
> > > 2018 at 10:07 AM
> > > *Index content: *http://www.concordpri.moe.edu.sg/   
> > > On Tue, Dec 18, 2018 at 10:07 AM
> > >
> > >
> > > Appreciate any other ideas or 

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Jörn Franke
Then you need two regexprocessfactory steps 

> Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> Thanks for the reply.
> 
> Do you know of any regex online tool that works correctly for Java regex?
> I tried to find some, but they are not working properly.
> 
> Yes, our plan is to replace more than one \n with , and single \n
> with single .
> 
> Regards,
> Edwin
> 
>> On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:
>> 
>> Solr uses Java regex matching, so i doubt there is a bug - it would then
>> be in the JDK. Try out in a regex online Tool that supports Java regex for
>> your solution.
>> 
>> I believe you want to have 2 regex process factories:
>> One that deals with single \n and one that deals with more than one \n
>> 
>>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo >> :
>>> 
>>> Hi,
>>> 
>>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
>>> configuration:
>>> 
>>> 
>>>  content
>>>  ([ \t]*\r?\n){2,}
>>>  brbr
>>>  true
>>> 
>>> 
>>> However, the issue is still occurring.
>>> 
>>> Anyone else is able to help?
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo 
>>> wrote:
>>> 
 Hi,
 
 For your info, this issue is occurring in Solr 7.7.0 as well.
 
 Regards,
 Edwin
 
 On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo >> 
 wrote:
 
> Hi,
> 
> Should we report this as a bug in Solr?
> 
> Regards,
> Edwin
> 
> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo >> 
> wrote:
> 
>> Hi Paul,
>> 
>> Regarding the regex (\n\s*){2,} that we are using, when we try in on
>> https://regex101.com/, it is able to give us the correct result for
>> all
>> the examples (ie: All of them will only have , and not more
>> than
>> that like what we are getting in Solr in our earlier examples).
>> 
>> Could there be a possibility of a bug in Solr?
>> 
>> Regards,
>> Edwin
>> 
>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> wrote:
>> 
>>> Hi Paul,
>>> 
>>> We have tried it with the space preceeding the \n i.e. >> name="pattern">(\s*\n){2,}, with the following regex pattern:
>>> 
>>> 
>>>  content
>>>  (\s*\n){2,}
>>>  brbr
>>> 
>>> 
>>> However, we are also getting the exact same results as the earlier
>>> Example 1, 2 and 3.
>>> 
>>> As for your point 2 on perhaps in the data you have other (non
>>> printing) characters than \n, we have find that there are no non
>> printing
>>> characters. It is just next line with a space. You can refer to the
>>> original content in the same examples below.
>>> 
>>> 
>>> Example 1: The sentence that the above regex pattern is working
>>> correctly
>>> *Original content in EML file:*
>>> Dear Sir,
>>> 
>>> 
>>> I am terminating
>>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
>>> *Index content: *Dear Sir,  I am terminating
>>> 
>>> Example 2: The sentence that the above regex pattern is partially
>>> working (as you can see, instead of 2 , there are 4 )
>>> *Original content in EML file:*
>>> 
>>> *exalted*
>>> 
>>> *Psalm 89:17*
>>> 
>>> 
>>> 3 Choa Chu Kang Avenue 4
>>> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
>>> Choa Chu Kang Avenue 4, Singapore
>>> *Index content: *exalted  Psalm 89:17 3
>>> Choa Chu Kang Avenue 4, Singapore
>>> 
>>> Example 3: The sentence that the above regex pattern is partially
>>> working (as you can see, instead of 2 , there are 4 )
>>> *Original content in EML file:*
>>> 
>>> http://www.concordpri.moe.edu.sg/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Dec 18, 2018 at 10:07 AM
>>> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n
>> \n
>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue,
>> Dec 18,
>>> 2018 at 10:07 AM
>>> *Index content: *http://www.concordpri.moe.edu.sg/   
>>> On Tue, Dec 18, 2018 at 10:07 AM
>>> 
>>> 
>>> Appreciate any other ideas or suggestions that you may have.
>>> 
>>> Thank you.
>>> 
>>> Regards,
>>> Edwin
>>> 
 On Thu, 7 Feb 2019 at 22:49,  wrote:
 
 Hi Edwin
 
 
 
 1.  Sorry, the pattern was wrong, the space should preceed the \n
 i.e. (\s*\n){2,}
 2.  Perhaps in the data you have other (non printing) characters
 than \n?
 
 
 
 Gesendet von Mail
>> für
 Windows 10
 
 
 
 Von: Zheng Lin Edwin Yeo
 Gesendet: Donnerstag, 7. Februar 2019 15:23
 An: 

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Jörn Franke
Maybe they work properly and the regex is not as expected? 

> Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> Thanks for the reply.
> 
> Do you know of any regex online tool that works correctly for Java regex?
> I tried to find some, but they are not working properly.
> 
> Yes, our plan is to replace more than one \n with , and single \n
> with single .
> 
> Regards,
> Edwin
> 
>> On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:
>> 
>> Solr uses Java regex matching, so i doubt there is a bug - it would then
>> be in the JDK. Try out in a regex online Tool that supports Java regex for
>> your solution.
>> 
>> I believe you want to have 2 regex process factories:
>> One that deals with single \n and one that deals with more than one \n
>> 
>>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo >> :
>>> 
>>> Hi,
>>> 
>>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
>>> configuration:
>>> 
>>> 
>>>  content
>>>  ([ \t]*\r?\n){2,}
>>>  brbr
>>>  true
>>> 
>>> 
>>> However, the issue is still occurring.
>>> 
>>> Anyone else is able to help?
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo 
>>> wrote:
>>> 
 Hi,
 
 For your info, this issue is occurring in Solr 7.7.0 as well.
 
 Regards,
 Edwin
 
 On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo >> 
 wrote:
 
> Hi,
> 
> Should we report this as a bug in Solr?
> 
> Regards,
> Edwin
> 
> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo >> 
> wrote:
> 
>> Hi Paul,
>> 
>> Regarding the regex (\n\s*){2,} that we are using, when we try in on
>> https://regex101.com/, it is able to give us the correct result for
>> all
>> the examples (ie: All of them will only have , and not more
>> than
>> that like what we are getting in Solr in our earlier examples).
>> 
>> Could there be a possibility of a bug in Solr?
>> 
>> Regards,
>> Edwin
>> 
>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> wrote:
>> 
>>> Hi Paul,
>>> 
>>> We have tried it with the space preceeding the \n i.e. >> name="pattern">(\s*\n){2,}, with the following regex pattern:
>>> 
>>> 
>>>  content
>>>  (\s*\n){2,}
>>>  brbr
>>> 
>>> 
>>> However, we are also getting the exact same results as the earlier
>>> Example 1, 2 and 3.
>>> 
>>> As for your point 2 on perhaps in the data you have other (non
>>> printing) characters than \n, we have find that there are no non
>> printing
>>> characters. It is just next line with a space. You can refer to the
>>> original content in the same examples below.
>>> 
>>> 
>>> Example 1: The sentence that the above regex pattern is working
>>> correctly
>>> *Original content in EML file:*
>>> Dear Sir,
>>> 
>>> 
>>> I am terminating
>>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
>>> *Index content: *Dear Sir,  I am terminating
>>> 
>>> Example 2: The sentence that the above regex pattern is partially
>>> working (as you can see, instead of 2 , there are 4 )
>>> *Original content in EML file:*
>>> 
>>> *exalted*
>>> 
>>> *Psalm 89:17*
>>> 
>>> 
>>> 3 Choa Chu Kang Avenue 4
>>> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
>>> Choa Chu Kang Avenue 4, Singapore
>>> *Index content: *exalted  Psalm 89:17 3
>>> Choa Chu Kang Avenue 4, Singapore
>>> 
>>> Example 3: The sentence that the above regex pattern is partially
>>> working (as you can see, instead of 2 , there are 4 )
>>> *Original content in EML file:*
>>> 
>>> http://www.concordpri.moe.edu.sg/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Dec 18, 2018 at 10:07 AM
>>> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n
>> \n
>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue,
>> Dec 18,
>>> 2018 at 10:07 AM
>>> *Index content: *http://www.concordpri.moe.edu.sg/   
>>> On Tue, Dec 18, 2018 at 10:07 AM
>>> 
>>> 
>>> Appreciate any other ideas or suggestions that you may have.
>>> 
>>> Thank you.
>>> 
>>> Regards,
>>> Edwin
>>> 
 On Thu, 7 Feb 2019 at 22:49,  wrote:
 
 Hi Edwin
 
 
 
 1.  Sorry, the pattern was wrong, the space should preceed the \n
 i.e. (\s*\n){2,}
 2.  Perhaps in the data you have other (non printing) characters
 than \n?
 
 
 
 Gesendet von Mail
>> für
 Windows 10
 
 
 
 Von: Zheng Lin Edwin Yeo
 Gesendet: Donnerstag, 7. Februar 2019 15:23
 An: 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread paul.dodd
BTW, which Java Version are you using?



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Mittwoch, 20. Februar 2019 08:13
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi,

Thanks for the reply.

Do you know of any regex online tool that works correctly for Java regex?
I tried to find some, but they are not working properly.

Yes, our plan is to replace more than one \n with , and single \n
with single .

Regards,
Edwin

On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:

> Solr uses Java regex matching, so i doubt there is a bug - it would then
> be in the JDK. Try out in a regex online Tool that supports Java regex for
> your solution.
>
> I believe you want to have 2 regex process factories:
> One that deals with single \n and one that deals with more than one \n
>
> > Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > We have tried with the following pattern ([ \t]*\r?\n){2,} and
> > configuration:
> >
> > 
> >   content
> >   ([ \t]*\r?\n){2,}
> >   brbr
> >   true
> > 
> >
> > However, the issue is still occurring.
> >
> > Anyone else is able to help?
> >
> > Regards,
> > Edwin
> >
> > On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi,
> >>
> >> For your info, this issue is occurring in Solr 7.7.0 as well.
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Should we report this as a bug in Solr?
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo  >
> >>> wrote:
> >>>
>  Hi Paul,
> 
>  Regarding the regex (\n\s*){2,} that we are using, when we try in on
>  https://regex101.com/, it is able to give us the correct result for
> all
>  the examples (ie: All of them will only have , and not more
> than
>  that like what we are getting in Solr in our earlier examples).
> 
>  Could there be a possibility of a bug in Solr?
> 
>  Regards,
>  Edwin
> 
>  On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
>  wrote:
> 
> > Hi Paul,
> >
> > We have tried it with the space preceeding the \n i.e.  > name="pattern">(\s*\n){2,}, with the following regex pattern:
> >
> > 
> >   content
> >   (\s*\n){2,}
> >   brbr
> > 
> >
> > However, we are also getting the exact same results as the earlier
> > Example 1, 2 and 3.
> >
> > As for your point 2 on perhaps in the data you have other (non
> > printing) characters than \n, we have find that there are no non
> printing
> > characters. It is just next line with a space. You can refer to the
> > original content in the same examples below.
> >
> >
> > Example 1: The sentence that the above regex pattern is working
> > correctly
> > *Original content in EML file:*
> > Dear Sir,
> >
> >
> > I am terminating
> > *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> > *Index content: *Dear Sir,  I am terminating
> >
> > Example 2: The sentence that the above regex pattern is partially
> > working (as you can see, instead of 2 , there are 4 )
> > *Original content in EML file:*
> >
> > *exalted*
> >
> > *Psalm 89:17*
> >
> >
> > 3 Choa Chu Kang Avenue 4
> > *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
> > Choa Chu Kang Avenue 4, Singapore
> > *Index content: *exalted  Psalm 89:17 3
> > Choa Chu Kang Avenue 4, Singapore
> >
> > Example 3: The sentence that the above regex pattern is partially
> > working (as you can see, instead of 2 , there are 4 )
> > *Original content in EML file:*
> >
> > http://www.concordpri.moe.edu.sg/
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Dec 18, 2018 at 10:07 AM
> > *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n
> \n
> > \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue,
> Dec 18,
> > 2018 at 10:07 AM
> > *Index content: *http://www.concordpri.moe.edu.sg/   
> > On Tue, Dec 18, 2018 at 10:07 AM
> >
> >
> > Appreciate any other ideas or suggestions that you may have.
> >
> > Thank you.
> >
> > Regards,
> > Edwin
> >
> >> On Thu, 7 Feb 2019 at 22:49,  wrote:
> >>
> >> Hi Edwin
> >>
> >>
> >>
> >>  1.  Sorry, the pattern was wrong, the space should preceed the \n
> >> i.e. (\s*\n){2,}
> >>  2.  Perhaps in the data you have other (non printing) characters
> >> than \n?
> >>
> >>
> >>
> >> Gesendet von Mail
> für
> >> Windows 10
> >>