StandardTokenizerFactory doesn't split on underscore

2021-01-07 Thread Rahul Goswami
Hello,
So recently I was debugging a problem on Solr 7.7.2 where the query wasn't
returning the desired results. Turned out that the indexed terms had
underscore separated terms, but the query didn't. I was under the
impression that terms separated by underscore are also tokenized by
StandardTokenizerFactory, but turns out that's not the case. Eg:
'hello-world' would be tokenized into 'hello' and 'world', but
'hello_world' is treated as a single token.
Is this a bug or a designed behavior?

If this is by design, it would be helpful if this behavior is included in
the documentation since it is similar to the behavior with periods.

https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer
"Periods (dots) that are not followed by whitespace are kept as part of the
token, including Internet domain names. "

Thanks,
Rahul


Solr query with space (only) gives error

2021-01-07 Thread vstuart
I have a frontend that uses Ajax to query Solr.

It's working well, but if I enter a single space (nothing else) in the
input/search box (the URL in the browser will show

... index.html#q=%20

In that circumstance I get a 400 error (as there are no parameters in the
request), which is fine, but my web page stalls, waiting for a response.

If, however, I enter a semicolon ( ; ) rather than a space, then the page
immediately refreshes, albeit with no results ("displaying 0 to 0 of 0").
Also fine / expected.

My question is what is triggering the " " (%20) query fault in Solr, and how
do I address (ideally, ignore) it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr query with space (only) gives error

2021-01-07 Thread vstuart
I have a frontend that uses Ajax to query Solr.

It's working well, but if I enter a single space (nothing else) in the
input/search box (the URL in the browser will show

... index.html#q=%20

In that circumstance I get a 400 error (as there are no parameters in the
request), which is fine, but my web page stalls, waiting for a response.

If, however, I enter a semicolon ( ; ) rather than a space, then the page
immediately refreshes, albeit with no results ("displaying 0 to 0 of 0").
Also fine / expected.

My question is what is triggering the " " (%20) query fault in Solr, and how
do I address (ideally, ignore) it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: "Failed to reserve shared memory."

2021-01-07 Thread TK Solr

I added these lines to solr.in.sh and restarted Solr:

 GC_TUNE=('-XX:+UseG1GC' \
   '-XX:+PerfDisableSharedMem' \
   '-XX:+ParallelRefProcEnabled' \
   '-XX:MaxGCPauseMillis=250' \
   '-XX:+AlwaysPreTouch' \
   '-XX:+ExplicitGCInvokesConcurrent')

According to the Admin UI, -XX:+UseLargePage is gone, which is good but all 
other -XX:* except -XX:+UseG1GC are also gone.


What is the correct way to remove just -XX:UseLargePage ?

TK

On 1/6/21 3:42 PM, TK Solr wrote:
My client is having a sudden death syndrome of Solr 8.3.1. Solr stops 
responding suddenly and they have to restart Solr.
(It is not clear if the Solr/jetty process was dead or alive but not 
responding. The OOM log isn't found.)


In the Solr start up log, these three error messages were found:

OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1)
OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)

I am wondering if anyone has seen these errors.


I found this article

https://stackoverflow.com/questions/45968433/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-er 



which suggests removal of the JVM option -XX:+UseLargePage, which is added by 
bin/solr script if GC_TUNE is not defined. Would that be a good idea? I'm not 
quite sure what kind of variable GC_TUNE is. It is used as in:


  if [ -z ${GC_TUNE+x} ]; then
...

    '-XX:+AlwaysPreTouch')
  else
    GC_TUNE=($GC_TUNE)
  fi

I'm not familiar with *${*GC_TUNES*+x}* and*($*GC_TUNE*)* syntax. Is this a 
special kind of environmental variable?



TK






Re: The x: prefix for the core name and 'custom.vm' errors in Admin UI's Logging tab

2021-01-07 Thread TK Solr
Please disregard my previous post. I understand these are actual error messages, 
not the errors of handling Admin UI.


I think this server is being attacked using the vulnerability described here

https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability

Fortunately the attack isn't succeeding because of SOLR-13971 fix, and instead 
it is causing these errors. I'll fortify the Solr access.


On 1/7/21 11:02 AM, TK Solr wrote:
On the Admin UI's login screen, when the Logging tab is clicked, I see lines 
like:


Time(Local)  Level  Core Logger    Message
1/7/2021 ERROR  x:mycore loader    ResourceManager: 
unable to find resource 'custom.vm' in any resource loader.

8:41:46 AM   false
    1/7/2021 ERROR x:mycore    HttpSolrCall 
null:java.io.IOException: Unable to find resource 'custom.vm'

8:41:46 AM   false



If I click on the info icon (circled "i"), this is displayed.

null:java.io.IOException: Unable to find resource 'custom.vm'
at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)

at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)

...

Are these errors from the Admin UI code itself? Does the Admin UI use 
Velocity? (I thought it might be a library path issue but I don't see 
'custom.vm' anywhere in the Solr source code.)



What does "x:" prefix to the core name mean?
What does "false" under the log level mean?

The Solr I'm using is 8.3.1 using openJDK 11 on Ubuntu 18.04.3.

TK





Re: Converting a collection name to an alias

2021-01-07 Thread Mike Drob
I believe you may be able to use that command (or some combination of
create alias commands) to create an alias from A to A, and then in
the future when you want to change it you can have Alias A to collection B
(assuming this is the point of the alias in the first place).

On Thu, Jan 7, 2021 at 1:53 PM ufuk yılmaz 
wrote:

> Hi,
> I’m aware of that API but it doesn’t do what I actually want.
>
> regards
>
> Sent from Mail for Windows 10
>
> From: matthew sporleder
> Sent: 07 January 2021 22:46
> To: solr-user@lucene.apache.org
> Subject: Re: Converting a collection name to an alias
>
> https://lucene.apache.org/solr/guide/8_1/collections-api.html#rename
>
> On Thu, Jan 7, 2021 at 2:07 PM ufuk yılmaz 
> wrote:
> >
> > Hi again,
> >
> > Lets say I have a collection named A.
> > I’m trying to rename it to A_1, then create an alias named A, which
> points to the A_1 collection.
> > Is this possible without deleting and reindexing the collection from
> scratch?
> >
> > Regards,
> > uyilmaz
> >
>
>


RE: Converting a collection name to an alias

2021-01-07 Thread ufuk yılmaz
Hi,
I’m aware of that API but it doesn’t do what I actually want.

regards

Sent from Mail for Windows 10

From: matthew sporleder
Sent: 07 January 2021 22:46
To: solr-user@lucene.apache.org
Subject: Re: Converting a collection name to an alias

https://lucene.apache.org/solr/guide/8_1/collections-api.html#rename

On Thu, Jan 7, 2021 at 2:07 PM ufuk yılmaz  wrote:
>
> Hi again,
>
> Lets say I have a collection named A.
> I’m trying to rename it to A_1, then create an alias named A, which points to 
> the A_1 collection.
> Is this possible without deleting and reindexing the collection from scratch?
>
> Regards,
> uyilmaz
>



Re: Converting a collection name to an alias

2021-01-07 Thread matthew sporleder
https://lucene.apache.org/solr/guide/8_1/collections-api.html#rename

On Thu, Jan 7, 2021 at 2:07 PM ufuk yılmaz  wrote:
>
> Hi again,
>
> Lets say I have a collection named A.
> I’m trying to rename it to A_1, then create an alias named A, which points to 
> the A_1 collection.
> Is this possible without deleting and reindexing the collection from scratch?
>
> Regards,
> uyilmaz
>


Converting a collection name to an alias

2021-01-07 Thread ufuk yılmaz
Hi again,

Lets say I have a collection named A.
I’m trying to rename it to A_1, then create an alias named A, which points to 
the A_1 collection.
Is this possible without deleting and reindexing the collection from scratch?

Regards,
uyilmaz



The x: prefix for the core name and 'custom.vm' errors in Admin UI's Logging tab

2021-01-07 Thread TK Solr

On the Admin UI's login screen, when the Logging tab is clicked, I see lines 
like:

Time(Local)  Level  Core    Logger    
Message
1/7/2021 ERROR  x:mycoreloader
ResourceManager: unable to find resource 'custom.vm' in any resource loader.
8:41:46 AM   false

1/7/2021 ERROR  x:mycoreHttpSolrCall  null:java.io.IOException: Unable to find resource 'custom.vm'

8:41:46 AM   false



If I click on the info icon (circled "i"), this is displayed.

null:java.io.IOException: Unable to find resource 'custom.vm'
at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
...

Are these errors from the Admin UI code itself? Does the Admin UI use Velocity? 
(I thought it might be a library path issue but I don't see 'custom.vm' 
anywhere in the Solr source code.)


What does "x:" prefix to the core name mean?
What does "false" under the log level mean?

The Solr I'm using is 8.3.1 using openJDK 11 on Ubuntu 18.04.3.

TK




Re: Sending compressed (gzip) UpdateRequest with SolrJ

2021-01-07 Thread matthew sporleder
jetty supports http gzip and I've added it to solr before in my own
installs (and submitted patches to do so by default to solr) but I
don't know about the handling for solrj.

IME compression helps a little, sometimes a lot, and never hurts.
Even the admin interface benefits a lot from regular old http gzip

On Thu, Jan 7, 2021 at 8:03 AM Gael Jourdan-Weil
 wrote:
>
> Answering to myself on this one.
>
> Solr uses Jetty 9.x which does not support compressed requests by itself 
> meaning, the application behind Jetty (that is Solr) has to decompress by 
> itself which is not the case for now.
> Thus even without using SolrJ, sending XML compressed in GZIP to Solr (with 
> cURL for instance) is not possible for now.
>
> Seems quite surprising to me though.
>
> -
>
> Hello,
>
> I was wondering if someone ever had the need to send compressed (gzip) update 
> requests (adding/deleting documents), especially using SolrJ.
>
> Somehow I expected it to be done by default, but didn't find any 
> documentation about it and when looking at the code it seems there is no 
> option to do it. Or is javabin compressed by default?
> - 
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/BinaryRequestWriter.java#L49
> - 
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/request/RequestWriter.java#L55
>  (if not using Javabin)
> - 
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L587
>
> By the way, is there any documentation about javabin? I could only find one 
> on the "old wiki".
>
> Thanks,
> Gaël


Interpreting Solr indexing times

2021-01-07 Thread ufuk yılmaz
Hello all,

I have been looking at our SolrCloud indexing performance statistics and trying 
to make sense of the numbers. We are using a custom Flume sink and sending 
updates to Solr (8.4) using SolrJ.

I know these stuff depend on a lot of things but can you tell me if these 
statistics are horribly bad (which means something is going obviously wrong), 
or something expectable from a Solr cluster under right circumstances?

We are sending documents in batches of 1000.

{
  "UPDATE./update.distrib.requestTimes": {
"count": 7579,
"meanRate": 0.044953336300254124,
"1minRate": 0.2855655259375961,
"5minRate": 0.29214637836736357,
"15minRate": 0.29510868125823914,
"min_ms": 5.854106,
"max_ms": 56854.784017,
"mean_ms": 3100.877968690649,
"median_ms": 1084.258683,
"stddev_ms": 4643.097311691323,
"p75_ms": 2407.196867,
"p95_ms": 15509.748909,
"p99_ms": 16206.134345,
"p999_ms": 16206.134345
  },
  "UPDATE./update.local.totalTime": 0,
  "UPDATE./update.requestTimes": {
"count": 7579,
"meanRate": 0.044953336230621366,
"1minRate": 0.2855655259375961,
"5minRate": 0.29214637836736357,
"15minRate": 0.29510868125823914,
"min_ms": 5.857796,
"max_ms": 56854.792298,
"mean_ms": 3100.885675292589,
"median_ms": 1084.264825,
"stddev_ms": 4643.097457508117,
"p75_ms": 2407.201642,
"p95_ms": 15509.755934,
"p99_ms": 16206.141754,
"p999_ms": 16206.141754
  },
  "UPDATE./update.requests": 7580,
  "UPDATE./update.totalTime": 33520426747162,
  "UPDATE.update.totalTime": 0,
  "UPDATE.updateHandler.adds": 854,
  "UPDATE.updateHandler.autoCommitMaxTime": "15000ms",
  "UPDATE.updateHandler.autoCommits": 2428,
  "UPDATE.updateHandler.softAutoCommitMaxTime":"1ms",
  "UPDATE.updateHandler.softAutoCommits":3380,
  "UPDATE.updateHandler.commits": {
"count": 5777,
"meanRate": 0.034265134931240636,
"1minRate": 0.13653886429826526,
"5minRate": 0.12997330621941325,
"15minRate": 0.12634106125326003
  },
  "UPDATE.updateHandler.cumulativeAdds": {
"count": 2578492,
"meanRate": 15.293816240408821,
"1minRate": 90.7054223213904,
"5minRate": 99.48315440730897,
"15minRate": 101.77967003607128
  },
}


Sent from Mail for Windows 10



RE: Sending compressed (gzip) UpdateRequest with SolrJ

2021-01-07 Thread Gael Jourdan-Weil
Answering to myself on this one.

Solr uses Jetty 9.x which does not support compressed requests by itself 
meaning, the application behind Jetty (that is Solr) has to decompress by 
itself which is not the case for now.
Thus even without using SolrJ, sending XML compressed in GZIP to Solr (with 
cURL for instance) is not possible for now.

Seems quite surprising to me though.

-
 
Hello,

I was wondering if someone ever had the need to send compressed (gzip) update 
requests (adding/deleting documents), especially using SolrJ.

Somehow I expected it to be done by default, but didn't find any documentation 
about it and when looking at the code it seems there is no option to do it. Or 
is javabin compressed by default?
- 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/BinaryRequestWriter.java#L49
- 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/request/RequestWriter.java#L55
 (if not using Javabin)
- 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L587

By the way, is there any documentation about javabin? I could only find one on 
the "old wiki".

Thanks,
Gaël

Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-07 Thread Flowerday, Matthew J
Hi There

 

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped
the database and re-indexed (as this would take too long to run on site).

 

On my local windows machine I have a single solr server 7.7.1 installation

 

I upgraded in the following manner

 

*   Installed windows solr 8.7.0 on my machine in a different folder
*   Copied the core related folder (holding conf, data, lib,
core.properties) from 7.7.1 to the new 8.7.0 folder
*   Brought up the solr
*   Checked that queries work through the Solr Admin Tool and our
application

 

This all worked fine until I tried to update a record which had been created
under 7.7.1. Instead of marking the old record as deleted it effectively
created a new copy of the record with the change in and left the old image
as still visible. When I updated the record again it then correctly updated
the new 8.7.0 version without leaving the old image behind. If I created a
new record and then updated it the solr record would be updated correctly.
The issue only seemed to affect the old 7.7.1 created records.

 

An example of the duplication as follows (the first record is 7.7.1 created
version and the second record is the 8.7.0 version after carrying out an
update):

 

{

  "responseHeader":{

"status":0,

"QTime":4,

"params":{

  "q":"id:9901020319M01-N26",

  "_":"1610016003669"}},

  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[

  {

"id":"9901020319M01-N26",

"groupId":"9901020319M01",

"urn":"N26",

"specification":"nominal",

"owningGroupId":"9901020319M01",

"description":"N26, Yates, Mike, Alan, Richard, MALE",

"group_t":"9901020319M01",

"nominalUrn_t":"N26",

"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",

"dateTimeCreated_dt":"2020-12-30T12:00:53Z",

"title_t":"Captain",

"surname_t":"Yates",

"qualifier_t":"Voyager",

"forename1_t":"Mike",

"forename2_t":"Alan",

"forename3_t":"Richard",

"sex_t":"MALE",

"orderedType_t":"Nominal",

"_version_":1687507566832123904},

  {

"id":"9901020319M01-N26",

"groupId":"9901020319M01",

"urn":"N26",

"specification":"nominal",

"owningGroupId":"9901020319M01",

"description":"N26, Yates, Mike, Alan, Richard, MALE",

"group_t":"9901020319M01",

"nominalUrn_t":"N26",

"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",

"dateTimeCreated_dt":"2020-12-30T12:00:53Z",

"title_t":"Captain",

"surname_t":"Yates",

"qualifier_t":"Voyager enterprise defiant yorktown xx yy",

"forename1_t":"Mike",

"forename2_t":"Alan",

"forename3_t":"Richard",

"sex_t":"MALE",

"orderedType_t":"Nominal",

"_version_":1688224966566215680}]

  }}

 

I checked the solrconfig.xml file and it does have a uniqueKey set up

 

  

  

id

 

I was wondering if this behaviour is expected and if there is a way to make
sure that records created under a previous version are updated correctly (so
that the old data is deleted when updated).

 

Also am I upgrading solr correctly as it could be that the way I have
upgraded it might be causing this issue (I tried hunting through the solr
documentation online but struggled to find window upgrade notes and the
above steps I worked out by trial and error).

 

Many thanks

 

Matthew

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

  

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 

  
 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Solr background merge in case of pull replicas

2021-01-07 Thread Abhishek Mishra
Hi Kshitij

What I can guess over here. Pull replicas replicate segments from tlog, so
whenever merge happens on tlog it will decrease the number of segments
which is more than ideal case(i.e. adding a new segment). Afaik
adding/deleting the segment is kind of a stop the world moment. This can be
the reason for the increase in response time.

Regards,
Abhishek

On Thu, Jan 7, 2021 at 12:43 PM kshitij tyagi 
wrote:

> Hi,
>
> I am not querying on tlog replicas, solr version is 8.6 and 2 tlogs and 4
> pull replica setup.
>
> why should pull replicas be affected during background segment merges?
>
> Regards,
> kshitij
>
> On Wed, Jan 6, 2021 at 9:48 PM Ritvik Sharma 
> wrote:
>
> > Hi
> > It may be the cause of rebalancing and querying is not available not on
> > tlog at that moment.
> > You can check tlog logs and pull log when u are facing this issue.
> >
> > May i know which version of solr you are using? and what is the ration of
> > tlog and pull nodes.
> >
> > On Wed, 6 Jan 2021 at 2:46 PM, kshitij tyagi 
> > wrote:
> >
> > > Hi,
> > >
> > > I am having a  tlog + pull replica solr cloud setup.
> > >
> > > 1. I am observing that whenever background segment merge is triggered
> > > automatically, i see high response time on all of my solr nodes.
> > >
> > > As far as I know merges must be happening on tlog and hence the
> increase
> > > response time, i am not able to understand that why my pull replicas
> are
> > > affected during background index merges.
> > >
> > > Can someone give some insights on this? What is affecting my pull
> > replicas
> > > during index merges?
> > >
> > > Regards,
> > > kshitij
> > >
> >
>


Re: How pull replica works

2021-01-07 Thread Abhishek Mishra
Thanks, Tomas. It was really helpful.
Regards,
Abhishek

On Thu, Jan 7, 2021 at 7:03 AM Tomás Fernández Löbbe 
wrote:

> Hi Abhishek,
> The pull replicas uses the "/replication" endpoint to copy full segment
> files (sections of the index) from the leader. It works in a similar way to
> the legacy leader/follower replication. This[1] talk tries to explain the
> different replica types and how they work.
>
> HTH,
>
> Tomás
>
> [1] https://www.youtube.com/watch?v=C8C9GRTCSzY
>
> On Tue, Jan 5, 2021 at 10:29 PM Abhishek Mishra 
> wrote:
>
> > I want to know how pull replica replicate from leader in real? Does
> > internally admin API get data from the leader in form of batches?
> >
> > Regards,
> > Abhishek
> >
>