Re: Sample JWT Solr configuration

2019-09-19 Thread Tyrone
Of course the secret key is just for my local development solr instance  



> On Sep 19, 2019, at 10:35 PM, Dave  wrote:
> 
> I know this has nothing to do with the issue at hand but if you have a public 
> facing solr instance you have much bigger issues.  
> 
>> On Sep 19, 2019, at 10:16 PM, Tyrone Tse  wrote:
>> 
>> I finally got JWT Authentication working on Solr 8.1.1.
>> This is my security.json file contents
>> {
>>  "authentication":{
>> "class":"solr.JWTAuthPlugin",
>> "jwk":{
>>"kty":"oct",
>>"use":"sig",
>>"kid":"k1",
>> 
>> "k":"xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ",
>>"alg":"HS256"
>> },
>> "aud":"Solr"
>>  },
>>  "authorization":{
>> "class":"solr.RuleBasedAuthorizationPlugin",
>> "permissions":[
>>{
>>   "name":"open_select",
>>   "path":"/select/*",
>>   "role":null
>>},
>>{
>>   "name":"all-admin",
>>   "collection":null,
>>   "path":"/*",
>>   "role":"admin"
>>},
>>{
>>   "name":"update",
>>   "role":"solr-update"
>>}
>> ],
>> "user-role":{
>>"admin":"solr-update"
>> }
>>  }
>> }
>> 
>> I used the web site to generate the JWK key.
>> 
>> So I am using the "k" value from the JWK to sign the JWT token.
>> 
>> Initially, I used website
>> https://jwt.io/#debugger-io?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.rqMpVpTSbNUHDA7VLSYUpv4ebeMjvwQMD6hwMDpvcBQ
>> 
>> to generate the JWT and sign it with the value
>> xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ
>> 
>> The header is
>> {
>> "alg": "HS256",
>> "typ": "JWT"
>> }
>> 
>> and the payload is
>> 
>> {
>> "sub": "admin",
>> "aud": "Solr",
>> "exp": 9916239022
>> }
>> 
>> This generates the JWT key of
>> eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6IlNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9._H1qeNvlpIOn3X9IpDG0QiRWnEDXITMhZm1NMfuocSc
>> 
>> So when I use this JWT token generated https://jwt.io/  JWT authentication
>> is working, and I can authenticate as the user admin and Post data to the
>> Solr collections/cores.
>> 
>> Now we have decided to get the JWT token generated using Java before we
>> authenticate as the user admin to Post data to Solr, and to have a
>> calculated expiration date
>> 
>> Here is the Java Snippet for generating the JWT token
>> 
>> import io.jsonwebtoken.Jwts;
>> import io.jsonwebtoken.SignatureAlgorithm;
>> ...
>> ...
>>   String
>> key="xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ";
>>   Calendar cal =Calendar.getInstance();
>>   Date issueAt = cal.getTime();
>>   cal.add(Calendar.MINUTE,60);
>>   Date expDate = cal.getTime();
>>   String jws = Jwts.builder().
>>   setSubject("admin")
>>   .setAudience("Solr")
>>   .setExpiration(expDate)
>>   .signWith(SignatureAlgorithm.HS256,key).compact();
>>   System.out.println(jws);
>> 
>> This does not generate a valid JWT token, when I use it I am getting the
>> error message
>> 
>> 
>> 
>>   
>>   Error 401 Signature invalid
>> 
>> 
>> 
>>   HTTP ERROR 401
>>   Problem accessing /solr/stores/update. Reason:
>>Signature invalid
>>   
>> 
>> 
>> 
>> 
>> I tried generating the JWT token using JavaScript from this codepen
>> https://codepen.io/tyrone-tse/pen/MWgzExB
>> 
>> and it too generates an invalid JWT key.
>> 
>> How come it works when the JWT is generated from
>> https://jwt.io/#debugger-io?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.rqMpVpTSbNUHDA7VLSYUpv4ebeMjvwQMD6hwMDpvcBQ
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Sat, Sep 14, 2019 at 9:06 AM Jan Høydahl  wrote:
>>> 
>>> See answer in other thread. JWT works for 8.1 or later, don’t attempt it
>>> in 7.x.
>>> 
>>> You could try to turn on debug logging for or.apache.solr.security to get
>>> more logging.
>>> 
>>> Jan Høydahl
>>> 
 13. sep. 2019 kl. 00:24 skrev Tyrone Tse :
 
 Jan
 
 I 

Re: Sample JWT Solr configuration

2019-09-19 Thread Dave
I know this has nothing to do with the issue at hand but if you have a public 
facing solr instance you have much bigger issues.  

> On Sep 19, 2019, at 10:16 PM, Tyrone Tse  wrote:
> 
> I finally got JWT Authentication working on Solr 8.1.1.
> This is my security.json file contents
> {
>   "authentication":{
>  "class":"solr.JWTAuthPlugin",
>  "jwk":{
> "kty":"oct",
> "use":"sig",
> "kid":"k1",
> 
> "k":"xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ",
> "alg":"HS256"
>  },
>  "aud":"Solr"
>   },
>   "authorization":{
>  "class":"solr.RuleBasedAuthorizationPlugin",
>  "permissions":[
> {
>"name":"open_select",
>"path":"/select/*",
>"role":null
> },
> {
>"name":"all-admin",
>"collection":null,
>"path":"/*",
>"role":"admin"
> },
> {
>"name":"update",
>"role":"solr-update"
> }
>  ],
>  "user-role":{
> "admin":"solr-update"
>  }
>   }
> }
> 
> I used the web site to generate the JWK key.
> 
> So I am using the "k" value from the JWK to sign the JWT token.
> 
> Initially, I used website
> https://jwt.io/#debugger-io?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.rqMpVpTSbNUHDA7VLSYUpv4ebeMjvwQMD6hwMDpvcBQ
> 
> to generate the JWT and sign it with the value
> xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ
> 
> The header is
> {
>  "alg": "HS256",
>  "typ": "JWT"
> }
> 
> and the payload is
> 
> {
>  "sub": "admin",
>  "aud": "Solr",
>  "exp": 9916239022
> }
> 
> This generates the JWT key of
> eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6IlNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9._H1qeNvlpIOn3X9IpDG0QiRWnEDXITMhZm1NMfuocSc
> 
> So when I use this JWT token generated https://jwt.io/  JWT authentication
> is working, and I can authenticate as the user admin and Post data to the
> Solr collections/cores.
> 
> Now we have decided to get the JWT token generated using Java before we
> authenticate as the user admin to Post data to Solr, and to have a
> calculated expiration date
> 
> Here is the Java Snippet for generating the JWT token
> 
> import io.jsonwebtoken.Jwts;
> import io.jsonwebtoken.SignatureAlgorithm;
> ...
> ...
>String
> key="xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ";
>Calendar cal =Calendar.getInstance();
>Date issueAt = cal.getTime();
>cal.add(Calendar.MINUTE,60);
>Date expDate = cal.getTime();
>String jws = Jwts.builder().
>setSubject("admin")
>.setAudience("Solr")
>.setExpiration(expDate)
>.signWith(SignatureAlgorithm.HS256,key).compact();
>System.out.println(jws);
> 
> This does not generate a valid JWT token, when I use it I am getting the
> error message
> 
> 
> 
>
>Error 401 Signature invalid
> 
> 
> 
>HTTP ERROR 401
>Problem accessing /solr/stores/update. Reason:
> Signature invalid
>
> 
> 
> 
> 
> I tried generating the JWT token using JavaScript from this codepen
> https://codepen.io/tyrone-tse/pen/MWgzExB
> 
> and it too generates an invalid JWT key.
> 
> How come it works when the JWT is generated from
> https://jwt.io/#debugger-io?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.rqMpVpTSbNUHDA7VLSYUpv4ebeMjvwQMD6hwMDpvcBQ
> 
> 
> 
> 
> 
> 
> 
>> On Sat, Sep 14, 2019 at 9:06 AM Jan Høydahl  wrote:
>> 
>> See answer in other thread. JWT works for 8.1 or later, don’t attempt it
>> in 7.x.
>> 
>> You could try to turn on debug logging for or.apache.solr.security to get
>> more logging.
>> 
>> Jan Høydahl
>> 
>>> 13. sep. 2019 kl. 00:24 skrev Tyrone Tse :
>>> 
>>> Jan
>>> 
>>> I tried using the JWT Plugin https://github.com/cominvent/solr-auth-jwt
>>> 
>>> If my security.json file is
>>> 
>>> {
>>> "authentication": {
>>>   "class":"com.cominvent.solr.JWTAuthPlugin",
>>>   "jwk" : {
>>>  

How do I index PDF and Word Doc

2019-09-19 Thread PasLe Choix
I am on Solr 7.7, according to the official document:
https://lucene.apache.org/solr/guide/7_7/solr-tutorial.html
Although it is mentioned Post Tool can index a directory of files, and can
handle HTML, PDF, Office formats like Word, however no example working
command is given.

./bin/post -c localDocs ~/DocumentsError:Problem accessing
/solr/books/update. Reason:
Not Found

or if I directly upload a pdf as Document through Admin GUI, I will get
Unsupported ContentType: application/pdf Not in: [application/xml,
application/csv, application/json, text/json, text/csv, text/xml,
application/javabin]

Can anyone please share the correct way to index on pdf/doc/docx, etc.?
through both Admin GUI and command line.

Thank you very much.


Pasle Choix


Re: Sample JWT Solr configuration

2019-09-19 Thread Tyrone Tse
I finally got JWT Authentication working on Solr 8.1.1.
This is my security.json file contents
{
   "authentication":{
  "class":"solr.JWTAuthPlugin",
  "jwk":{
 "kty":"oct",
 "use":"sig",
 "kid":"k1",

 
"k":"xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ",
 "alg":"HS256"
  },
  "aud":"Solr"
   },
   "authorization":{
  "class":"solr.RuleBasedAuthorizationPlugin",
  "permissions":[
 {
"name":"open_select",
"path":"/select/*",
"role":null
 },
 {
"name":"all-admin",
"collection":null,
"path":"/*",
"role":"admin"
 },
 {
"name":"update",
"role":"solr-update"
 }
  ],
  "user-role":{
 "admin":"solr-update"
  }
   }
}

I used the web site to generate the JWK key.

So I am using the "k" value from the JWK to sign the JWT token.

Initially, I used website
https://jwt.io/#debugger-io?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.rqMpVpTSbNUHDA7VLSYUpv4ebeMjvwQMD6hwMDpvcBQ

to generate the JWT and sign it with the value
xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ

The header is
{
  "alg": "HS256",
  "typ": "JWT"
}

and the payload is

{
  "sub": "admin",
  "aud": "Solr",
  "exp": 9916239022
}

This generates the JWT key of
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6IlNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9._H1qeNvlpIOn3X9IpDG0QiRWnEDXITMhZm1NMfuocSc

So when I use this JWT token generated https://jwt.io/  JWT authentication
is working, and I can authenticate as the user admin and Post data to the
Solr collections/cores.

Now we have decided to get the JWT token generated using Java before we
authenticate as the user admin to Post data to Solr, and to have a
calculated expiration date

Here is the Java Snippet for generating the JWT token

import io.jsonwebtoken.Jwts;
import io.jsonwebtoken.SignatureAlgorithm;
...
...
String
key="xbQNocUhLJKSmGi0Qp_4hAVfls9CWH5WoTrw543WTXi5H6G-AXFlHRaTKWoGZtLKAD9jn6-MFC49jvR3bJI2L_H9a3yeRgd3tMkhxcR7ABsnhFz2WutN7NSZHiAxCJzTxR8YsgzMM9SXjvp6H1xpNWALdi67YIogKFTLiUIRDtdp3xBJxMP9IQlSYxK4ov81lt4hpAhSdkfpeczgRGd2xxrMbN38uDqtoIXSPRX-7d3pf1YvlyzWKHudTz30sjM6R2h-RRDBOp-SK_tDq4vjG72DyqFYt7BRyzSzrxGl-Ku5yURr21u6vep6suWeJ2_fmA8hgd304e60DBKZoFebxQ";
Calendar cal =Calendar.getInstance();
Date issueAt = cal.getTime();
cal.add(Calendar.MINUTE,60);
Date expDate = cal.getTime();
String jws = Jwts.builder().
setSubject("admin")
.setAudience("Solr")
.setExpiration(expDate)
.signWith(SignatureAlgorithm.HS256,key).compact();
System.out.println(jws);

This does not generate a valid JWT token, when I use it I am getting the
error message




Error 401 Signature invalid



HTTP ERROR 401
Problem accessing /solr/stores/update. Reason:
 Signature invalid





I tried generating the JWT token using JavaScript from this codepen
https://codepen.io/tyrone-tse/pen/MWgzExB

and it too generates an invalid JWT key.

How come it works when the JWT is generated from
https://jwt.io/#debugger-io?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.rqMpVpTSbNUHDA7VLSYUpv4ebeMjvwQMD6hwMDpvcBQ







On Sat, Sep 14, 2019 at 9:06 AM Jan Høydahl  wrote:

> See answer in other thread. JWT works for 8.1 or later, don’t attempt it
> in 7.x.
>
> You could try to turn on debug logging for or.apache.solr.security to get
> more logging.
>
> Jan Høydahl
>
> > 13. sep. 2019 kl. 00:24 skrev Tyrone Tse :
> >
> > Jan
> >
> > I tried using the JWT Plugin https://github.com/cominvent/solr-auth-jwt
> >
> > If my security.json file is
> >
> > {
> >  "authentication": {
> >"class":"com.cominvent.solr.JWTAuthPlugin",
> >"jwk" : {
> >"kty": "oct",
> >"use": "sig",
> >"kid": "solr",
> >"k":
> >
> 

Re: fq * vs [* TO *]

2019-09-19 Thread Vincenzo D'Amore
Hi Shawn, Mikhail,

thanks for the feedback. Really appreciate it.

Best regards,
Vincenzo

On Thu, Sep 19, 2019 at 3:55 PM Shawn Heisey  wrote:

> On 9/19/2019 1:23 AM, Vincenzo D'Amore wrote:
> > talking about how to write solr queries I was investigating if there is a
> > difference of performance in these two filter queries: field:[* TO *]  or
> > field:*
> >
> > In other words:
> >
> > q=*:*=field:[* TO *]=0
> >
> > q=*:*=field:*=0
>
> The first one is a range query, the second is a wildcard query.
>
> Ordinarily we strongly recommend against wildcard queries for selecting
> all documents where a field exists (has a value).  For the general case,
> a range query will be faster.
>
> If the field's cardinality is VERY low, a wildcard query can be fast,
> and might even be faster than the range query ... but if the field has
> ten million possible values (terms) in the index, the query that Solr
> constructs from a wildcard will quite literally contain all ten million
> of those values, and it will be VERY slow.
>
> Thanks,
> Shawn
>


-- 
Vincenzo D'Amore


Re: DIH: Create Child Documents in ScriptTransformer

2019-09-19 Thread Jörn Franke
Hi,

thanks for all the feedback.
The context parameter in the ScriptTransformer is new to me - thanks for
this insight. I could not find it in any docs. So just for people that also
did not know it:
you can have the ScriptTransformer with 2 parameters, e.g.
function mytransformer(row,context){

}

The following Javadoc gives some hints on what you can do with the context:
https://lucene.apache.org/solr/8_2_0/solr-dataimporthandler/org/apache/solr/handler/dataimport/Context.html

Despite all this, I came to the conclusion that adding child docs in a
ScriptTransformer in DIH are not supported.

One can though use a StatelessScriptUpdateProcessFactory, see
https://lucene.apache.org/solr/8_2_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

and

https://cwiki.apache.org/confluence/display/solr/ScriptUpdateProcessor#ScriptUpdateProcessor-JavaScript

Hint on how to add child documents to a SolrInputDocument:
http://lucene.apache.org/solr/8_2_0/solr-solrj/index.html?org/apache/solr/common/SolrInputDocument.html


Nevertheless, I agree that one should use an external tool, which depending
on the needs can though also mean some complexity (e.g. supporting
individual transformations per collection without code, but
configuration/plugins etc.). While this is not a problem, it might be good
to start an open source loader that goes beyond the post tool (
https://lucene.apache.org/solr/guide/8_1/post-tool.html).

best regards

On Thu, Sep 19, 2019 at 8:54 AM Mikhail Khludnev  wrote:

> Hello, Jörn.
> Have you tried to find a parent doc in the context which is passed as a
> second argument into ScriptTransformer?
>
> On Wed, Sep 18, 2019 at 9:56 PM Jörn Franke  wrote:
> >
> > Hi,
> >
> > I load a set of documents. Based on these documents some logic needs to
> be
> > applied to split them into chapters (this is done). One whole document is
> > loaded as a parent. Chapters of the whole document + metadata should be
> > loaded as child documents of this parent.
> > I want to now collect information on how this can be done:
> > * Use a custom loader - this is possible and works
> > * Use DIH and extract the chapters in a ScriptTransformer and add them as
> > child documents there. However, the scripttransformer receives as input
> > only a HashMap and while it works to transform field values etc. It does
> > not seem possible to add childdocuments within the DIH scripttransformer.
> I
> > tried adding a JavaArray with SolrInputDocuments, but this does not seem
> to
> > work. I see in debug/verbose mode that indeed the transformer adds them
> to
> > the HashMap correctly, but they don't end up in the document. Maybe here
> it
> > could be possible somehow via nested entities?
> > * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument
> > as a parameter and it seems feasible to extract chapters and add them as
> > child documents.
> >
> > thank you.
> >
> > best regards
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Reloading after creating a collection

2019-09-19 Thread Shawn Heisey

On 9/19/2019 12:09 PM, Arnold Bronley wrote:

I am not changing the the config to enable CDCR. I am just using the CDCR
API to start it. Does that count as changing configuration?


I would guess that there are no changes to the config from using the 
API, but I also admit that I have never tried CDCR.  It would strike me 
as very unusual to have that change the config, but I didn't write CDCR.


That might be something we should treat as a bug.  Before opening an 
issue in Jira, would you be able to come up with precise steps and a 
minimal config to reproduce?


Thanks,
Shawn


Re: Reloading after creating a collection

2019-09-19 Thread Arnold Bronley
Hi,
I am not changing the the config to enable CDCR. I am just using the CDCR
API to start it. Does that count as changing configuration?

On Thu, Sep 19, 2019 at 12:20 PM Shawn Heisey  wrote:

> On 9/19/2019 9:36 AM, Arnold Bronley wrote:
> > Why is it that I need to reload collection after I created it? CDCR runs
> > into issues if I do not do this.
>
> If the config doesn't change after creation, I would not expect that to
> be required.
>
> If you do change the config to enable CDCR after the collection is
> created, then you have to reload so that Solr sees the new config.
>
> Thanks,
> Shawn
>


Re: Custom update processor not kicking in

2019-09-19 Thread Rahul Goswami
Eric,
The 200 million docs are all large as they are content indexed. Also it
would be hard to convince the customer to rebuild their index. But more
than that, I also want to clear my understanding on this topic and know if
it’s an expected behaviour for a distributed update processor to not call
any further custom processors other than the run update processor in
standalone mode? Alternatively, is there a way I can get a handle on a
complete document once it’s reconstructed from an atomic update?

Thanks,
Rahul

On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson 
wrote:

> _Why_ is reindexing not an option? 200M doc isn't that many.
> Since you have Atomic updates working, you could easily
> write a little program that pulled the docs from you existing
> collection and pushed them to a new one with the new schema.
>
> Do use CursorMark if you try that You have to be ready to
> reindex as time passes, either to upgrade to a major version
> 2 greater than what you're using now or because the requirements
> change yet again.
>
> Best,
> Erick
>
> On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami 
> wrote:
> >
> > Eric, Markus,
> > Thank you for your inputs. I made sure that the jar file is found
> correctly
> > since the core reloads fine and also prints the log lines from my
> processor
> > during update request (getInstane() method of the update factory). The
> > reason why I want to insert the processor between distributed update
> > processor (DUP) and run update processor (RUP) is because there are
> certain
> > fields which were indexed against a dynamic field “*” and later the
> schema
> > was patched to remove the * field, causing atomic updates to fail for
> such
> > documents. Reindexing is not option since the index has nearly 200
> million
> > docs. My understanding is that the atomic updates are stitched back to a
> > complete document in the DUP before being reindexed by RUP. Hence if I am
> > able to access the document before being indexed and check for fields
> which
> > are not defined in the schema, I can remove them from the stitched back
> > document so that the atomic update can happen successfully for such docs.
> > The documentation below mentions that even if I don’t include the DUP in
> my
> > chain it is automatically inserted just before RUP.
> >
> >
> https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain
> >
> >
> > I tried both approaches viz. explicitly specifying my processor after DUP
> > in the chain and also tried using the “post-processor” option in the
> chain,
> > to have the custom processor execute after DUP. Still looks like the
> > processor is just short circuited. I have defined my logic in the
> > processAdd() of the  processor. Is this an expected behavior?
> >
> > Regards,
> > Rahul
> >
> >
> > On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson 
> > wrote:
> >
> > > It Depends (tm). This is a little confused. Why do you have
> > > distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> > > distribute updates so that seems odd. Do try switching it around and
> > > putting it on top, this should be OK since distributed is irrelevant.
> > >
> > > You can also just set a breakpoint and see for instance, the
> > > instructions in the "IntelliJ" section here:
> > > https://cwiki.apache.org/confluence/display/solr/HowToContribute
> > >
> > > One thing I'd do is make very, very sure that my jar file was being
> > > found. IIRC, the -v startup option will log exactly where solr looks
> > > for jar files. Be sure your custom jar is in one of them and is picked
> > > up. I've set a lib directive to one place only to discover that
> > > there's an old copy lying around someplace else
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> > >  wrote:
> > > >
> > > > Hello Rahul,
> > > >
> > > > I don't know why you don't see your logs lines, but if i remember
> > > correctly, you must put all custom processors above Log, Distributed
> and
> > > Run, at least i remember i read it somewhere a long time ago.
> > > >
> > > > We put all our custom processors on top of the three default
> processors
> > > and they run just fine.
> > > >
> > > > Try it.
> > > >
> > > > Regards,
> > > > Markus
> > > >
> > > > -Original message-
> > > > > From:Rahul Goswami 
> > > > > Sent: Wednesday 18th September 2019 22:20
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Custom update processor not kicking in
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am using solr 7.2.1 in a standalone mode. I created a custom
> update
> > > > > request processor and placed it between the distributed processor
> and
> > > run
> > > > > update processor in my chain. I made sure the chain is invoked
> since I
> > > see
> > > > > log lines from the getInstance() method of my processor factory.
> But I
> > > > > don’t see any log lines from the processAdd() method.
> > > > >
> > > > > 

Re: Solr behaves wonky when zookeeper quorom is messed up.

2019-09-19 Thread harjagsbby
If ZK loses quorum, Solr goes read-only.  That's how it's designed to
work.  I don't know of any workaround for that.

That makes sense. CPU spiking in solr is because solr's index calls are
holding threads because zookeeper is down as per solr?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr behaves wonky when zookeeper quorom is messed up.

2019-09-19 Thread Shawn Heisey

On 9/19/2019 9:22 AM, harjagsbby wrote:

In our PROD SOLR cluster(7.6 and ZK:3.4.9) when Zookeeper leader fails
Zookeeper enter an infinite leader election loop which makes SOLR instable.
Solr Fails to index(as Expected with error "Remote error message: Cannot
talk to ZooKeeper - Updates are disabled") and CPU spikes up.

This is a know ZK bug https://issues.apache.org/jira/browse/ZOOKEEPER-2164.
below are my question:

1. Solr behaving Wonky when ZK quorom is affected is expected but is there a
work around?


If ZK loses quorum, Solr goes read-only.  That's how it's designed to 
work.  I don't know of any workaround for that.



2. The ZK bug is fixed in 3.5.6  is solr7.6 compatible with ZK 3.5.6?


Generally speaking, yes, Solr will work with ZK 3.5.x.  But the ZK 
status parts of the admin UI will not work right.  That problem is fixed 
in Solr 8.3, which hasn't been released yet.


https://issues.apache.org/jira/browse/SOLR-13672


3. Anyone have an opinion around "fast leader election" vs "original
UDP-based" of ZK. Will cahnge election algorithm version from 3->0 solve the
issue?


You would have to ask the ZooKeeper mailing list that question.

Thanks,
Shawn


Re: Reloading after creating a collection

2019-09-19 Thread Shawn Heisey

On 9/19/2019 9:36 AM, Arnold Bronley wrote:

Why is it that I need to reload collection after I created it? CDCR runs
into issues if I do not do this.


If the config doesn't change after creation, I would not expect that to 
be required.


If you do change the config to enable CDCR after the collection is 
created, then you have to reload so that Solr sees the new config.


Thanks,
Shawn


Solr behaves wonky when zookeeper quorom is messed up.

2019-09-19 Thread harjagsbby
In our PROD SOLR cluster(7.6 and ZK:3.4.9) when Zookeeper leader fails
Zookeeper enter an infinite leader election loop which makes SOLR instable.
Solr Fails to index(as Expected with error "Remote error message: Cannot
talk to ZooKeeper - Updates are disabled") and CPU spikes up. 

This is a know ZK bug https://issues.apache.org/jira/browse/ZOOKEEPER-2164.
below are my question:

1. Solr behaving Wonky when ZK quorom is affected is expected but is there a
work around?
2. The ZK bug is fixed in 3.5.6  is solr7.6 compatible with ZK 3.5.6?
3. Anyone have an opinion around "fast leader election" vs "original
UDP-based" of ZK. Will cahnge election algorithm version from 3->0 solve the
issue?






--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Reloading after creating a collection

2019-09-19 Thread Arnold Bronley
Hi,

Why is it that I need to reload collection after I created it? CDCR runs
into issues if I do not do this.


Re: Custom auth plugin for SolrCloud

2019-09-19 Thread Shawn Heisey

On 9/19/2019 6:18 AM, Zubovich Yauheni wrote:

This class is wrapped into jar. Jar added to server lib directory and
defined at solrconfig.xml:




Where exactly is this "server lib" directory that you describe?  If it's 
one of the locations that gets loaded automatically, you should NOT be 
loading anything in that location with the  directive in 
solrconfig.xml.  That will result in the jar being loaded more than once.



null:org.apache.solr.common.SolrException: Error loading class
'com.custom.solr.core.RestrictDirectAccessPlugin'


This problem with a Java program can be caused by having the same jar 
loaded more than once.  The reason it happens is complex and has to do 
with interactions between multiple Java classloaders.


The solution to these problems is to make sure that every required jar 
is loaded, and that each of them is only loaded once.


Thanks,
Shawn


Re: fq * vs [* TO *]

2019-09-19 Thread Shawn Heisey

On 9/19/2019 1:23 AM, Vincenzo D'Amore wrote:

talking about how to write solr queries I was investigating if there is a
difference of performance in these two filter queries: field:[* TO *]  or
field:*

In other words:

q=*:*=field:[* TO *]=0

q=*:*=field:*=0


The first one is a range query, the second is a wildcard query.

Ordinarily we strongly recommend against wildcard queries for selecting 
all documents where a field exists (has a value).  For the general case, 
a range query will be faster.


If the field's cardinality is VERY low, a wildcard query can be fast, 
and might even be faster than the range query ... but if the field has 
ten million possible values (terms) in the index, the query that Solr 
constructs from a wildcard will quite literally contain all ten million 
of those values, and it will be VERY slow.


Thanks,
Shawn


Re: fq * vs [* TO *]

2019-09-19 Thread Mikhail Khludnev
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L1234

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L1184


On Thu, Sep 19, 2019 at 1:10 PM Vincenzo D'Amore  wrote:

> Thanks Mikhail, could you please share the code paths you found?
>
> On Thu, Sep 19, 2019 at 10:57 AM Mikhail Khludnev  wrote:
>
> > Hello, Vincenzo.
> > I traced both code pathes, they are different. It's hard to predict the
> > difference between them. Probably some thorough microbenchmark can show
> two
> > times fold or so, but don't think it's significant for practical usage.
> >
> > On Thu, Sep 19, 2019 at 10:23 AM Vincenzo D'Amore 
> > wrote:
> >
> > > Hi all,
> > >
> > > talking about how to write solr queries I was investigating if there
> is a
> > > difference of performance in these two filter queries: field:[* TO *]
> or
> > > field:*
> > >
> > > In other words:
> > >
> > > q=*:*=field:[* TO *]=0
> > >
> > > q=*:*=field:*=0
> > >
> > > Could someone enlighten me?
> > >
> > > --
> > > Vincenzo D'Amore
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Custom auth plugin for SolrCloud

2019-09-19 Thread Zubovich Yauheni
Hi,

I have very very simple task - we need to protect access to Solr if request
doesn't have specific header. Solr 7.3 is running in cloud mode. Was
implemented custom auth plugin:

package com.custom.solr.core;

import org.apache.solr.security.AuthenticationPlugin;

public class RestrictDirectAccessPlugin extends AuthenticationPlugin {

private static final Logger LOGGER =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());

private static final String X_HEADER = "X-HEADER";

@Override
public void init(Map pluginConfig) {

}

@Override
public boolean doAuthenticate(ServletRequest request,
ServletResponse response, FilterChain filterChain) throws Exception {
HttpServletRequest wrappedRequest = (HttpServletRequest) request;
if ("true".equalsIgnoreCase(wrappedRequest.getHeader(X_HEADER)) {
return true;
}
return false;
}

@Override
public void close() throws IOException {

}

This class is wrapped into jar. Jar added to server lib directory and
defined at solrconfig.xml:



But, when I am trying to start Solr, I have following error:

2019-09-19 08:51:31.667 INFO  (main) [   ] o.a.s.c.CoreContainer
Initializing authentication plugin:
com.custom.solr.core.RestrictDirectAccessPlugin
2019-09-19 08:51:31.676 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
not start Solr. Check solr/home property and the logs
2019-09-19 08:51:31.695 ERROR (main) [   ] o.a.s.c.SolrCore
null:org.apache.solr.common.SolrException: Error loading class
'com.custom.solr.core.RestrictDirectAccessPlugin'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:557)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:626)
at
org.apache.solr.core.CoreContainer.initializeAuthenticationPlugin(CoreContainer.java:355)
at
org.apache.solr.core.CoreContainer.reloadSecurityProperties(CoreContainer.java:708)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:525)
at
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:263)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:183)
at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:741)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:348)
at
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1515)
at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1477)
at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:785)
at
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:502)
at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:150)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:453)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:564)
at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:239)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:133)
at org.eclipse.jetty.server.Server.start(Server.java:418)
at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:115)
at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:385)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1584)
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1508)
at 

Re:Solr query fetching different results

2019-09-19 Thread Ramsey Haddad (BLOOMBERG/ LONDON)
Your query seems simple enough that this may not be your issue, but just 
mentioning it:

Your collection has 1 shard. Depending on how the query is sent, queries to 1 
shard collections can sometimes get interpreted as a "distributed query" and 
sometimes as a "non-distributed query". These have different code paths that 
should *in theory* give identical results. When we made some code extensions to 
Solr in our private plugins, we decided not to support both code paths and so 
instead we use shortCircuit=false (we sent this in the config of our 
) to force use of the distributed query code path. (We want our 
change to work for both our 60 shard collection and our 1 shard collection.) 
This gives us more consistent results from different ways of invoking the 
search.

But, again, your query seems too simple for this to be the cause -- why would 
the distributed vs non-distributed return different results for this??

From: solr-user@lucene.apache.org At: 09/19/19 06:20:30To:  
solr-user@lucene.apache.org
Subject: Solr query fetching different results

Hi all,

There is something "strange' happening in our Solr cluster. If I execute a
query from the server, via solarium client, I get one result. If I execute
the same or similar query from admin Panel, I get another result. If I go
to Admin Panel  - Collections - Select Collection and click "Reload", and
then repeat the query, the result I get is consistent with  the one I get
from the server via solarium client. So I picked the query that is getting
executed, from Solr logs. Evidently, the query was going to different nodes.

Query that went from Admin Panel, went to node 4 and fetched 0 documents
2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
[c:paymetryproducts s:shard1 r:*core_node4*
x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
status=0 QTime=0


Query that went from solarium client running on a server, went to node 3
and fetched 4 documents

2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
[c:paymetryproducts s:shard1 r:*core_node3*
x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39=flat=true=I
D=0=90=json}
*hits=4* status=0 QTime=104

What could be causing this strange behaviour? How can I fix this?
SOlr Version - 7.3
Shard count: 1
replicationFactor: 2
maxShardsPerNode: 1

Regards,
Jayadevan




Re: Custom update processor not kicking in

2019-09-19 Thread Erick Erickson
_Why_ is reindexing not an option? 200M doc isn't that many.
Since you have Atomic updates working, you could easily
write a little program that pulled the docs from you existing
collection and pushed them to a new one with the new schema.

Do use CursorMark if you try that You have to be ready to
reindex as time passes, either to upgrade to a major version
2 greater than what you're using now or because the requirements
change yet again.

Best,
Erick

On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami  wrote:
>
> Eric, Markus,
> Thank you for your inputs. I made sure that the jar file is found correctly
> since the core reloads fine and also prints the log lines from my processor
> during update request (getInstane() method of the update factory). The
> reason why I want to insert the processor between distributed update
> processor (DUP) and run update processor (RUP) is because there are certain
> fields which were indexed against a dynamic field “*” and later the schema
> was patched to remove the * field, causing atomic updates to fail for such
> documents. Reindexing is not option since the index has nearly 200 million
> docs. My understanding is that the atomic updates are stitched back to a
> complete document in the DUP before being reindexed by RUP. Hence if I am
> able to access the document before being indexed and check for fields which
> are not defined in the schema, I can remove them from the stitched back
> document so that the atomic update can happen successfully for such docs.
> The documentation below mentions that even if I don’t include the DUP in my
> chain it is automatically inserted just before RUP.
>
> https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain
>
>
> I tried both approaches viz. explicitly specifying my processor after DUP
> in the chain and also tried using the “post-processor” option in the chain,
> to have the custom processor execute after DUP. Still looks like the
> processor is just short circuited. I have defined my logic in the
> processAdd() of the  processor. Is this an expected behavior?
>
> Regards,
> Rahul
>
>
> On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson 
> wrote:
>
> > It Depends (tm). This is a little confused. Why do you have
> > distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> > distribute updates so that seems odd. Do try switching it around and
> > putting it on top, this should be OK since distributed is irrelevant.
> >
> > You can also just set a breakpoint and see for instance, the
> > instructions in the "IntelliJ" section here:
> > https://cwiki.apache.org/confluence/display/solr/HowToContribute
> >
> > One thing I'd do is make very, very sure that my jar file was being
> > found. IIRC, the -v startup option will log exactly where solr looks
> > for jar files. Be sure your custom jar is in one of them and is picked
> > up. I've set a lib directive to one place only to discover that
> > there's an old copy lying around someplace else
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> >  wrote:
> > >
> > > Hello Rahul,
> > >
> > > I don't know why you don't see your logs lines, but if i remember
> > correctly, you must put all custom processors above Log, Distributed and
> > Run, at least i remember i read it somewhere a long time ago.
> > >
> > > We put all our custom processors on top of the three default processors
> > and they run just fine.
> > >
> > > Try it.
> > >
> > > Regards,
> > > Markus
> > >
> > > -Original message-
> > > > From:Rahul Goswami 
> > > > Sent: Wednesday 18th September 2019 22:20
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Custom update processor not kicking in
> > > >
> > > > Hello,
> > > >
> > > > I am using solr 7.2.1 in a standalone mode. I created a custom update
> > > > request processor and placed it between the distributed processor and
> > run
> > > > update processor in my chain. I made sure the chain is invoked since I
> > see
> > > > log lines from the getInstance() method of my processor factory. But I
> > > > don’t see any log lines from the processAdd() method.
> > > >
> > > > Any inputs on why the processor is getting skipped if placed after
> > > > distributed processor?
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> >


Re: Solr query fetching different results

2019-09-19 Thread Erick Erickson
Multiple replicas of the same shard will execute their autocommits at
different wall clock times.
Thus there may be a _temporary_ time when newly-indexed document is
found by a query that
happens to get served by replica1 but not by replica2. If you have a
timestamp in the doc, and
a soft commit interval of, say, 1 minute, you can test whether this is
the case by adding
=timestamp:[* TO NOW-2MINUTE]. In that case you should see identical returns.

Best,
Erick

On Thu, Sep 19, 2019 at 1:20 AM Jayadevan Maymala
 wrote:
>
> Hi all,
>
> There is something "strange' happening in our Solr cluster. If I execute a
> query from the server, via solarium client, I get one result. If I execute
> the same or similar query from admin Panel, I get another result. If I go
> to Admin Panel  - Collections - Select Collection and click "Reload", and
> then repeat the query, the result I get is consistent with  the one I get
> from the server via solarium client. So I picked the query that is getting
> executed, from Solr logs. Evidently, the query was going to different nodes.
>
> Query that went from Admin Panel, went to node 4 and fetched 0 documents
> 2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
> [c:paymetryproducts s:shard1 r:*core_node4*
> x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
> [paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
> params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
> status=0 QTime=0
>
>
> Query that went from solarium client running on a server, went to node 3
> and fetched 4 documents
>
> 2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
> [c:paymetryproducts s:shard1 r:*core_node3*
> x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
> [paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
> params={q=category_id:5a0aeaeea6bc7239cc21ee39=flat=true=ID=0=90=json}
> *hits=4* status=0 QTime=104
>
> What could be causing this strange behaviour? How can I fix this?
> SOlr Version - 7.3
> Shard count: 1
> replicationFactor: 2
> maxShardsPerNode: 1
>
> Regards,
> Jayadevan


Re: fq * vs [* TO *]

2019-09-19 Thread Vincenzo D'Amore
Thanks Mikhail, could you please share the code paths you found?

On Thu, Sep 19, 2019 at 10:57 AM Mikhail Khludnev  wrote:

> Hello, Vincenzo.
> I traced both code pathes, they are different. It's hard to predict the
> difference between them. Probably some thorough microbenchmark can show two
> times fold or so, but don't think it's significant for practical usage.
>
> On Thu, Sep 19, 2019 at 10:23 AM Vincenzo D'Amore 
> wrote:
>
> > Hi all,
> >
> > talking about how to write solr queries I was investigating if there is a
> > difference of performance in these two filter queries: field:[* TO *]  or
> > field:*
> >
> > In other words:
> >
> > q=*:*=field:[* TO *]=0
> >
> > q=*:*=field:*=0
> >
> > Could someone enlighten me?
> >
> > --
> > Vincenzo D'Amore
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Vincenzo D'Amore


Re: fq * vs [* TO *]

2019-09-19 Thread Mikhail Khludnev
Hello, Vincenzo.
I traced both code pathes, they are different. It's hard to predict the
difference between them. Probably some thorough microbenchmark can show two
times fold or so, but don't think it's significant for practical usage.

On Thu, Sep 19, 2019 at 10:23 AM Vincenzo D'Amore 
wrote:

> Hi all,
>
> talking about how to write solr queries I was investigating if there is a
> difference of performance in these two filter queries: field:[* TO *]  or
> field:*
>
> In other words:
>
> q=*:*=field:[* TO *]=0
>
> q=*:*=field:*=0
>
> Could someone enlighten me?
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Unsubscribe please

2019-09-19 Thread Gora Mohanty
Hi,

Please see https://lucene.apache.org/solr/community.html#mailing-lists-irc
. In order to unsubscribe, please send mail to solr-user-unsubscribe


Regards,
Gora

>


fq * vs [* TO *]

2019-09-19 Thread Vincenzo D'Amore
Hi all,

talking about how to write solr queries I was investigating if there is a
difference of performance in these two filter queries: field:[* TO *]  or
field:*

In other words:

q=*:*=field:[* TO *]=0

q=*:*=field:*=0

Could someone enlighten me?

-- 
Vincenzo D'Amore


Unsubscribe please

2019-09-19 Thread Smida Mahdi
Unsubscribe please : m.sm...@brgm.fr

[Image result for logo brgm]
Mahdi SMIDA
DISN / ISE
Tél : 02.38.64.35.38

SUIVEZ LE BRGM :
[Image result for twitter logo][Image result for 
linkedin logo]






Re: DIH: Create Child Documents in ScriptTransformer

2019-09-19 Thread Mikhail Khludnev
Hello, Jörn.
Have you tried to find a parent doc in the context which is passed as a
second argument into ScriptTransformer?

On Wed, Sep 18, 2019 at 9:56 PM Jörn Franke  wrote:
>
> Hi,
>
> I load a set of documents. Based on these documents some logic needs to be
> applied to split them into chapters (this is done). One whole document is
> loaded as a parent. Chapters of the whole document + metadata should be
> loaded as child documents of this parent.
> I want to now collect information on how this can be done:
> * Use a custom loader - this is possible and works
> * Use DIH and extract the chapters in a ScriptTransformer and add them as
> child documents there. However, the scripttransformer receives as input
> only a HashMap and while it works to transform field values etc. It does
> not seem possible to add childdocuments within the DIH scripttransformer.
I
> tried adding a JavaArray with SolrInputDocuments, but this does not seem
to
> work. I see in debug/verbose mode that indeed the transformer adds them to
> the HashMap correctly, but they don't end up in the document. Maybe here
it
> could be possible somehow via nested entities?
> * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument
> as a parameter and it seems feasible to extract chapters and add them as
> child documents.
>
> thank you.
>
> best regards



--
Sincerely yours
Mikhail Khludnev


Re: Question about "No registered leader" error

2019-09-19 Thread Hongxu Ma
@Shawn @Erick Thanks for your kindle help!

No OOM log and I confirm there was no OOM happened.

My ZK ticktime is set to 5000, so 5000*20 = 100s > 60s, and I checked solr 
code: the leader waiting time: 4000ms is a const variable, is not configurable. 
(why it isn't a configurable param?)

My solr version is 7.3.1, xmx = 3MB (via solr UI, peak memory is 22GB)
I have already used CMS GC tuning (param has a little difference from your wiki 
page).

I will try the following advice:

  *   lower heap size
  *   turn to G1 (the same param as wiki)
  *   try to restart one SOLR node when this error happens.

Thanks again.


From: Shawn Heisey 
Sent: Wednesday, September 18, 2019 20:21
To: solr-user@lucene.apache.org 
Subject: Re: Question about "No registered leader" error

On 9/18/2019 6:11 AM, Shawn Heisey wrote:
> On 9/17/2019 9:35 PM, Hongxu Ma wrote:
>> My questions:
>>
>>*   Is this error possible caused by "long gc pause"? my solr
>> zkClientTimeout=6
>
> It's possible.  I can't say for sure that this is the issue, but it
> might be.

A followup.  I was thinking about the interactions here.  It looks like
Solr only waits four seconds for the leader election, and both of the
pauses you mentioned are longer than that.

Four seconds is probably too short a time to wait, and I do not think
that timeout is configurable anywhere.

> What version of Solr do you have, and what is your max heap?  The CMS
> garbage collection that Solr 5.0 and later incorporate by default is
> pretty good.  My G1 settings might do slightly better, but the
> improvement won't be dramatic unless your existing commandline has
> absolutely no gc tuning at all.

That question will be important.  If you already have our CMS GC tuning,
switching to G1 probably is not going to solve this.  Lowering the max
heap might be the only viable solution in that case, and depending on
what you're dealing with, it will either be impossible or it will
require more servers.

Thanks,
Shawn


Re: DIH: Create Child Documents in ScriptTransformer

2019-09-19 Thread Jörn Franke
I fully agree. However, I am just curious to see the limits.

> Am 18.09.2019 um 23:33 schrieb Erick Erickson :
> 
> When it starts getting complex, I usually move to SolrJ. You say
> you're loading documents, so I assume Tika is in the mix too.
> 
> Here's a blog on the topic so you an see how to get started...
> 
> https://lucidworks.com/post/indexing-with-solrj/
> 
> Best,
> Erick
> 
>> On Wed, Sep 18, 2019 at 2:56 PM Jörn Franke  wrote:
>> 
>> Hi,
>> 
>> I load a set of documents. Based on these documents some logic needs to be
>> applied to split them into chapters (this is done). One whole document is
>> loaded as a parent. Chapters of the whole document + metadata should be
>> loaded as child documents of this parent.
>> I want to now collect information on how this can be done:
>> * Use a custom loader - this is possible and works
>> * Use DIH and extract the chapters in a ScriptTransformer and add them as
>> child documents there. However, the scripttransformer receives as input
>> only a HashMap and while it works to transform field values etc. It does
>> not seem possible to add childdocuments within the DIH scripttransformer. I
>> tried adding a JavaArray with SolrInputDocuments, but this does not seem to
>> work. I see in debug/verbose mode that indeed the transformer adds them to
>> the HashMap correctly, but they don't end up in the document. Maybe here it
>> could be possible somehow via nested entities?
>> * Use DIH+ an UpdateProcessor (Script): there i get the SolrInputDocument
>> as a parameter and it seems feasible to extract chapters and add them as
>> child documents.
>> 
>> thank you.
>> 
>> best regards