Using solr 7.7.2, Is it safe to manually delete tlog after doing commit?

2019-12-20 Thread alwaysbluesky
Using solr 7.7.2.

Our CDCR is broken for some reason as I posted the other
question(https://lucene.472066.n3.nabble.com/Three-questions-about-huge-tlog-problem-and-CDCR-td4453788.html).

 So the size of tlog is huge now... I don't care CDCR for now, and just want
to clean all these tlog first. Otherwise, disk space will become full.

Is it safe to manually delete by using "rm -rf ./tlog" after commit with
/solr/collectionname/update?commit=true (simply doing commit was not able to
clean tlog because of CDCR malfunction)?





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-20 Thread alwaysbluesky
sure.

I disabled buffer and started cdcr by calling api on both side.

And when I do indexing, I see the size of tlog folder stays within 1MB while
the size of index folder is increasing. 

So I imagined that tlog would be consumed by target node and cleared, and
data is being forwarded to target node.. but actually when I checked target
node, index in target nodes is still empty and data was loaded only in
source node.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: solr index data from hdfs with error

2019-12-20 Thread Erick Erickson
Morphlines support was removed from Solr in Solr 6.6, see: 
https://issues.apache.org/jira/browse/SOLR-9221

So I don’t think anyone here will be very conversant in the details. I vaguely 
recall that this process added an ID field by default, but it’s been a very 
long time since I looked. Do check if you have UUIDUpdateProcessorFactory in 
your solrconfig.xml file, that automatically adds a field to a document if it 
doesn’t have one and it usually defaults to “id”.

Sorry I can’t be more help,
Erick

> On Dec 20, 2019, at 10:17 AM, bennis  wrote:
> 
> Hello
> I am new in using Solr and I need your help.
> I have data on HDFS that I  need to index with Solr.
> 
> I) My data looks like that, it is saved on hdfs  :
> ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE
> 37,3,Cadres et professions intellectuelles supérieures,35,Professions de
> l'information, des arts et des spectacles,353a,Directeurs de journaux,
> administrateurs de presse, directeurs d'éditions (littéraire, musicale,
> audiovisuelle et multimédia),01/01/70,31/12/99,08/01/19 18:13:42,274272000,
> 
> it is located here :
> ${GEOBI_NAMENODE}/user/bdatadev2/work/tmp/tmp_TD_METIER_PCS_ESE
> 
> II) I made solr-morphline.conf :
> 
> *
> SOLR_LOCATOR : {
>  # Name of solr collection
>  collection : oracle_table_test_DEV2 
> 
>  # ZooKeeper ensemble
>  zkHost : "eufrtopbdt003.randstaddta.gis:2182/solr"
> }
> 
> morphlines : [
>  {
>id : morphline1
>importCommands : ["org.kitesdk.**"]
> 
>commands : [
>  {
>readCSV {
>  separator : ","
>  # This columns should map the one configured in SolR and are
> expected in this position inside CSV
>  columns :
> [ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE]
>  ignoreFirstLine : true
>  commentPrefix : ""
>  trim : true
>  charset : UTF-8
>}
>  }
> 
>  {
>sanitizeUnknownSolrFields {
>  # Location from which to fetch Solr schema
>  solrLocator : ${SOLR_LOCATOR}
>}
>  }
> 
>  # log the record at DEBUG level to SLF4J
>  { logDebug { format : "output record: {}", args : ["@{}"] } }
> 
>  # load the record into a Solr server or MapReduce Reducer
>  {
>loadSolr {
>  solrLocator : ${SOLR_LOCATOR}
>}
>  }
> 
>]
>  }
> ]
> 
> *
> 
> 
> III) and finally my schema.xml is the following, I modified only the part to
> define FIELDS :
> *
> 
> 
> 
> 
> 
> stored="true" required="true"  docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>required="false"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
> 
> 
> 
> 
>   
>multiValued="true"/>
>   
>multiValued="true"/>
>   
>multiValued="true"/>
>stored="true"/>
>stored="true" multiValued="true"/>
>stored="true" multiValued="true"/>
>   
>multiValued="true"/>
>   
>multiValued="true"/>
>   
>multiValued="true"/>
> 
> 
>stored="false" />
> 
>   
>multiValued="true"/>
>   
> 
> 
>   
>   
>   
>   
>   
> 
>   
>stored="true"/>
> 
>   
>stored="true" multiValued="true"/>
> 
>   
> 
> 
> 
> 
> 
> 
> 
> 
> ID_METIER_PCS_ESE 
> 
> 
> 
> 
> 
>  -->
>
> 
> 
> sortMissingLast="true"/>
> 
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> 
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> 
> positionIncrementGap="0"/>
> 
> 
> positionIncrementGap="0"/>
> 
> 
> 
>
> 
>
>
>
>
>
> 
>   
> 
> 
> positionIncrementGap="100">
>  
>
>  
>
> 
> 
> positionIncrementGap="100">
>  
>
> words="stopwords.txt" />
> 
>
>
>  
>  
>
> words="stopwords.txt" />
> ignoreCase="true" expand="true"/>
>
>
>  
>
> 
> 
>  
>  
> 
> 
> 
> positionIncrementGap="100">
>  
>
> 
>ignoreCase="true"
>words="lang/stopwords_en.txt"
>/>
>
>   
> protected="protwords.txt"/>
>   
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>

solr index data from hdfs with error

2019-12-20 Thread bennis
Hello
I am new in using Solr and I need your help.
I have data on HDFS that I  need to index with Solr.

I) My data looks like that, it is saved on hdfs  :
ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE
37,3,Cadres et professions intellectuelles supérieures,35,Professions de
l'information, des arts et des spectacles,353a,Directeurs de journaux,
administrateurs de presse, directeurs d'éditions (littéraire, musicale,
audiovisuelle et multimédia),01/01/70,31/12/99,08/01/19 18:13:42,274272000,

it is located here :
${GEOBI_NAMENODE}/user/bdatadev2/work/tmp/tmp_TD_METIER_PCS_ESE

II) I made solr-morphline.conf :

*
SOLR_LOCATOR : {
  # Name of solr collection
  collection : oracle_table_test_DEV2 

  # ZooKeeper ensemble
  zkHost : "eufrtopbdt003.randstaddta.gis:2182/solr"
}

morphlines : [
  {
id : morphline1
importCommands : ["org.kitesdk.**"]

commands : [
  {
readCSV {
  separator : ","
  # This columns should map the one configured in SolR and are
expected in this position inside CSV
  columns :
[ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE]
  ignoreFirstLine : true
  commentPrefix : ""
  trim : true
  charset : UTF-8
}
  }

  {
sanitizeUnknownSolrFields {
  # Location from which to fetch Solr schema
  solrLocator : ${SOLR_LOCATOR}
}
  }

  # log the record at DEBUG level to SLF4J
  { logDebug { format : "output record: {}", args : ["@{}"] } }

  # load the record into a Solr server or MapReduce Reducer
  {
loadSolr {
  solrLocator : ${SOLR_LOCATOR}
}
  }

]
  }
]

*


III) and finally my schema.xml is the following, I modified only the part to
define FIELDS :
*



 
 

   
   
   
   
   
   
   
   
   
   
   
   
   

   

   

   
   
   
   

   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   
   

   
   

   


   
   
 


 
 ID_METIER_PCS_ESE 

 
 
 
  
  -->






























   

   

  

  


   

  





  
  





  


 
  
  
 



  








  
  








  



  








  
  







  




  









  




  




  
  




  




  








  



  


  



  



  




  


  




  

  
  

  



  

  
  

  


 












 


   



   







  




   
 

 
   
  




   





   
  




  






  




   



   
  




   




   
  




   







  




   





  




   





  




   




  




  







  




   





  




   








  




   








  




   





  




   








  




   



PreAnalyzedFieldUpdateProcessor issues in Solrcloud

2019-12-20 Thread Markus Jelsma
Hello,

We are moving our text analysis to outside of Solr and use PreAnalyzedField to 
speed up indexing. We also use MLT, but these two don't work together, there is 
no way for MLT to properly analyze a document using the PreAnalyzedField's 
analyzer, and it does not pass the code in the MLT qparser where it checks for 
FieldType.isExplicitAnalyzer().

So instead of changing the schema, i tried using 
PreAnalyzedFieldUpdateProcessor. This would be ideal because MLT still works 
and i can still manually index non-preanalyzed documents when developing, just 
by switching URP chain.

I cannot get it to work. When i place the URP on top of all others i get:

TransactionLog doesn't know how to serialize class 
org.apache.lucene.document.Field; try implementing ObjectResolver?
at 
org.apache.solr.update.TransactionLog$1.resolve(TransactionLog.java:100)
at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:264)

If i put the URP directly above Run i get:

Remote error message: TransactionLog doesn't know how to serialize class 
org.apache.lucene.document.Field; try implementing ObjectResolver?
at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096)

If i remove the DistributedURP indexing a preanalyzed document works, but, my 
stored field is suddenly prefixed with:

org.apache.lucene.document.Field:stored,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition,omitNorms

Re: Facing jwt authentication problem using solr 8.1.1

2019-12-20 Thread Jason Gerlowski
Oh, ok.

>From the user's error message it looked to me like bin/solr was making
an admin/info/system call from bash, but it must be something else.

On Fri, Dec 20, 2019 at 6:28 AM Jan Høydahl  wrote:
>
> No, I doubt that bin/solr support would do more than just wire in a simple 
> initial JWT config, with some default Rule-based config.
>
> Jan
>
> > 17. des. 2019 kl. 16:42 skrev Jason Gerlowski :
> >
> > Hey Jan,
> >
> > Is this a case of something that'd be fixed by
> > https://issues.apache.org/jira/browse/SOLR-13071 ?
> >
> > Just wondering
> >
> > Best,
> > Jason
> >
> > On Thu, Dec 12, 2019 at 5:43 PM Jan Høydahl  wrote:
> >>
> >> Try something like this 
> >> https://gist.github.com/b330e1bea7842bcdc1e5fa3940b4a4f7 
> >> 
> >>
> >> The trick is to «whitelist» certain paths that will not require auth, but 
> >> then further down add rules to block all other paths either as admin role 
> >> or with special role *»* which means «any authenticated user».
> >>
> >> Jan
> >>
> >>> 12. des. 2019 kl. 07:47 skrev Lakhan Gupta 
> >>> :
> >>>
> >>> Hi,
> >>>
> >>> Using solr 8.1.1 version and facing problem while enabling jwt 
> >>> authentication in solr. Jwt authentication is working fine after 
> >>> configuring security.json file. Below is the configuration I am using for 
> >>> enabling jwt authentication.
> >>>
> >>> Security.json
> >>>
> >>> {
> >>> "authentication":{
> >>>  "blockUnknown": false,
> >>>   "class":"solr.JWTAuthPlugin",
> >>>  "jwk":{
> >>> "kty":"oct",
> >>> "use":"sig",
> >>> "kid":"k1",
> >>> 
> >>> "k":"7A02618BE6943C22FD81CAB9F6FCF063B6E1732C3614BC3ACA6032B6B3215CAF0D28A34FD423423CA3AC34BEA27D3F79",
> >>> "alg":"HS256"},
> >>>   "aud":"solr"},
> >>>  "authorization":{
> >>> "class":"solr.RuleBasedAuthorizationPlugin",
> >>> "permissions":[
> >>> {
> >>>   "name":"all",
> >>>"path":"/*",
> >>>   "role":"admin"
> >>>}
> >>> ],
> >>> "user-role":{
> >>>"solr":"admin"
> >>> }
> >>>  }
> >>> }
> >>>
> >>> Using secret key
> >>> 7A02618BE6943C22FD81CAB9F6FCF063B6E1732C3614BC3ACA6032B6B3215CAF0D28A34FD423423CA3AC34BEA27D3F79
> >>>
> >>> Jwt token is generated:
> >>> eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZCIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.M4PksJTJ9gFjOlvvFmG1eDSyXDtKIRSGIYicIW9hwT4
> >>>
> >>> Below header and payload I'm using for generate jwt token :
> >>>
> >>> The header is
> >>> {
> >>> "alg": "HS256",
> >>> "typ": "JWT"
> >>> }
> >>>
> >>> and the payload is
> >>>
> >>> {
> >>> "sub": "admin",
> >>> "aud": "Solr",
> >>> "exp": 9916239022
> >>> }
> >>>
> >>> With above configuration my jwt authentication is working fine. But there 
> >>> is a problem when request is sent without authentication in header the 
> >>> api still retrieving data. I want to prevent it when request come without 
> >>> authentication header.
> >>>
> >>> For that, I've enabled blockUnknown parameter in security.json file. That 
> >>> works fine and authentication request is required. But, after enabling 
> >>> blockunknown  parameter I am facing below exception while starting solr 
> >>> using solr start command.
> >>>
> >>>
> >>> ERROR: Solr requires authentication for 
> >>> http://localhost:8983/solr/admin/info/system. Please supply valid 
> >>> credentials. HTTP code=401
> >>>
> >>> I've googled a lot and find out
> >>>
> >>> solr/admin/info/system endpoint required authentication.
> >>>
> >>> How to authenticate solr/admin/info/system endpoint while startup solr?
> >>>
> >>> Need urgent help. I'd appreciate if someone can help me.
> >>>
> >>> Thanks
> >>> Lakhan Gupta
> >>>
> >>>
> >>>
> >>> The information in this email is confidential and may be legally 
> >>> privileged. It is intended solely for the addressee and access to it by 
> >>> anyone else is unauthorized. If you are not the intended recipient, any 
> >>> disclosure, copying, distribution or any action taken or omitted to be 
> >>> taken based on it, is strictly prohibited and may be unlawful.
> >>
>


Re: Facing jwt authentication problem using solr 8.1.1

2019-12-20 Thread Jan Høydahl
No, I doubt that bin/solr support would do more than just wire in a simple 
initial JWT config, with some default Rule-based config.

Jan

> 17. des. 2019 kl. 16:42 skrev Jason Gerlowski :
> 
> Hey Jan,
> 
> Is this a case of something that'd be fixed by
> https://issues.apache.org/jira/browse/SOLR-13071 ?
> 
> Just wondering
> 
> Best,
> Jason
> 
> On Thu, Dec 12, 2019 at 5:43 PM Jan Høydahl  wrote:
>> 
>> Try something like this 
>> https://gist.github.com/b330e1bea7842bcdc1e5fa3940b4a4f7 
>> 
>> 
>> The trick is to «whitelist» certain paths that will not require auth, but 
>> then further down add rules to block all other paths either as admin role or 
>> with special role *»* which means «any authenticated user».
>> 
>> Jan
>> 
>>> 12. des. 2019 kl. 07:47 skrev Lakhan Gupta 
>>> :
>>> 
>>> Hi,
>>> 
>>> Using solr 8.1.1 version and facing problem while enabling jwt 
>>> authentication in solr. Jwt authentication is working fine after 
>>> configuring security.json file. Below is the configuration I am using for 
>>> enabling jwt authentication.
>>> 
>>> Security.json
>>> 
>>> {
>>> "authentication":{
>>>  "blockUnknown": false,
>>>   "class":"solr.JWTAuthPlugin",
>>>  "jwk":{
>>> "kty":"oct",
>>> "use":"sig",
>>> "kid":"k1",
>>> 
>>> "k":"7A02618BE6943C22FD81CAB9F6FCF063B6E1732C3614BC3ACA6032B6B3215CAF0D28A34FD423423CA3AC34BEA27D3F79",
>>> "alg":"HS256"},
>>>   "aud":"solr"},
>>>  "authorization":{
>>> "class":"solr.RuleBasedAuthorizationPlugin",
>>> "permissions":[
>>> {
>>>   "name":"all",
>>>"path":"/*",
>>>   "role":"admin"
>>>}
>>> ],
>>> "user-role":{
>>>"solr":"admin"
>>> }
>>>  }
>>> }
>>> 
>>> Using secret key
>>> 7A02618BE6943C22FD81CAB9F6FCF063B6E1732C3614BC3ACA6032B6B3215CAF0D28A34FD423423CA3AC34BEA27D3F79
>>> 
>>> Jwt token is generated:
>>> eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZCIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.M4PksJTJ9gFjOlvvFmG1eDSyXDtKIRSGIYicIW9hwT4
>>> 
>>> Below header and payload I'm using for generate jwt token :
>>> 
>>> The header is
>>> {
>>> "alg": "HS256",
>>> "typ": "JWT"
>>> }
>>> 
>>> and the payload is
>>> 
>>> {
>>> "sub": "admin",
>>> "aud": "Solr",
>>> "exp": 9916239022
>>> }
>>> 
>>> With above configuration my jwt authentication is working fine. But there 
>>> is a problem when request is sent without authentication in header the api 
>>> still retrieving data. I want to prevent it when request come without 
>>> authentication header.
>>> 
>>> For that, I've enabled blockUnknown parameter in security.json file. That 
>>> works fine and authentication request is required. But, after enabling 
>>> blockunknown  parameter I am facing below exception while starting solr 
>>> using solr start command.
>>> 
>>> 
>>> ERROR: Solr requires authentication for 
>>> http://localhost:8983/solr/admin/info/system. Please supply valid 
>>> credentials. HTTP code=401
>>> 
>>> I've googled a lot and find out
>>> 
>>> solr/admin/info/system endpoint required authentication.
>>> 
>>> How to authenticate solr/admin/info/system endpoint while startup solr?
>>> 
>>> Need urgent help. I'd appreciate if someone can help me.
>>> 
>>> Thanks
>>> Lakhan Gupta
>>> 
>>> 
>>> 
>>> The information in this email is confidential and may be legally 
>>> privileged. It is intended solely for the addressee and access to it by 
>>> anyone else is unauthorized. If you are not the intended recipient, any 
>>> disclosure, copying, distribution or any action taken or omitted to be 
>>> taken based on it, is strictly prohibited and may be unlawful.
>>