Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Zheng Lin Edwin Yeo
Thanks for the reply, will find out more about it.

Currently I am able to retrieve the normal Metadata of the email, but not
the Metadata of the attachments which are part of the contents in the EML
file, which looks something like this.

--d8b77b057d59ca19--

--d8b77e057d59ca1b
Content-Type: application/pdf; name="file1.pdf"
Content-Disposition: attachment; filename="file1.pdf"
Content-Transfer-Encoding: base64
Content-ID: 
X-Attachment-Id: f_jpurtpnk0

Regards,
Edwin

On Sat, 3 Aug 2019 at 05:38, Tim Allison  wrote:

> I'd strongly recommend rolling your own ingest code.  See Erick's
> superb: https://lucidworks.com/post/indexing-with-solrj/
>
> You can easily get attachments via the RecursiveParserWrapper, e.g.
>
> https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351
>
> This will return a list of Metadata objects; the first one will be the
> main/container, each other entry will be an attachment.  Let us know
> if you have any questions/surprises.  There are a couple of todos for
> .eml...
>
> On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl  wrote:
> >
> > Try the Apache Tika mailing list.
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo  >:
> > >
> > > Hi,
> > >
> > > Does anyone knows if this can be done on the Solr side?
> > > Or it has to be done on the Tika side?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo  >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Would like to check, Is there anyway which we can detect the number of
> > >> attachments and their names during indexing of EML files in Solr, and
> index
> > >> those information into Solr?
> > >>
> > >> Currently, Solr is able to use Tika and Tesseract OCR to extract the
> > >> contents of the attachments. However, I could not find the information
> > >> about the number of attachments in the EML file and what are their
> filename.
> > >>
> > >> I am using Solr 7.6.0 in production, and also trying out on the new
> Solr
> > >> 8.2.0.
> > >>
> > >> Regards,
> > >> Edwin
> > >>
> >
>


Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Tim Allison
I'd strongly recommend rolling your own ingest code.  See Erick's
superb: https://lucidworks.com/post/indexing-with-solrj/

You can easily get attachments via the RecursiveParserWrapper, e.g.
https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351

This will return a list of Metadata objects; the first one will be the
main/container, each other entry will be an attachment.  Let us know
if you have any questions/surprises.  There are a couple of todos for
.eml...

On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl  wrote:
>
> Try the Apache Tika mailing list.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo :
> >
> > Hi,
> >
> > Does anyone knows if this can be done on the Solr side?
> > Or it has to be done on the Tika side?
> >
> > Regards,
> > Edwin
> >
> > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi,
> >>
> >> Would like to check, Is there anyway which we can detect the number of
> >> attachments and their names during indexing of EML files in Solr, and index
> >> those information into Solr?
> >>
> >> Currently, Solr is able to use Tika and Tesseract OCR to extract the
> >> contents of the attachments. However, I could not find the information
> >> about the number of attachments in the EML file and what are their 
> >> filename.
> >>
> >> I am using Solr 7.6.0 in production, and also trying out on the new Solr
> >> 8.2.0.
> >>
> >> Regards,
> >> Edwin
> >>
>


Re: Solr on HDFS

2019-08-02 Thread Kevin Risden
>
> If you think about it, having a shard with 3 replicas on top of a file

system that does 3x replication seems a little excessive!


https://issues.apache.org/jira/browse/SOLR-6305 should help here. I can
take a look at merging the patch since looks like it has been helpful to
others.


Kevin Risden


On Fri, Aug 2, 2019 at 10:09 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi Kyle - Thank you.
>
> Our current index is split across 3 solr collections; our largest
> collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across
> 100 shards.  There are 40 machines hosting this cluster. We've found
> that when dealing with large collections having no replicas (but lots of
> shards) ends up being more reliable since there is a much smaller
> recovery time.  We keep another 30 day index (1.4TBytes) that does have
> replicas (40 shards, 3 replicas each), and if a node goes down, we
> manually delete lock files and then bring it back up and yes - lots of
> network IO, but it usually recovers OK.
>
> Having a large collection like this with no replicas seems like a recipe
> for disaster.  So, we've been experimenting with the latest version
> (8.2) and our index process to split up the data into many solr
> collections that do have replicas, and then build the list of
> collections to search at query time.  Our searches are date based, so we
> can define what collections we want to query at query time. As a test,
> we ran just two machines, HDFS, and 500 collections. One server ran out
> of memory and crashed.  We had over 1,600 lock files to delete.
>
> If you think about it, having a shard with 3 replicas on top of a file
> system that does 3x replication seems a little excessive! I'd love to
> see Solr take more advantage of a shared FS.  Perhaps an idea is to use
> HDFS but with an NFS gateway.  Seems like that may be slow.
> Architecturally, I love only having one large file system to manage
> instead of lots of individual file systems across many machines.  HDFS
> makes this easy.
>
> -Joe
>
> On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote:
> > Hi Joe,
> >
> > We fought with Solr on HDFS for quite some time, and faced similar issues
> > as you're seeing. (See this thread, for example:"
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e
> >   )
> >
> > The Solr lock files on HDFS get deleted if the Solr server gets shut down
> > gracefully, but we couldn't always guarantee that in our environment so
> we
> > ended up writing a custom startup script to search for lock files on HDFS
> > and delete them before solr startup.
> >
> > However, the issue that you mention of the Solr server rebuilding its
> whole
> > index from replicas on startup was enough of a show-stopper for us that
> we
> > switched away from HDFS to local disk. It literally made the difference
> > between 24+ hours of recovery time after an unexpected outage to less
> than
> > a minute...
> >
> > If you do end up finding a solution to this issue, please post it to this
> > mailing list, because there are others out there (like us!) who would
> most
> > definitely make use it.
> >
> > Thanks
> >
> > Kyle
> >
> > On Fri, 2 Aug 2019 at 08:58, Joe Obernberger <
> joseph.obernber...@gmail.com>
> > wrote:
> >
> >> Thank you.  No, while the cluster is using Cloudera for HDFS, we do not
> >> use Cloudera to manager the solr cluster.  If it is a
> >> configuration/architecture issue, what can I do to fix it?  I'd like a
> >> system where servers can come and go, but the indexes stay available and
> >> recover automatically.  Is that possible with HDFS?
> >> While adding an alias to other collections would be an option, if that
> >> collection is the only collection, or one that is currently needed, in a
> >> live system, we can't bring it down, re-create it, and re-index when
> >> that process may take weeks to do.
> >>
> >> Any ideas?
> >>
> >> -Joe
> >>
> >> On 8/1/2019 6:15 PM, Angie Rabelero wrote:
> >>> I don’t think you’re using claudera or ambari, but ambari has an option
> >> to delete the locks. This seems more a configuration/architecture isssue
> >> than a realibility issue. You may want to spin up an alias while you
> bring
> >> down, clear locks and directories, recreate and index the affected
> >> collection, while you work your other isues.
> >>> On Aug 1, 2019, at 16:40, Joe Obernberger <
> joseph.obernber...@gmail.com>
> >> wrote:
> >>> Been using Solr on HDFS for a while now, and I'm seeing an issue with
> >> redundancy/reliability.  If a server goes down, when it comes back up,
> it
> >> will never recover because of the lock files in HDFS. That solr node
> needs
> >> to be brought down manually, the lock files deleted, and then brought
> back
> >> up.  At that point, it appears to copy all the data for its replicas.
> If
> >> the index is large, and new data is being indexed, in some cases it will
> >> never 

Re: Solr on HDFS

2019-08-02 Thread Joe Obernberger

Hi Kyle - Thank you.

Our current index is split across 3 solr collections; our largest 
collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across 
100 shards.  There are 40 machines hosting this cluster. We've found 
that when dealing with large collections having no replicas (but lots of 
shards) ends up being more reliable since there is a much smaller 
recovery time.  We keep another 30 day index (1.4TBytes) that does have 
replicas (40 shards, 3 replicas each), and if a node goes down, we 
manually delete lock files and then bring it back up and yes - lots of 
network IO, but it usually recovers OK.


Having a large collection like this with no replicas seems like a recipe 
for disaster.  So, we've been experimenting with the latest version 
(8.2) and our index process to split up the data into many solr 
collections that do have replicas, and then build the list of 
collections to search at query time.  Our searches are date based, so we 
can define what collections we want to query at query time. As a test, 
we ran just two machines, HDFS, and 500 collections. One server ran out 
of memory and crashed.  We had over 1,600 lock files to delete.


If you think about it, having a shard with 3 replicas on top of a file 
system that does 3x replication seems a little excessive! I'd love to 
see Solr take more advantage of a shared FS.  Perhaps an idea is to use 
HDFS but with an NFS gateway.  Seems like that may be slow.  
Architecturally, I love only having one large file system to manage 
instead of lots of individual file systems across many machines.  HDFS 
makes this easy.


-Joe

On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote:

Hi Joe,

We fought with Solr on HDFS for quite some time, and faced similar issues
as you're seeing. (See this thread, for example:"
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e
  )

The Solr lock files on HDFS get deleted if the Solr server gets shut down
gracefully, but we couldn't always guarantee that in our environment so we
ended up writing a custom startup script to search for lock files on HDFS
and delete them before solr startup.

However, the issue that you mention of the Solr server rebuilding its whole
index from replicas on startup was enough of a show-stopper for us that we
switched away from HDFS to local disk. It literally made the difference
between 24+ hours of recovery time after an unexpected outage to less than
a minute...

If you do end up finding a solution to this issue, please post it to this
mailing list, because there are others out there (like us!) who would most
definitely make use it.

Thanks

Kyle

On Fri, 2 Aug 2019 at 08:58, Joe Obernberger 
wrote:


Thank you.  No, while the cluster is using Cloudera for HDFS, we do not
use Cloudera to manager the solr cluster.  If it is a
configuration/architecture issue, what can I do to fix it?  I'd like a
system where servers can come and go, but the indexes stay available and
recover automatically.  Is that possible with HDFS?
While adding an alias to other collections would be an option, if that
collection is the only collection, or one that is currently needed, in a
live system, we can't bring it down, re-create it, and re-index when
that process may take weeks to do.

Any ideas?

-Joe

On 8/1/2019 6:15 PM, Angie Rabelero wrote:

I don’t think you’re using claudera or ambari, but ambari has an option

to delete the locks. This seems more a configuration/architecture isssue
than a realibility issue. You may want to spin up an alias while you bring
down, clear locks and directories, recreate and index the affected
collection, while you work your other isues.

On Aug 1, 2019, at 16:40, Joe Obernberger 

wrote:

Been using Solr on HDFS for a while now, and I'm seeing an issue with

redundancy/reliability.  If a server goes down, when it comes back up, it
will never recover because of the lock files in HDFS. That solr node needs
to be brought down manually, the lock files deleted, and then brought back
up.  At that point, it appears to copy all the data for its replicas.  If
the index is large, and new data is being indexed, in some cases it will
never recover. The replication retries over and over.

How can we make a reliable Solr Cloud cluster when using HDFS that can

handle servers coming and going?

Thank you!

-Joe



---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)

2019-08-02 Thread Jörn Franke
 Not sure if this is possible, but why not create a query handler in Solr with 
any custom query and you use that as ping replacement ?

> Am 02.08.2019 um 15:48 schrieb dinesh naik :
> 
> Hi all,
> I have few clusters with huge data set and whenever a node goes down its
> not able to recover due to below reasons:
> 
>  1. ping request handler is taking more than 10-15 seconds to respond. The
> ping requesthandler however, expects it will return in less than 1 second
> and fails a requestrecovery if it is not responded to in this time.
> Therefore recoveries never would start.
> 
>  2. soft commit is very low ie. 5 sec. This is a business requirement so
> not much can be done here.
> 
> As the standard/default admin/ping request handler is using *:* queries ,
> the response time is much higher, and i am looking for an option to change
> the same so that the ping handler returns the results within few
> miliseconds.
> 
> here is an example for standard query time:
> 
> snip---
> curl "
> http://hostname:8983/solr/parts/select?indent=on=*:*=0=json=false=timing
> "
> {
>  "responseHeader":{
>"zkConnected":true,
>"status":0,
>"QTime":16620,
>"params":{
>  "q":"*:*",
>  "distrib":"false",
>  "debug":"timing",
>  "indent":"on",
>  "rows":"0",
>  "wt":"json"}},
>  "response":{"numFound":1329638799,"start":0,"docs":[]
>  },
>  "debug":{
>"timing":{
>  "time":16620.0,
>  "prepare":{
>"time":0.0,
>"query":{
>  "time":0.0},
>"facet":{
>  "time":0.0},
>"facet_module":{
>  "time":0.0},
>"mlt":{
>  "time":0.0},
>"highlight":{
>  "time":0.0},
>"stats":{
>  "time":0.0},
>"expand":{
>  "time":0.0},
>"terms":{
>  "time":0.0},
>"block-expensive-queries":{
>  "time":0.0},
>"slow-query-logger":{
>  "time":0.0},
>"debug":{
>  "time":0.0}},
>  "process":{
>"time":16619.0,
>"query":{
>  "time":16619.0},
>"facet":{
>  "time":0.0},
>"facet_module":{
>  "time":0.0},
>"mlt":{
>  "time":0.0},
>"highlight":{
>  "time":0.0},
>"stats":{
>  "time":0.0},
>"expand":{
>  "time":0.0},
>"terms":{
>  "time":0.0},
>"block-expensive-queries":{
>  "time":0.0},
>"slow-query-logger":{
>  "time":0.0},
>"debug":{
>  "time":0.0}
> 
> 
> snap
> 
> can we use query: _root_:abc in the ping request handler ? Tried this query
> and its returning the results within few miliseconds and also the nodes are
> able to recover without any issue.
> 
> we want to use _root_ field for querying as this field is available in all
> our clusters with below definition:
>  termOffsets="false" stored="false" termPayloads="false" termPositions=
> "false" docValues="false" termVectors="false"/>
> Could you please let me know if using _root_ for querying in
> pingRequestHandler will cause any problem?
> 
>   name="invariants"> /select _root_:abc  
> 
> 
> -- 
> Best Regards,
> Dinesh Naik


Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)

2019-08-02 Thread dinesh naik
Hi all,
I have few clusters with huge data set and whenever a node goes down its
not able to recover due to below reasons:

  1. ping request handler is taking more than 10-15 seconds to respond. The
ping requesthandler however, expects it will return in less than 1 second
and fails a requestrecovery if it is not responded to in this time.
Therefore recoveries never would start.

  2. soft commit is very low ie. 5 sec. This is a business requirement so
not much can be done here.

As the standard/default admin/ping request handler is using *:* queries ,
the response time is much higher, and i am looking for an option to change
the same so that the ping handler returns the results within few
miliseconds.

here is an example for standard query time:

snip---
curl "
http://hostname:8983/solr/parts/select?indent=on=*:*=0=json=false=timing
"
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":16620,
"params":{
  "q":"*:*",
  "distrib":"false",
  "debug":"timing",
  "indent":"on",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":1329638799,"start":0,"docs":[]
  },
  "debug":{
"timing":{
  "time":16620.0,
  "prepare":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"block-expensive-queries":{
  "time":0.0},
"slow-query-logger":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":16619.0,
"query":{
  "time":16619.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"block-expensive-queries":{
  "time":0.0},
"slow-query-logger":{
  "time":0.0},
"debug":{
  "time":0.0}


snap

can we use query: _root_:abc in the ping request handler ? Tried this query
and its returning the results within few miliseconds and also the nodes are
able to recover without any issue.

we want to use _root_ field for querying as this field is available in all
our clusters with below definition:

Could you please let me know if using _root_ for querying in
pingRequestHandler will cause any problem?

  /select _root_:abc  


-- 
Best Regards,
Dinesh Naik


RE: Basic Authentication problem

2019-08-02 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Was I correct in my description yesterday (which I am pasting in below)? That 
you are using a hash based on the "solr" account name and expecting that to 
work if you change the account name but not the hash?

Am I correct in assuming that everything other than security-edit functions 
currently works for you with any account and any password, including without 
any login-and-password at all?


-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C] 
Sent: Thursday, August 01, 2019 10:58 AM
To: solr-user@lucene.apache.org
Subject: RE: Basic Authentication problem

The hash value 
"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
 is based on both the plain text password AND the plain test login. Since 
"solr" is not the same string as "solr-admin", the password will not work. If 
the only authorization in security.json is restricting security-edit, then you 
can do anything else with any password, or with no password.

What you can do is setup the security.json file as specified in the Reference 
Guide (whence you got the hash of the login and password), then use the default 
solr login to run your set-user (to add the solr-admin user alongside the 
existing solr login), then use the default solr login to run 
{"set-user-role":{"solr-admin":["security-edit"]}}, and then (when you are sure 
things are correctly setup for solr-admin) drop the default solr login


-Original Message-
From: Zheng Lin Edwin Yeo  
Sent: Friday, August 02, 2019 2:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Basic Authentication problem

From what I see, you are trying to change your own user's password. If I
remembered correctly this might not be allowed, which is why you are
getting the "Unauthorized request" error.

You can try to create another user with admin role as well, and to change
your existing user's password from the new user.

Regards,
Edwin

On Fri, 2 Aug 2019 at 13:32, Salmaan Rashid Syed 
wrote:

> My curl command works fine for querying, updating etc.
>
> I don't think it is the fault of curl command.
>
> I get the following error message when I tried to change the password of
> solr-admin,
>
>
> 
>
> 
>
> 
>
> Error 403 Unauthorized request, Response code: 403
>
> 
>
> HTTP ERROR 403
>
> Problem accessing /solr/admin/authentication. Reason:
>
> Unauthorized request, Response code: 403
>
> 
>
> 
>
>
> And if I give incorrect username and password, it states bad credentials
> entered. So, I think the curl command is fine. There is some issue with
> basic authentication.
>
>
> Okay, One way around is to figure out how to convert my password into a
> SHA256 (password + salt) and enter it in security.json file. But, I have no
> idea how to generate the SHA256 equivalent of my password.
>
>
> Any suggestions?
>
>
>
> On Fri, Aug 2, 2019 at 10:55 AM Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Salmaan,
> >
> > Does your curl command works for other curl commands like normal
> querying?
> > Or is it just not working when updating password and adding new users?
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On Fri, 2 Aug 2019 at 13:03, Salmaan Rashid Syed <
> > salmaan.ras...@mroads.com>
> > wrote:
> >
> > > Hi Zheng,
> > >
> > > I tried and it works. But, when I use the curl command to update
> password
> > > or add new users it doesn't work.
> > >
> > > I don't know what is going wrong with curl command!
> > >
> > > Regards,
> > > Salmaan
> > >
> > >
> > > On Fri, Aug 2, 2019 at 8:26 AM Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > > wrote:
> > >
> > > > Have you tried to access the Solr Admin UI with your created user
> name
> > > and
> > > > password to see if it works?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > > On Thu, 1 Aug 2019 at 19:51, Salmaan Rashid Syed <
> > > > salmaan.ras...@mroads.com>
> > > > wrote:
> > > >
> > > > > Hi Solr User,
> > > > >
> > > > > Please help me with my issue.
> > > > >
> > > > > I have enabled Solr basic authentication as shown in Solr
> > > documentations.
> > > > >
> > > > > I have changed username from solr to solr-admin as follow
> > > > >
> > > > > {
> > > > > "authentication":{
> > > > >"blockUnknown": true,
> > > > >"class":"solr.BasicAuthPlugin",
> > > > >
> > > > >
> > > >
> > "credentials":{"solr-admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> > > > > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> > > > > },
> > > > > "authorization":{
> > > > >"class":"solr.RuleBasedAuthorizationPlugin",
> > > > >"permissions":[{"name":"security-edit",
> > > > >   "role":"admin"}],
> > > > >"user-role":{"solr-admin":"admin"}
> > > > > }}
> > > > >
> > > > > I am able to login to the page using the credentials
> > > > solr-admin:SolrRocks.
> > > > >
> > > > > But, when I try to change the default password using the curl
> command
> > > as
> > > > > follows,
> > > > >
> > > > > curl --user solr-admin:SolrRocks
> > > > > http://localhost:8983/solr/admin/authentication -H
> > > > > 

Re: Solr on HDFS

2019-08-02 Thread lstusr 5u93n4
Hi Joe,

We fought with Solr on HDFS for quite some time, and faced similar issues
as you're seeing. (See this thread, for example:"
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e
 )

The Solr lock files on HDFS get deleted if the Solr server gets shut down
gracefully, but we couldn't always guarantee that in our environment so we
ended up writing a custom startup script to search for lock files on HDFS
and delete them before solr startup.

However, the issue that you mention of the Solr server rebuilding its whole
index from replicas on startup was enough of a show-stopper for us that we
switched away from HDFS to local disk. It literally made the difference
between 24+ hours of recovery time after an unexpected outage to less than
a minute...

If you do end up finding a solution to this issue, please post it to this
mailing list, because there are others out there (like us!) who would most
definitely make use it.

Thanks

Kyle

On Fri, 2 Aug 2019 at 08:58, Joe Obernberger 
wrote:

> Thank you.  No, while the cluster is using Cloudera for HDFS, we do not
> use Cloudera to manager the solr cluster.  If it is a
> configuration/architecture issue, what can I do to fix it?  I'd like a
> system where servers can come and go, but the indexes stay available and
> recover automatically.  Is that possible with HDFS?
> While adding an alias to other collections would be an option, if that
> collection is the only collection, or one that is currently needed, in a
> live system, we can't bring it down, re-create it, and re-index when
> that process may take weeks to do.
>
> Any ideas?
>
> -Joe
>
> On 8/1/2019 6:15 PM, Angie Rabelero wrote:
> > I don’t think you’re using claudera or ambari, but ambari has an option
> to delete the locks. This seems more a configuration/architecture isssue
> than a realibility issue. You may want to spin up an alias while you bring
> down, clear locks and directories, recreate and index the affected
> collection, while you work your other isues.
> >
> > On Aug 1, 2019, at 16:40, Joe Obernberger 
> wrote:
> >
> > Been using Solr on HDFS for a while now, and I'm seeing an issue with
> redundancy/reliability.  If a server goes down, when it comes back up, it
> will never recover because of the lock files in HDFS. That solr node needs
> to be brought down manually, the lock files deleted, and then brought back
> up.  At that point, it appears to copy all the data for its replicas.  If
> the index is large, and new data is being indexed, in some cases it will
> never recover. The replication retries over and over.
> >
> > How can we make a reliable Solr Cloud cluster when using HDFS that can
> handle servers coming and going?
> >
> > Thank you!
> >
> > -Joe
> >
> >
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
>


Re: Solr on HDFS

2019-08-02 Thread Joe Obernberger
Thank you.  No, while the cluster is using Cloudera for HDFS, we do not 
use Cloudera to manager the solr cluster.  If it is a 
configuration/architecture issue, what can I do to fix it?  I'd like a 
system where servers can come and go, but the indexes stay available and 
recover automatically.  Is that possible with HDFS?
While adding an alias to other collections would be an option, if that 
collection is the only collection, or one that is currently needed, in a 
live system, we can't bring it down, re-create it, and re-index when 
that process may take weeks to do.


Any ideas?

-Joe

On 8/1/2019 6:15 PM, Angie Rabelero wrote:

I don’t think you’re using claudera or ambari, but ambari has an option to 
delete the locks. This seems more a configuration/architecture isssue than a 
realibility issue. You may want to spin up an alias while you bring down, clear 
locks and directories, recreate and index the affected collection, while you 
work your other isues.

On Aug 1, 2019, at 16:40, Joe Obernberger  wrote:

Been using Solr on HDFS for a while now, and I'm seeing an issue with 
redundancy/reliability.  If a server goes down, when it comes back up, it will 
never recover because of the lock files in HDFS. That solr node needs to be 
brought down manually, the lock files deleted, and then brought back up.  At 
that point, it appears to copy all the data for its replicas.  If the index is 
large, and new data is being indexed, in some cases it will never recover. The 
replication retries over and over.

How can we make a reliable Solr Cloud cluster when using HDFS that can handle 
servers coming and going?

Thank you!

-Joe



---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-08-02 Thread Jörn Franke
I just checked also the output of the telnet commands - for conf it is 
different for standalone compared to ensemble, will put it later in the Jira

> Am 02.08.2019 um 03:46 schrieb Zheng Lin Edwin Yeo :
> 
> Yes, I tried with space and the same error occurs.
> 
> I have also tried to put * , but I am getting the same error as well.
> 4lw.commands.whitelist=*
> 
> Regards,
> Edwin
> 
>> On Thu, 1 Aug 2019 at 21:34, Jörn Franke  wrote:
>> 
>> Spaces did not change the situation. In standalone it works without spaces
>> and the issue is only if there is an ensemble. I checked all ZK nodes have
>> the statement and have been restarted .
>> 
>> So issue pics there for a ZK Ensemble only in admin UI. I will create
>> tonight a JIRA if no one creates it before
>> 
>>> Am 01.08.2019 um 12:58 schrieb Erick Erickson :
>>> 
>>> The issue was only cosmetic in the sense that the admin UI was the only
>> thing that was affected,
>>> not other Solr functions when I was working on this.
>>> 
>>> Please check a few things:
>>> 
>>> 1> be absolutely sure that you’ve added this in all your zoo.cfg files
>>> 
>>> 2> the example on the ZooKeeper website has spaces, could you try that
>> just to cover all bases? Stranger things have happened…
>>> 
>>> 4lw.commands.whitelist=mntr, ruok, conf
>>> not
>>> 4lw.commands.whitelist=mntr,ruok,conf
>>> 
>>> 3> if <1> and <2> don’t work, what happens if you start your ZooKeepers
>> with
>>> -Dzookeeper.4lw.commands.whitelist=….
>>> 
>>> If it’s not <1> or <2>, please raise a JIRA.
>>> 
>>> Best,
>>> Erick
>>> 
>>> Also, see: SOLR-13502 (no work has been done on this yet)
>>> 
 On Aug 1, 2019, at 6:05 AM, Jörn Franke  wrote:
 
 For me:
 * ZK stand-alone mode - no issues
 * ZK Ensemble - it seems to be only a cosmetic issue in the Admin UI (I
>> see the same error message), but aside this Solr is working fine
 
 
 
> Am 01.08.2019 um 12:02 schrieb Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>:
> 
> Hi Jörn,
> Thank you for your reply.
> 
> I have encountered problem when I tried to create a collection with
>> this
> new version of ZooKeeper. You can find my Solr log file here:
> https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN
> 
> Does it work perfectly at your side for creating collections and
>> indexing
> even when running ZooKeeper ensemble?
> 
> Regards,
> Edwin
> 
> 
>> On Thu, 1 Aug 2019 at 17:39, Jörn Franke 
>> wrote:
>> 
>> I confirm the issue.
>> 
>> Interestingly it does not happen with ZK standalone, but only in a ZK
>> Ensemble.
>> 
>> It seems to be mainly cosmetic in the admin UI because Solr appears to
>> function normally.
>> 
>>> Am 01.08.2019 um 03:31 schrieb Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com
>>> :
>>> 
>>> Yes. You can get my full solr.log from the link below. The error is
>> there
>>> when I tried to create collection1 (around line 170 to 300) .
>>> 
>>> https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
 On Wed, 31 Jul 2019 at 18:39, Jan Høydahl 
>> wrote:
 
 Please look for the full log file solr.log in your Solr server, and
>> share
 it via some file sharing service or gist or similar for us to be
>> able to
 decipher the collection create error.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
> 31. jul. 2019 kl. 08:33 skrev Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com
> :
> 
> Hi,
> 
> Regarding the issue, I have tried to put the following in zoo.cfg
>> under
> ZooKeeper:
> 4lw.commands.whitelist=mntr,conf,ruok
> 
> But it is still showing this error.
> *"Errors: - membership: Check 4lq.commands.whitelist setting in
>> zookeeper
> configuration file."*
> 
> As I am using SolrCloud, the collection config can still be loaded
>> to
> ZooKeeper as per normal. But if I tried to create a collection, I
>> will
 get
> the following error:
> 
> {
> "responseHeader":{
> "status":400,
> "QTime":686},
> "failure":{
> "192.168.1.2:8983
 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> occurred when talking to server at:http://192.168.1.2:8983/solr;,
> "192.168.1.2:8984
 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> occurred when talking to server at:http://192.168.1.2:8984/solr"},
> "Operation create caused
> 
 
>> 
>> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Underlying core creation failed while creating collection:
>> 

Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5

2019-08-02 Thread Jörn Franke
Telnet is working correct. The status endpoint seem to report the error that is 
displayed in the UI.

I don’t see any obvious in the code, but it might not be working for more than 
one node, but i am not sure exactly why.

I could not find the log line there with „membership: check 4lw“ in the source 
ode 

> Am 02.08.2019 um 03:46 schrieb Zheng Lin Edwin Yeo :
> 
> Yes, I tried with space and the same error occurs.
> 
> I have also tried to put * , but I am getting the same error as well.
> 4lw.commands.whitelist=*
> 
> Regards,
> Edwin
> 
>> On Thu, 1 Aug 2019 at 21:34, Jörn Franke  wrote:
>> 
>> Spaces did not change the situation. In standalone it works without spaces
>> and the issue is only if there is an ensemble. I checked all ZK nodes have
>> the statement and have been restarted .
>> 
>> So issue pics there for a ZK Ensemble only in admin UI. I will create
>> tonight a JIRA if no one creates it before
>> 
>>> Am 01.08.2019 um 12:58 schrieb Erick Erickson :
>>> 
>>> The issue was only cosmetic in the sense that the admin UI was the only
>> thing that was affected,
>>> not other Solr functions when I was working on this.
>>> 
>>> Please check a few things:
>>> 
>>> 1> be absolutely sure that you’ve added this in all your zoo.cfg files
>>> 
>>> 2> the example on the ZooKeeper website has spaces, could you try that
>> just to cover all bases? Stranger things have happened…
>>> 
>>> 4lw.commands.whitelist=mntr, ruok, conf
>>> not
>>> 4lw.commands.whitelist=mntr,ruok,conf
>>> 
>>> 3> if <1> and <2> don’t work, what happens if you start your ZooKeepers
>> with
>>> -Dzookeeper.4lw.commands.whitelist=….
>>> 
>>> If it’s not <1> or <2>, please raise a JIRA.
>>> 
>>> Best,
>>> Erick
>>> 
>>> Also, see: SOLR-13502 (no work has been done on this yet)
>>> 
 On Aug 1, 2019, at 6:05 AM, Jörn Franke  wrote:
 
 For me:
 * ZK stand-alone mode - no issues
 * ZK Ensemble - it seems to be only a cosmetic issue in the Admin UI (I
>> see the same error message), but aside this Solr is working fine
 
 
 
> Am 01.08.2019 um 12:02 schrieb Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>:
> 
> Hi Jörn,
> Thank you for your reply.
> 
> I have encountered problem when I tried to create a collection with
>> this
> new version of ZooKeeper. You can find my Solr log file here:
> https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN
> 
> Does it work perfectly at your side for creating collections and
>> indexing
> even when running ZooKeeper ensemble?
> 
> Regards,
> Edwin
> 
> 
>> On Thu, 1 Aug 2019 at 17:39, Jörn Franke 
>> wrote:
>> 
>> I confirm the issue.
>> 
>> Interestingly it does not happen with ZK standalone, but only in a ZK
>> Ensemble.
>> 
>> It seems to be mainly cosmetic in the admin UI because Solr appears to
>> function normally.
>> 
>>> Am 01.08.2019 um 03:31 schrieb Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com
>>> :
>>> 
>>> Yes. You can get my full solr.log from the link below. The error is
>> there
>>> when I tried to create collection1 (around line 170 to 300) .
>>> 
>>> https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
 On Wed, 31 Jul 2019 at 18:39, Jan Høydahl 
>> wrote:
 
 Please look for the full log file solr.log in your Solr server, and
>> share
 it via some file sharing service or gist or similar for us to be
>> able to
 decipher the collection create error.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
> 31. jul. 2019 kl. 08:33 skrev Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com
> :
> 
> Hi,
> 
> Regarding the issue, I have tried to put the following in zoo.cfg
>> under
> ZooKeeper:
> 4lw.commands.whitelist=mntr,conf,ruok
> 
> But it is still showing this error.
> *"Errors: - membership: Check 4lq.commands.whitelist setting in
>> zookeeper
> configuration file."*
> 
> As I am using SolrCloud, the collection config can still be loaded
>> to
> ZooKeeper as per normal. But if I tried to create a collection, I
>> will
 get
> the following error:
> 
> {
> "responseHeader":{
> "status":400,
> "QTime":686},
> "failure":{
> "192.168.1.2:8983
 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> occurred when talking to server at:http://192.168.1.2:8983/solr;,
> "192.168.1.2:8984
 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException
> occurred when talking to server at:http://192.168.1.2:8984/solr"},
> "Operation create caused
> 
 
>> 
>> 

Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Jan Høydahl
Try the Apache Tika mailing list.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> Does anyone knows if this can be done on the Solr side?
> Or it has to be done on the Tika side?
> 
> Regards,
> Edwin
> 
> On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi,
>> 
>> Would like to check, Is there anyway which we can detect the number of
>> attachments and their names during indexing of EML files in Solr, and index
>> those information into Solr?
>> 
>> Currently, Solr is able to use Tika and Tesseract OCR to extract the
>> contents of the attachments. However, I could not find the information
>> about the number of attachments in the EML file and what are their filename.
>> 
>> I am using Solr 7.6.0 in production, and also trying out on the new Solr
>> 8.2.0.
>> 
>> Regards,
>> Edwin
>> 



Re: Problem with uploading Large synonym files in cloud mode

2019-08-02 Thread Jörn Franke
You can use the configset API:
https://lucene.apache.org/solr/guide/7_7/configsets-api.html

I don’t recommend to use Schema.xml , but managed Schemas:

https://lucene.apache.org/solr/guide/6_6/schema-api.html

For people new to Solr I generally recommend to read a recent book about Solr 
from beginning to end - that will bring you up to speed much faster than trying 
to find all the information via Internet and will prepare you to deliver 
results much faster in better quality.
Then it is also much easier to understand and use the reference guide

> Am 02.08.2019 um 08:30 schrieb Salmaan Rashid Syed 
> :
> 
> Hi Bernd,
> 
> Yet, another noob question.
> 
> Consider that my conf directory for creating a collection is _default. Suppose
> now I made changes to managed-schema and conf.xml, How do I upload it to
> external zookeeper at 2181 port?
> 
> Can you please give me the command that uploads altered config.xml and
> managed-schema to zookeeper?
> 
> Thanks.
> 
> 
> On Fri, Aug 2, 2019 at 11:53 AM Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> 
>> 
>> to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter.
>> 
>> to 2) I don't know because i never use internal zookeeper
>> 
>> to 3) the configs are located at solr/server/solr/configsets/
>>   - choose one configset, make your changes and upload it to zookeeper
>>   - when creating a new collection choose your uploaded config
>>   - whenever you change something at your config you have to upload
>> it to zookeeper
>> 
>> I don't know which Solr version you are using, but a good starting point
>> with solr cloud is
>> http://lucene.apache.org/solr/guide/6_6/solrcloud.html
>> 
>> Regards
>> Bernd
>> 
>> 
>> 
>>> Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed:
>>> Hi Bernd,
>>> 
>>> Sorry for noob questions.
>>> 
>>> 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr
>>> stop -all?
>>> 
>>> And then issue these commands,
>>> 
>>> bin/solr restart -cloud -s example/cloud/node1/solr -p 8983
>>> 
>>> bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr
>>> 
>>> 
>>> 2) Where can I find solr internal Zookeeper folder for issuing this
>> command
>>> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"?
>>> 
>>> 
>>> 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores
>> to
>>> make changes in schema and configuration? Or do I have to make chages in
>>> the directory that contains managed-schema and config.xml files with
>> which
>>> I initialized and created collections? And then the solr will pick them
>> up
>>> from there when it restarts?
>>> 
>>> 
>>> Regards,
>>> 
>>> Salmaan
>>> 
>>> 
>>> 
>>> On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de>
>>> wrote:
>>> 
 
 
> Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed:
> After I make the -Djute.maxbuffer changes to Solr, deployed in
 production,
> Do I need to restart the solr to be able to add synonyms >1MB?
 
 Yes, you have to restart Solr.
 
 
> 
> Or, Was this supposed to be done before putting Solr to production
>> ever?
> Can we make chages when the Solr is running in production?
 
 It depends on your system. In my cloud with 5 shards and 3 replicas I
>> can
 take one by one offline, stop, modify and start again without problems.
 
 
> 
> Thanks.
> 
> Regards,
> Salmaan
> 
> 
> 
> On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> 
>> You have to increase the -Djute.maxbuffer for large configs.
>> 
>> In Solr bin/solr/solr.in.sh use e.g.
>> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
>> This will increase maxbuffer for zookeeper on solr side to 10MB.
>> 
>> In Zookeeper zookeeper/conf/zookeeper-env.sh
>> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"
>> 
>> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works
>> perfect.
>> 
>> Regards
>> 
>> 
>>> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:
>>> Hi Solr Users,
>>> 
>>> I have a very big synonym file (>5MB). I am unable to start Solr in
 cloud
>>> mode as it throws an error message stating that the synonmys file is
>>> too large. I figured out that the zookeeper doesn't take a file
>> greater
>>> than 1MB size.
>>> 
>>> I tried to break down my synonyms file to smaller chunks less than
>> 1MB
>>> each. But, I am not sure about how to include all the filenames into
 the
>>> Solr schema.
>>> 
>>> Should it be seperated by commas like synonyms = "__1_synonyms.txt,
>>> __2_synonyms.txt, __3synonyms.txt"
>>> 
>>> Or is there a better way of doing that? Will the bigger file when
 broken
>>> down to smaller chunks will be uploaded to zookeeper as well.
>>> 
>>> Please help or please guide 

Re: Basic Authentication problem

2019-08-02 Thread Zheng Lin Edwin Yeo
>From what I see, you are trying to change your own user's password. If I
remembered correctly this might not be allowed, which is why you are
getting the "Unauthorized request" error.

You can try to create another user with admin role as well, and to change
your existing user's password from the new user.

Regards,
Edwin

On Fri, 2 Aug 2019 at 13:32, Salmaan Rashid Syed 
wrote:

> My curl command works fine for querying, updating etc.
>
> I don't think it is the fault of curl command.
>
> I get the following error message when I tried to change the password of
> solr-admin,
>
>
> 
>
> 
>
> 
>
> Error 403 Unauthorized request, Response code: 403
>
> 
>
> HTTP ERROR 403
>
> Problem accessing /solr/admin/authentication. Reason:
>
> Unauthorized request, Response code: 403
>
> 
>
> 
>
>
> And if I give incorrect username and password, it states bad credentials
> entered. So, I think the curl command is fine. There is some issue with
> basic authentication.
>
>
> Okay, One way around is to figure out how to convert my password into a
> SHA256 (password + salt) and enter it in security.json file. But, I have no
> idea how to generate the SHA256 equivalent of my password.
>
>
> Any suggestions?
>
>
>
> On Fri, Aug 2, 2019 at 10:55 AM Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Salmaan,
> >
> > Does your curl command works for other curl commands like normal
> querying?
> > Or is it just not working when updating password and adding new users?
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On Fri, 2 Aug 2019 at 13:03, Salmaan Rashid Syed <
> > salmaan.ras...@mroads.com>
> > wrote:
> >
> > > Hi Zheng,
> > >
> > > I tried and it works. But, when I use the curl command to update
> password
> > > or add new users it doesn't work.
> > >
> > > I don't know what is going wrong with curl command!
> > >
> > > Regards,
> > > Salmaan
> > >
> > >
> > > On Fri, Aug 2, 2019 at 8:26 AM Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > > wrote:
> > >
> > > > Have you tried to access the Solr Admin UI with your created user
> name
> > > and
> > > > password to see if it works?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > > On Thu, 1 Aug 2019 at 19:51, Salmaan Rashid Syed <
> > > > salmaan.ras...@mroads.com>
> > > > wrote:
> > > >
> > > > > Hi Solr User,
> > > > >
> > > > > Please help me with my issue.
> > > > >
> > > > > I have enabled Solr basic authentication as shown in Solr
> > > documentations.
> > > > >
> > > > > I have changed username from solr to solr-admin as follow
> > > > >
> > > > > {
> > > > > "authentication":{
> > > > >"blockUnknown": true,
> > > > >"class":"solr.BasicAuthPlugin",
> > > > >
> > > > >
> > > >
> > "credentials":{"solr-admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> > > > > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> > > > > },
> > > > > "authorization":{
> > > > >"class":"solr.RuleBasedAuthorizationPlugin",
> > > > >"permissions":[{"name":"security-edit",
> > > > >   "role":"admin"}],
> > > > >"user-role":{"solr-admin":"admin"}
> > > > > }}
> > > > >
> > > > > I am able to login to the page using the credentials
> > > > solr-admin:SolrRocks.
> > > > >
> > > > > But, when I try to change the default password using the curl
> command
> > > as
> > > > > follows,
> > > > >
> > > > > curl --user solr-admin:SolrRocks
> > > > > http://localhost:8983/solr/admin/authentication -H
> > > > > 'Content-type:application/json' -d
> > > '{"set-user":{"solr-admin":"s2019"}}'
> > > > >
> > > > >
> > > > > I get the following error message,
> > > > >
> > > > >
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > > > Error 403 Unauthorized request, Response code: 403
> > > > >
> > > > > 
> > > > >
> > > > > HTTP ERROR 403
> > > > >
> > > > > Problem accessing /solr/admin/authentication. Reason:
> > > > >
> > > > > Unauthorized request, Response code: 403
> > > > >
> > > > > 
> > > > >
> > > > > 
> > > > >
> > > > >
> > > > > Please help.
> > > > >
> > > > > Regards,
> > > > > Salmaan
> > > > >
> > > > >
> > > > > On Thu, Aug 1, 2019 at 1:51 PM Salmaan Rashid Syed <
> > > > > salmaan.ras...@mroads.com> wrote:
> > > > >
> > > > > > Small correction in the user-name. It is solr-admin everywhere.
> > > > > >
> > > > > > Hi Solr Users,
> > > > > >
> > > > > > I have enabled Solr basic authentication as shown in Solr
> > > > documentations.
> > > > > >
> > > > > > I have changed username from solr to solr-admin as follow
> > > > > >
> > > > > > {
> > > > > > "authentication":{
> > > > > >"blockUnknown": true,
> > > > > >"class":"solr.BasicAuthPlugin",
> > > > > >
> > > > > >
> > > > >
> > >
> "credentials":{"solr-admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> > > > > > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> > > > > > },
> > > > > > "authorization":{
> > > > > >"class":"solr.RuleBasedAuthorizationPlugin",
> > > > > >"permissions":[{"name":"security-edit",
> > > > > >   "role":"admin"}],
> > > > > >

Re: Problem with uploading Large synonym files in cloud mode

2019-08-02 Thread Bernd Fehling

http://lucene.apache.org/solr/guide/6_6/command-line-utilities.html
"Upload a configuration directory"

Take my advise and read the SolrCloud section of Solr Ref Guide.
It will answer most of your questions and is a good start.



Am 02.08.19 um 08:30 schrieb Salmaan Rashid Syed:

Hi Bernd,

Yet, another noob question.

Consider that my conf directory for creating a collection is _default. Suppose
now I made changes to managed-schema and conf.xml, How do I upload it to
external zookeeper at 2181 port?

Can you please give me the command that uploads altered config.xml and
managed-schema to zookeeper?

Thanks.


On Fri, Aug 2, 2019 at 11:53 AM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:



to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter.

to 2) I don't know because i never use internal zookeeper

to 3) the configs are located at solr/server/solr/configsets/
- choose one configset, make your changes and upload it to zookeeper
- when creating a new collection choose your uploaded config
- whenever you change something at your config you have to upload
it to zookeeper

I don't know which Solr version you are using, but a good starting point
with solr cloud is
http://lucene.apache.org/solr/guide/6_6/solrcloud.html

Regards
Bernd



Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed:

Hi Bernd,

Sorry for noob questions.

1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr
stop -all?

And then issue these commands,

bin/solr restart -cloud -s example/cloud/node1/solr -p 8983

bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr


2) Where can I find solr internal Zookeeper folder for issuing this

command

SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"?


3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores

to

make changes in schema and configuration? Or do I have to make chages in
the directory that contains managed-schema and config.xml files with

which

I initialized and created collections? And then the solr will pick them

up

from there when it restarts?


Regards,

Salmaan



On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling <

bernd.fehl...@uni-bielefeld.de>

wrote:




Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed:

After I make the -Djute.maxbuffer changes to Solr, deployed in

production,

Do I need to restart the solr to be able to add synonyms >1MB?


Yes, you have to restart Solr.




Or, Was this supposed to be done before putting Solr to production

ever?

Can we make chages when the Solr is running in production?


It depends on your system. In my cloud with 5 shards and 3 replicas I

can

take one by one offline, stop, modify and start again without problems.




Thanks.

Regards,
Salmaan



On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:


You have to increase the -Djute.maxbuffer for large configs.

In Solr bin/solr/solr.in.sh use e.g.
SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
This will increase maxbuffer for zookeeper on solr side to 10MB.

In Zookeeper zookeeper/conf/zookeeper-env.sh
SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"

I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works

perfect.


Regards


Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:

Hi Solr Users,

I have a very big synonym file (>5MB). I am unable to start Solr in

cloud

mode as it throws an error message stating that the synonmys file is
too large. I figured out that the zookeeper doesn't take a file

greater

than 1MB size.

I tried to break down my synonyms file to smaller chunks less than

1MB

each. But, I am not sure about how to include all the filenames into

the

Solr schema.

Should it be seperated by commas like synonyms = "__1_synonyms.txt,
__2_synonyms.txt, __3synonyms.txt"

Or is there a better way of doing that? Will the bigger file when

broken

down to smaller chunks will be uploaded to zookeeper as well.

Please help or please guide me to relevant documentation regarding

this.


Thank you.

Regards.
Salmaan.















Re: Problem with uploading Large synonym files in cloud mode

2019-08-02 Thread Salmaan Rashid Syed
Hi Bernd,

Yet, another noob question.

Consider that my conf directory for creating a collection is _default. Suppose
now I made changes to managed-schema and conf.xml, How do I upload it to
external zookeeper at 2181 port?

Can you please give me the command that uploads altered config.xml and
managed-schema to zookeeper?

Thanks.


On Fri, Aug 2, 2019 at 11:53 AM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

>
> to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter.
>
> to 2) I don't know because i never use internal zookeeper
>
> to 3) the configs are located at solr/server/solr/configsets/
>- choose one configset, make your changes and upload it to zookeeper
>- when creating a new collection choose your uploaded config
>- whenever you change something at your config you have to upload
> it to zookeeper
>
> I don't know which Solr version you are using, but a good starting point
> with solr cloud is
> http://lucene.apache.org/solr/guide/6_6/solrcloud.html
>
> Regards
> Bernd
>
>
>
> Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed:
> > Hi Bernd,
> >
> > Sorry for noob questions.
> >
> > 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr
> > stop -all?
> >
> > And then issue these commands,
> >
> > bin/solr restart -cloud -s example/cloud/node1/solr -p 8983
> >
> > bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr
> >
> >
> > 2) Where can I find solr internal Zookeeper folder for issuing this
> command
> > SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"?
> >
> >
> > 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores
> to
> > make changes in schema and configuration? Or do I have to make chages in
> > the directory that contains managed-schema and config.xml files with
> which
> > I initialized and created collections? And then the solr will pick them
> up
> > from there when it restarts?
> >
> >
> > Regards,
> >
> > Salmaan
> >
> >
> >
> > On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de>
> > wrote:
> >
> >>
> >>
> >> Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed:
> >>> After I make the -Djute.maxbuffer changes to Solr, deployed in
> >> production,
> >>> Do I need to restart the solr to be able to add synonyms >1MB?
> >>
> >> Yes, you have to restart Solr.
> >>
> >>
> >>>
> >>> Or, Was this supposed to be done before putting Solr to production
> ever?
> >>> Can we make chages when the Solr is running in production?
> >>
> >> It depends on your system. In my cloud with 5 shards and 3 replicas I
> can
> >> take one by one offline, stop, modify and start again without problems.
> >>
> >>
> >>>
> >>> Thanks.
> >>>
> >>> Regards,
> >>> Salmaan
> >>>
> >>>
> >>>
> >>> On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
> >>> bernd.fehl...@uni-bielefeld.de> wrote:
> >>>
>  You have to increase the -Djute.maxbuffer for large configs.
> 
>  In Solr bin/solr/solr.in.sh use e.g.
>  SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
>  This will increase maxbuffer for zookeeper on solr side to 10MB.
> 
>  In Zookeeper zookeeper/conf/zookeeper-env.sh
>  SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"
> 
>  I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works
> perfect.
> 
>  Regards
> 
> 
>  Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:
> > Hi Solr Users,
> >
> > I have a very big synonym file (>5MB). I am unable to start Solr in
> >> cloud
> > mode as it throws an error message stating that the synonmys file is
> > too large. I figured out that the zookeeper doesn't take a file
> greater
> > than 1MB size.
> >
> > I tried to break down my synonyms file to smaller chunks less than
> 1MB
> > each. But, I am not sure about how to include all the filenames into
> >> the
> > Solr schema.
> >
> > Should it be seperated by commas like synonyms = "__1_synonyms.txt,
> > __2_synonyms.txt, __3synonyms.txt"
> >
> > Or is there a better way of doing that? Will the bigger file when
> >> broken
> > down to smaller chunks will be uploaded to zookeeper as well.
> >
> > Please help or please guide me to relevant documentation regarding
> >> this.
> >
> > Thank you.
> >
> > Regards.
> > Salmaan.
> >
> 
> >>>
> >>
> >
>


Re: Problem with uploading Large synonym files in cloud mode

2019-08-02 Thread Bernd Fehling



to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter.

to 2) I don't know because i never use internal zookeeper

to 3) the configs are located at solr/server/solr/configsets/
  - choose one configset, make your changes and upload it to zookeeper
  - when creating a new collection choose your uploaded config
  - whenever you change something at your config you have to upload it to 
zookeeper

I don't know which Solr version you are using, but a good starting point with 
solr cloud is
http://lucene.apache.org/solr/guide/6_6/solrcloud.html

Regards
Bernd



Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed:

Hi Bernd,

Sorry for noob questions.

1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr
stop -all?

And then issue these commands,

bin/solr restart -cloud -s example/cloud/node1/solr -p 8983

bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr


2) Where can I find solr internal Zookeeper folder for issuing this command
SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"?


3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores to
make changes in schema and configuration? Or do I have to make chages in
the directory that contains managed-schema and config.xml files with which
I initialized and created collections? And then the solr will pick them up
from there when it restarts?


Regards,

Salmaan



On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling 
wrote:




Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed:

After I make the -Djute.maxbuffer changes to Solr, deployed in

production,

Do I need to restart the solr to be able to add synonyms >1MB?


Yes, you have to restart Solr.




Or, Was this supposed to be done before putting Solr to production ever?
Can we make chages when the Solr is running in production?


It depends on your system. In my cloud with 5 shards and 3 replicas I can
take one by one offline, stop, modify and start again without problems.




Thanks.

Regards,
Salmaan



On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:


You have to increase the -Djute.maxbuffer for large configs.

In Solr bin/solr/solr.in.sh use e.g.
SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
This will increase maxbuffer for zookeeper on solr side to 10MB.

In Zookeeper zookeeper/conf/zookeeper-env.sh
SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"

I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect.

Regards


Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:

Hi Solr Users,

I have a very big synonym file (>5MB). I am unable to start Solr in

cloud

mode as it throws an error message stating that the synonmys file is
too large. I figured out that the zookeeper doesn't take a file greater
than 1MB size.

I tried to break down my synonyms file to smaller chunks less than 1MB
each. But, I am not sure about how to include all the filenames into

the

Solr schema.

Should it be seperated by commas like synonyms = "__1_synonyms.txt,
__2_synonyms.txt, __3synonyms.txt"

Or is there a better way of doing that? Will the bigger file when

broken

down to smaller chunks will be uploaded to zookeeper as well.

Please help or please guide me to relevant documentation regarding

this.


Thank you.

Regards.
Salmaan.











Re: Problem with uploading Large synonym files in cloud mode

2019-08-02 Thread Salmaan Rashid Syed
Hi Bernd,

Sorry for noob questions.

1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr
stop -all?

And then issue these commands,

bin/solr restart -cloud -s example/cloud/node1/solr -p 8983

bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr


2) Where can I find solr internal Zookeeper folder for issuing this command
SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"?


3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores to
make changes in schema and configuration? Or do I have to make chages in
the directory that contains managed-schema and config.xml files with which
I initialized and created collections? And then the solr will pick them up
from there when it restarts?


Regards,

Salmaan



On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling 
wrote:

>
>
> Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed:
> > After I make the -Djute.maxbuffer changes to Solr, deployed in
> production,
> > Do I need to restart the solr to be able to add synonyms >1MB?
>
> Yes, you have to restart Solr.
>
>
> >
> > Or, Was this supposed to be done before putting Solr to production ever?
> > Can we make chages when the Solr is running in production?
>
> It depends on your system. In my cloud with 5 shards and 3 replicas I can
> take one by one offline, stop, modify and start again without problems.
>
>
> >
> > Thanks.
> >
> > Regards,
> > Salmaan
> >
> >
> >
> > On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling <
> > bernd.fehl...@uni-bielefeld.de> wrote:
> >
> >> You have to increase the -Djute.maxbuffer for large configs.
> >>
> >> In Solr bin/solr/solr.in.sh use e.g.
> >> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000"
> >> This will increase maxbuffer for zookeeper on solr side to 10MB.
> >>
> >> In Zookeeper zookeeper/conf/zookeeper-env.sh
> >> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"
> >>
> >> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect.
> >>
> >> Regards
> >>
> >>
> >> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed:
> >>> Hi Solr Users,
> >>>
> >>> I have a very big synonym file (>5MB). I am unable to start Solr in
> cloud
> >>> mode as it throws an error message stating that the synonmys file is
> >>> too large. I figured out that the zookeeper doesn't take a file greater
> >>> than 1MB size.
> >>>
> >>> I tried to break down my synonyms file to smaller chunks less than 1MB
> >>> each. But, I am not sure about how to include all the filenames into
> the
> >>> Solr schema.
> >>>
> >>> Should it be seperated by commas like synonyms = "__1_synonyms.txt,
> >>> __2_synonyms.txt, __3synonyms.txt"
> >>>
> >>> Or is there a better way of doing that? Will the bigger file when
> broken
> >>> down to smaller chunks will be uploaded to zookeeper as well.
> >>>
> >>> Please help or please guide me to relevant documentation regarding
> this.
> >>>
> >>> Thank you.
> >>>
> >>> Regards.
> >>> Salmaan.
> >>>
> >>
> >
>