Re: Indexing information on number of attachments and their names in EML file
Thanks for the reply, will find out more about it. Currently I am able to retrieve the normal Metadata of the email, but not the Metadata of the attachments which are part of the contents in the EML file, which looks something like this. --d8b77b057d59ca19-- --d8b77e057d59ca1b Content-Type: application/pdf; name="file1.pdf" Content-Disposition: attachment; filename="file1.pdf" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jpurtpnk0 Regards, Edwin On Sat, 3 Aug 2019 at 05:38, Tim Allison wrote: > I'd strongly recommend rolling your own ingest code. See Erick's > superb: https://lucidworks.com/post/indexing-with-solrj/ > > You can easily get attachments via the RecursiveParserWrapper, e.g. > > https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351 > > This will return a list of Metadata objects; the first one will be the > main/container, each other entry will be an attachment. Let us know > if you have any questions/surprises. There are a couple of todos for > .eml... > > On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl wrote: > > > > Try the Apache Tika mailing list. > > > > -- > > Jan Høydahl, search solution architect > > Cominvent AS - www.cominvent.com > > > > > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo >: > > > > > > Hi, > > > > > > Does anyone knows if this can be done on the Solr side? > > > Or it has to be done on the Tika side? > > > > > > Regards, > > > Edwin > > > > > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo > > > > wrote: > > > > > >> Hi, > > >> > > >> Would like to check, Is there anyway which we can detect the number of > > >> attachments and their names during indexing of EML files in Solr, and > index > > >> those information into Solr? > > >> > > >> Currently, Solr is able to use Tika and Tesseract OCR to extract the > > >> contents of the attachments. However, I could not find the information > > >> about the number of attachments in the EML file and what are their > filename. > > >> > > >> I am using Solr 7.6.0 in production, and also trying out on the new > Solr > > >> 8.2.0. > > >> > > >> Regards, > > >> Edwin > > >> > > >
Re: Indexing information on number of attachments and their names in EML file
I'd strongly recommend rolling your own ingest code. See Erick's superb: https://lucidworks.com/post/indexing-with-solrj/ You can easily get attachments via the RecursiveParserWrapper, e.g. https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351 This will return a list of Metadata objects; the first one will be the main/container, each other entry will be an attachment. Let us know if you have any questions/surprises. There are a couple of todos for .eml... On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl wrote: > > Try the Apache Tika mailing list. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo : > > > > Hi, > > > > Does anyone knows if this can be done on the Solr side? > > Or it has to be done on the Tika side? > > > > Regards, > > Edwin > > > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo > > wrote: > > > >> Hi, > >> > >> Would like to check, Is there anyway which we can detect the number of > >> attachments and their names during indexing of EML files in Solr, and index > >> those information into Solr? > >> > >> Currently, Solr is able to use Tika and Tesseract OCR to extract the > >> contents of the attachments. However, I could not find the information > >> about the number of attachments in the EML file and what are their > >> filename. > >> > >> I am using Solr 7.6.0 in production, and also trying out on the new Solr > >> 8.2.0. > >> > >> Regards, > >> Edwin > >> >
Re: Solr on HDFS
> > If you think about it, having a shard with 3 replicas on top of a file system that does 3x replication seems a little excessive! https://issues.apache.org/jira/browse/SOLR-6305 should help here. I can take a look at merging the patch since looks like it has been helpful to others. Kevin Risden On Fri, Aug 2, 2019 at 10:09 AM Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Hi Kyle - Thank you. > > Our current index is split across 3 solr collections; our largest > collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across > 100 shards. There are 40 machines hosting this cluster. We've found > that when dealing with large collections having no replicas (but lots of > shards) ends up being more reliable since there is a much smaller > recovery time. We keep another 30 day index (1.4TBytes) that does have > replicas (40 shards, 3 replicas each), and if a node goes down, we > manually delete lock files and then bring it back up and yes - lots of > network IO, but it usually recovers OK. > > Having a large collection like this with no replicas seems like a recipe > for disaster. So, we've been experimenting with the latest version > (8.2) and our index process to split up the data into many solr > collections that do have replicas, and then build the list of > collections to search at query time. Our searches are date based, so we > can define what collections we want to query at query time. As a test, > we ran just two machines, HDFS, and 500 collections. One server ran out > of memory and crashed. We had over 1,600 lock files to delete. > > If you think about it, having a shard with 3 replicas on top of a file > system that does 3x replication seems a little excessive! I'd love to > see Solr take more advantage of a shared FS. Perhaps an idea is to use > HDFS but with an NFS gateway. Seems like that may be slow. > Architecturally, I love only having one large file system to manage > instead of lots of individual file systems across many machines. HDFS > makes this easy. > > -Joe > > On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote: > > Hi Joe, > > > > We fought with Solr on HDFS for quite some time, and faced similar issues > > as you're seeing. (See this thread, for example:" > > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e > > ) > > > > The Solr lock files on HDFS get deleted if the Solr server gets shut down > > gracefully, but we couldn't always guarantee that in our environment so > we > > ended up writing a custom startup script to search for lock files on HDFS > > and delete them before solr startup. > > > > However, the issue that you mention of the Solr server rebuilding its > whole > > index from replicas on startup was enough of a show-stopper for us that > we > > switched away from HDFS to local disk. It literally made the difference > > between 24+ hours of recovery time after an unexpected outage to less > than > > a minute... > > > > If you do end up finding a solution to this issue, please post it to this > > mailing list, because there are others out there (like us!) who would > most > > definitely make use it. > > > > Thanks > > > > Kyle > > > > On Fri, 2 Aug 2019 at 08:58, Joe Obernberger < > joseph.obernber...@gmail.com> > > wrote: > > > >> Thank you. No, while the cluster is using Cloudera for HDFS, we do not > >> use Cloudera to manager the solr cluster. If it is a > >> configuration/architecture issue, what can I do to fix it? I'd like a > >> system where servers can come and go, but the indexes stay available and > >> recover automatically. Is that possible with HDFS? > >> While adding an alias to other collections would be an option, if that > >> collection is the only collection, or one that is currently needed, in a > >> live system, we can't bring it down, re-create it, and re-index when > >> that process may take weeks to do. > >> > >> Any ideas? > >> > >> -Joe > >> > >> On 8/1/2019 6:15 PM, Angie Rabelero wrote: > >>> I don’t think you’re using claudera or ambari, but ambari has an option > >> to delete the locks. This seems more a configuration/architecture isssue > >> than a realibility issue. You may want to spin up an alias while you > bring > >> down, clear locks and directories, recreate and index the affected > >> collection, while you work your other isues. > >>> On Aug 1, 2019, at 16:40, Joe Obernberger < > joseph.obernber...@gmail.com> > >> wrote: > >>> Been using Solr on HDFS for a while now, and I'm seeing an issue with > >> redundancy/reliability. If a server goes down, when it comes back up, > it > >> will never recover because of the lock files in HDFS. That solr node > needs > >> to be brought down manually, the lock files deleted, and then brought > back > >> up. At that point, it appears to copy all the data for its replicas. > If > >> the index is large, and new data is being indexed, in some cases it will > >> never
Re: Solr on HDFS
Hi Kyle - Thank you. Our current index is split across 3 solr collections; our largest collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across 100 shards. There are 40 machines hosting this cluster. We've found that when dealing with large collections having no replicas (but lots of shards) ends up being more reliable since there is a much smaller recovery time. We keep another 30 day index (1.4TBytes) that does have replicas (40 shards, 3 replicas each), and if a node goes down, we manually delete lock files and then bring it back up and yes - lots of network IO, but it usually recovers OK. Having a large collection like this with no replicas seems like a recipe for disaster. So, we've been experimenting with the latest version (8.2) and our index process to split up the data into many solr collections that do have replicas, and then build the list of collections to search at query time. Our searches are date based, so we can define what collections we want to query at query time. As a test, we ran just two machines, HDFS, and 500 collections. One server ran out of memory and crashed. We had over 1,600 lock files to delete. If you think about it, having a shard with 3 replicas on top of a file system that does 3x replication seems a little excessive! I'd love to see Solr take more advantage of a shared FS. Perhaps an idea is to use HDFS but with an NFS gateway. Seems like that may be slow. Architecturally, I love only having one large file system to manage instead of lots of individual file systems across many machines. HDFS makes this easy. -Joe On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote: Hi Joe, We fought with Solr on HDFS for quite some time, and faced similar issues as you're seeing. (See this thread, for example:" http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e ) The Solr lock files on HDFS get deleted if the Solr server gets shut down gracefully, but we couldn't always guarantee that in our environment so we ended up writing a custom startup script to search for lock files on HDFS and delete them before solr startup. However, the issue that you mention of the Solr server rebuilding its whole index from replicas on startup was enough of a show-stopper for us that we switched away from HDFS to local disk. It literally made the difference between 24+ hours of recovery time after an unexpected outage to less than a minute... If you do end up finding a solution to this issue, please post it to this mailing list, because there are others out there (like us!) who would most definitely make use it. Thanks Kyle On Fri, 2 Aug 2019 at 08:58, Joe Obernberger wrote: Thank you. No, while the cluster is using Cloudera for HDFS, we do not use Cloudera to manager the solr cluster. If it is a configuration/architecture issue, what can I do to fix it? I'd like a system where servers can come and go, but the indexes stay available and recover automatically. Is that possible with HDFS? While adding an alias to other collections would be an option, if that collection is the only collection, or one that is currently needed, in a live system, we can't bring it down, re-create it, and re-index when that process may take weeks to do. Any ideas? -Joe On 8/1/2019 6:15 PM, Angie Rabelero wrote: I don’t think you’re using claudera or ambari, but ambari has an option to delete the locks. This seems more a configuration/architecture isssue than a realibility issue. You may want to spin up an alias while you bring down, clear locks and directories, recreate and index the affected collection, while you work your other isues. On Aug 1, 2019, at 16:40, Joe Obernberger wrote: Been using Solr on HDFS for a while now, and I'm seeing an issue with redundancy/reliability. If a server goes down, when it comes back up, it will never recover because of the lock files in HDFS. That solr node needs to be brought down manually, the lock files deleted, and then brought back up. At that point, it appears to copy all the data for its replicas. If the index is large, and new data is being indexed, in some cases it will never recover. The replication retries over and over. How can we make a reliable Solr Cloud cluster when using HDFS that can handle servers coming and going? Thank you! -Joe --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
Not sure if this is possible, but why not create a query handler in Solr with any custom query and you use that as ping replacement ? > Am 02.08.2019 um 15:48 schrieb dinesh naik : > > Hi all, > I have few clusters with huge data set and whenever a node goes down its > not able to recover due to below reasons: > > 1. ping request handler is taking more than 10-15 seconds to respond. The > ping requesthandler however, expects it will return in less than 1 second > and fails a requestrecovery if it is not responded to in this time. > Therefore recoveries never would start. > > 2. soft commit is very low ie. 5 sec. This is a business requirement so > not much can be done here. > > As the standard/default admin/ping request handler is using *:* queries , > the response time is much higher, and i am looking for an option to change > the same so that the ping handler returns the results within few > miliseconds. > > here is an example for standard query time: > > snip--- > curl " > http://hostname:8983/solr/parts/select?indent=on=*:*=0=json=false=timing > " > { > "responseHeader":{ >"zkConnected":true, >"status":0, >"QTime":16620, >"params":{ > "q":"*:*", > "distrib":"false", > "debug":"timing", > "indent":"on", > "rows":"0", > "wt":"json"}}, > "response":{"numFound":1329638799,"start":0,"docs":[] > }, > "debug":{ >"timing":{ > "time":16620.0, > "prepare":{ >"time":0.0, >"query":{ > "time":0.0}, >"facet":{ > "time":0.0}, >"facet_module":{ > "time":0.0}, >"mlt":{ > "time":0.0}, >"highlight":{ > "time":0.0}, >"stats":{ > "time":0.0}, >"expand":{ > "time":0.0}, >"terms":{ > "time":0.0}, >"block-expensive-queries":{ > "time":0.0}, >"slow-query-logger":{ > "time":0.0}, >"debug":{ > "time":0.0}}, > "process":{ >"time":16619.0, >"query":{ > "time":16619.0}, >"facet":{ > "time":0.0}, >"facet_module":{ > "time":0.0}, >"mlt":{ > "time":0.0}, >"highlight":{ > "time":0.0}, >"stats":{ > "time":0.0}, >"expand":{ > "time":0.0}, >"terms":{ > "time":0.0}, >"block-expensive-queries":{ > "time":0.0}, >"slow-query-logger":{ > "time":0.0}, >"debug":{ > "time":0.0} > > > snap > > can we use query: _root_:abc in the ping request handler ? Tried this query > and its returning the results within few miliseconds and also the nodes are > able to recover without any issue. > > we want to use _root_ field for querying as this field is available in all > our clusters with below definition: > termOffsets="false" stored="false" termPayloads="false" termPositions= > "false" docValues="false" termVectors="false"/> > Could you please let me know if using _root_ for querying in > pingRequestHandler will cause any problem? > > name="invariants"> /select _root_:abc > > > -- > Best Regards, > Dinesh Naik
Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
Hi all, I have few clusters with huge data set and whenever a node goes down its not able to recover due to below reasons: 1. ping request handler is taking more than 10-15 seconds to respond. The ping requesthandler however, expects it will return in less than 1 second and fails a requestrecovery if it is not responded to in this time. Therefore recoveries never would start. 2. soft commit is very low ie. 5 sec. This is a business requirement so not much can be done here. As the standard/default admin/ping request handler is using *:* queries , the response time is much higher, and i am looking for an option to change the same so that the ping handler returns the results within few miliseconds. here is an example for standard query time: snip--- curl " http://hostname:8983/solr/parts/select?indent=on=*:*=0=json=false=timing " { "responseHeader":{ "zkConnected":true, "status":0, "QTime":16620, "params":{ "q":"*:*", "distrib":"false", "debug":"timing", "indent":"on", "rows":"0", "wt":"json"}}, "response":{"numFound":1329638799,"start":0,"docs":[] }, "debug":{ "timing":{ "time":16620.0, "prepare":{ "time":0.0, "query":{ "time":0.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":0.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "block-expensive-queries":{ "time":0.0}, "slow-query-logger":{ "time":0.0}, "debug":{ "time":0.0}}, "process":{ "time":16619.0, "query":{ "time":16619.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":0.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "block-expensive-queries":{ "time":0.0}, "slow-query-logger":{ "time":0.0}, "debug":{ "time":0.0} snap can we use query: _root_:abc in the ping request handler ? Tried this query and its returning the results within few miliseconds and also the nodes are able to recover without any issue. we want to use _root_ field for querying as this field is available in all our clusters with below definition: Could you please let me know if using _root_ for querying in pingRequestHandler will cause any problem? /select _root_:abc -- Best Regards, Dinesh Naik
RE: Basic Authentication problem
Was I correct in my description yesterday (which I am pasting in below)? That you are using a hash based on the "solr" account name and expecting that to work if you change the account name but not the hash? Am I correct in assuming that everything other than security-edit functions currently works for you with any account and any password, including without any login-and-password at all? -Original Message- From: Oakley, Craig (NIH/NLM/NCBI) [C] Sent: Thursday, August 01, 2019 10:58 AM To: solr-user@lucene.apache.org Subject: RE: Basic Authentication problem The hash value "IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=" is based on both the plain text password AND the plain test login. Since "solr" is not the same string as "solr-admin", the password will not work. If the only authorization in security.json is restricting security-edit, then you can do anything else with any password, or with no password. What you can do is setup the security.json file as specified in the Reference Guide (whence you got the hash of the login and password), then use the default solr login to run your set-user (to add the solr-admin user alongside the existing solr login), then use the default solr login to run {"set-user-role":{"solr-admin":["security-edit"]}}, and then (when you are sure things are correctly setup for solr-admin) drop the default solr login -Original Message- From: Zheng Lin Edwin Yeo Sent: Friday, August 02, 2019 2:59 AM To: solr-user@lucene.apache.org Subject: Re: Basic Authentication problem From what I see, you are trying to change your own user's password. If I remembered correctly this might not be allowed, which is why you are getting the "Unauthorized request" error. You can try to create another user with admin role as well, and to change your existing user's password from the new user. Regards, Edwin On Fri, 2 Aug 2019 at 13:32, Salmaan Rashid Syed wrote: > My curl command works fine for querying, updating etc. > > I don't think it is the fault of curl command. > > I get the following error message when I tried to change the password of > solr-admin, > > > > > > > > > Error 403 Unauthorized request, Response code: 403 > > > > HTTP ERROR 403 > > Problem accessing /solr/admin/authentication. Reason: > > Unauthorized request, Response code: 403 > > > > > > > And if I give incorrect username and password, it states bad credentials > entered. So, I think the curl command is fine. There is some issue with > basic authentication. > > > Okay, One way around is to figure out how to convert my password into a > SHA256 (password + salt) and enter it in security.json file. But, I have no > idea how to generate the SHA256 equivalent of my password. > > > Any suggestions? > > > > On Fri, Aug 2, 2019 at 10:55 AM Zheng Lin Edwin Yeo > wrote: > > > Hi Salmaan, > > > > Does your curl command works for other curl commands like normal > querying? > > Or is it just not working when updating password and adding new users? > > > > Regards, > > Edwin > > > > > > > > On Fri, 2 Aug 2019 at 13:03, Salmaan Rashid Syed < > > salmaan.ras...@mroads.com> > > wrote: > > > > > Hi Zheng, > > > > > > I tried and it works. But, when I use the curl command to update > password > > > or add new users it doesn't work. > > > > > > I don't know what is going wrong with curl command! > > > > > > Regards, > > > Salmaan > > > > > > > > > On Fri, Aug 2, 2019 at 8:26 AM Zheng Lin Edwin Yeo < > edwinye...@gmail.com > > > > > > wrote: > > > > > > > Have you tried to access the Solr Admin UI with your created user > name > > > and > > > > password to see if it works? > > > > > > > > Regards, > > > > Edwin > > > > > > > > On Thu, 1 Aug 2019 at 19:51, Salmaan Rashid Syed < > > > > salmaan.ras...@mroads.com> > > > > wrote: > > > > > > > > > Hi Solr User, > > > > > > > > > > Please help me with my issue. > > > > > > > > > > I have enabled Solr basic authentication as shown in Solr > > > documentations. > > > > > > > > > > I have changed username from solr to solr-admin as follow > > > > > > > > > > { > > > > > "authentication":{ > > > > >"blockUnknown": true, > > > > >"class":"solr.BasicAuthPlugin", > > > > > > > > > > > > > > > > "credentials":{"solr-admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= > > > > > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} > > > > > }, > > > > > "authorization":{ > > > > >"class":"solr.RuleBasedAuthorizationPlugin", > > > > >"permissions":[{"name":"security-edit", > > > > > "role":"admin"}], > > > > >"user-role":{"solr-admin":"admin"} > > > > > }} > > > > > > > > > > I am able to login to the page using the credentials > > > > solr-admin:SolrRocks. > > > > > > > > > > But, when I try to change the default password using the curl > command > > > as > > > > > follows, > > > > > > > > > > curl --user solr-admin:SolrRocks > > > > > http://localhost:8983/solr/admin/authentication -H > > > > >
Re: Solr on HDFS
Hi Joe, We fought with Solr on HDFS for quite some time, and faced similar issues as you're seeing. (See this thread, for example:" http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e ) The Solr lock files on HDFS get deleted if the Solr server gets shut down gracefully, but we couldn't always guarantee that in our environment so we ended up writing a custom startup script to search for lock files on HDFS and delete them before solr startup. However, the issue that you mention of the Solr server rebuilding its whole index from replicas on startup was enough of a show-stopper for us that we switched away from HDFS to local disk. It literally made the difference between 24+ hours of recovery time after an unexpected outage to less than a minute... If you do end up finding a solution to this issue, please post it to this mailing list, because there are others out there (like us!) who would most definitely make use it. Thanks Kyle On Fri, 2 Aug 2019 at 08:58, Joe Obernberger wrote: > Thank you. No, while the cluster is using Cloudera for HDFS, we do not > use Cloudera to manager the solr cluster. If it is a > configuration/architecture issue, what can I do to fix it? I'd like a > system where servers can come and go, but the indexes stay available and > recover automatically. Is that possible with HDFS? > While adding an alias to other collections would be an option, if that > collection is the only collection, or one that is currently needed, in a > live system, we can't bring it down, re-create it, and re-index when > that process may take weeks to do. > > Any ideas? > > -Joe > > On 8/1/2019 6:15 PM, Angie Rabelero wrote: > > I don’t think you’re using claudera or ambari, but ambari has an option > to delete the locks. This seems more a configuration/architecture isssue > than a realibility issue. You may want to spin up an alias while you bring > down, clear locks and directories, recreate and index the affected > collection, while you work your other isues. > > > > On Aug 1, 2019, at 16:40, Joe Obernberger > wrote: > > > > Been using Solr on HDFS for a while now, and I'm seeing an issue with > redundancy/reliability. If a server goes down, when it comes back up, it > will never recover because of the lock files in HDFS. That solr node needs > to be brought down manually, the lock files deleted, and then brought back > up. At that point, it appears to copy all the data for its replicas. If > the index is large, and new data is being indexed, in some cases it will > never recover. The replication retries over and over. > > > > How can we make a reliable Solr Cloud cluster when using HDFS that can > handle servers coming and going? > > > > Thank you! > > > > -Joe > > > > > > > > --- > > This email has been checked for viruses by AVG. > > https://www.avg.com > > >
Re: Solr on HDFS
Thank you. No, while the cluster is using Cloudera for HDFS, we do not use Cloudera to manager the solr cluster. If it is a configuration/architecture issue, what can I do to fix it? I'd like a system where servers can come and go, but the indexes stay available and recover automatically. Is that possible with HDFS? While adding an alias to other collections would be an option, if that collection is the only collection, or one that is currently needed, in a live system, we can't bring it down, re-create it, and re-index when that process may take weeks to do. Any ideas? -Joe On 8/1/2019 6:15 PM, Angie Rabelero wrote: I don’t think you’re using claudera or ambari, but ambari has an option to delete the locks. This seems more a configuration/architecture isssue than a realibility issue. You may want to spin up an alias while you bring down, clear locks and directories, recreate and index the affected collection, while you work your other isues. On Aug 1, 2019, at 16:40, Joe Obernberger wrote: Been using Solr on HDFS for a while now, and I'm seeing an issue with redundancy/reliability. If a server goes down, when it comes back up, it will never recover because of the lock files in HDFS. That solr node needs to be brought down manually, the lock files deleted, and then brought back up. At that point, it appears to copy all the data for its replicas. If the index is large, and new data is being indexed, in some cases it will never recover. The replication retries over and over. How can we make a reliable Solr Cloud cluster when using HDFS that can handle servers coming and going? Thank you! -Joe --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5
I just checked also the output of the telnet commands - for conf it is different for standalone compared to ensemble, will put it later in the Jira > Am 02.08.2019 um 03:46 schrieb Zheng Lin Edwin Yeo : > > Yes, I tried with space and the same error occurs. > > I have also tried to put * , but I am getting the same error as well. > 4lw.commands.whitelist=* > > Regards, > Edwin > >> On Thu, 1 Aug 2019 at 21:34, Jörn Franke wrote: >> >> Spaces did not change the situation. In standalone it works without spaces >> and the issue is only if there is an ensemble. I checked all ZK nodes have >> the statement and have been restarted . >> >> So issue pics there for a ZK Ensemble only in admin UI. I will create >> tonight a JIRA if no one creates it before >> >>> Am 01.08.2019 um 12:58 schrieb Erick Erickson : >>> >>> The issue was only cosmetic in the sense that the admin UI was the only >> thing that was affected, >>> not other Solr functions when I was working on this. >>> >>> Please check a few things: >>> >>> 1> be absolutely sure that you’ve added this in all your zoo.cfg files >>> >>> 2> the example on the ZooKeeper website has spaces, could you try that >> just to cover all bases? Stranger things have happened… >>> >>> 4lw.commands.whitelist=mntr, ruok, conf >>> not >>> 4lw.commands.whitelist=mntr,ruok,conf >>> >>> 3> if <1> and <2> don’t work, what happens if you start your ZooKeepers >> with >>> -Dzookeeper.4lw.commands.whitelist=…. >>> >>> If it’s not <1> or <2>, please raise a JIRA. >>> >>> Best, >>> Erick >>> >>> Also, see: SOLR-13502 (no work has been done on this yet) >>> On Aug 1, 2019, at 6:05 AM, Jörn Franke wrote: For me: * ZK stand-alone mode - no issues * ZK Ensemble - it seems to be only a cosmetic issue in the Admin UI (I >> see the same error message), but aside this Solr is working fine > Am 01.08.2019 um 12:02 schrieb Zheng Lin Edwin Yeo < >> edwinye...@gmail.com>: > > Hi Jörn, > Thank you for your reply. > > I have encountered problem when I tried to create a collection with >> this > new version of ZooKeeper. You can find my Solr log file here: > https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN > > Does it work perfectly at your side for creating collections and >> indexing > even when running ZooKeeper ensemble? > > Regards, > Edwin > > >> On Thu, 1 Aug 2019 at 17:39, Jörn Franke >> wrote: >> >> I confirm the issue. >> >> Interestingly it does not happen with ZK standalone, but only in a ZK >> Ensemble. >> >> It seems to be mainly cosmetic in the admin UI because Solr appears to >> function normally. >> >>> Am 01.08.2019 um 03:31 schrieb Zheng Lin Edwin Yeo < >> edwinye...@gmail.com >>> : >>> >>> Yes. You can get my full solr.log from the link below. The error is >> there >>> when I tried to create collection1 (around line 170 to 300) . >>> >>> https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN >>> >>> Regards, >>> Edwin >>> >>> On Wed, 31 Jul 2019 at 18:39, Jan Høydahl >> wrote: Please look for the full log file solr.log in your Solr server, and >> share it via some file sharing service or gist or similar for us to be >> able to decipher the collection create error. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 31. jul. 2019 kl. 08:33 skrev Zheng Lin Edwin Yeo < >> edwinye...@gmail.com > : > > Hi, > > Regarding the issue, I have tried to put the following in zoo.cfg >> under > ZooKeeper: > 4lw.commands.whitelist=mntr,conf,ruok > > But it is still showing this error. > *"Errors: - membership: Check 4lq.commands.whitelist setting in >> zookeeper > configuration file."* > > As I am using SolrCloud, the collection config can still be loaded >> to > ZooKeeper as per normal. But if I tried to create a collection, I >> will get > the following error: > > { > "responseHeader":{ > "status":400, > "QTime":686}, > "failure":{ > "192.168.1.2:8983 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException > occurred when talking to server at:http://192.168.1.2:8983/solr;, > "192.168.1.2:8984 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException > occurred when talking to server at:http://192.168.1.2:8984/solr"}, > "Operation create caused > >> >> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > Underlying core creation failed while creating collection: >>
Re: Solr 8.2.0 having issue with ZooKeeper 3.5.5
Telnet is working correct. The status endpoint seem to report the error that is displayed in the UI. I don’t see any obvious in the code, but it might not be working for more than one node, but i am not sure exactly why. I could not find the log line there with „membership: check 4lw“ in the source ode > Am 02.08.2019 um 03:46 schrieb Zheng Lin Edwin Yeo : > > Yes, I tried with space and the same error occurs. > > I have also tried to put * , but I am getting the same error as well. > 4lw.commands.whitelist=* > > Regards, > Edwin > >> On Thu, 1 Aug 2019 at 21:34, Jörn Franke wrote: >> >> Spaces did not change the situation. In standalone it works without spaces >> and the issue is only if there is an ensemble. I checked all ZK nodes have >> the statement and have been restarted . >> >> So issue pics there for a ZK Ensemble only in admin UI. I will create >> tonight a JIRA if no one creates it before >> >>> Am 01.08.2019 um 12:58 schrieb Erick Erickson : >>> >>> The issue was only cosmetic in the sense that the admin UI was the only >> thing that was affected, >>> not other Solr functions when I was working on this. >>> >>> Please check a few things: >>> >>> 1> be absolutely sure that you’ve added this in all your zoo.cfg files >>> >>> 2> the example on the ZooKeeper website has spaces, could you try that >> just to cover all bases? Stranger things have happened… >>> >>> 4lw.commands.whitelist=mntr, ruok, conf >>> not >>> 4lw.commands.whitelist=mntr,ruok,conf >>> >>> 3> if <1> and <2> don’t work, what happens if you start your ZooKeepers >> with >>> -Dzookeeper.4lw.commands.whitelist=…. >>> >>> If it’s not <1> or <2>, please raise a JIRA. >>> >>> Best, >>> Erick >>> >>> Also, see: SOLR-13502 (no work has been done on this yet) >>> On Aug 1, 2019, at 6:05 AM, Jörn Franke wrote: For me: * ZK stand-alone mode - no issues * ZK Ensemble - it seems to be only a cosmetic issue in the Admin UI (I >> see the same error message), but aside this Solr is working fine > Am 01.08.2019 um 12:02 schrieb Zheng Lin Edwin Yeo < >> edwinye...@gmail.com>: > > Hi Jörn, > Thank you for your reply. > > I have encountered problem when I tried to create a collection with >> this > new version of ZooKeeper. You can find my Solr log file here: > https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN > > Does it work perfectly at your side for creating collections and >> indexing > even when running ZooKeeper ensemble? > > Regards, > Edwin > > >> On Thu, 1 Aug 2019 at 17:39, Jörn Franke >> wrote: >> >> I confirm the issue. >> >> Interestingly it does not happen with ZK standalone, but only in a ZK >> Ensemble. >> >> It seems to be mainly cosmetic in the admin UI because Solr appears to >> function normally. >> >>> Am 01.08.2019 um 03:31 schrieb Zheng Lin Edwin Yeo < >> edwinye...@gmail.com >>> : >>> >>> Yes. You can get my full solr.log from the link below. The error is >> there >>> when I tried to create collection1 (around line 170 to 300) . >>> >>> https://drive.google.com/open?id=1qkMLTRJ4eDSFwbqr15wSqjbg4dJV-bGN >>> >>> Regards, >>> Edwin >>> >>> On Wed, 31 Jul 2019 at 18:39, Jan Høydahl >> wrote: Please look for the full log file solr.log in your Solr server, and >> share it via some file sharing service or gist or similar for us to be >> able to decipher the collection create error. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 31. jul. 2019 kl. 08:33 skrev Zheng Lin Edwin Yeo < >> edwinye...@gmail.com > : > > Hi, > > Regarding the issue, I have tried to put the following in zoo.cfg >> under > ZooKeeper: > 4lw.commands.whitelist=mntr,conf,ruok > > But it is still showing this error. > *"Errors: - membership: Check 4lq.commands.whitelist setting in >> zookeeper > configuration file."* > > As I am using SolrCloud, the collection config can still be loaded >> to > ZooKeeper as per normal. But if I tried to create a collection, I >> will get > the following error: > > { > "responseHeader":{ > "status":400, > "QTime":686}, > "failure":{ > "192.168.1.2:8983 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException > occurred when talking to server at:http://192.168.1.2:8983/solr;, > "192.168.1.2:8984 _solr":"org.apache.solr.client.solrj.SolrServerException:IOException > occurred when talking to server at:http://192.168.1.2:8984/solr"}, > "Operation create caused > >> >>
Re: Indexing information on number of attachments and their names in EML file
Try the Apache Tika mailing list. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo : > > Hi, > > Does anyone knows if this can be done on the Solr side? > Or it has to be done on the Tika side? > > Regards, > Edwin > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo > wrote: > >> Hi, >> >> Would like to check, Is there anyway which we can detect the number of >> attachments and their names during indexing of EML files in Solr, and index >> those information into Solr? >> >> Currently, Solr is able to use Tika and Tesseract OCR to extract the >> contents of the attachments. However, I could not find the information >> about the number of attachments in the EML file and what are their filename. >> >> I am using Solr 7.6.0 in production, and also trying out on the new Solr >> 8.2.0. >> >> Regards, >> Edwin >>
Re: Problem with uploading Large synonym files in cloud mode
You can use the configset API: https://lucene.apache.org/solr/guide/7_7/configsets-api.html I don’t recommend to use Schema.xml , but managed Schemas: https://lucene.apache.org/solr/guide/6_6/schema-api.html For people new to Solr I generally recommend to read a recent book about Solr from beginning to end - that will bring you up to speed much faster than trying to find all the information via Internet and will prepare you to deliver results much faster in better quality. Then it is also much easier to understand and use the reference guide > Am 02.08.2019 um 08:30 schrieb Salmaan Rashid Syed > : > > Hi Bernd, > > Yet, another noob question. > > Consider that my conf directory for creating a collection is _default. Suppose > now I made changes to managed-schema and conf.xml, How do I upload it to > external zookeeper at 2181 port? > > Can you please give me the command that uploads altered config.xml and > managed-schema to zookeeper? > > Thanks. > > > On Fri, Aug 2, 2019 at 11:53 AM Bernd Fehling < > bernd.fehl...@uni-bielefeld.de> wrote: > >> >> to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter. >> >> to 2) I don't know because i never use internal zookeeper >> >> to 3) the configs are located at solr/server/solr/configsets/ >> - choose one configset, make your changes and upload it to zookeeper >> - when creating a new collection choose your uploaded config >> - whenever you change something at your config you have to upload >> it to zookeeper >> >> I don't know which Solr version you are using, but a good starting point >> with solr cloud is >> http://lucene.apache.org/solr/guide/6_6/solrcloud.html >> >> Regards >> Bernd >> >> >> >>> Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed: >>> Hi Bernd, >>> >>> Sorry for noob questions. >>> >>> 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr >>> stop -all? >>> >>> And then issue these commands, >>> >>> bin/solr restart -cloud -s example/cloud/node1/solr -p 8983 >>> >>> bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr >>> >>> >>> 2) Where can I find solr internal Zookeeper folder for issuing this >> command >>> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"? >>> >>> >>> 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores >> to >>> make changes in schema and configuration? Or do I have to make chages in >>> the directory that contains managed-schema and config.xml files with >> which >>> I initialized and created collections? And then the solr will pick them >> up >>> from there when it restarts? >>> >>> >>> Regards, >>> >>> Salmaan >>> >>> >>> >>> On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling < >> bernd.fehl...@uni-bielefeld.de> >>> wrote: >>> > Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed: > After I make the -Djute.maxbuffer changes to Solr, deployed in production, > Do I need to restart the solr to be able to add synonyms >1MB? Yes, you have to restart Solr. > > Or, Was this supposed to be done before putting Solr to production >> ever? > Can we make chages when the Solr is running in production? It depends on your system. In my cloud with 5 shards and 3 replicas I >> can take one by one offline, stop, modify and start again without problems. > > Thanks. > > Regards, > Salmaan > > > > On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling < > bernd.fehl...@uni-bielefeld.de> wrote: > >> You have to increase the -Djute.maxbuffer for large configs. >> >> In Solr bin/solr/solr.in.sh use e.g. >> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000" >> This will increase maxbuffer for zookeeper on solr side to 10MB. >> >> In Zookeeper zookeeper/conf/zookeeper-env.sh >> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000" >> >> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works >> perfect. >> >> Regards >> >> >>> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed: >>> Hi Solr Users, >>> >>> I have a very big synonym file (>5MB). I am unable to start Solr in cloud >>> mode as it throws an error message stating that the synonmys file is >>> too large. I figured out that the zookeeper doesn't take a file >> greater >>> than 1MB size. >>> >>> I tried to break down my synonyms file to smaller chunks less than >> 1MB >>> each. But, I am not sure about how to include all the filenames into the >>> Solr schema. >>> >>> Should it be seperated by commas like synonyms = "__1_synonyms.txt, >>> __2_synonyms.txt, __3synonyms.txt" >>> >>> Or is there a better way of doing that? Will the bigger file when broken >>> down to smaller chunks will be uploaded to zookeeper as well. >>> >>> Please help or please guide
Re: Basic Authentication problem
>From what I see, you are trying to change your own user's password. If I remembered correctly this might not be allowed, which is why you are getting the "Unauthorized request" error. You can try to create another user with admin role as well, and to change your existing user's password from the new user. Regards, Edwin On Fri, 2 Aug 2019 at 13:32, Salmaan Rashid Syed wrote: > My curl command works fine for querying, updating etc. > > I don't think it is the fault of curl command. > > I get the following error message when I tried to change the password of > solr-admin, > > > > > > > > > Error 403 Unauthorized request, Response code: 403 > > > > HTTP ERROR 403 > > Problem accessing /solr/admin/authentication. Reason: > > Unauthorized request, Response code: 403 > > > > > > > And if I give incorrect username and password, it states bad credentials > entered. So, I think the curl command is fine. There is some issue with > basic authentication. > > > Okay, One way around is to figure out how to convert my password into a > SHA256 (password + salt) and enter it in security.json file. But, I have no > idea how to generate the SHA256 equivalent of my password. > > > Any suggestions? > > > > On Fri, Aug 2, 2019 at 10:55 AM Zheng Lin Edwin Yeo > wrote: > > > Hi Salmaan, > > > > Does your curl command works for other curl commands like normal > querying? > > Or is it just not working when updating password and adding new users? > > > > Regards, > > Edwin > > > > > > > > On Fri, 2 Aug 2019 at 13:03, Salmaan Rashid Syed < > > salmaan.ras...@mroads.com> > > wrote: > > > > > Hi Zheng, > > > > > > I tried and it works. But, when I use the curl command to update > password > > > or add new users it doesn't work. > > > > > > I don't know what is going wrong with curl command! > > > > > > Regards, > > > Salmaan > > > > > > > > > On Fri, Aug 2, 2019 at 8:26 AM Zheng Lin Edwin Yeo < > edwinye...@gmail.com > > > > > > wrote: > > > > > > > Have you tried to access the Solr Admin UI with your created user > name > > > and > > > > password to see if it works? > > > > > > > > Regards, > > > > Edwin > > > > > > > > On Thu, 1 Aug 2019 at 19:51, Salmaan Rashid Syed < > > > > salmaan.ras...@mroads.com> > > > > wrote: > > > > > > > > > Hi Solr User, > > > > > > > > > > Please help me with my issue. > > > > > > > > > > I have enabled Solr basic authentication as shown in Solr > > > documentations. > > > > > > > > > > I have changed username from solr to solr-admin as follow > > > > > > > > > > { > > > > > "authentication":{ > > > > >"blockUnknown": true, > > > > >"class":"solr.BasicAuthPlugin", > > > > > > > > > > > > > > > > "credentials":{"solr-admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= > > > > > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} > > > > > }, > > > > > "authorization":{ > > > > >"class":"solr.RuleBasedAuthorizationPlugin", > > > > >"permissions":[{"name":"security-edit", > > > > > "role":"admin"}], > > > > >"user-role":{"solr-admin":"admin"} > > > > > }} > > > > > > > > > > I am able to login to the page using the credentials > > > > solr-admin:SolrRocks. > > > > > > > > > > But, when I try to change the default password using the curl > command > > > as > > > > > follows, > > > > > > > > > > curl --user solr-admin:SolrRocks > > > > > http://localhost:8983/solr/admin/authentication -H > > > > > 'Content-type:application/json' -d > > > '{"set-user":{"solr-admin":"s2019"}}' > > > > > > > > > > > > > > > I get the following error message, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Error 403 Unauthorized request, Response code: 403 > > > > > > > > > > > > > > > > > > > > HTTP ERROR 403 > > > > > > > > > > Problem accessing /solr/admin/authentication. Reason: > > > > > > > > > > Unauthorized request, Response code: 403 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please help. > > > > > > > > > > Regards, > > > > > Salmaan > > > > > > > > > > > > > > > On Thu, Aug 1, 2019 at 1:51 PM Salmaan Rashid Syed < > > > > > salmaan.ras...@mroads.com> wrote: > > > > > > > > > > > Small correction in the user-name. It is solr-admin everywhere. > > > > > > > > > > > > Hi Solr Users, > > > > > > > > > > > > I have enabled Solr basic authentication as shown in Solr > > > > documentations. > > > > > > > > > > > > I have changed username from solr to solr-admin as follow > > > > > > > > > > > > { > > > > > > "authentication":{ > > > > > >"blockUnknown": true, > > > > > >"class":"solr.BasicAuthPlugin", > > > > > > > > > > > > > > > > > > > > > "credentials":{"solr-admin":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= > > > > > > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} > > > > > > }, > > > > > > "authorization":{ > > > > > >"class":"solr.RuleBasedAuthorizationPlugin", > > > > > >"permissions":[{"name":"security-edit", > > > > > > "role":"admin"}], > > > > > >
Re: Problem with uploading Large synonym files in cloud mode
http://lucene.apache.org/solr/guide/6_6/command-line-utilities.html "Upload a configuration directory" Take my advise and read the SolrCloud section of Solr Ref Guide. It will answer most of your questions and is a good start. Am 02.08.19 um 08:30 schrieb Salmaan Rashid Syed: Hi Bernd, Yet, another noob question. Consider that my conf directory for creating a collection is _default. Suppose now I made changes to managed-schema and conf.xml, How do I upload it to external zookeeper at 2181 port? Can you please give me the command that uploads altered config.xml and managed-schema to zookeeper? Thanks. On Fri, Aug 2, 2019 at 11:53 AM Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter. to 2) I don't know because i never use internal zookeeper to 3) the configs are located at solr/server/solr/configsets/ - choose one configset, make your changes and upload it to zookeeper - when creating a new collection choose your uploaded config - whenever you change something at your config you have to upload it to zookeeper I don't know which Solr version you are using, but a good starting point with solr cloud is http://lucene.apache.org/solr/guide/6_6/solrcloud.html Regards Bernd Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed: Hi Bernd, Sorry for noob questions. 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr stop -all? And then issue these commands, bin/solr restart -cloud -s example/cloud/node1/solr -p 8983 bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr 2) Where can I find solr internal Zookeeper folder for issuing this command SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"? 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores to make changes in schema and configuration? Or do I have to make chages in the directory that contains managed-schema and config.xml files with which I initialized and created collections? And then the solr will pick them up from there when it restarts? Regards, Salmaan On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed: After I make the -Djute.maxbuffer changes to Solr, deployed in production, Do I need to restart the solr to be able to add synonyms >1MB? Yes, you have to restart Solr. Or, Was this supposed to be done before putting Solr to production ever? Can we make chages when the Solr is running in production? It depends on your system. In my cloud with 5 shards and 3 replicas I can take one by one offline, stop, modify and start again without problems. Thanks. Regards, Salmaan On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: You have to increase the -Djute.maxbuffer for large configs. In Solr bin/solr/solr.in.sh use e.g. SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000" This will increase maxbuffer for zookeeper on solr side to 10MB. In Zookeeper zookeeper/conf/zookeeper-env.sh SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000" I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect. Regards Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed: Hi Solr Users, I have a very big synonym file (>5MB). I am unable to start Solr in cloud mode as it throws an error message stating that the synonmys file is too large. I figured out that the zookeeper doesn't take a file greater than 1MB size. I tried to break down my synonyms file to smaller chunks less than 1MB each. But, I am not sure about how to include all the filenames into the Solr schema. Should it be seperated by commas like synonyms = "__1_synonyms.txt, __2_synonyms.txt, __3synonyms.txt" Or is there a better way of doing that? Will the bigger file when broken down to smaller chunks will be uploaded to zookeeper as well. Please help or please guide me to relevant documentation regarding this. Thank you. Regards. Salmaan.
Re: Problem with uploading Large synonym files in cloud mode
Hi Bernd, Yet, another noob question. Consider that my conf directory for creating a collection is _default. Suppose now I made changes to managed-schema and conf.xml, How do I upload it to external zookeeper at 2181 port? Can you please give me the command that uploads altered config.xml and managed-schema to zookeeper? Thanks. On Fri, Aug 2, 2019 at 11:53 AM Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > > to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter. > > to 2) I don't know because i never use internal zookeeper > > to 3) the configs are located at solr/server/solr/configsets/ >- choose one configset, make your changes and upload it to zookeeper >- when creating a new collection choose your uploaded config >- whenever you change something at your config you have to upload > it to zookeeper > > I don't know which Solr version you are using, but a good starting point > with solr cloud is > http://lucene.apache.org/solr/guide/6_6/solrcloud.html > > Regards > Bernd > > > > Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed: > > Hi Bernd, > > > > Sorry for noob questions. > > > > 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr > > stop -all? > > > > And then issue these commands, > > > > bin/solr restart -cloud -s example/cloud/node1/solr -p 8983 > > > > bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr > > > > > > 2) Where can I find solr internal Zookeeper folder for issuing this > command > > SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"? > > > > > > 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores > to > > make changes in schema and configuration? Or do I have to make chages in > > the directory that contains managed-schema and config.xml files with > which > > I initialized and created collections? And then the solr will pick them > up > > from there when it restarts? > > > > > > Regards, > > > > Salmaan > > > > > > > > On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling < > bernd.fehl...@uni-bielefeld.de> > > wrote: > > > >> > >> > >> Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed: > >>> After I make the -Djute.maxbuffer changes to Solr, deployed in > >> production, > >>> Do I need to restart the solr to be able to add synonyms >1MB? > >> > >> Yes, you have to restart Solr. > >> > >> > >>> > >>> Or, Was this supposed to be done before putting Solr to production > ever? > >>> Can we make chages when the Solr is running in production? > >> > >> It depends on your system. In my cloud with 5 shards and 3 replicas I > can > >> take one by one offline, stop, modify and start again without problems. > >> > >> > >>> > >>> Thanks. > >>> > >>> Regards, > >>> Salmaan > >>> > >>> > >>> > >>> On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling < > >>> bernd.fehl...@uni-bielefeld.de> wrote: > >>> > You have to increase the -Djute.maxbuffer for large configs. > > In Solr bin/solr/solr.in.sh use e.g. > SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000" > This will increase maxbuffer for zookeeper on solr side to 10MB. > > In Zookeeper zookeeper/conf/zookeeper-env.sh > SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000" > > I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works > perfect. > > Regards > > > Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed: > > Hi Solr Users, > > > > I have a very big synonym file (>5MB). I am unable to start Solr in > >> cloud > > mode as it throws an error message stating that the synonmys file is > > too large. I figured out that the zookeeper doesn't take a file > greater > > than 1MB size. > > > > I tried to break down my synonyms file to smaller chunks less than > 1MB > > each. But, I am not sure about how to include all the filenames into > >> the > > Solr schema. > > > > Should it be seperated by commas like synonyms = "__1_synonyms.txt, > > __2_synonyms.txt, __3synonyms.txt" > > > > Or is there a better way of doing that? Will the bigger file when > >> broken > > down to smaller chunks will be uploaded to zookeeper as well. > > > > Please help or please guide me to relevant documentation regarding > >> this. > > > > Thank you. > > > > Regards. > > Salmaan. > > > > >>> > >> > > >
Re: Problem with uploading Large synonym files in cloud mode
to 1) yes, because -Djute.maxbuffer is going to JAVA as a start parameter. to 2) I don't know because i never use internal zookeeper to 3) the configs are located at solr/server/solr/configsets/ - choose one configset, make your changes and upload it to zookeeper - when creating a new collection choose your uploaded config - whenever you change something at your config you have to upload it to zookeeper I don't know which Solr version you are using, but a good starting point with solr cloud is http://lucene.apache.org/solr/guide/6_6/solrcloud.html Regards Bernd Am 02.08.19 um 07:59 schrieb Salmaan Rashid Syed: Hi Bernd, Sorry for noob questions. 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr stop -all? And then issue these commands, bin/solr restart -cloud -s example/cloud/node1/solr -p 8983 bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr 2) Where can I find solr internal Zookeeper folder for issuing this command SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"? 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores to make changes in schema and configuration? Or do I have to make chages in the directory that contains managed-schema and config.xml files with which I initialized and created collections? And then the solr will pick them up from there when it restarts? Regards, Salmaan On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling wrote: Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed: After I make the -Djute.maxbuffer changes to Solr, deployed in production, Do I need to restart the solr to be able to add synonyms >1MB? Yes, you have to restart Solr. Or, Was this supposed to be done before putting Solr to production ever? Can we make chages when the Solr is running in production? It depends on your system. In my cloud with 5 shards and 3 replicas I can take one by one offline, stop, modify and start again without problems. Thanks. Regards, Salmaan On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: You have to increase the -Djute.maxbuffer for large configs. In Solr bin/solr/solr.in.sh use e.g. SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000" This will increase maxbuffer for zookeeper on solr side to 10MB. In Zookeeper zookeeper/conf/zookeeper-env.sh SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000" I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect. Regards Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed: Hi Solr Users, I have a very big synonym file (>5MB). I am unable to start Solr in cloud mode as it throws an error message stating that the synonmys file is too large. I figured out that the zookeeper doesn't take a file greater than 1MB size. I tried to break down my synonyms file to smaller chunks less than 1MB each. But, I am not sure about how to include all the filenames into the Solr schema. Should it be seperated by commas like synonyms = "__1_synonyms.txt, __2_synonyms.txt, __3synonyms.txt" Or is there a better way of doing that? Will the bigger file when broken down to smaller chunks will be uploaded to zookeeper as well. Please help or please guide me to relevant documentation regarding this. Thank you. Regards. Salmaan.
Re: Problem with uploading Large synonym files in cloud mode
Hi Bernd, Sorry for noob questions. 1) What do you mean by restart? Do you mean that I shoud issue ./bin/solr stop -all? And then issue these commands, bin/solr restart -cloud -s example/cloud/node1/solr -p 8983 bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr 2) Where can I find solr internal Zookeeper folder for issuing this command SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000"? 3) Where can I find schema.xml and config.xmo files for Solr Cloud Cores to make changes in schema and configuration? Or do I have to make chages in the directory that contains managed-schema and config.xml files with which I initialized and created collections? And then the solr will pick them up from there when it restarts? Regards, Salmaan On Thu, Aug 1, 2019 at 5:40 PM Bernd Fehling wrote: > > > Am 01.08.19 um 13:57 schrieb Salmaan Rashid Syed: > > After I make the -Djute.maxbuffer changes to Solr, deployed in > production, > > Do I need to restart the solr to be able to add synonyms >1MB? > > Yes, you have to restart Solr. > > > > > > Or, Was this supposed to be done before putting Solr to production ever? > > Can we make chages when the Solr is running in production? > > It depends on your system. In my cloud with 5 shards and 3 replicas I can > take one by one offline, stop, modify and start again without problems. > > > > > > Thanks. > > > > Regards, > > Salmaan > > > > > > > > On Tue, Jul 30, 2019 at 4:53 PM Bernd Fehling < > > bernd.fehl...@uni-bielefeld.de> wrote: > > > >> You have to increase the -Djute.maxbuffer for large configs. > >> > >> In Solr bin/solr/solr.in.sh use e.g. > >> SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=1000" > >> This will increase maxbuffer for zookeeper on solr side to 10MB. > >> > >> In Zookeeper zookeeper/conf/zookeeper-env.sh > >> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Djute.maxbuffer=1000" > >> > >> I have a >10MB Thesaurus and use 30MB for jute.maxbuffer, works perfect. > >> > >> Regards > >> > >> > >> Am 30.07.19 um 13:09 schrieb Salmaan Rashid Syed: > >>> Hi Solr Users, > >>> > >>> I have a very big synonym file (>5MB). I am unable to start Solr in > cloud > >>> mode as it throws an error message stating that the synonmys file is > >>> too large. I figured out that the zookeeper doesn't take a file greater > >>> than 1MB size. > >>> > >>> I tried to break down my synonyms file to smaller chunks less than 1MB > >>> each. But, I am not sure about how to include all the filenames into > the > >>> Solr schema. > >>> > >>> Should it be seperated by commas like synonyms = "__1_synonyms.txt, > >>> __2_synonyms.txt, __3synonyms.txt" > >>> > >>> Or is there a better way of doing that? Will the bigger file when > broken > >>> down to smaller chunks will be uploaded to zookeeper as well. > >>> > >>> Please help or please guide me to relevant documentation regarding > this. > >>> > >>> Thank you. > >>> > >>> Regards. > >>> Salmaan. > >>> > >> > > >