Thanks Ed,

That saved the day. The confusing part setting up that property is 
documentation if it needs hex or bytes etc. Even the example they provided here

https://solr.apache.org/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 7.4 - 
The Apache Software 
Foundation<https://solr.apache.org/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit>
The solution to this problem is to set up an external ZooKeeper ensemble, which 
is a number of servers running ZooKeeper that communicate with each other to 
coordinate the activities of the cluster.
solr.apache.org

states they are setting the value to 2mb but the value really looks like 200k 
(with 5 0)

------------------------------------

Add the following line to increase the file size limit to 2MB:

SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=0x200000"

-------------------------------------

Anyways, a master is up and running for an hour now..so just trying to 
understand what was changed and revert it after it stabilize.

Thanks a bunch.

-S

________________________________
From: dev1 <d...@etcoleman.com>
Sent: Wednesday, February 9, 2022 5:54 PM
To: user@accumulo.apache.org <user@accumulo.apache.org>
Subject: [External] Re: accumulo 1.10.0 masters won't start

You might want to set the accumulo (zookeeper client) side - by setting 
ACCUMULO_JAVA_OPTS that is processed in accumulo-env.sh (or just edit that 
file?)

Looking at the Zookeeper documentation it describes what looks like you are 
seeing:

When jute.maxbuffer in the client side is less than the server side, the client 
wants to read the data exceeds jute.maxbuffer in the client side, the client 
side will get java.io.IOException: Unreasonable length or Packet len is out of 
range!

Also, a search showed jira tickets that had a server side limit of 4MB, but 
client limits of 1MB - you may want to see if 4194304 (or larger) as a value 
works,

________________________________
From: dev1 <d...@etcoleman.com>
Sent: Wednesday, February 9, 2022 5:25 PM
To: user@accumulo.apache.org <user@accumulo.apache.org>
Subject: Re: accumulo 1.10.0 masters won't start

jute.maxbuffer is a ZooKeeper property - it needs to be set on the zookeeper 
configuration.  If this is still correct, then it looks like there are a few 
options 
https://solr.apache.org/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit<https://urldefense.com/v3/__https://solr.apache.org/guide/7_4/setting-up-an-external-zookeeper-ensemble.html*increasing-the-file-size-limit__;Iw!!May37g!ewlkGRNFLrKEpeF1Lz8vRt_oBtpgi8hVvvnCrp1Dq4_8Xprb4tEHWiHVFW7EVUT3vA$>

But maybe the ZooKeeper documentation for your version can provide additional 
guidance?
Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 7.4 - 
The Apache Software 
Foundation<https://urldefense.com/v3/__https://solr.apache.org/guide/7_4/setting-up-an-external-zookeeper-ensemble.html*increasing-the-file-size-limit__;Iw!!May37g!ewlkGRNFLrKEpeF1Lz8vRt_oBtpgi8hVvvnCrp1Dq4_8Xprb4tEHWiHVFW7EVUT3vA$>
The solution to this problem is to set up an external ZooKeeper ensemble, which 
is a number of servers running ZooKeeper that communicate with each other to 
coordinate the activities of the cluster.
solr.apache.org


________________________________
From: Shailesh Ligade <slig...@fbi.gov>
Sent: Wednesday, February 9, 2022 5:02 PM
To: user@accumulo.apache.org <user@accumulo.apache.org>
Subject: RE: accumulo 1.10.0 masters won't start


Thanks



Even if I set jute.maxbuffer on zookeeper in conf/java.env file to



-Djute.maxbuffer=300000



I see in accumulo master log as



INFO: jute.maxbuffer value is 1048575 Bytes    not sure where to set that on 
accumulo side.



I set instance.zookeeper.timeout value to 90s in accumulo-site.xml



But still get those zookeeper KeeperErrorCode errors



-S



From: dev1 <d...@etcoleman.com>
Sent: Wednesday, February 9, 2022 4:27 PM
To: user@accumulo.apache.org
Subject: [EXTERNAL EMAIL] - Re: accumulo 1.10.0 masters won't start



I would not recommend setting the goal state directly unlit there are no other 
alternatives.



It is hard to recommend what to do, because it is unclear what put you into the 
current situation and what action / impact you might have had trying to fix 
things -



why did the goal state become unset in the first place?

what did you stuff into the fates that increased the need for larger jute 
buffers?



It could be that the number of tables and servers pushed you over the limit - 
or it could be something else.



What I would do.



Shutdown accumulo and make sure all services / tservers are stopped.

Shutdown any other services that might be using ZooKeeper.

Shutdown ZooKeeper.



Set the larger jute.buffer and increase the timeout values across the board and 
in any dependent services.



Start hdfs - if you needed to shut it down.

Start just zookeeper - and use zkCli.sh to examine the state of things.  If 
that looks okay.

Start just the master - how far does it come up?  It will not be able to load 
the root / metadata tables, but it may give some indication of state,



I'd then cycle between stopping the master, trying to clean-up things using 
zkCli.sh using any guidance with errors the master is generating. If that looks 
promising, then:



With the master stopped - start the tservers and check a few logs if there are 
exceptions determine if they are they something that is pointing to an issue - 
or just something that is transient and handled.



Once the tservers are up and looking okay - start the master.



One of the things to grab as soon as you can get the shell to run - get a 
listing of the tables and the ids.  If the worst happens, you can use that to 
map the existing data into a "new" instance. Hopefully it will not come to that 
and you will not need it - but if you don't have it and you need it, well... 
The table names and id are all in ZooKeeper.



Ed Coleman



________________________________

From: Shailesh Ligade <slig...@fbi.gov<mailto:slig...@fbi.gov>>
Sent: Wednesday, February 9, 2022 3:47 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: RE: accumulo 1.10.0 masters won't start



Thanks I can try that,



At this point, my goal is to get accumulo up. I was just wondering if I can set 
different goal like SAFE_MODE will it come up by ignoring fate and other 
issues? If that comes up, can I switch back to NORMAL, will that work? I 
understand there may be some data loss..



-S



From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Wednesday, February 9, 2022 3:36 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: [EXTERNAL EMAIL] - Re: accumulo 1.10.0 masters won't start



For values in zoo.cfg see: 
https://zookeeper.apache.org/doc/r3.5.9/zookeeperAdmin.html#sc_advancedConfiguration<https://urldefense.com/v3/__https://usg02.safelinks.protection.office365.us/?url=https*3A*2F*2Fzookeeper.apache.org*2Fdoc*2Fr3.5.9*2FzookeeperAdmin.html*23sc_advancedConfiguration&data=04*7C01*7CSLIGADE*40FBI.GOV*7Cb7b8be92faf64fbc95ff08d9ec13044d*7C022914a9b95f4b7bbace551ce1a04071*7C0*7C0*7C637800390698440068*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=QaGFX7kcHJeiIN73G5bfDDEQNgxN0F7QdyJ9fO3SJzA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!May37g!ewlkGRNFLrKEpeF1Lz8vRt_oBtpgi8hVvvnCrp1Dq4_8Xprb4tEHWiHVFW6UcuwVgA$>



maxSessionTimeout



In the accumulo config  - #instance.zookeepers.timeout=30s



The zookeeper setting controls the max time that the ZK servers will grant - 
the accumulo setting is how much time accumulo will ask for.



ZooKeeper: Because Coordinating Distributed Systems is a 
Zoo<https://urldefense.com/v3/__https://usg02.safelinks.protection.office365.us/?url=https*3A*2F*2Fzookeeper.apache.org*2Fdoc*2Fr3.5.9*2FzookeeperAdmin.html*23sc_advancedConfiguration&data=04*7C01*7CSLIGADE*40FBI.GOV*7Cb7b8be92faf64fbc95ff08d9ec13044d*7C022914a9b95f4b7bbace551ce1a04071*7C0*7C0*7C637800390698440068*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=QaGFX7kcHJeiIN73G5bfDDEQNgxN0F7QdyJ9fO3SJzA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSU!!May37g!ewlkGRNFLrKEpeF1Lz8vRt_oBtpgi8hVvvnCrp1Dq4_8Xprb4tEHWiHVFW6UcuwVgA$>

Trace Mask Bit Values ; 0b0000000000 : Unused, reserved for future use. 
0b0000000010 : Logs client requests, excluding ping requests. 0b0000000100 : 
Unused, reserved ...

zookeeper.apache.org





________________________________

From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Wednesday, February 9, 2022 3:03 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: accumulo 1.10.0 masters won't start



thanks for response,



no i have not update any timeout



is that going in zoo.cfg? I can see there is min/maxSessionTimeout 2/20, is 
that what are you refering to?



-S

________________________________

From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Wednesday, February 9, 2022 2:51 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: [External] Re: accumulo 1.10.0 masters won't start



Have you tried to increase the zoo session timeout value? I think it's 
zookeeper.session.timeout.ms



________________________________

From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Wednesday, February 9, 2022 2:47 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: accumulo 1.10.0 masters won't start



Thanks



That fixed goal sate issue but now still getting



Errors with zookeeper

e.g.



KeeperErrorCode = ConnectionLoss for



/accumulo/<instane-id>/config/tserver.hold.time.max

/accumulo/<instane-id>/tables

/accumulo/<instane-id>/tables/1/name

/accumulo/<instane-id>/fate

/accumulo/<instane-id>/masters/goal_state



So it is all over …some I see good values in zookeeper…so not sure..  🙁



-S



________________________________

From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Wednesday, February 9, 2022 2:22 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: [External] Re: accumulo 1.10.0 masters won't start



The is a utility - SetGoalState that can be run from the command line



accumulo SetGoalState NORMAL



(or SAFE_MODE, CLEAN_STOP)



It sets a value in ZK at /accumulo/instance-id/managers/goal_state



Ed Coleman



________________________________

From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Wednesday, February 9, 2022 1:54 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: accumulo 1.10.0 masters won't start



Well,



i just went ahead and deleted fate in zookeeper and restarted the master..it 
was doing better but then i am getting different error



ERROR: Problem getting real goal state from zookeeper: 
java.lang.IllegalArgumentException: No enum constant 
org.apache.accumulo.core.master.thrift.MasterGoalState



I hope i didn't delete goal_state accidently ...;-( currently ls on goal_state 
is [], is there a way to add some value there?



-S

________________________________

From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Wednesday, February 9, 2022 1:32 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: [External] Re: accumulo 1.10.0 masters won't start



Did you try setting the increased size in the zkCli.sh command (or wherever it 
gets it environment from?)



The ZK docs indicate that it needs to be set to the same size on all servers 
and clients.



You should be able to use zkCli.sh to at least see what's going on - if that 
does not work, then it seems unlikely that the master would either.



Can you:

  *   list the nodes under /accumulo/[instance id]/fate?
  *   use the stat command on each of the nodes - the size is one of the fields.
  *   list nodes under any of the /accumulo/[instance_id/fate/tx-#####
  *   there should be a node named debug - doing a get on that should show the 
op name.



Ed Coleman

________________________________

From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Wednesday, February 9, 2022 12:54 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: accumulo 1.10.0 masters won't start



Thanks



I added



-Djute.maxbuffer=30000000



In conf/java.env and restart all zookeepers but still getting the same error.. 
documentation is kind of fuzzy on setting this property as it states in hex 
(default 0xffff) so not 100% sure if 30000000 is ok, but atleast I could see 
zookeeper was up



-S



________________________________

From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Wednesday, February 9, 2022 12:26 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: [External] Re: accumulo 1.10.0 masters won't start



Does the monitor or any of the logs show errors that relate to exceeding the 
ZooKeeper jute buffer size?



Is so, have you tried increasing the ZooKeeper jute.maxbuffer 
limit(https://zookeeper.apache.org/doc/r3.5.9/zookeeperAdmin.html#Unsafe+Options<https://urldefense.com/v3/__https://usg02.safelinks.protection.office365.us/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fzookeeper.apache.org*2Fdoc*2Fr3.5.9*2FzookeeperAdmin.html*Unsafe*Options__*3BIys!!May37g!dTGCMHPLPDBXwSqtLa5cIPHiTIQF7IjLCVyvGxfi1sgPbrsOI8RCEsuZ9u-jJtayEg*24&data=04*7C01*7CSLIGADE*40FBI.GOV*7Cb7b8be92faf64fbc95ff08d9ec13044d*7C022914a9b95f4b7bbace551ce1a04071*7C0*7C0*7C637800390698440068*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=meZvmEpBktGc95qzM46QmtNWp5NJ8noozSTv896k7qw*3D&reserved=0__;JSUlJSUlJSUlJSUqKiUlJSUlJSUlJSUlJSUlJQ!!May37g!ewlkGRNFLrKEpeF1Lz8vRt_oBtpgi8hVvvnCrp1Dq4_8Xprb4tEHWiHVFW58KT9_bg$>)?



Ed Coleman





________________________________

From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Wednesday, February 9, 2022 11:49 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: accumulo 1.10.0 masters won't start



Hello,



My both masters are stuck error on zookeeper:



IOException: Packet len 2791093 is out of range!

KeeperErrorCode = ConnectionLoss for /accumulo/<instance_id>/fate





if use zkCli to see what is under fate, i get



IOException Packet len 2791161 is out of range

Unable to read additional data from server sessionid xxxx, likely server has 
closed socket



hdfs fsck is all good



How can I clear this fate?



master process is up and I can get into accumulo shell, but there are no fate 
(fate print returns empty)



Any idea how to bring the master up?



Thanks



S

Reply via email to