Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-17 Thread Zachary Radtka
I have had this exact error before:

ERROR: read a frame size of 1195725856, which is bigger than the maximum
allowable buffer size for ALL connections.

My cluster was on a client's AWS account which would regularly have
security scans on the weekends. Logging in on Monday the master would
always be down. We didn't know what the security scans were, but we did
solve our issue by placing our servers in a security group that only
allowed the accumulo servers to talk with each other. We also restricted
inbound traffic to security groups for our other systems that were
accessing Accumulo directly.

-Zach



On Thu, Mar 17, 2022 at 11:45 AM Mike Miller  wrote:

> Are you still running Replication? I would turn it off if you can.
>
> On Thu, Mar 17, 2022 at 7:44 AM dev1  wrote:
>
>> When an Accumulo process abnormally terminates, there may be a file
>> create with the exception of the problem – the files may be names *.out (or
>> *.err) can’t recall which. Normally the files have 0 size, but on
>> termination will have some text.
>>
>>
>>
>> Are you seeing those files and do they point to the issue?
>>
>>
>>
>> Do you have the jvm configured to terminate on out of memory – and print
>> that error condition? Maybe the manager is running out of memory.
>>
>>
>>
>> Ed Coleman
>>
>>
>>
>> *From:* Ligade, Shailesh [USA] 
>> *Sent:* Wednesday, March 16, 2022 3:31 PM
>> *To:* user@accumulo.apache.org
>> *Subject:* RE: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>>
>>
>> Thanks,
>>
>>
>>
>> I think we are having the same or similar issue with virus scan/security
>> scan. However that should not bring down the master, can it??
>>
>>
>>
>> I am still digging thru the logs.
>>
>>
>>
>> -S
>>
>>
>>
>> *From:* Adam J. Shook 
>> *Sent:* Wednesday, March 16, 2022 2:46 PM
>> *To:* user@accumulo.apache.org
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>>
>>
>> This is certainly anecdotal, but we've seen this "ERROR: Read a frame
>> size of (large number)" before on our Accumulo cluster that would show up
>> at a regular and predictable frequency. The root cause was due to a routine
>> scan done by the security team looking for vulnerabilities across the
>> entire enterprise (nothing Accumulo-specific). I don't have any additional
>> information about the specifics of the scan. From all that we can tell, it
>> has no impact on our Accumulo cluster outside of these error messages.
>>
>>
>>
>> --Adam
>>
>>
>>
>> On Wed, Mar 16, 2022 at 8:35 AM Christopher  wrote:
>>
>> Since that error message is coming from the libthrift library, and not
>> Accumulo code, we would need a lot more context to even begin helping you
>> troubleshoot it. For example, the complete stack trace that shows the
>> Accumulo code that called into the Thrift library, would be extremely
>> helpful.
>>
>> It's a bit concerning that you're trying to send a single buffer over
>> thrift that's over a gigabyte in size, according to that number. You've
>> said before that you use live ingest. Are you trying to send a 1GB mutation
>> to a tablet server? Or are you using replication and the stack trace looks
>> like it's sending 1GB of replication data?
>>
>>
>>
>> On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
>> ligade_shail...@bah.com> wrote:
>>
>> Well, I re-initialized accumulo but I still see
>>
>>
>>
>> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
>> allowable buffer size for ALL connections.
>>
>>
>>
>> Is there a setting that I can increase to get past it?
>>
>>
>>
>> -S
>>
>>
>>
>>
>> --
>>
>> *From:* Ligade, Shailesh [USA] 
>> *Sent:* Tuesday, March 15, 2022 12:47 PM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>>
>>
>> Not daily but  over weekend.
>> --
>>
>> *From:* Mike Miller 
>> *Sent:* Tuesday, March 15, 2022 10:39 AM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>>
>>
>> Why are you bringing the cluster down every night? That is not ideal.
>>
>>
>>
>> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-17 Thread Mike Miller
Are you still running Replication? I would turn it off if you can.

On Thu, Mar 17, 2022 at 7:44 AM dev1  wrote:

> When an Accumulo process abnormally terminates, there may be a file create
> with the exception of the problem – the files may be names *.out (or *.err)
> can’t recall which. Normally the files have 0 size, but on termination will
> have some text.
>
>
>
> Are you seeing those files and do they point to the issue?
>
>
>
> Do you have the jvm configured to terminate on out of memory – and print
> that error condition? Maybe the manager is running out of memory.
>
>
>
> Ed Coleman
>
>
>
> *From:* Ligade, Shailesh [USA] 
> *Sent:* Wednesday, March 16, 2022 3:31 PM
> *To:* user@accumulo.apache.org
> *Subject:* RE: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> Thanks,
>
>
>
> I think we are having the same or similar issue with virus scan/security
> scan. However that should not bring down the master, can it??
>
>
>
> I am still digging thru the logs.
>
>
>
> -S
>
>
>
> *From:* Adam J. Shook 
> *Sent:* Wednesday, March 16, 2022 2:46 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> This is certainly anecdotal, but we've seen this "ERROR: Read a frame size
> of (large number)" before on our Accumulo cluster that would show up at a
> regular and predictable frequency. The root cause was due to a routine scan
> done by the security team looking for vulnerabilities across the entire
> enterprise (nothing Accumulo-specific). I don't have any additional
> information about the specifics of the scan. From all that we can tell, it
> has no impact on our Accumulo cluster outside of these error messages.
>
>
>
> --Adam
>
>
>
> On Wed, Mar 16, 2022 at 8:35 AM Christopher  wrote:
>
> Since that error message is coming from the libthrift library, and not
> Accumulo code, we would need a lot more context to even begin helping you
> troubleshoot it. For example, the complete stack trace that shows the
> Accumulo code that called into the Thrift library, would be extremely
> helpful.
>
> It's a bit concerning that you're trying to send a single buffer over
> thrift that's over a gigabyte in size, according to that number. You've
> said before that you use live ingest. Are you trying to send a 1GB mutation
> to a tablet server? Or are you using replication and the stack trace looks
> like it's sending 1GB of replication data?
>
>
>
> On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Well, I re-initialized accumulo but I still see
>
>
>
> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
>
>
> Is there a setting that I can increase to get past it?
>
>
>
> -S
>
>
>
>
> --
>
> *From:* Ligade, Shailesh [USA] 
> *Sent:* Tuesday, March 15, 2022 12:47 PM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> Not daily but  over weekend.
> --
>
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 10:39 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> Why are you bringing the cluster down every night? That is not ideal.
>
>
>
> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
>
>
> We bring the servers down nightly. these are on aws. This worked yesterday
> (Monday) but this (Tuesday) i went on to check on it and it was down, I
> guess i didn't check yesterday. I assume it was up as no one complained.,
> but it was up and kicking last week for sure.
>
>
>
> So not exactly sure when or what caused it, all services are up (tserver,
> master) so services are not crashing themselves.
>
>
>
> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>
>
>
> -S
> --
>
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 9:16 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> What was going on in the tserver before you saw that error? Did it finish
> recovering after the restart? If it is still recovering, I don't think you
> will be able to do any scans.
>
>
>
> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.co

RE: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-17 Thread dev1
When an Accumulo process abnormally terminates, there may be a file create with 
the exception of the problem – the files may be names *.out (or *.err) can’t 
recall which. Normally the files have 0 size, but on termination will have some 
text.

Are you seeing those files and do they point to the issue?

Do you have the jvm configured to terminate on out of memory – and print that 
error condition? Maybe the manager is running out of memory.

Ed Coleman

From: Ligade, Shailesh [USA] 
Sent: Wednesday, March 16, 2022 3:31 PM
To: user@accumulo.apache.org
Subject: RE: [External] Re: odd issue with accumulo 1.10.0 starting up

Thanks,

I think we are having the same or similar issue with virus scan/security scan. 
However that should not bring down the master, can it??

I am still digging thru the logs.

-S

From: Adam J. Shook mailto:adamjsh...@gmail.com>>
Sent: Wednesday, March 16, 2022 2:46 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

This is certainly anecdotal, but we've seen this "ERROR: Read a frame size of 
(large number)" before on our Accumulo cluster that would show up at a regular 
and predictable frequency. The root cause was due to a routine scan done by the 
security team looking for vulnerabilities across the entire enterprise (nothing 
Accumulo-specific). I don't have any additional information about the specifics 
of the scan. From all that we can tell, it has no impact on our Accumulo 
cluster outside of these error messages.

--Adam

On Wed, Mar 16, 2022 at 8:35 AM Christopher 
mailto:ctubb...@apache.org>> wrote:
Since that error message is coming from the libthrift library, and not Accumulo 
code, we would need a lot more context to even begin helping you troubleshoot 
it. For example, the complete stack trace that shows the Accumulo code that 
called into the Thrift library, would be extremely helpful.

It's a bit concerning that you're trying to send a single buffer over thrift 
that's over a gigabyte in size, according to that number. You've said before 
that you use live ingest. Are you trying to send a 1GB mutation to a tablet 
server? Or are you using replication and the stack trace looks like it's 
sending 1GB of replication data?

On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Well, I re-initialized accumulo but I still see

ERROR: Read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

Is there a setting that I can increase to get past it?

-S



From: Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>>
Sent: Tuesday, March 15, 2022 12:47 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Not daily but  over weekend.

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 10:39 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefens

RE: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Ligade, Shailesh [USA]
Thanks,

I think we are having the same or similar issue with virus scan/security scan. 
However that should not bring down the master, can it??

I am still digging thru the logs.

-S

From: Adam J. Shook 
Sent: Wednesday, March 16, 2022 2:46 PM
To: user@accumulo.apache.org
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

This is certainly anecdotal, but we've seen this "ERROR: Read a frame size of 
(large number)" before on our Accumulo cluster that would show up at a regular 
and predictable frequency. The root cause was due to a routine scan done by the 
security team looking for vulnerabilities across the entire enterprise (nothing 
Accumulo-specific). I don't have any additional information about the specifics 
of the scan. From all that we can tell, it has no impact on our Accumulo 
cluster outside of these error messages.

--Adam

On Wed, Mar 16, 2022 at 8:35 AM Christopher 
mailto:ctubb...@apache.org>> wrote:
Since that error message is coming from the libthrift library, and not Accumulo 
code, we would need a lot more context to even begin helping you troubleshoot 
it. For example, the complete stack trace that shows the Accumulo code that 
called into the Thrift library, would be extremely helpful.

It's a bit concerning that you're trying to send a single buffer over thrift 
that's over a gigabyte in size, according to that number. You've said before 
that you use live ingest. Are you trying to send a 1GB mutation to a tablet 
server? Or are you using replication and the stack trace looks like it's 
sending 1GB of replication data?

On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Well, I re-initialized accumulo but I still see

ERROR: Read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

Is there a setting that I can increase to get past it?

-S



From: Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>>
Sent: Tuesday, March 15, 2022 12:47 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Not daily but  over weekend.

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 10:39 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https:/lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
 
https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https:/markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
<h

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Adam J. Shook
This is certainly anecdotal, but we've seen this "ERROR: Read a frame size
of (large number)" before on our Accumulo cluster that would show up at a
regular and predictable frequency. The root cause was due to a routine scan
done by the security team looking for vulnerabilities across the entire
enterprise (nothing Accumulo-specific). I don't have any additional
information about the specifics of the scan. From all that we can tell, it
has no impact on our Accumulo cluster outside of these error messages.

--Adam

On Wed, Mar 16, 2022 at 8:35 AM Christopher  wrote:

> Since that error message is coming from the libthrift library, and not
> Accumulo code, we would need a lot more context to even begin helping you
> troubleshoot it. For example, the complete stack trace that shows the
> Accumulo code that called into the Thrift library, would be extremely
> helpful.
>
> It's a bit concerning that you're trying to send a single buffer over
> thrift that's over a gigabyte in size, according to that number. You've
> said before that you use live ingest. Are you trying to send a 1GB mutation
> to a tablet server? Or are you using replication and the stack trace looks
> like it's sending 1GB of replication data?
>
>
> On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
>> Well, I re-initialized accumulo but I still see
>>
>> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
>> allowable buffer size for ALL connections.
>>
>> Is there a setting that I can increase to get past it?
>>
>> -S
>>
>>
>> --
>> *From:* Ligade, Shailesh [USA] 
>> *Sent:* Tuesday, March 15, 2022 12:47 PM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Not daily but  over weekend.
>> ------
>> *From:* Mike Miller 
>> *Sent:* Tuesday, March 15, 2022 10:39 AM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Why are you bringing the cluster down every night? That is not ideal.
>>
>> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
>> ligade_shail...@bah.com> wrote:
>>
>> Thanks Mike,
>>
>> We bring the servers down nightly. these are on aws. This worked
>> yesterday (Monday) but this (Tuesday) i went on to check on it and it was
>> down, I guess i didn't check yesterday. I assume it was up as no one
>> complained., but it was up and kicking last week for sure.
>>
>> So not exactly sure when or what caused it, all services are up (tserver,
>> master) so services are not crashing themselves.
>>
>> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>>
>> -S
>> --
>> *From:* Mike Miller 
>> *Sent:* Tuesday, March 15, 2022 9:16 AM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> What was going on in the tserver before you saw that error? Did it finish
>> recovering after the restart? If it is still recovering, I don't think you
>> will be able to do any scans.
>>
>> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
>> ligade_shail...@bah.com> wrote:
>>
>> Thanks Mike,
>>
>> That was my first reaction but the instance is backed up by puppet and no
>> configuration was updated (i double checked and ran puppet manually as well
>> as automatically after restart), Since the system was operational
>> yesterday, So I think I can rule that out.
>>
>> For other error, I did see the exact error
>> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
>> <https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>
>> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
>> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>>  https://markmail.org/message/bc7ijdsgqmod5p2h
>> <https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
>>  but
>> those are for lot older accumulo. and server didn't go out of memory so I
>> think that must have been fixed..
>>
>> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/C

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Christopher
Since that error message is coming from the libthrift library, and not
Accumulo code, we would need a lot more context to even begin helping you
troubleshoot it. For example, the complete stack trace that shows the
Accumulo code that called into the Thrift library, would be extremely
helpful.

It's a bit concerning that you're trying to send a single buffer over
thrift that's over a gigabyte in size, according to that number. You've
said before that you use live ingest. Are you trying to send a 1GB mutation
to a tablet server? Or are you using replication and the stack trace looks
like it's sending 1GB of replication data?


On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Well, I re-initialized accumulo but I still see
>
> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> Is there a setting that I can increase to get past it?
>
> -S
>
>
> --
> *From:* Ligade, Shailesh [USA] 
> *Sent:* Tuesday, March 15, 2022 12:47 PM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Not daily but  over weekend.
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 10:39 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Why are you bringing the cluster down every night? That is not ideal.
>
> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
> We bring the servers down nightly. these are on aws. This worked yesterday
> (Monday) but this (Tuesday) i went on to check on it and it was down, I
> guess i didn't check yesterday. I assume it was up as no one complained.,
> but it was up and kicking last week for sure.
>
> So not exactly sure when or what caused it, all services are up (tserver,
> master) so services are not crashing themselves.
>
> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>
> -S
> ------------------
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 9:16 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> What was going on in the tserver before you saw that error? Did it finish
> recovering after the restart? If it is still recovering, I don't think you
> will be able to do any scans.
>
> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
> <https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>
> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>  https://markmail.org/message/bc7ijdsgqmod5p2h
> <https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
>  but
> those are for lot older accumulo. and server didn't go out of memory so I
> think that must have been fixed..
>
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> github.com
> <https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>
>
>
> -S
>
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 8:47 AM
> *To:* user@accumulo.apache.org 
> *Subject:* [External] Re: odd issue

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Ligade, Shailesh [USA]
Thanks Mike,

After stopping all the services, i just moved /accumulo  to /old-accumulo and 
then ran

accumulo init --clear-instance-name --instance-name  --password 


With that plain vanilla accumulo came up after restarting the services.

Plan is to re-create all the tables from /old-accumulo using importdirectory

Although accumulo is up and I can see it from monitor and scan accumulo tables, 
I do see that error in master log..

-S

From: Mike Miller 
Sent: Wednesday, March 16, 2022 8:24 AM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

It is hard to help you without a full explanation of what exactly you are 
doing. Was that error in the Master log? What commands did you run exactly to 
"re-initialize"? Did you wipe all the data or just run "--reset-security"?

On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Well, I re-initialized accumulo but I still see

ERROR: Read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

Is there a setting that I can increase to get past it?

-S



From: Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>>
Sent: Tuesday, March 15, 2022 12:47 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Not daily but  over weekend.

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 10:39 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
 
https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Iss

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Mike Miller
It is hard to help you without a full explanation of what exactly you are
doing. Was that error in the Master log? What commands did you run exactly
to "re-initialize"? Did you wipe all the data or just run
"--reset-security"?

On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Well, I re-initialized accumulo but I still see
>
> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> Is there a setting that I can increase to get past it?
>
> -S
>
>
> --
> *From:* Ligade, Shailesh [USA] 
> *Sent:* Tuesday, March 15, 2022 12:47 PM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Not daily but  over weekend.
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 10:39 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Why are you bringing the cluster down every night? That is not ideal.
>
> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
> We bring the servers down nightly. these are on aws. This worked yesterday
> (Monday) but this (Tuesday) i went on to check on it and it was down, I
> guess i didn't check yesterday. I assume it was up as no one complained.,
> but it was up and kicking last week for sure.
>
> So not exactly sure when or what caused it, all services are up (tserver,
> master) so services are not crashing themselves.
>
> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>
> -S
> --------------
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 9:16 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> What was going on in the tserver before you saw that error? Did it finish
> recovering after the restart? If it is still recovering, I don't think you
> will be able to do any scans.
>
> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
> <https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>
> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>  https://markmail.org/message/bc7ijdsgqmod5p2h
> <https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
>  but
> those are for lot older accumulo. and server didn't go out of memory so I
> think that must have been fixed..
>
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> github.com
> <https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>
>
>
> -S
>
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 8:47 AM
> *To:* user@accumulo.apache.org 
> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Check your configuration. The log message indicates that there is a
> problem with the internal system user performing operations. The internal
> system user uses credentials derived from the configuration (such as the
> instance.secret field). Make sure your configuration is identical across
> all nodes in your cluster.
>
&

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Ligade, Shailesh [USA]
Well, I re-initialized accumulo but I still see

ERROR: Read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

Is there a setting that I can increase to get past it?

-S



From: Ligade, Shailesh [USA] 
Sent: Tuesday, March 15, 2022 12:47 PM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Not daily but  over weekend.

From: Mike Miller 
Sent: Tuesday, March 15, 2022 10:39 AM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
 
https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com<https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>


-S


From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Not daily but  over weekend.

From: Mike Miller 
Sent: Tuesday, March 15, 2022 10:39 AM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
 
https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com<https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>


-S


From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S



Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Mike Miller
Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks Mike,
>
> We bring the servers down nightly. these are on aws. This worked yesterday
> (Monday) but this (Tuesday) i went on to check on it and it was down, I
> guess i didn't check yesterday. I assume it was up as no one complained.,
> but it was up and kicking last week for sure.
>
> So not exactly sure when or what caused it, all services are up (tserver,
> master) so services are not crashing themselves.
>
> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>
> -S
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 9:16 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> What was going on in the tserver before you saw that error? Did it finish
> recovering after the restart? If it is still recovering, I don't think you
> will be able to do any scans.
>
> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
> <https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>
> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>  https://markmail.org/message/bc7ijdsgqmod5p2h
> <https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
>  but
> those are for lot older accumulo. and server didn't go out of memory so I
> think that must have been fixed..
>
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> github.com
> <https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>
>
>
> -S
>
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 8:47 AM
> *To:* user@accumulo.apache.org 
> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Check your configuration. The log message indicates that there is a
> problem with the internal system user performing operations. The internal
> system user uses credentials derived from the configuration (such as the
> instance.secret field). Make sure your configuration is identical across
> all nodes in your cluster.
>
> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Hello,
>
> I am getting little odd issue with accumulo starting up
>
> on tserver i am seeing
>
> [tserver.TabletServer] ERROR: Caller doesn't have permission to get active
> scnas
> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
>
> on the ,aster log i am seeing
>
> ERROR: read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> from the shell i can list all the tables but canot scan any. Monitor is
> shwoing tablet count 0 and unassigned tablet 1
>
> HDFS fsck is all healthy.
>
> Any suggestions?
>
> Thanks
>
> -S
>
>


Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller 
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
 
https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com<https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>


-S


From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
mailto:user@accumulo.apache.org>>
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S



Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Mike Miller
What was going on in the tserver before you saw that error? Did it finish
recovering after the restart? If it is still recovering, I don't think you
will be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks Mike,
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j,
> https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> https://markmail.org/message/bc7ijdsgqmod5p2h but those are for lot older
> accumulo. and server didn't go out of memory so I think that must have been
> fixed..
> <https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14>
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> <https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14>
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> github.com
>
>
> -S
>
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 8:47 AM
> *To:* user@accumulo.apache.org 
> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Check your configuration. The log message indicates that there is a
> problem with the internal system user performing operations. The internal
> system user uses credentials derived from the configuration (such as the
> instance.secret field). Make sure your configuration is identical across
> all nodes in your cluster.
>
> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Hello,
>
> I am getting little odd issue with accumulo starting up
>
> on tserver i am seeing
>
> [tserver.TabletServer] ERROR: Caller doesn't have permission to get active
> scnas
> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
>
> on the ,aster log i am seeing
>
> ERROR: read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> from the shell i can list all the tables but canot scan any. Monitor is
> shwoing tablet count 0 and unassigned tablet 1
>
> HDFS fsck is all healthy.
>
> Any suggestions?
>
> Thanks
>
> -S
>
>


Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j,  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14 
https://markmail.org/message/bc7ijdsgqmod5p2h but those are for lot older 
accumulo. and server didn't go out of memory so I think that must have been 
fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14>
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo<https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14>
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com


-S


From: Mike Miller 
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org 
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S