Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Not daily but  over weekend.

From: Mike Miller 
Sent: Tuesday, March 15, 2022 10:39 AM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org 
mailto:user@accumulo.apache.org>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
 
https://markmail.org/message/bc7ijdsgqmod5p2h
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com


-S


From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org 
mailto:user@accumulo.apache.org>>
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S



Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Mike Miller
Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks Mike,
>
> We bring the servers down nightly. these are on aws. This worked yesterday
> (Monday) but this (Tuesday) i went on to check on it and it was down, I
> guess i didn't check yesterday. I assume it was up as no one complained.,
> but it was up and kicking last week for sure.
>
> So not exactly sure when or what caused it, all services are up (tserver,
> master) so services are not crashing themselves.
>
> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>
> -S
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 9:16 AM
> *To:* user@accumulo.apache.org 
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
> What was going on in the tserver before you saw that error? Did it finish
> recovering after the restart? If it is still recovering, I don't think you
> will be able to do any scans.
>
> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
> 
> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> 
>  https://markmail.org/message/bc7ijdsgqmod5p2h
> 
>  but
> those are for lot older accumulo. and server didn't go out of memory so I
> think that must have been fixed..
>
> 
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> 
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> github.com
> 
>
>
> -S
>
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 8:47 AM
> *To:* user@accumulo.apache.org 
> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Check your configuration. The log message indicates that there is a
> problem with the internal system user performing operations. The internal
> system user uses credentials derived from the configuration (such as the
> instance.secret field). Make sure your configuration is identical across
> all nodes in your cluster.
>
> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Hello,
>
> I am getting little odd issue with accumulo starting up
>
> on tserver i am seeing
>
> [tserver.TabletServer] ERROR: Caller doesn't have permission to get active
> scnas
> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
>
> on the ,aster log i am seeing
>
> ERROR: read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> from the shell i can list all the tables but canot scan any. Monitor is
> shwoing tablet count 0 and unassigned tablet 1
>
> HDFS fsck is all healthy.
>
> Any suggestions?
>
> Thanks
>
> -S
>
>


Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S

From: Mike Miller 
Sent: Tuesday, March 15, 2022 9:16 AM
To: user@accumulo.apache.org 
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
 
https://markmail.org/message/bc7ijdsgqmod5p2h
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com


-S


From: Mike Miller mailto:mmil...@apache.org>>
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org 
mailto:user@accumulo.apache.org>>
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S



Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Mike Miller
What was going on in the tserver before you saw that error? Did it finish
recovering after the restart? If it is still recovering, I don't think you
will be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks Mike,
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j,
> https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> https://markmail.org/message/bc7ijdsgqmod5p2h but those are for lot older
> accumulo. and server didn't go out of memory so I think that must have been
> fixed..
> 
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> 
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> github.com
>
>
> -S
>
> --
> *From:* Mike Miller 
> *Sent:* Tuesday, March 15, 2022 8:47 AM
> *To:* user@accumulo.apache.org 
> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up
>
> Check your configuration. The log message indicates that there is a
> problem with the internal system user performing operations. The internal
> system user uses credentials derived from the configuration (such as the
> instance.secret field). Make sure your configuration is identical across
> all nodes in your cluster.
>
> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Hello,
>
> I am getting little odd issue with accumulo starting up
>
> on tserver i am seeing
>
> [tserver.TabletServer] ERROR: Caller doesn't have permission to get active
> scnas
> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
>
> on the ,aster log i am seeing
>
> ERROR: read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> from the shell i can list all the tables but canot scan any. Monitor is
> shwoing tablet count 0 and unassigned tablet 1
>
> HDFS fsck is all healthy.
>
> Any suggestions?
>
> Thanks
>
> -S
>
>


Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j,  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14 
https://markmail.org/message/bc7ijdsgqmod5p2h but those are for lot older 
accumulo. and server didn't go out of memory so I think that must have been 
fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com


-S


From: Mike Miller 
Sent: Tuesday, March 15, 2022 8:47 AM
To: user@accumulo.apache.org 
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
mailto:ligade_shail...@bah.com>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S



Re: odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Mike Miller
Check your configuration. The log message indicates that there is a problem
with the internal system user performing operations. The internal system
user uses credentials derived from the configuration (such as the
instance.secret field). Make sure your configuration is identical across
all nodes in your cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Hello,
>
> I am getting little odd issue with accumulo starting up
>
> on tserver i am seeing
>
> [tserver.TabletServer] ERROR: Caller doesn't have permission to get active
> scnas
> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
>
> on the ,aster log i am seeing
>
> ERROR: read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
> from the shell i can list all the tables but canot scan any. Monitor is
> shwoing tablet count 0 and unassigned tablet 1
>
> HDFS fsck is all healthy.
>
> Any suggestions?
>
> Thanks
>
> -S
>
>


odd issue with accumulo 1.10.0 starting up

2022-03-15 Thread Ligade, Shailesh [USA]
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S