Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Adam J. Shook Wed, 16 Mar 2022 11:46:53 -0700

This is certainly anecdotal, but we've seen this "ERROR: Read a frame size
of (large number)" before on our Accumulo cluster that would show up at a
regular and predictable frequency. The root cause was due to a routine scan
done by the security team looking for vulnerabilities across the entire
enterprise (nothing Accumulo-specific). I don't have any additional
information about the specifics of the scan. From all that we can tell, it
has no impact on our Accumulo cluster outside of these error messages.


--Adam

On Wed, Mar 16, 2022 at 8:35 AM Christopher <[email protected]> wrote:

> Since that error message is coming from the libthrift library, and not
> Accumulo code, we would need a lot more context to even begin helping you
> troubleshoot it. For example, the complete stack trace that shows the
> Accumulo code that called into the Thrift library, would be extremely
> helpful.
>
> It's a bit concerning that you're trying to send a single buffer over
> thrift that's over a gigabyte in size, according to that number. You've
> said before that you use live ingest. Are you trying to send a 1GB mutation
> to a tablet server? Or are you using replication and the stack trace looks
> like it's sending 1GB of replication data?
>
>
> On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
> [email protected]> wrote:
>
>> Well, I re-initialized accumulo but I still see
>>
>> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
>> allowable buffer size for ALL connections.
>>
>> Is there a setting that I can increase to get past it?
>>
>> -S
>>
>>
>> ------------------------------
>> *From:* Ligade, Shailesh [USA] <[email protected]>
>> *Sent:* Tuesday, March 15, 2022 12:47 PM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Not daily but  over weekend.
>> ------------------------------
>> *From:* Mike Miller <[email protected]>
>> *Sent:* Tuesday, March 15, 2022 10:39 AM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Why are you bringing the cluster down every night? That is not ideal.
>>
>> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
>> [email protected]> wrote:
>>
>> Thanks Mike,
>>
>> We bring the servers down nightly. these are on aws. This worked
>> yesterday (Monday) but this (Tuesday) i went on to check on it and it was
>> down, I guess i didn't check yesterday. I assume it was up as no one
>> complained., but it was up and kicking last week for sure.
>>
>> So not exactly sure when or what caused it, all services are up (tserver,
>> master) so services are not crashing themselves.
>>
>> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>>
>> -S
>> ------------------------------
>> *From:* Mike Miller <[email protected]>
>> *Sent:* Tuesday, March 15, 2022 9:16 AM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> What was going on in the tserver before you saw that error? Did it finish
>> recovering after the restart? If it is still recovering, I don't think you
>> will be able to do any scans.
>>
>> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
>> [email protected]> wrote:
>>
>> Thanks Mike,
>>
>> That was my first reaction but the instance is backed up by puppet and no
>> configuration was updated (i double checked and ran puppet manually as well
>> as automatically after restart), Since the system was operational
>> yesterday, So I think I can rule that out.
>>
>> For other error, I did see the exact error
>> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
>> <https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>
>> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
>> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>>  https://markmail.org/message/bc7ijdsgqmod5p2h
>> <https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
>>  but
>> those are for lot older accumulo. and server didn't go out of memory so I
>> think that must have been fixed..
>>
>> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>> COMET - accumulomaster out of memory issue · Issue #14 ·
>> RENCI-NRIG/COMET-Accumulo
>> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>> In COMET cluster running in AWS, node running accumulomaster also hosts
>> comet head node. In current deployment, EC2 instance is of type small which
>> has 2GB ram. Issue: Accumulomaster process is kil...
>> github.com
>> <https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>
>>
>>
>> -S
>>
>> ------------------------------
>> *From:* Mike Miller <[email protected]>
>> *Sent:* Tuesday, March 15, 2022 8:47 AM
>> *To:* [email protected] <[email protected]>
>> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Check your configuration. The log message indicates that there is a
>> problem with the internal system user performing operations. The internal
>> system user uses credentials derived from the configuration (such as the
>> instance.secret field). Make sure your configuration is identical across
>> all nodes in your cluster.
>>
>> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
>> [email protected]> wrote:
>>
>> Hello,
>>
>> I am getting little odd issue with accumulo starting up
>>
>> on tserver i am seeing
>>
>> [tserver.TabletServer] ERROR: Caller doesn't have permission to get
>> active scnas
>> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
>>
>> on the ,aster log i am seeing
>>
>> ERROR: read a frame size of 1195725856, which is bigger than the maximum
>> allowable buffer size for ALL connections.
>>
>> from the shell i can list all the tables but canot scan any. Monitor is
>> shwoing tablet count 0 and unassigned tablet 1
>>
>> HDFS fsck is all healthy.
>>
>> Any suggestions?
>>
>> Thanks
>>
>> -S
>>
>>

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Reply via email to