This is certainly anecdotal, but we've seen this "ERROR: Read a frame size of (large number)" before on our Accumulo cluster that would show up at a regular and predictable frequency. The root cause was due to a routine scan done by the security team looking for vulnerabilities across the entire enterprise (nothing Accumulo-specific). I don't have any additional information about the specifics of the scan. From all that we can tell, it has no impact on our Accumulo cluster outside of these error messages.
--Adam On Wed, Mar 16, 2022 at 8:35 AM Christopher <ctubb...@apache.org> wrote: > Since that error message is coming from the libthrift library, and not > Accumulo code, we would need a lot more context to even begin helping you > troubleshoot it. For example, the complete stack trace that shows the > Accumulo code that called into the Thrift library, would be extremely > helpful. > > It's a bit concerning that you're trying to send a single buffer over > thrift that's over a gigabyte in size, according to that number. You've > said before that you use live ingest. Are you trying to send a 1GB mutation > to a tablet server? Or are you using replication and the stack trace looks > like it's sending 1GB of replication data? > > > On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] < > ligade_shail...@bah.com> wrote: > >> Well, I re-initialized accumulo but I still see >> >> ERROR: Read a frame size of 1195725856, which is bigger than the maximum >> allowable buffer size for ALL connections. >> >> Is there a setting that I can increase to get past it? >> >> -S >> >> >> ------------------------------ >> *From:* Ligade, Shailesh [USA] <ligade_shail...@bah.com> >> *Sent:* Tuesday, March 15, 2022 12:47 PM >> *To:* user@accumulo.apache.org <user@accumulo.apache.org> >> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up >> >> Not daily but over weekend. >> ------------------------------ >> *From:* Mike Miller <mmil...@apache.org> >> *Sent:* Tuesday, March 15, 2022 10:39 AM >> *To:* user@accumulo.apache.org <user@accumulo.apache.org> >> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up >> >> Why are you bringing the cluster down every night? That is not ideal. >> >> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] < >> ligade_shail...@bah.com> wrote: >> >> Thanks Mike, >> >> We bring the servers down nightly. these are on aws. This worked >> yesterday (Monday) but this (Tuesday) i went on to check on it and it was >> down, I guess i didn't check yesterday. I assume it was up as no one >> complained., but it was up and kicking last week for sure. >> >> So not exactly sure when or what caused it, all services are up (tserver, >> master) so services are not crashing themselves. >> >> I guess worst case, i can re-initialize and recreate tables form hdfs..:-( >> >> -S >> ------------------------------ >> *From:* Mike Miller <mmil...@apache.org> >> *Sent:* Tuesday, March 15, 2022 9:16 AM >> *To:* user@accumulo.apache.org <user@accumulo.apache.org> >> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up >> >> What was going on in the tserver before you saw that error? Did it finish >> recovering after the restart? If it is still recovering, I don't think you >> will be able to do any scans. >> >> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] < >> ligade_shail...@bah.com> wrote: >> >> Thanks Mike, >> >> That was my first reaction but the instance is backed up by puppet and no >> configuration was updated (i double checked and ran puppet manually as well >> as automatically after restart), Since the system was operational >> yesterday, So I think I can rule that out. >> >> For other error, I did see the exact error >> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j >> <https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$> >> , https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14 >> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> >> https://markmail.org/message/bc7ijdsgqmod5p2h >> <https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$> >> but >> those are for lot older accumulo. and server didn't go out of memory so I >> think that must have been fixed.. >> >> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> >> COMET - accumulomaster out of memory issue · Issue #14 · >> RENCI-NRIG/COMET-Accumulo >> <https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> >> In COMET cluster running in AWS, node running accumulomaster also hosts >> comet head node. In current deployment, EC2 instance is of type small which >> has 2GB ram. Issue: Accumulomaster process is kil... >> github.com >> <https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$> >> >> >> -S >> >> ------------------------------ >> *From:* Mike Miller <mmil...@apache.org> >> *Sent:* Tuesday, March 15, 2022 8:47 AM >> *To:* user@accumulo.apache.org <user@accumulo.apache.org> >> *Subject:* [External] Re: odd issue with accumulo 1.10.0 starting up >> >> Check your configuration. The log message indicates that there is a >> problem with the internal system user performing operations. The internal >> system user uses credentials derived from the configuration (such as the >> instance.secret field). Make sure your configuration is identical across >> all nodes in your cluster. >> >> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] < >> ligade_shail...@bah.com> wrote: >> >> Hello, >> >> I am getting little odd issue with accumulo starting up >> >> on tserver i am seeing >> >> [tserver.TabletServer] ERROR: Caller doesn't have permission to get >> active scnas >> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS) >> >> on the ,aster log i am seeing >> >> ERROR: read a frame size of 1195725856, which is bigger than the maximum >> allowable buffer size for ALL connections. >> >> from the shell i can list all the tables but canot scan any. Monitor is >> shwoing tablet count 0 and unassigned tablet 1 >> >> HDFS fsck is all healthy. >> >> Any suggestions? >> >> Thanks >> >> -S >> >>