Re: [External] Re: odd issue with accumulo 1.10.0 starting up

2022-03-16 Thread Adam J. Shook
This is certainly anecdotal, but we've seen this "ERROR: Read a frame size
of (large number)" before on our Accumulo cluster that would show up at a
regular and predictable frequency. The root cause was due to a routine scan
done by the security team looking for vulnerabilities across the entire
enterprise (nothing Accumulo-specific). I don't have any additional
information about the specifics of the scan. From all that we can tell, it
has no impact on our Accumulo cluster outside of these error messages.

--Adam

On Wed, Mar 16, 2022 at 8:35 AM Christopher  wrote:

> Since that error message is coming from the libthrift library, and not
> Accumulo code, we would need a lot more context to even begin helping you
> troubleshoot it. For example, the complete stack trace that shows the
> Accumulo code that called into the Thrift library, would be extremely
> helpful.
>
> It's a bit concerning that you're trying to send a single buffer over
> thrift that's over a gigabyte in size, according to that number. You've
> said before that you use live ingest. Are you trying to send a 1GB mutation
> to a tablet server? Or are you using replication and the stack trace looks
> like it's sending 1GB of replication data?
>
>
> On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
>> Well, I re-initialized accumulo but I still see
>>
>> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
>> allowable buffer size for ALL connections.
>>
>> Is there a setting that I can increase to get past it?
>>
>> -S
>>
>>
>> --
>> *From:* Ligade, Shailesh [USA] 
>> *Sent:* Tuesday, March 15, 2022 12:47 PM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Not daily but  over weekend.
>> --
>> *From:* Mike Miller 
>> *Sent:* Tuesday, March 15, 2022 10:39 AM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> Why are you bringing the cluster down every night? That is not ideal.
>>
>> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
>> ligade_shail...@bah.com> wrote:
>>
>> Thanks Mike,
>>
>> We bring the servers down nightly. these are on aws. This worked
>> yesterday (Monday) but this (Tuesday) i went on to check on it and it was
>> down, I guess i didn't check yesterday. I assume it was up as no one
>> complained., but it was up and kicking last week for sure.
>>
>> So not exactly sure when or what caused it, all services are up (tserver,
>> master) so services are not crashing themselves.
>>
>> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>>
>> -S
>> --
>> *From:* Mike Miller 
>> *Sent:* Tuesday, March 15, 2022 9:16 AM
>> *To:* user@accumulo.apache.org 
>> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>>
>> What was going on in the tserver before you saw that error? Did it finish
>> recovering after the restart? If it is still recovering, I don't think you
>> will be able to do any scans.
>>
>> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
>> ligade_shail...@bah.com> wrote:
>>
>> Thanks Mike,
>>
>> That was my first reaction but the instance is backed up by puppet and no
>> configuration was updated (i double checked and ran puppet manually as well
>> as automatically after restart), Since the system was operational
>> yesterday, So I think I can rule that out.
>>
>> For other error, I did see the exact error
>> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
>> 
>> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
>> 
>>  https://markmail.org/message/bc7ijdsgqmod5p2h
>> 
>>  but
>> those are for lot older accumulo. and server didn't go out of memory so I
>> think that must have been fixed..
>>
>> 
>> COMET - accumulomaster out of memory issue · Issue #14 ·
>> RENCI-NRIG/COMET-Accumulo
>> 
>> In COMET cluster running in AWS, node running accumulomaster also hosts
>> comet head node. In current deployment, EC2 instance is of type small which
>> has 2GB ram. Issue: 

Re: [External] RE: accumulo 1.10 replication issue

2021-09-23 Thread Adam J. Shook
Yes, if it is not heavily used then you may see a significant delay. You
can change the defaults using tserver.walog.max.age [1] and
tserver.walog.max.size [2]. If I recall you can change these via the shell
and a restart is not required.

If you aren't seeing much ingestion, then the max age would be what you
want to set to ensure data is replicated within the window you want it to
be replicated.  Please keep in mind that setting either of these values to
very low thresholds will cause the WALs to roll over frequently and could
negatively impact the system, particularly for large Accumulo clusters.

In my experience, using Accumulo replication is a good fit when you have
longer SLAs on replication.  If you are looking for anything in the
near-real-time realm (milliseconds to seconds to maybe even a few minutes),
you'd be better off double writing to multiple Accumulo instances.

--Adam

[1]
https://accumulo.apache.org/1.10/accumulo_user_manual.html#_tserver_walog_max_age
[2]
https://accumulo.apache.org/1.10/accumulo_user_manual.html#_tserver_walog_max_size

On Thu, Sep 23, 2021 at 1:31 PM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks Adam,
>
>
>
> System is not heavily used, Does that mean it will wait for 1G data in wal
> file, (or 24 hours) before it will replicate?
>
>
>
> I don’t see any error In any log source master, tserver or target
> master,tserver
>
>
>
> Monitor replication page has correct status, and once in while I see
> In-Progress Replication section flashing by. But don’t see any new data in
> the target table. ☹
>
>
>
> -S
>
>
>
> *From:* Adam J. Shook 
> *Sent:* Thursday, September 23, 2021 12:10 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: [External] RE: accumulo 1.10 replication issue
>
>
>
> Yes, inserting via the shell will be enough to test it.
>
>
>
> Note that the replication system uses the write-ahead logs (WAL) to
> replicate the data.  These logs must be closed before any replication can
> occur, so there will be a delay before it shows up in the peer table.  How
> long of a delay depends on how much data is actively being written to the
> TabletServers (and therefore the WAL) and/or how much time has passed since
> the WAL was opened. The default max WAL data size is 1 GB and the max age
> is 24 hours.
>
>
>
> --Adam
>
>
>
> On Thu, Sep 23, 2021 at 11:13 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Adam,
>
>
>
> I am setting accumulo.name
> <https://urldefense.com/v3/__http:/accumulo.name__;!!May37g!au2NJ_bzRengNQZdhTHO9O38cNfzpFNzN8DFC49SzxW58cbMe9Vl20i58oJkg1wZWQ$>
> property in accumulo-site.xml. I think this property must be set to
> “Instance Name” value, I tried to set to “primary” and I saw error stating
> that instance id was not found in zookeeper
>
>
>
> I have few tables to replicate so I am thinking I will set all others
> properties using shell config command
>
>
>
> To test this, I just insert value using shell right? Or do I need to flush
> or compact on the table to see those values on the other side?
>
>
>
> -S
>
>
>
> *From:* Adam J. Shook 
> *Sent:* Thursday, September 23, 2021 11:08 AM
> *To:* user@accumulo.apache.org
> *Subject:* Re: [External] RE: accumulo 1.10 replication issue
>
>
>
> Your configurations look correct to me, and it sounds like it is partially
> working as you are seeing files that need replicated in the Accumulo
> Monitor. I do have the replication.name
> <https://urldefense.com/v3/__http:/replication.name__;!!May37g!YaShfRxRNA1m14PM-_NQTaWWuL-fcis6RlUI9RKQU68Q2oWUTZuh-Q1EkHA0XIO8vA$>
> and all replication.peer.* properties defined in accumulo-site.xml. Do you
> have all these properties defined there?  If not, try setting them in
> accumulo-site.xml and restarting your Accumulo services, particularly the
> Master and TabletServers.  The Master may not be queuing work and/or the
> TabletServers may not be looking for work.
>
>
>
> You should see DEBUG-level logs in the TabletServers that say "Looking for
> work in /accumulo//replication/workqueue", so enable debug
> logging if you haven't done so already in the generic_logger.xml file.
>
>
>
> --Adam
>
>
>
> On Thu, Sep 23, 2021 at 6:53 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks for reply
>
>
>
> I am using insert command from shell to insert data.
>
>
>
> Also, a quick question, replication.name
> <https://urldefense.com/v3/__http:/replication.name__;!!May37g!YaShfRxRNA1m14PM-_NQTaWWuL-fcis6RlUI9RKQU68Q2oWUTZuh-Q1EkHA0XIO8vA$>
> property can it be set using cli? Will that work or it must 

Re: [External] RE: accumulo 1.10 replication issue

2021-09-23 Thread Adam J. Shook
Yes, inserting via the shell will be enough to test it.

Note that the replication system uses the write-ahead logs (WAL) to
replicate the data.  These logs must be closed before any replication can
occur, so there will be a delay before it shows up in the peer table.  How
long of a delay depends on how much data is actively being written to the
TabletServers (and therefore the WAL) and/or how much time has passed since
the WAL was opened. The default max WAL data size is 1 GB and the max age
is 24 hours.

--Adam

On Thu, Sep 23, 2021 at 11:13 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks Adam,
>
>
>
> I am setting accumulo.name property in accumulo-site.xml. I think this
> property must be set to “Instance Name” value, I tried to set to “primary”
> and I saw error stating that instance id was not found in zookeeper
>
>
>
> I have few tables to replicate so I am thinking I will set all others
> properties using shell config command
>
>
>
> To test this, I just insert value using shell right? Or do I need to flush
> or compact on the table to see those values on the other side?
>
>
>
> -S
>
>
>
> *From:* Adam J. Shook 
> *Sent:* Thursday, September 23, 2021 11:08 AM
> *To:* user@accumulo.apache.org
> *Subject:* Re: [External] RE: accumulo 1.10 replication issue
>
>
>
> Your configurations look correct to me, and it sounds like it is partially
> working as you are seeing files that need replicated in the Accumulo
> Monitor. I do have the replication.name
> <https://urldefense.com/v3/__http:/replication.name__;!!May37g!YaShfRxRNA1m14PM-_NQTaWWuL-fcis6RlUI9RKQU68Q2oWUTZuh-Q1EkHA0XIO8vA$>
> and all replication.peer.* properties defined in accumulo-site.xml. Do you
> have all these properties defined there?  If not, try setting them in
> accumulo-site.xml and restarting your Accumulo services, particularly the
> Master and TabletServers.  The Master may not be queuing work and/or the
> TabletServers may not be looking for work.
>
>
>
> You should see DEBUG-level logs in the TabletServers that say "Looking for
> work in /accumulo//replication/workqueue", so enable debug
> logging if you haven't done so already in the generic_logger.xml file.
>
>
>
> --Adam
>
>
>
> On Thu, Sep 23, 2021 at 6:53 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks for reply
>
>
>
> I am using insert command from shell to insert data.
>
>
>
> Also, a quick question, replication.name
> <https://urldefense.com/v3/__http:/replication.name__;!!May37g!YaShfRxRNA1m14PM-_NQTaWWuL-fcis6RlUI9RKQU68Q2oWUTZuh-Q1EkHA0XIO8vA$>
> property can it be set using cli? Will that work or it must be defined in
> accumilo-site.xml?
>
>
>
> Thanks
>
> -S
>
>
>
>
>
> *From:* d...@etcoleman.com 
> *Sent:* Thursday, September 23, 2021 6:50 AM
> *To:* user@accumulo.apache.org
> *Subject:* [External] RE: accumulo 1.10 replication issue
>
>
>
> How are you inserting the data?
>
>
>
> *From:* Ligade, Shailesh [USA] 
> *Sent:* Wednesday, September 22, 2021 10:22 PM
> *To:* user@accumulo.apache.org
> *Subject:* accumulo 1.10 replication issue
>
>
>
> Hello,
>
>
>
> I am following
>
> Apache Accumulo® User Manual Version 1.10
> <https://urldefense.com/v3/__https:/accumulo.apache.org/1.10/accumulo_user_manual.html*_replication__;Iw!!May37g!epzOA4Zxtj4kjXvE1dPTtAae7AAFXbiZCVMxVk6_yQ3O-AlaG8GkML6q5OX1nt3O0A$>
>
>
>
> I want to setup replication from accumulo instance inst1, table source, TO
> inst2, table target
>
> I created a replication user,( same password) on both instances and grant
> Table.READ/WRITE for source and target respectively
>
>
>
> I set replication.name
> <https://urldefense.com/v3/__http:/replication.name__;!!May37g!YaShfRxRNA1m14PM-_NQTaWWuL-fcis6RlUI9RKQU68Q2oWUTZuh-Q1EkHA0XIO8vA$>
> property to be same as inst on both instances
>
>
>
> On inst1 Set following properties
>
>
>
>
> replication.peer.inst1=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,inst2,inst2zoo1:2181,inst2zoo2:2181,inst2zoo3:2181
>
> replication.peer.user.inst2=replication
>
> replication.peer.password.inst2=replication
>
>
>
> set the source table for replication
>
> config -t source -s table.replication=true
>
> config -t source -s table.replication.target.inst2=(number I got for
> target table from inst2 tables -l command)
>
>
>
> and finally I did
>
> online accumulo.replication
>
>
>
> Now when I insert data in source, I get feiles needing replication 1 on
> the monitor replication section. All other values are correct, TABLE –
> source, PEER – inst2 REMOTE ID as number I set
>
>
>
> However my In-Progress Replication always stay empty and I don’t see any
> data in inst2 target table
>
>
>
> No errors that I can see in master log or tserver log where tablet exist.
>
>
>
> Any idea what may be wrong? Is there any way to debug this?
>
>
>
> -S
>
>
>
>
>
>
>
>


Re: [External] RE: accumulo 1.10 replication issue

2021-09-23 Thread Adam J. Shook
Your configurations look correct to me, and it sounds like it is partially
working as you are seeing files that need replicated in the Accumulo
Monitor. I do have the replication.name and all replication.peer.*
properties defined in accumulo-site.xml. Do you have all these properties
defined there?  If not, try setting them in accumulo-site.xml and
restarting your Accumulo services, particularly the Master and
TabletServers.  The Master may not be queuing work and/or the TabletServers
may not be looking for work.

You should see DEBUG-level logs in the TabletServers that say "Looking for
work in /accumulo//replication/workqueue", so enable debug
logging if you haven't done so already in the generic_logger.xml file.

--Adam

On Thu, Sep 23, 2021 at 6:53 AM Ligade, Shailesh [USA] <
ligade_shail...@bah.com> wrote:

> Thanks for reply
>
>
>
> I am using insert command from shell to insert data.
>
>
>
> Also, a quick question, replication.name property can it be set using
> cli? Will that work or it must be defined in accumilo-site.xml?
>
>
>
> Thanks
>
> -S
>
>
>
>
>
> *From:* d...@etcoleman.com 
> *Sent:* Thursday, September 23, 2021 6:50 AM
> *To:* user@accumulo.apache.org
> *Subject:* [External] RE: accumulo 1.10 replication issue
>
>
>
> How are you inserting the data?
>
>
>
> *From:* Ligade, Shailesh [USA] 
> *Sent:* Wednesday, September 22, 2021 10:22 PM
> *To:* user@accumulo.apache.org
> *Subject:* accumulo 1.10 replication issue
>
>
>
> Hello,
>
>
>
> I am following
>
> Apache Accumulo® User Manual Version 1.10
> 
>
>
>
> I want to setup replication from accumulo instance inst1, table source, TO
> inst2, table target
>
> I created a replication user,( same password) on both instances and grant
> Table.READ/WRITE for source and target respectively
>
>
>
> I set replication.name property to be same as inst on both instances
>
>
>
> On inst1 Set following properties
>
>
>
>
> replication.peer.inst1=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,inst2,inst2zoo1:2181,inst2zoo2:2181,inst2zoo3:2181
>
> replication.peer.user.inst2=replication
>
> replication.peer.password.inst2=replication
>
>
>
> set the source table for replication
>
> config -t source -s table.replication=true
>
> config -t source -s table.replication.target.inst2=(number I got for
> target table from inst2 tables -l command)
>
>
>
> and finally I did
>
> online accumulo.replication
>
>
>
> Now when I insert data in source, I get feiles needing replication 1 on
> the monitor replication section. All other values are correct, TABLE –
> source, PEER – inst2 REMOTE ID as number I set
>
>
>
> However my In-Progress Replication always stay empty and I don’t see any
> data in inst2 target table
>
>
>
> No errors that I can see in master log or tserver log where tablet exist.
>
>
>
> Any idea what may be wrong? Is there any way to debug this?
>
>
>
> -S
>
>
>
>
>
>
>


Re: Noob questions

2020-04-14 Thread Adam J. Shook
limitVersion = false would *not* set the default VersioningIterator,
effectively keeping every entry you write to Accumulo.  Sounds like it hits
your requirement of "versions never to be removed", though keep in mind
that your static "metadata" qualifier would also never be versioned/deleted.

On Mon, Apr 13, 2020 at 8:47 PM Niclas Hedhman  wrote:

> Ah! I had some misunderstandings implanted in me, and good to get
> corrected.
>
> For
>
> connector.tableOperations.create(String tableName, boolean limitVersion);
>
>
> Will limitVersion=false disable versioning completely and I will always
> only have one version, or will it have a "no limit" and "no removal" policy
> of versions?
>
> Well, to be clear, I am looking for "versions never to be removed", a
> requirement that made me smile and remember "Accumulo can do that
> automatically", rather than implement that at a higher level.
>
> Thanks
>
> On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook 
> wrote:
>
>> Hi Niclas,
>>
>> 1. Accumulo uses a VersioningIterator for all tables which ensures that
>> you see the latest version of a particular entry, defined as the entry that
>> has the highest value for the timestamp.  Older versions of the same key
>> (row ID + family + qualifier + visibility) are compacted away by Accumulo
>> and will eventually be deleted.  You can set the number of versions you
>> want to keep to something other than the default of 1 (see
>> https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps
>> ).
>>
>> 2. Related to #1, Accumulo will update the value to the latest version of
>> entry.  I believe if you keep writing the same entry with the same data
>> over and over again, you'll see them if you are keeping more than one
>> version of the same entry.  AFAIK there is no "put if absent" behavior
>> without reading for every write.  You can, of course, configure an existing
>> iterator or write your own to achieve whatever logic you want as far as
>> what versions to keep of what columns of your data model.
>>
>> 3. The "Scanner" will return entries in order.  Related to #1, it will
>> only return the latest version of an entry (by default).  If you are
>> keeping more versions of the same entry, then you would see the newest
>> entry first.  The "BatchScanner" is multi-threaded and communicates to
>> several tablets at once, returning entries out of order.  One common
>> pattern is to use the WholeRowIterator when scanning.  This iterator
>> serializes all entries with the same row into one entry on the server side,
>> then you can deserialize the row on the client side to view the entire
>> contents of a row at once.  The order of the rows themselves is still
>> undefined when using a BatchScanner due to the multi-threaded nature of the
>> scanner.
>>
>> Hope this helps!
>> --Adam
>>
>> On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman 
>> wrote:
>>
>>> Hi,
>>> I am steaming new on Accumulo, but tasked to put it into what used to be
>>> Apache Polygene (now in Attic) as a entity store, one that keeps history.
>>>
>>> I have a couple of questions;
>>> 1. Assuming that I can guarantee that no one executes any explicit
>>> deletes, can I rely on the mutation sequences not disappearing over time?
>>>
>>> 2. Part of storing a row, I have a "metadata" qualifier, that contains
>>> static information. But since I don't know whether the row exists without
>>> reading it first, then IIUIC I will fill the "metadata" with the same
>>> information over and over again OR, does Accumulo realize that this is
>>> the same byte[] as before and won't update the value, alternatively
>>> creating a new Key, but pointing to the same Value?  I effectively want a
>>> "putIfAbsent()"
>>>
>>> 3. The Scanner can fetch multiple rows, and constrained by CF and
>>> qualifier. I think that is quite clear. But what does the iterator()
>>> actually return? I presume that it is many key/value paris, of ALL
>>> timestamped values. But what is the order guarantees here? I get the
>>> impression that within a row->cf->qualifier, the returned values are in
>>> timestamp order, newest first. And I think that within a row, I am
>>> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all
>>> ascending). But am I also guaranteed that the iterator is "done" with a row
>>> when the has changed? Or can rows be interleaved in the iterator?
>>>
>>> Thanks in advance
>>> Niclas
>>>
>>


Re: Noob questions

2020-04-13 Thread Adam J. Shook
Hi Niclas,

1. Accumulo uses a VersioningIterator for all tables which ensures that you
see the latest version of a particular entry, defined as the entry that has
the highest value for the timestamp.  Older versions of the same key (row
ID + family + qualifier + visibility) are compacted away by Accumulo and
will eventually be deleted.  You can set the number of versions you want to
keep to something other than the default of 1 (see
https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps
).

2. Related to #1, Accumulo will update the value to the latest version of
entry.  I believe if you keep writing the same entry with the same data
over and over again, you'll see them if you are keeping more than one
version of the same entry.  AFAIK there is no "put if absent" behavior
without reading for every write.  You can, of course, configure an existing
iterator or write your own to achieve whatever logic you want as far as
what versions to keep of what columns of your data model.

3. The "Scanner" will return entries in order.  Related to #1, it will only
return the latest version of an entry (by default).  If you are keeping
more versions of the same entry, then you would see the newest entry
first.  The "BatchScanner" is multi-threaded and communicates to several
tablets at once, returning entries out of order.  One common pattern is to
use the WholeRowIterator when scanning.  This iterator serializes all
entries with the same row into one entry on the server side, then you can
deserialize the row on the client side to view the entire contents of a row
at once.  The order of the rows themselves is still undefined when using a
BatchScanner due to the multi-threaded nature of the scanner.

Hope this helps!
--Adam

On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman  wrote:

> Hi,
> I am steaming new on Accumulo, but tasked to put it into what used to be
> Apache Polygene (now in Attic) as a entity store, one that keeps history.
>
> I have a couple of questions;
> 1. Assuming that I can guarantee that no one executes any explicit
> deletes, can I rely on the mutation sequences not disappearing over time?
>
> 2. Part of storing a row, I have a "metadata" qualifier, that contains
> static information. But since I don't know whether the row exists without
> reading it first, then IIUIC I will fill the "metadata" with the same
> information over and over again OR, does Accumulo realize that this is
> the same byte[] as before and won't update the value, alternatively
> creating a new Key, but pointing to the same Value?  I effectively want a
> "putIfAbsent()"
>
> 3. The Scanner can fetch multiple rows, and constrained by CF and
> qualifier. I think that is quite clear. But what does the iterator()
> actually return? I presume that it is many key/value paris, of ALL
> timestamped values. But what is the order guarantees here? I get the
> impression that within a row->cf->qualifier, the returned values are in
> timestamp order, newest first. And I think that within a row, I am
> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all
> ascending). But am I also guaranteed that the iterator is "done" with a row
> when the has changed? Or can rows be interleaved in the iterator?
>
> Thanks in advance
> Niclas
>


Re: Accumulo Tracer?

2020-02-28 Thread Adam J. Shook
I've used the Accumulo Tracer API before to help identify bottlenecks in my
scans.  You can find the most recent traces in the Accumulo Monitor UI, and
there are also some tools you can use to view the contents of the trace
table.  See section 18.10.4 "Viewing Collected Traces" at
https://accumulo.apache.org/1.9/accumulo_user_manual.html.

On Fri, Feb 28, 2020 at 6:44 PM mhd wrk  wrote:

> Hi,
>
> Our Accumulo deployment uses custom Authenticator and Authorizer and also
> attaches few custom filters/iterators to tables during scan time. The
> challenge is that we are seeing very slow  response when loading the table
> inside a spark shell and doing a simple count.
> I was thinking of adding logs to all our custom components to collect
> metrics then I came across Accumulo Tracer which seems, somehow, targets
> the same concerns but requires its own custom coding and also, so far, I
> don't find the content of the trace table very easy to read/interpret.
>
> Any suggestions/recommendations?
>
> Thanks,
>


Re: upgrading from 1.8.x to 1.9.x

2019-06-10 Thread Adam J. Shook
1.9.x is effectively the continuation of the 1.8.x bug fix releases.  I've
also upgraded several clusters from 1.8 to 1.9.  There were some issues
identified and fixed in the interim 1.9.x versions that you may experience,
so I would recommend upgrading directly to the latest 1.9.3.

On Mon, Jun 10, 2019 at 3:08 PM Mike Miller  wrote:

> I can't say I have done it personally but 1.8 and 1.9 are very similar so
> you "shouldn't" have any problems.
>
> Here are the upgrade instructions:
> https://accumulo.apache.org/docs/2.x/administration/upgrading
>
> On Mon, Jun 10, 2019 at 2:33 PM Bulldog20630405 
> wrote:
>
>>
>> i did not see any upgrading instruction in the 1.9 userguide.
>>
>> are there any surprises?
>>
>> can i simply shutdown 1.8.x and replace binaries and then startup the 1.9
>> services?
>>
>>
>>


Re: upgrading from Accumulo 1.8.1 to 1.9.2

2019-03-18 Thread Adam J. Shook
It is possible to do a rolling upgrade from 1.8.1 to 1.9.2 -- no need to
shut down the whole cluster.  The 1.9 series is effectively a continuation
of the 1.8 line, however some client methods were deprecated which caused
it to be a 1.9 release per the laws of semantic versioning.  If your code
is using the deprecated ClientConfiguration constructors, you'll need to
update to use the new static methods before upgrading to Accumulo 2.x.  See
release notes regarding the deprecated code here:
https://accumulo.apache.org/release/accumulo-1.9.0/#deprecated-clientconfiguration-api-using-commons-config

Cheers,
--Adam

On Mon, Mar 18, 2019 at 2:23 PM Thomas Adam  wrote:

> Hi,
>
> I was wondering if it is possible to do a rolling upgrade of Accumulo
> from 1.8.1 to 1.9.2. It says that 1.9.x adds additional API calls and
> bug fixes, so I wasn't sure if I need to shut down the whole cluster to
> do the upgrade and it doesn't seem to mention this upgrade in the
> documentation online.
>
> Thank you for your time,
>
> Thomas
>
>


Re: Corrupt WAL

2018-08-22 Thread Adam J. Shook
The code referenced in the PR works to detect and move a WAL, replacing it
with an empty one, but isn't fully wrapped up/merged.  Some priorities were
shifted and this got pushed back, though I do plan on addressing the
comments in the code review Soon™.

I'd suggest upgrading to 1.9.2 once you resolve the issue.  We've been
running it for a while and have not had any WAL-related errors.

--Adam

On Tue, Aug 21, 2018 at 6:58 PM Ed Coleman  wrote:

> The has been work done in https://github.com/apache/accumulo/pull/574.
> I'm not certain of the state of the code, but the description may provide
> you with things that you could look at manually.
>
>
> -Original Message-
> From: tech.s...@gmail.com [mailto:tech.s...@gmail.com]
> Sent: Tuesday, August 21, 2018 5:45 PM
> To: user@accumulo.apache.org
> Subject: Re: Corrupt WAL
>
> Was there any success with this workaround strategy?  I am also
> experiencing this issue.
>
> On 2018/06/13 16:30:22, "Adam J. Shook"  wrote:
> > Sorry, I had the error backwards.  There is an OPEN for the WAL and
> > then immediately a COMPACTION_FINISH entry.  This would cause the error.
> >
> > On Wed, Jun 13, 2018 at 11:34 AM, Adam J. Shook 
> > wrote:
> >
> > > Looking at the log I see that the last two entries are
> > > COMPACTION_START of one RFile immediately followed by a
> > > COMPACTION_START of a separate RFile which (I believe) would lead to
> > > the error.  Would this necessarily be an issue if the compactions are
> for separate RFiles?
> > >
> > > This is a dev cluster and I don't necessarily care about it, but is
> > > there a (good) means to do WAL log surgery?  I imagine I can just
> > > chop off bytes until the log is parseable and missing the info about
> the compactions.
> > >
> > > On Tue, Jun 12, 2018 at 2:32 PM, Keith Turner 
> wrote:
> > >
> > >> On Tue, Jun 12, 2018 at 12:10 PM, Adam J. Shook
> > >> 
> > >> wrote:
> > >> > Yes, that is the error.  I'll inspect the logs and report back.
> > >>
> > >> Ok.  The LogReader command has a mechanism to filter which tablet
> > >> is displayed.  If the walog has  alot of data in it, may need to
> > >> use this.
> > >>
> > >> Also, be aware that only 5 mutations are shown for a "many mutations"
> > >> objects in the walog.   The -m options changes this.  May want to see
> > >> more when deciding if the info in the log is important.
> > >>
> > >>
> > >> >
> > >> > On Tue, Jun 12, 2018 at 10:14 AM, Keith Turner 
> > >> wrote:
> > >> >>
> > >> >> Is the message you are seeing "COMPACTION_FINISH (without
> > >> >> preceding COMPACTION_START)" ?  That messages indicates that the
> > >> >> WALs are incomplete, probably as a result of the NN problems.
> > >> >> Could do the following :
> > >> >>
> > >> >> 1) Run the following command to see whats in the log.  Need to
> > >> >> see what is there for the root tablet.
> > >> >>
> > >> >>accumulo org.apache.accumulo.tserver.logger.LogReader
> > >> >>
> > >> >> 2) Replace the log file with an empty file after seeing if there
> > >> >> is anything important in it.
> > >> >>
> > >> >> I think the list of WALs for the root tablet is stored in ZK at
> > >> >> /accumulo//walogs
> > >> >>
> > >> >> On Mon, Jun 11, 2018 at 5:26 PM, Adam J. Shook
> > >> >> 
> > >> >> wrote:
> > >> >> > Hey all,
> > >> >> >
> > >> >> > The root tablet on one of our dev systems isn't loading due to
> > >> >> > an illegal state exception -- COMPACTION_FINISH preceding
> > >> >> > COMPACTION_START.
> > >> What'd
> > >> >> > be
> > >> >> > the best way to mitigate this issue?  This was likely caused
> > >> >> > due to
> > >> both
> > >> >> > of
> > >> >> > our NameNodes failing.
> > >> >> >
> > >> >> > Thank you,
> > >> >> > --Adam
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>
>


Re: Corrupt WAL

2018-06-13 Thread Adam J. Shook
Sorry, I had the error backwards.  There is an OPEN for the WAL and then
immediately a COMPACTION_FINISH entry.  This would cause the error.

On Wed, Jun 13, 2018 at 11:34 AM, Adam J. Shook 
wrote:

> Looking at the log I see that the last two entries are COMPACTION_START of
> one RFile immediately followed by a COMPACTION_START of a separate RFile
> which (I believe) would lead to the error.  Would this necessarily be an
> issue if the compactions are for separate RFiles?
>
> This is a dev cluster and I don't necessarily care about it, but is there
> a (good) means to do WAL log surgery?  I imagine I can just chop off bytes
> until the log is parseable and missing the info about the compactions.
>
> On Tue, Jun 12, 2018 at 2:32 PM, Keith Turner  wrote:
>
>> On Tue, Jun 12, 2018 at 12:10 PM, Adam J. Shook 
>> wrote:
>> > Yes, that is the error.  I'll inspect the logs and report back.
>>
>> Ok.  The LogReader command has a mechanism to filter which tablet is
>> displayed.  If the walog has  alot of data in it, may need to use
>> this.
>>
>> Also, be aware that only 5 mutations are shown for a "many mutations"
>> objects in the walog.   The -m options changes this.  May want to see
>> more when deciding if the info in the log is important.
>>
>>
>> >
>> > On Tue, Jun 12, 2018 at 10:14 AM, Keith Turner 
>> wrote:
>> >>
>> >> Is the message you are seeing "COMPACTION_FINISH (without preceding
>> >> COMPACTION_START)" ?  That messages indicates that the WALs are
>> >> incomplete, probably as a result of the NN problems.  Could do the
>> >> following :
>> >>
>> >> 1) Run the following command to see whats in the log.  Need to see
>> >> what is there for the root tablet.
>> >>
>> >>accumulo org.apache.accumulo.tserver.logger.LogReader
>> >>
>> >> 2) Replace the log file with an empty file after seeing if there is
>> >> anything important in it.
>> >>
>> >> I think the list of WALs for the root tablet is stored in ZK at
>> >> /accumulo//walogs
>> >>
>> >> On Mon, Jun 11, 2018 at 5:26 PM, Adam J. Shook 
>> >> wrote:
>> >> > Hey all,
>> >> >
>> >> > The root tablet on one of our dev systems isn't loading due to an
>> >> > illegal
>> >> > state exception -- COMPACTION_FINISH preceding COMPACTION_START.
>> What'd
>> >> > be
>> >> > the best way to mitigate this issue?  This was likely caused due to
>> both
>> >> > of
>> >> > our NameNodes failing.
>> >> >
>> >> > Thank you,
>> >> > --Adam
>> >
>> >
>>
>
>


Re: Corrupt WAL

2018-06-11 Thread Adam J. Shook
The WAL is from 1.9.1.

On Mon, Jun 11, 2018 at 6:33 PM, Christopher  wrote:

> That's what I was thinking it was related to. Do you know if the
> particular WAL file was created from a previous version, from before you
> upgraded?
>
> On Mon, Jun 11, 2018 at 6:00 PM Adam J. Shook 
> wrote:
>
>> Sorry would have been good to include that :)  It's the newest 1.9.1.  I
>> think it relates to https://github.com/apache/accumulo/pull/458, just
>> not sure what the best thing to do here is.
>>
>> On Mon, Jun 11, 2018 at 5:46 PM, Christopher  wrote:
>>
>>> What version are you using?
>>>
>>> On Mon, Jun 11, 2018 at 5:27 PM Adam J. Shook 
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>> The root tablet on one of our dev systems isn't loading due to an
>>>> illegal state exception -- COMPACTION_FINISH preceding COMPACTION_START.
>>>> What'd be the best way to mitigate this issue?  This was likely caused due
>>>> to both of our NameNodes failing.
>>>>
>>>> Thank you,
>>>> --Adam
>>>>
>>>
>>


Corrupt WAL

2018-06-11 Thread Adam J. Shook
Hey all,

The root tablet on one of our dev systems isn't loading due to an illegal
state exception -- COMPACTION_FINISH preceding COMPACTION_START.  What'd be
the best way to mitigate this issue?  This was likely caused due to both of
our NameNodes failing.

Thank you,
--Adam


Re: Question on missing RFiles

2018-05-16 Thread Adam J. Shook
Thanks for all of your help.  We have a peer cluster that we'll be using to
do some data reconciliation.

On Wed, May 16, 2018 at 11:29 AM, Michael Wall <mjw...@gmail.com> wrote:

> Since the rfiles on disk are "later" then the ones references, I tend to
> think old metadata got rewritten.  Since you can't get a timeline to better
> understand what happened, the only think I can think of is reingest all
> data since a known good point.  And then do thing to make the future better
> like tweak what logs you have save and upgrade to 1.9.1.  Sorry, I wish I
> had better answers for you.
>
>
> On Wed, May 16, 2018 at 11:25 AM Adam J. Shook <adamjsh...@gmail.com>
> wrote:
>
>> I tried building a timeline but the logs are just not there.  We weren't
>> sending the debug logs to Splunk due to the verbosity, but we may be
>> tweaking the log4j settings a bit to make sure we get the log data stored
>> in the event this happens again.  This very well could be attributed to the
>> recovery failure; hard to say.  I'll be upgrading to 1.9.1 soon.
>>
>> On Mon, May 14, 2018 at 8:53 AM, Michael Wall <mjw...@gmail.com> wrote:
>>
>>> Can you pick some of the files that are missing and search through your
>>> logs to put together a timeline?  See if you can find that file for a
>>> specific tablet.  Then grab all the logs for when a file was created as
>>> result of a compaction, and a when a file was included in compaction for
>>> that table.  Follow compactions for that tablet until you started getting
>>> errors.  Then see what logs you have for WAL replay during that time for
>>> that tablet and the metadata and can try to correlate.
>>>
>>> It's a shame you don't have the GC logs.  If you saw it was GC'd then
>>> showed up in the metadata table again that would help explain what
>>> happened.  Like Christopher mentioned, this could be related to a recovery
>>> failure.
>>>
>>> Mike
>>>
>>> On Sat, May 12, 2018 at 5:26 PM Adam J. Shook <adamjsh...@gmail.com>
>>> wrote:
>>>
>>>> WALs are turned on.  Durability is set to flush for all tables except
>>>> for root and metadata which are sync.  The current rfile names on HDFS
>>>> and in the metadata table are greater than the files that are missing.
>>>>  Searched through all of our current and historical logs in Splunk (which
>>>> are only INFO level or higher).  Issues from the logs:
>>>>
>>>> * Problem reports saying the files are not found
>>>> * IllegalStateException saying the rfile is closed when it tried to
>>>> load the Bloom filter (likely the flappy DataNode)
>>>> * IOException when reading the file saying Stream is closed (likely the
>>>> flappy DataNode)
>>>>
>>>> Nothing in the GC logs -- all the above errors are in the tablet server
>>>> logs.  The logs may have rolled over, though, and our debug logs don't make
>>>> it into Splunk.
>>>>
>>>> --Adam
>>>>
>>>> On Fri, May 11, 2018 at 6:16 PM, Christopher <ctubb...@apache.org>
>>>> wrote:
>>>>
>>>>> Oh, it occurs to me that this may be related to the WAL bugs that
>>>>> Keith fixed for 1.9.1... which could affect the metadata table recovery
>>>>> after a failure.
>>>>>
>>>>> On Fri, May 11, 2018 at 6:11 PM Michael Wall <mjw...@gmail.com> wrote:
>>>>>
>>>>>> Adam,
>>>>>>
>>>>>> Do you have GC logs?  Can you see if those missing RFiles were
>>>>>> removed by the GC process?  That could indicate you somehow got old
>>>>>> metadata info replayed.  Also, the rfiles increment so compare the 
>>>>>> current
>>>>>> rfile names in the srv.dir directory vs what is in the metadata table.  
>>>>>> Are
>>>>>> the existing files after files in the metadata.  Finally, pick a few of 
>>>>>> the
>>>>>> missing files and grep all your master and tserver logs to see if you can
>>>>>> learn anything.  This sounds ungood.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Fri, May 11, 2018 at 6:06 PM Christopher <ctubb...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> This is strange. I've only ever seen this when HDFS has reported
>>>>>>> problems, such

Re: Question on missing RFiles

2018-05-16 Thread Adam J. Shook
I tried building a timeline but the logs are just not there.  We weren't
sending the debug logs to Splunk due to the verbosity, but we may be
tweaking the log4j settings a bit to make sure we get the log data stored
in the event this happens again.  This very well could be attributed to the
recovery failure; hard to say.  I'll be upgrading to 1.9.1 soon.

On Mon, May 14, 2018 at 8:53 AM, Michael Wall <mjw...@gmail.com> wrote:

> Can you pick some of the files that are missing and search through your
> logs to put together a timeline?  See if you can find that file for a
> specific tablet.  Then grab all the logs for when a file was created as
> result of a compaction, and a when a file was included in compaction for
> that table.  Follow compactions for that tablet until you started getting
> errors.  Then see what logs you have for WAL replay during that time for
> that tablet and the metadata and can try to correlate.
>
> It's a shame you don't have the GC logs.  If you saw it was GC'd then
> showed up in the metadata table again that would help explain what
> happened.  Like Christopher mentioned, this could be related to a recovery
> failure.
>
> Mike
>
> On Sat, May 12, 2018 at 5:26 PM Adam J. Shook <adamjsh...@gmail.com>
> wrote:
>
>> WALs are turned on.  Durability is set to flush for all tables except for
>> root and metadata which are sync.  The current rfile names on HDFS and
>> in the metadata table are greater than the files that are missing.
>>  Searched through all of our current and historical logs in Splunk (which
>> are only INFO level or higher).  Issues from the logs:
>>
>> * Problem reports saying the files are not found
>> * IllegalStateException saying the rfile is closed when it tried to load
>> the Bloom filter (likely the flappy DataNode)
>> * IOException when reading the file saying Stream is closed (likely the
>> flappy DataNode)
>>
>> Nothing in the GC logs -- all the above errors are in the tablet server
>> logs.  The logs may have rolled over, though, and our debug logs don't make
>> it into Splunk.
>>
>> --Adam
>>
>> On Fri, May 11, 2018 at 6:16 PM, Christopher <ctubb...@apache.org> wrote:
>>
>>> Oh, it occurs to me that this may be related to the WAL bugs that Keith
>>> fixed for 1.9.1... which could affect the metadata table recovery after a
>>> failure.
>>>
>>> On Fri, May 11, 2018 at 6:11 PM Michael Wall <mjw...@gmail.com> wrote:
>>>
>>>> Adam,
>>>>
>>>> Do you have GC logs?  Can you see if those missing RFiles were removed
>>>> by the GC process?  That could indicate you somehow got old metadata info
>>>> replayed.  Also, the rfiles increment so compare the current rfile names in
>>>> the srv.dir directory vs what is in the metadata table.  Are the existing
>>>> files after files in the metadata.  Finally, pick a few of the missing
>>>> files and grep all your master and tserver logs to see if you can learn
>>>> anything.  This sounds ungood.
>>>>
>>>> Mike
>>>>
>>>> On Fri, May 11, 2018 at 6:06 PM Christopher <ctubb...@apache.org>
>>>> wrote:
>>>>
>>>>> This is strange. I've only ever seen this when HDFS has reported
>>>>> problems, such as missing blocks, or another obvious failure. What is your
>>>>> durability settings (were WALs turned on)?
>>>>>
>>>>> On Fri, May 11, 2018 at 12:45 PM Adam J. Shook <adamjsh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> On one of our clusters, there are a good number of missing RFiles
>>>>>> from HDFS, however HDFS is not/has not reported any missing blocks.  We
>>>>>> were experiencing issues with HDFS; some flapping DataNode processes that
>>>>>> needed more heap.
>>>>>>
>>>>>> I don't anticipate I can do much besides create a bunch of empty
>>>>>> RFiles (open to suggestions).  My question is, Is it possible that 
>>>>>> Accumulo
>>>>>> could have written the metadata for these RFiles but failed to write it 
>>>>>> to
>>>>>> HDFS?  In which case it would have been re-tried later and the data was
>>>>>> persisted to a different RFile?  Or is it an 'RFile is in Accumulo 
>>>>>> metadata
>>>>>> if and only if it is in HDFS' situation?
>>>>>>
>>>>>> Accumulo 1.8.1 on HDFS 2.6.0.
>>>>>>
>>>>>> Thank you,
>>>>>> --Adam
>>>>>>
>>>>>
>>


Re: Question on missing RFiles

2018-05-12 Thread Adam J. Shook
WALs are turned on.  Durability is set to flush for all tables except for
root and metadata which are sync.  The current rfile names on HDFS and in
the metadata table are greater than the files that are missing.   Searched
through all of our current and historical logs in Splunk (which are only
INFO level or higher).  Issues from the logs:

* Problem reports saying the files are not found
* IllegalStateException saying the rfile is closed when it tried to load
the Bloom filter (likely the flappy DataNode)
* IOException when reading the file saying Stream is closed (likely the
flappy DataNode)

Nothing in the GC logs -- all the above errors are in the tablet server
logs.  The logs may have rolled over, though, and our debug logs don't make
it into Splunk.

--Adam

On Fri, May 11, 2018 at 6:16 PM, Christopher <ctubb...@apache.org> wrote:

> Oh, it occurs to me that this may be related to the WAL bugs that Keith
> fixed for 1.9.1... which could affect the metadata table recovery after a
> failure.
>
> On Fri, May 11, 2018 at 6:11 PM Michael Wall <mjw...@gmail.com> wrote:
>
>> Adam,
>>
>> Do you have GC logs?  Can you see if those missing RFiles were removed by
>> the GC process?  That could indicate you somehow got old metadata info
>> replayed.  Also, the rfiles increment so compare the current rfile names in
>> the srv.dir directory vs what is in the metadata table.  Are the existing
>> files after files in the metadata.  Finally, pick a few of the missing
>> files and grep all your master and tserver logs to see if you can learn
>> anything.  This sounds ungood.
>>
>> Mike
>>
>> On Fri, May 11, 2018 at 6:06 PM Christopher <ctubb...@apache.org> wrote:
>>
>>> This is strange. I've only ever seen this when HDFS has reported
>>> problems, such as missing blocks, or another obvious failure. What is your
>>> durability settings (were WALs turned on)?
>>>
>>> On Fri, May 11, 2018 at 12:45 PM Adam J. Shook <adamjsh...@gmail.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> On one of our clusters, there are a good number of missing RFiles from
>>>> HDFS, however HDFS is not/has not reported any missing blocks.  We were
>>>> experiencing issues with HDFS; some flapping DataNode processes that needed
>>>> more heap.
>>>>
>>>> I don't anticipate I can do much besides create a bunch of empty RFiles
>>>> (open to suggestions).  My question is, Is it possible that Accumulo could
>>>> have written the metadata for these RFiles but failed to write it to HDFS?
>>>> In which case it would have been re-tried later and the data was persisted
>>>> to a different RFile?  Or is it an 'RFile is in Accumulo metadata if and
>>>> only if it is in HDFS' situation?
>>>>
>>>> Accumulo 1.8.1 on HDFS 2.6.0.
>>>>
>>>> Thank you,
>>>> --Adam
>>>>
>>>


Question on missing RFiles

2018-05-11 Thread Adam J. Shook
Hello all,

On one of our clusters, there are a good number of missing RFiles from
HDFS, however HDFS is not/has not reported any missing blocks.  We were
experiencing issues with HDFS; some flapping DataNode processes that needed
more heap.

I don't anticipate I can do much besides create a bunch of empty RFiles
(open to suggestions).  My question is, Is it possible that Accumulo could
have written the metadata for these RFiles but failed to write it to HDFS?
In which case it would have been re-tried later and the data was persisted
to a different RFile?  Or is it an 'RFile is in Accumulo metadata if and
only if it is in HDFS' situation?

Accumulo 1.8.1 on HDFS 2.6.0.

Thank you,
--Adam


Re: Question on how Accumulo binds to Hadoop

2018-01-31 Thread Adam J. Shook
Yes, it does use RPC to talk to HDFS.  You will need to update the value of
instance.volumes in accumulo-site.xml to reference this address,
haz0-m:8020, instead of the default localhost:9000.

--Adam

On Wed, Jan 31, 2018 at 4:45 PM, Geoffry Roberts 
wrote:

> I have a situation where Accumulo cannot find Hadoop.
>
> Hadoop is running and I can access hdfs from the cli.
> Zookeeper also says it is ok and I can log in using the client.
> Accumulo init is failing with a connection refused for localhost:9000.
>
> netstat shows nothing listening on 9000.
>
> Now the plot thickens...
>
> The Hadoop I am running is Google's Dataproc and the Hadoop installation
> is not my own.  I have already found a number of differences.
>
> Here's my question:  Does Accumulo use RPC to talk to Hadoop? I ask
> because of things like this:
>
> From hfs-site.xml
>
>   
>
> dfs.namenode.rpc-address
>
> haz0-m:8020
>
> 
>
>   RPC address that handles all clients requests. If empty then we'll
> get
>
>   thevalue from fs.default.name.The value of this property will take
> the
>
>   form of hdfs://nn-host1:rpc-port.
>
> 
>
>   
>
> Or does it use something else?
>
> Thanks
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>


Re: Large number of used ports from tserver

2018-01-26 Thread Adam J. Shook
I checked all tablet servers across all six of our environments and it
seems to be present in all of them, with some having upwards of 73k
connections.

I disabled replication in our dev cluster and restarted the tablet
servers.  Left it running overnight and checked the connections -- a
reasonable number in the single or double digits.  Enabling replication
lead to a quick climb in the CLOSE_WAIT connections to a couple thousand,
leading me to think it is some lingering connection reading a WAL file from
HDFS.

I've opened ACCUMULO-4787
<https://issues.apache.org/jira/browse/ACCUMULO-4787> to track this and we
can move discussion over there.

--Adam

On Thu, Jan 25, 2018 at 12:23 PM, Christopher <ctubb...@apache.org> wrote:

> Interesting. It's possible we're mishandling an IOException from DFSClient
> or something... but it's also possible there's a bug in DFSClient
> somewhere. I found a few similar issues from the past... some might still
> be not fully resolved:
>
> https://issues.apache.org/jira/browse/HDFS-1836
> https://issues.apache.org/jira/browse/HDFS-2028
> https://issues.apache.org/jira/browse/HDFS-6973
> https://issues.apache.org/jira/browse/HBASE-9393
>
> The HBASE issue is interesting, because it indicates a new HDFS feature in
> 2.6.4 to clear readahead buffers/sockets (https://issues.apache.org/
> jira/browse/HDFS-7694). That might be a feature we're not yet utilizing,
> but it would only work on a newer version of HDFS.
>
> I would probably also try to grab some jstacks of the tserver, to try to
> figure out what HDFS client code paths are being taken to see where the
> leak might be occurring. Also, if you have any debug logs for the tserver,
> that might help. There might be some DEBUG or WARN items that indicate
> retries or other failures failures that are occurring, but perhaps handled
> improperly.
>
> It's probably less likely, but it could also be a Java or Linux issue. I
> wouldn't even know where to begin debugging at that level, though, other
> than to check for OS updates.  What JVM are you running?
>
> It's possible it's not a leak... and these are just getting cleaned up too
> slowly. That might be something that can be tuned with sysctl.
>
> On Thu, Jan 25, 2018 at 11:27 AM Adam J. Shook <adamjsh...@gmail.com>
> wrote:
>
>> We're running Ubuntu 14.04, HDFS 2.6.0, ZooKeeper 3.4.6, and Accumulo
>> 1.8.1.  I'm using `lsof -i` and grepping for the tserver PID to list all
>> the connections.  Just now there are ~25k connections for this one tserver,
>> of which 99.9% of them are all writing to various DataNodes on port 50010.
>> It's split about 50/50 for connections that are CLOSED_WAIT and ones that
>> are ESTABLISHED.  No special RPC configuration.
>>
>> On Wed, Jan 24, 2018 at 7:53 PM, Josh Elser <josh.el...@gmail.com> wrote:
>>
>>> +1 to looking at the remote end of the socket and see where they're
>>> going/coming to/from. I've seen a few HDFS JIRA issues filed about sockets
>>> left in CLOSED_WAIT.
>>>
>>> Lucky you, this is a fun Linux rabbit hole to go down :)
>>>
>>> (https://blog.cloudflare.com/this-is-strictly-a-violation-
>>> of-the-tcp-specification/ covers some of the technical details)
>>>
>>> On 1/24/18 6:37 PM, Christopher wrote:
>>>
>>>> I haven't seen that, but I'm curious what OS, Hadoop, ZooKeeper, and
>>>> Accumulo version you're running. I'm assuming you verified that it was the
>>>> TabletServer process holding these TCP sockets open using `netstat -p` and
>>>> cross-referencing the PID with `jps -ml` (or similar)? Are you able to
>>>> confirm based on the port number that these were Thrift connections or
>>>> could they be ZooKeeper or Hadoop connections? Do you have any special
>>>> non-default Accumulo RPC configuration (SSL or SASL)?
>>>>
>>>> On Wed, Jan 24, 2018 at 3:46 PM Adam J. Shook <adamjsh...@gmail.com
>>>> <mailto:adamjsh...@gmail.com>> wrote:
>>>>
>>>> Hello all,
>>>>
>>>> Has anyone come across an issue with a TabletServer occupying a
>>>> large number of ports in a CLOSED_WAIT state?  'Normal' number of
>>>> used ports on a 12-node cluster are around 12,000 to 20,000 ports.
>>>>In one instance, there were over 68k and it was affecting other
>>>> applications from getting a free port and they would fail to start
>>>> (which is how we found this in the first place).
>>>>
>>>> Thank you,
>>>> --Adam
>>>>
>>>>
>>


Re: Large number of used ports from tserver

2018-01-25 Thread Adam J. Shook
We're running Ubuntu 14.04, HDFS 2.6.0, ZooKeeper 3.4.6, and Accumulo
1.8.1.  I'm using `lsof -i` and grepping for the tserver PID to list all
the connections.  Just now there are ~25k connections for this one tserver,
of which 99.9% of them are all writing to various DataNodes on port 50010.
It's split about 50/50 for connections that are CLOSED_WAIT and ones that
are ESTABLISHED.  No special RPC configuration.

On Wed, Jan 24, 2018 at 7:53 PM, Josh Elser <josh.el...@gmail.com> wrote:

> +1 to looking at the remote end of the socket and see where they're
> going/coming to/from. I've seen a few HDFS JIRA issues filed about sockets
> left in CLOSED_WAIT.
>
> Lucky you, this is a fun Linux rabbit hole to go down :)
>
> (https://blog.cloudflare.com/this-is-strictly-a-violation-of
> -the-tcp-specification/ covers some of the technical details)
>
> On 1/24/18 6:37 PM, Christopher wrote:
>
>> I haven't seen that, but I'm curious what OS, Hadoop, ZooKeeper, and
>> Accumulo version you're running. I'm assuming you verified that it was the
>> TabletServer process holding these TCP sockets open using `netstat -p` and
>> cross-referencing the PID with `jps -ml` (or similar)? Are you able to
>> confirm based on the port number that these were Thrift connections or
>> could they be ZooKeeper or Hadoop connections? Do you have any special
>> non-default Accumulo RPC configuration (SSL or SASL)?
>>
>> On Wed, Jan 24, 2018 at 3:46 PM Adam J. Shook <adamjsh...@gmail.com
>> <mailto:adamjsh...@gmail.com>> wrote:
>>
>> Hello all,
>>
>> Has anyone come across an issue with a TabletServer occupying a
>> large number of ports in a CLOSED_WAIT state?  'Normal' number of
>> used ports on a 12-node cluster are around 12,000 to 20,000 ports.
>>  In one instance, there were over 68k and it was affecting other
>> applications from getting a free port and they would fail to start
>> (which is how we found this in the first place).
>>
>> Thank you,
>> --Adam
>>
>>


Large number of used ports from tserver

2018-01-24 Thread Adam J. Shook
Hello all,

Has anyone come across an issue with a TabletServer occupying a large
number of ports in a CLOSED_WAIT state?  'Normal' number of used ports on a
12-node cluster are around 12,000 to 20,000 ports.  In one instance, there
were over 68k and it was affecting other applications from getting a free
port and they would fail to start (which is how we found this in the first
place).

Thank you,
--Adam


log4j SocketNode error on upgrading to 1.8.1

2017-10-24 Thread Adam J. Shook
Anyone run into the below error?  We're upgrading from 1.7.3 to 1.8.1 on
Hadoop 2.6.0, ZooKeeper 3.4.6, and JDK 8u121.  The monitor is continuously
complaining about the log4j socket appender and it eventually crashes.  It
is accepting connections, but they are promptly closed with the
EOFException.  I am guessing a classpath issue, but wondering if this is
known before I find myself down a rabbit hole.  I'm unable to reproduce it
locally, but still trying...

Thanks,
--Adam

2017-10-24 14:13:53,495 [net.SocketNode] ERROR: Could not open
ObjectInputStream to Socket[addr=/x.x.x.x,port=50823,localport=31379]
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(
ObjectInputStream.java:2624)
at java.io.ObjectInputStream$BlockDataInputStream.
readShort(ObjectInputStream.java:3099)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
at java.io.ObjectInputStream.(ObjectInputStream.java:349)
at org.apache.log4j.net.SocketNode.(SocketNode.java:56)
at org.apache.accumulo.server.monitor.LogService$
SocketServer.run(LogService.java:76)
at java.lang.Thread.run(Thread.java:745)


IPv6-only hosts for MAC

2017-08-29 Thread Adam J. Shook
Howdy folks,

Anyone have any experience running Accumulo on IPv6-only hosts?
Specifically the MiniAccumloCluster?

There is an open issue in the Presto-Accumulo connector (see [1] and [2])
saying the MAC doesn't work in an IPv6-only environment, and the PR comment
thread has some suggestions to change the JVM arguments within the server
and client code to prefer IPv6 addresses.

>From a brief look at the Accumulo source code, this might require changes
to make MAC's JVM arguments configurable, changes to the client code, or a
different approach to testing the Presto/Accumulo connector all together.

Any pointers in the right direction would be appreciated.  Looking to get a
heading before I dig myself into a hole on this one.

[1] Issue: https://github.com/prestodb/presto/issues/8789
[2] PR and comment thread: https://github.com/prestodb/presto/pull/8869

Thanks,
--Adam


Skip trash on delete

2017-08-09 Thread Adam J. Shook
Hello all,

Has there ever been discussion of having the Garbage Collector skip the
HDFS trash when deleting WALs and old RFiles as a configurable feature
(assuming it isn't already -- I couldn't find it)?  Outside of the risks
involved in having the files immediately deleted, what'd be the negatives
of supporting this kind of feature?  Is this something we'd be interested
in supporting?

Thanks,
--Adam


Re: Missing replication metadata

2017-07-24 Thread Adam J. Shook
Thanks, Josh.  As this is our stage cluster, we aren't too worried about
the missing data; I just want to clean up the metadata so the queue looks
better.  I'll take the back-fill approach and see how that goes.

--Adam

On Mon, Jul 24, 2017 at 1:55 PM, Josh Elser <josh.el...@gmail.com> wrote:

>
>
> On 7/24/17 1:44 PM, Adam J. Shook wrote:
>
>> We had some corrupt WAL blocks on our stage environment the other day and
>> opted to delete them.  We not have some missing metadata and about 3k files
>> pending for replication.  I've dug into it a bit and noticed that many of
>> the WALs in the `order` queue of the replication table A) no longer exist
>> in HDFS and B) have no entries in the `repl` section of the replication
>> table.
>>
>> Based on the code, if there are no entries in the `repl` section, then
>> the work will never be queued for completion via ZooKeeper and therefore
>> never finished -- does this make sense?
>>
>
> Yeah, that sounds about right. I'm lamenting that I never wrote up docs
> for the user-manual to cover the table-schema. I should ... do that...
>
> I think the order entry is created when the repl entry is. Would have to
> dig back into code though.
>
>   What'd be the suggestion here
>
>> to proceed?  I'm thinking a one-off tool to backfill the `repl` section
>> should do the trick, but I am wondering if this is something that should be
>> changed in Accumulo?
>>
>
> A tool to back-fill makes sense to me. I'm not sure what we could do in
> Accumulo automatically. Any time there is data-loss (data gone missing or
> old data coming back), Accumulo really can't do anything on its own. As you
> described in your scenario, you made the conscious decision to nuke the
> files with missing blocks. However, providing tools to handle "common"
> failure scenarios outside of our purview sounds like a good idea.
>
> Improving our docs around how to "re-sync" two tables being replicated
> would also be great. We have the hammer via snapshot+export, just need to
> be clear with the instructions.
>
> Cheers,
>> --Adam
>>
>


Missing replication metadata

2017-07-24 Thread Adam J. Shook
We had some corrupt WAL blocks on our stage environment the other day and
opted to delete them.  We not have some missing metadata and about 3k files
pending for replication.  I've dug into it a bit and noticed that many of
the WALs in the `order` queue of the replication table A) no longer exist
in HDFS and B) have no entries in the `repl` section of the replication
table.

Based on the code, if there are no entries in the `repl` section, then the
work will never be queued for completion via ZooKeeper and therefore never
finished -- does this make sense?  What'd be the suggestion here to
proceed?  I'm thinking a one-off tool to backfill the `repl` section should
do the trick, but I am wondering if this is something that should be
changed in Accumulo?

Cheers,
--Adam


Re: Status record lacked createdTime

2017-02-17 Thread Adam J. Shook
No exception on write -- this is coming from the master when it goes to
assign work to the accumulo.replication table.  Some of the WALs are fairly
old.

Not too sure why it didn't get the attribute; my guess is server failure
before it was able to append the created time.  Some of the WAL files are
empty, others have data in them.  I think a tool will suffice for now when
the issue crops up, but it'll need to get fixed in the Master/GC so that,
after some condition, it will assign it a createdTime so replication will
occur -- or whenever the first metadata entry is added, give it a
createdTime.

--Adam

On Fri, Feb 17, 2017 at 1:40 PM, Josh Elser <josh.el...@gmail.com> wrote:

> Hey Adam,
>
> Thanks for sharing this one.
>
> Adam J. Shook wrote:
>
>> Hello folks,
>>
>> One of our clusters has been throwing a handful of replication errors
>> from the status maker -- see below.  The WAL files in question to not
>> belong to an active tserver -- some investigation in the code shows that
>> the createdTime could not be written and these WALs will sit here until
>> a created time is added.
>>
>
> Does that mean that you saw an exception when the mutation written to
> accumulo.metadata that had the createTime failed? Or is the cause of why
> that WAL didn't get this 'attribute' still unknown?
>
> I think the kind of fix to make it dependent on the cause here. e.g. if
> this is just a bug, a standalone tool to fix this case would be good.
> However, if there's an inherent issue where this case might happen and we
> can't guarantee the record was written (server failure), it might be best
> to add some process to the master/gc to eventually add one (e.g. if we see
> the wal has been hanging out in the state, add a createdTime after ~12hrs)
>
>
> I wanted to bring some attention to this -- I think my immediate course
>> of action here is to manually add a createdTime so the files will be
>> replicated, then address this within the Accumulo source code itself.
>> Thoughts?
>>
>> Status record ([begin: 0 end: 0 infiniteEnd: true closed:true]) for
>> hdfs://foo:8020/accumulo/wal/blah/blah in table k was written to
>> metadata table which lacked createtime
>>
>> Thank you,
>> --Adam
>>
>


Re: Improving Accumulo Replication Latency

2017-02-15 Thread Adam J. Shook
Thanks, Josh.  I think the main pain-point is that replication doesn't
occur until the WAL is closed.  We've made some aggressive configuration
changes to Accumulo to reduce the WAL time rollover and minor compaction
frequency to force replication to go faster.  It is down to around 20
minutes or so on our production clusters, but we are kind of at our limit
-- Accumulo is spending a lot more time doing bookkeeping tasks and it is
starting to affect our query performance.

My initial thoughts are to increase the replication parallelism and start
replicating the WAL before it is closed (I see a few JIRAs open already
that mention these things), but I haven't done enough digging in the code
base to see what is really available.

Are you free for a bit in the near future to meet up for a bit and talk
replication?  I'll buy lunch!

Cheers,
--Adam

On Wed, Feb 15, 2017 at 2:52 PM, Josh Elser <josh.el...@gmail.com> wrote:

> Hi Adam,
>
> I'm not presently working on anything (too many irons in other fires), but
> I'd be happy to help work through a design doc for improvements.
>
> Do you have a list of pain-points which are the primary causes of latency?
> That would help in identifying the changes to make and how best to
> implement them.
>
> - Josh
>
>
> Adam J. Shook wrote:
>
>> I'm currently scoping what it would take to improve the latency in the
>> replication feature of Accumulo.  I'm interested in knowing what work,
>> if any, is being done to improve replication latency?  If work is being
>> done, would there be some interest in collaborating on that effort?
>>
>> If nothing is currently being planned, I'd be interested in design ideas
>> and pointers from the community for improvements to the existing
>> implementation.  We're looking to get replication down to less than five
>> minutes and are willing to put in the effort to implement the
>> improvements.
>>
>> Thank you for your time!
>>
>> Cheers,
>> --Adam
>>
>


Improving Accumulo Replication Latency

2017-02-15 Thread Adam J. Shook
I'm currently scoping what it would take to improve the latency in the
replication feature of Accumulo.  I'm interested in knowing what work, if
any, is being done to improve replication latency?  If work is being done,
would there be some interest in collaborating on that effort?

If nothing is currently being planned, I'd be interested in design ideas
and pointers from the community for improvements to the existing
implementation.  We're looking to get replication down to less than five
minutes and are willing to put in the effort to implement the improvements.

Thank you for your time!

Cheers,
--Adam


Re: Accumulo Seek performance

2016-09-12 Thread Adam J. Shook
As an aside, this is actually pretty relevant to the work I've been doing
for Presto/Accumulo integration.  It isn't uncommon to have around a
million exact Ranges (that is, Ranges with a single row ID)  spread across
the five Presto worker nodes we use for scanning Accumulo.  Right now,
these ranges get packed into PrestoSplits, 10k ranges per split (an
arbitrary number I chose), and each split is run in parallel (depending on
the overall number of splits, they may be queued for execution).

I'm curious to see the query impact of changing it to use a fixed thread
pool of Scanners over the current BatchScanner implementation.  Maybe I'll
play around with it sometime soon.

--Adam

On Mon, Sep 12, 2016 at 2:47 PM, Dan Blum  wrote:

> I think the 450 ranges returned a total of about 7.5M entries, but the
> ranges were in fact quite small relative to the size of the table.
>
> -Original Message-
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Monday, September 12, 2016 2:43 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> What does a "large scan" mean here, Dan?
>
> Sven's original problem statement was running many small/pointed Ranges
> (e.g. point lookups). My observation was that BatchScanners were slower
> than running each in a Scanner when using multiple BS's concurrently.
>
> Dan Blum wrote:
> > I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using
> Scanners was much slower than using a BatchScanner with 11 threads, by
> about a 5:1 ratio. There were 450 ranges.
> >
> > -Original Message-
> > From: Josh Elser [mailto:josh.el...@gmail.com]
> > Sent: Monday, September 12, 2016 1:42 PM
> > To: user@accumulo.apache.org
> > Subject: Re: Accumulo Seek performance
> >
> > I had increased the readahead threed pool to 32 (from 16). I had also
> > increased the minimum thread pool size from 20 to 40. I had 10 tablets
> > with the data block cache turned on (probably only 256M tho).
> >
> > Each tablet had a single file (manually compacted). Did not observe
> > cache rates.
> >
> > I've been working through this with Keith on IRC this morning too. Found
> > that a single batchscanner (one partition) is faster than the Scanner.
> > Two partitions and things started to slow down.
> >
> > Two interesting points to still pursue, IMO:
> >
> > 1. I saw that the tserver-side logging for MultiScanSess was near
> > identical to the BatchScanner timings
> > 2. The minimum server threads did not seem to be taking effect. Despite
> > having the value set to 64, I only saw a few ClientPool threads in a
> > jstack after running the test.
> >
> > Adam Fuchs wrote:
> >> Sorry, Monday morning poor reading skills, I guess. :)
> >>
> >> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
> >> experience HDFS seeks tend to take something like 10-100ms, and I would
> >> expect that time to dominate here. With 60 client threads your
> >> bottleneck should be the readahead pool, which I believe defaults to 16
> >> threads. If you get perfect index caching then you should be seeing
> >> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
> >> it assumes no data cache hits. Do you have any idea of how many files
> >> you had per tablet after the ingest? Do you know what your cache hit
> >> rate was?
> >>
> >> Adam
> >>
> >>
> >> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser >> >  wrote:
> >>
> >>  5 iterations, figured that would be apparent from the log messages
> :)
> >>
> >>  The code is already posted in my original message.
> >>
> >>  Adam Fuchs wrote:
> >>
> >>  Josh,
> >>
> >>  Two questions:
> >>
> >>  1. How many iterations did you do? I would like to see an
> absolute
> >>  number of lookups per second to compare against other
> observations.
> >>
> >>  2. Can you post your code somewhere so I can run it?
> >>
> >>  Thanks,
> >>  Adam
> >>
> >>
> >>  On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
> >>  
> >>  >>
> wrote:
> >>
> >>   Sven, et al:
> >>
> >>   So, it would appear that I have been able to reproduce
> this one
> >>   (better late than never, I guess...). tl;dr Serially using
> >>  Scanners
> >>   to do point lookups instead of a BatchScanner is ~20x
> >>  faster. This
> >>   sounds like a pretty serious performance issue to me.
> >>
> >>   Here's a general outline for what I did.
> >>
> >>   * Accumulo 1.8.0
> >>   * Created a table with 1M rows, each row with 10 columns
> >>  using YCSB
> >>   (workloada)
> >>   * Split the table into 9 tablets
> >>   * Computed the set of all rows in the table
> >>

Re: Map Lexicoder

2015-12-29 Thread Adam J. Shook
Agreed, I came to the same conclusion while implementing.  The final result
that I have is a SortedMapLexicoder to avoid any comparisons going
haywire.  Additionally, would it be best to encode the map as an array of
keys followed by an array of values, or encode all key value pairs
back-to-back:

{ a : 1 , b : 2, c : 3 } encoded as

a1b2c3
-or-
abc123

Feels like I should be encoding a list of keys, then the list of values,
and then concatenating these two encoded byte arrays.  I think the end
solution will be to support both?  I'm having a hard time reconciling which
method is better, if any.  Hard to find some good examples of people who
are sorting a list of maps.

On Tue, Dec 29, 2015 at 2:47 PM, Keith Turner <ke...@deenlo.com> wrote:

>
>
> On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook <adamjsh...@gmail.com>
> wrote:
>
>> Hello all,
>>
>> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
>> currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
>> lexicoder), which is working for single maps.  However, when one of the
>> lexicoders in the Pair is itself a Map (and therefore another
>> ListLexicoder(PairLexicoder)), an exception is being thrown because
>> ArrayList is not Comparable.
>>
>
>
> Since maps do not have a well defined order of keys and values, comparison
> is tricky.   The purpose of Lexicoders is encode things in such a way that
> the lexicographical comparison of the serialized data is possible.  With a
> hashmap if I add the same data in the same order to two different hash map
> instances, its possible that when iterating over those maps I could see the
> data in different orders.   This could lead to two maps constructed in the
> same way at different times (like different JVMs with different
> implementations of HashMap) generating different data that compare as
> different.  Ideally comparison of the two would yield equality.
>
> Something like LinkedHashMap does not have this problem for the same
> insertion order.  If you want things to be comparable regardless of
> insertion order (which I think is more intuitive), then SortedMap seems
> like it would be a good candidate.  So maybe a SortedMapLexicoder would be
> a better thing to offer?
>
>
>> Regards,
>> --Adam
>>
>
>


Re: Map Lexicoder

2015-12-28 Thread Adam J. Shook
Hi Josh,

Thanks for the advice.  I'm with you on using the CQ and Value instead of
putting the whole map into a Value, but what I am working on is using the
relational model of mapping data to Accumulo and expects the value of the
cell to be in the Value.  Certainly some optimization opportunities by
using the 'better' ways for storing data in Accumulo, but I'd like to get
this working before diving into that rabbit hole.

A brief look at the ListLexicoder encodes each element of the list using a
sub-lexicoder and escapes each element (0x00 -> 0x01 0x01 and 0x01 -> 0x01
0x02).  The voodoo here escapes me a little (pun!), but it seems to be
enough to enable multi-dimensional arrays encoded by nesting ListLexicoders
(up to 4D, haven't tried a fifth dimension).  I would expect something
similar could be done using a Map.  Would a MapLexicoder be something worth
contributing to the project?  I'd be happy to give it a stab.

--Adam

On Mon, Dec 28, 2015 at 12:21 PM, Josh Elser <josh.el...@gmail.com> wrote:

> Looks like you would have to implement some kind of ComparableMap to be
> able to use the PairLexicoder (see that the parameterization requires both
> types in the Pair to implement Comparable). The Pair lexicoder requires
> these Comparable types to align itself with the original goal of the
> Lexicoders: provide byte-array serialization for types whose sort order
> matches the original object's ordering.
>
> Typically, when we have key to value style data we want to put in
> Accumulo, it makes sense to leverage the Column Qualifier and the Value,
> instead of serializing everything into one Accumulo Value. Iterators make
> it easy to do server-side predicates and transformations. My hunch is that
> this is another reason why you don't already see a MapLexicoder provided.
>
> One technical difficulty you might run into implementing a generalized
> MapLexicoder is how you delimit the key and value in one pair and how you
> delimit many pairs from each other. Commonly, the "null" byte (\x00) is
> used as a separator since it doesn't often appear in user-data. I'm not
> sure if some of the other Lexicoders already use this in their
> serialization (e.g. the ListLexicoder might, I haven't looked at the code).
> Nesting Lexicoders generically might be tricky (although not impossible) --
> thought it was worth mentioning to make sure you thought about it.
>
>
> Adam J. Shook wrote:
>
>> Hello all,
>>
>> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
>> currently using a new ListLexicoder(new PairLexicoder(some lexicoder,
>> some lexicoder), which is working for single maps.  However, when one of
>> the lexicoders in the Pair is itself a Map (and therefore another
>> ListLexicoder(PairLexicoder)), an exception is being thrown because
>> ArrayList is not Comparable.
>>
>> Regards,
>> --Adam
>>
>


Map Lexicoder

2015-12-28 Thread Adam J. Shook
Hello all,

Any suggestions for using a Map Lexicoder (or implementing one)?  I am
currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
lexicoder), which is working for single maps.  However, when one of the
lexicoders in the Pair is itself a Map (and therefore another
ListLexicoder(PairLexicoder)), an exception is being thrown because
ArrayList is not Comparable.

Regards,
--Adam


Midpoint between two splits

2015-12-17 Thread Adam J. Shook
Hello all,

I've got an odd use case that requires me to calculate the midpoint between
two Accumulo splits.  I've been searching through the Accumulo source code
for a little bit trying to find where Accumulo automatically calculates a
new split.  I am assuming that the new split point is somewhere around the
midpoint of two existing splits, and I was hoping to just take that instead
of coding it up myself.

Is this assumption correct?  Could someone point it out to me so I can
borrow it?  Or maybe there is some API call to calculate the midpoint?

Thank you,
--Adam


Re: Midpoint between two splits

2015-12-17 Thread Adam J. Shook
Thank you, Eric!  Making this public may be helpful to others down the
road.  Unsure how many requests are out there for this kind of
functionality being public.

On Thu, Dec 17, 2015 at 3:58 PM, Eric Newton <eric.new...@gmail.com> wrote:

> You'll want to look
> at org.apache.accumulo.server.util.FileUtil#findMidPoint. Note that it
> isn't in the public API, and can change in the future.
>
> -Eric
>
>
> On Thu, Dec 17, 2015 at 1:17 PM, Adam J. Shook <adamjsh...@gmail.com>
> wrote:
>
>> Hello all,
>>
>> I've got an odd use case that requires me to calculate the midpoint
>> between two Accumulo splits.  I've been searching through the Accumulo
>> source code for a little bit trying to find where Accumulo automatically
>> calculates a new split.  I am assuming that the new split point is
>> somewhere around the midpoint of two existing splits, and I was hoping to
>> just take that instead of coding it up myself.
>>
>> Is this assumption correct?  Could someone point it out to me so I can
>> borrow it?  Or maybe there is some API call to calculate the midpoint?
>>
>> Thank you,
>> --Adam
>>
>
>