Re: What is the Communication and Time Complexity for Bulk Inserts?

2012-10-24 Thread Jeff Kubina
is eventually dominated by compactions, which is a merge sort, or O(n log n). -Eric On Thu, Oct 18, 2012 at 11:37 AM, Jeff Kubina jeff.kub...@gmail.com wrote: BatchWriter, but I would be interested in the answer assuming a pre-sorted rfile. On Thu, Oct 18, 2012 at 11:20 AM, Josh Elser

question about the values in the Ingest column on the monitor page

2013-04-09 Thread Jeff Kubina
On the Accumulo monitor page (http://accumulo.apache.org/screenshots.html) the Ingest column contains values that are some rate of ingest of rows, if so, what is the time period that the rate is measured over?

Is there a method in the accumulo api to get the total bytes used and/or total key/value pairs for each tablet?

2013-04-11 Thread Jeff Kubina
Is there a method in the accumulo api to get the total bytes used and/or total key/value pairs for each tablet? I believe I can get the total bytes used per tablet using HDFS file size calls on the tables directory, but what about the total key/value pairs for each tablet?

Re: Is there a method in the accumulo api to get the total bytes used and/or total key/value pairs for each tablet?

2013-04-11 Thread Jeff Kubina
So how would I get the info from the master, or could you point me to the monitor code? Thanks. -- Jeff Kubina 410-988-4436 On Thu, Apr 11, 2013 at 11:06 AM, Keith Turner ke...@deenlo.com wrote: On Thu, Apr 11, 2013 at 7:15 AM, Jeff Kubina jeff.kub...@gmail.com wrote: Is there a method

How to efficiently find lexicographically adjacent records?

2013-08-07 Thread Jeff Kubina
I have records key; value in an Accumulo table where the key is about a 50 long byte string. Given a new key k, I want to find the m records that would precede and succeed the record k;v if it were inserted into the table. Any ideas on how I can do this efficiently? The record k;v will eventually

What API call can I use to get the number of (online) tablet servers of an Accumulo instance?

2013-09-17 Thread Jeff Kubina
What API call can I use to get the number of (online) tablet servers of an Accumulo instance?

(U) I do I tell Accumulo where the jars for my custom formatters and balancers are for a specific table?

2014-08-25 Thread Jeff Kubina
(U) I do I tell Accumulo where the jars for my custom formatters and balancers are for a specific table?

Re: (U) I do I tell Accumulo where the jars for my custom formatters and balancers are for a specific table?

2014-08-25 Thread Jeff Kubina
Sorry, that was suppose to read How do I tell Accumulo where the jars containing my custom formatters and balancers are for a specific table? -- Jeff Kubina 410-988-4436 On Mon, Aug 25, 2014 at 2:03 PM, Jeff Kubina jeff.kub...@gmail.com wrote: (U) I do I tell Accumulo where the jars for my

Re: (U) I do I tell Accumulo where the jars for my custom formatters and balancers are for a specific table?

2014-08-25 Thread Jeff Kubina
to accumulo-site.xml? -- Jeff Kubina 410-988-4436 On Mon, Aug 25, 2014 at 3:20 PM, dlmar...@comcast.net wrote: Jeff, Which version of Accumulo are you using? Most of the classpath settings are in the accumulo-site.xml file. -- *From: *Jeff Kubina jeff.kub

Question about the TableInfo class

2014-12-08 Thread Jeff Kubina
In the TableInfo http://accumulo.apache.org/1.6/apidocs/index.html?org/apache/accumulo/core/master/thrift//class-useTabletServerStatus._Fields.html class is tablets http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/master/thrift/TableInfo.html#tablets the total number of tablets

Re: Question about configuring the linux niceness of tablet servers?

2015-08-17 Thread Jeff Kubina
More like a mapreduce task process. -- Jeff Kubina 410-988-4436 On Mon, Aug 17, 2015 at 5:33 PM, William Slacum wsla...@gmail.com wrote: By Hadoop do you mean a Yarn NodeManager process? On Mon, Aug 17, 2015 at 4:21 PM, Jeff Kubina jeff.kub...@gmail.com wrote: On each of the processing

Question about configuring the linux niceness of tablet servers?

2015-08-17 Thread Jeff Kubina
On each of the processing nodes in our cluster we have running 1) HDFS (datanode), 2) Accumulo (tablet server), and 3) Hadoop. Since Accumulo depends on the HDFS, and Hadoop depends on the HDFS and sometimes on Accumulo, we are considering setting the niceness of HDFS to 0 (the current value),

questions regarding accumulo tracing

2015-08-13 Thread Jeff Kubina
To collect traces, Accumulo needs at least one server listed in $ACCUMULO_HOME/conf/tracers. The server collects traces from clients and writes them to the trace table. 1. Regarding the information above about accumulo tracing, if more than one server is listed in $ACCUMULO_HOME/conf/tracers

Re: questions regarding accumulo tracing

2015-08-13 Thread Jeff Kubina
On Thu, Aug 13, 2015 at 2:52 PM, Josh Elser josh.el...@gmail.com wrote: 1. Regarding the information above about accumulo tracing, if more than one server is listed in $ACCUMULO_HOME/conf/tracers how do the clients select the trace server to send their trace data to? Tracers register

Re: How does Accumulo process a r-files for bulk ingesting?

2015-10-16 Thread Jeff Kubina
So if a table has splits 0, 1, 2, 3, 4 and I create an rfile with only splits in the range 1-2 and 3-4, after bulk ingesting will the tablet with range 2-3 also be assigned the rfile? -- Jeff Kubina 410-988-4436 On Wed, Oct 7, 2015 at 12:05 PM, Jeff Kubina <jeff.kub...@gmail.com>

How does Accumulo process a r-files for bulk ingesting?

2015-10-07 Thread Jeff Kubina
How does Accumulo process an r-file for bulk ingesting when the key range of an r-file is within one tablet's key range and when the key range of an r-file spans two or more tablets? If the r-file is within one tablet's range I thought the file was "just renamed" and added to the tablet's list of

Re: How does Accumulo process a r-files for bulk ingesting?

2015-10-07 Thread Jeff Kubina
So if the HDFS has a replication factor of m and an r-file has a range that intersects n tablets, then data-locality will never be achieved for max(0,n-m) of the r-files, that is, they will never be on the same node as their tablet server until compaction, correct? -- Jeff Kubina 410-988-4436

Re: How does Accumulo process a r-files for bulk ingesting?

2015-10-07 Thread Jeff Kubina
Okay, shall I flesh out some details of the performance testing code in this thread or a JIRA? -- Jeff Kubina 410-988-4436 On Wed, Oct 7, 2015 at 10:57 AM, Josh Elser <josh.el...@gmail.com> wrote: > Jeff Kubina wrote: > >> So has testing been done to determine how much a lac

Re: How does Accumulo process a r-files for bulk ingesting?

2015-10-07 Thread Jeff Kubina
So has testing been done to determine how much a lack of data locality of bulk ingest files effects query performance?

Re: How does Accumulo process a r-files for bulk ingesting?

2015-10-07 Thread Jeff Kubina
Moved this thread to dev@. -- Jeff Kubina 410-988-4436 On Wed, Oct 7, 2015 at 11:18 AM, Josh Elser <josh.el...@gmail.com> wrote: > I'd say email is good for discussion on design (dev@ instead of user@ > would probably be more appropriate though). JIRA works better when it

What is the optimal number of tablets for a large table?

2015-10-09 Thread Jeff Kubina
I read the following from the Accumulo manual on tablet merging : > Over time, a table can get very large, so large that it has hundreds of > thousands of split points. Once there are enough tablets to spread a table >

Re: Read and writing rfiles

2015-12-23 Thread Jeff Kubina
> Are the hadoop nodes handling your map-reduce job also running tservers? > Yes. Do the Accumulo log files show the exception? If so, can you post it? Yes, but nothing helpful to track down the cause, it was a very sparse error message. I will try to post the full error messages.

Read and writing rfiles

2015-12-23 Thread Jeff Kubina
I've have a mapreduce job that reads rfiles as Accumulo key/value pairs using FileSKVIterator within a RecordReader, partition/shuffles them based on the byte string of the key, and writes them out as new rfiles using the AccumuloFileOutputFormat. The objective is to create larger rfiles for bulk

default for tserver.total.mutation.queue.max increased from 50M to 1M in 1.7

2016-07-07 Thread Jeff Kubina
the buffer is flushed to calculate how this effects performance? -- Jeff Kubina

Re: default for tserver.total.mutation.queue.max increased from 50M to 1M in 1.7

2016-07-07 Thread Jeff Kubina
Interesting, is it only flushed when the buffer is full or is there a time limit on it also? For example, if 25M of mutations are written and no more when is the buffer flushed? -- Jeff Kubina 410-988-4436 On Thu, Jul 7, 2016 at 11:50 AM, Christopher <ctubb...@apache.org> wrote: > T

Re: Tuned performance profiles for Accumulo

2016-08-30 Thread Jeff Kubina
On Tue, Aug 30, 2016 at 3:51 PM, Christopher wrote: > Has anybody used tuned ("tune D") to manage their system performance > profiles on an Accumulo cluster? > Yes, we use it a lot. > I've recently been looking into tuned, and found it a very convenient tool > for

Re: Lost tablet server lock..SESSION_EXPIRED

2016-10-07 Thread Jeff Kubina
Noe, Do you have a lot (1000s) of "[tserver.TableServer] DEBUG: UpSess ..." messages in your tserver logs prior to the FATAL or "ERROR: Lost tablet server lock" error message? Jeff -- Jeff Kubina 410-988-4436 On Fri, Oct 7, 2016 at 10:34 AM, Noe Detore <ndet...@minerk

setting tserver configs from the accumulo shell

2016-10-04 Thread Jeff Kubina
Does changing the values of tserver configs in the accumulo shell, like "config -s tserver.server.threads.minimum=256", require a restart of all the tservers to become effective?

Re: setting tserver configs from the accumulo shell

2016-10-04 Thread Jeff Kubina
That would be very helpful, but a note in the documentation would be fine initially. Is there an easy way to determine this from the source code? -- Jeff Kubina 410-988-4436 On Tue, Oct 4, 2016 at 8:59 PM, Christopher <ctubb...@apache.org> wrote: > Some do, some don't. One thing we

Re: setting tserver configs from the accumulo shell

2016-10-04 Thread Jeff Kubina
r properties are used on > demand. Some of those on demand properties are probably cached into > internal state for indefinite periods of time. It's hard to say which are > which without investigating each property individually (or through > empirical testing). > > On Tue, Oct 4

how do I list user permissions per table

2016-09-23 Thread Jeff Kubina
>From the accumulo shell how do I list all the users who have access to a specific table?

Re: Lost tablet server lock..SESSION_EXPIRED

2016-10-14 Thread Jeff Kubina
txt> documentation with accumulo is helpful in finding latency issues too. -- Jeff Kubina 410-988-4436 On Thu, Oct 13, 2016 at 10:49 AM, Noe Detore <ndet...@minerkasch.com> wrote: > Yes, seeing a lot of DEBUG:Upsess. Also seeing > [server.GarbageCollectionLogger] > DEBUG:

Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.

2016-10-18 Thread Jeff Kubina
On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall wrote: > Take a look at the master logs for where the WAL was sorted to the > /accumulo/recovery/... > directory. Then look to see if those WALs are still around and contain > content. > Checked one of them, yes it is around with

java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.

2016-10-17 Thread Jeff Kubina
We had a lot of datanodes lock up nearly simaltanuously in our Accumulo instance. Many more of the tservers also went offline. After about two hours we were able to get all the datanodes and tservers back online with no HDFS blocks lost. However we have two tservers throwing about 70 exceptions

Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile.

2016-10-21 Thread Jeff Kubina
Mike, Yes, thanks for the help. We had to delete the recovered files generated from the WAL a few times but that worked. Then we restarted the two tablets with the TProtocolException exceptions to fix those errors. We saved off the log files for you. Jeff -- Jeff Kubina 410-988-4436 On Fri

Re: New Accumulo Blog Post

2016-11-02 Thread Jeff Kubina
Thanks for the blog post, very interesting read. Some questions ... 1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or flush tablet servers’ WAL" and "Adds mutations to sorted in memory map of each tablet." performed by threads in parallel? 2. Could the latency of hsync-ing

How can I remove or fix a hanging droptable in Accumulo?

2016-10-12 Thread Jeff Kubina
How can I remove or fix a hanging droptable in Accumulo?

Re: How can I remove or fix a hanging droptable in Accumulo?

2016-10-12 Thread Jeff Kubina
Lots of bulk ingest assignments. It's been hung for days and even returned after a reboot of the master and tservers. -- Jeff Kubina 410-988-4436 On Wed, Oct 12, 2016 at 6:28 PM, Michael Wall <mjw...@gmail.com> wrote: > Jeff, > > What is the master doing while the droptable is

Re: New Accumulo Blog Post

2016-12-20 Thread Jeff Kubina
Chris, Any status on the patch to Accumulo to allow customizing the HDFS volume on which the WALs are stored. -- Jeff Kubina 410-988-4436 On Wed, Nov 2, 2016 at 10:34 PM, Christopher <ctubb...@apache.org> wrote: > I'm aware of at least one person who has patched Accumulo

Does a hard shutdown of accumulo clear all fates in zookeeper

2017-04-05 Thread Jeff Kubina
I had three fates in the accumulo fate queue that had no tasks assigned to them and could not be deleted or failed. All create/delete/clone etc commands were timing out. I did a hard shutdown of all services. I had planned to do "accumulo org.apache.accumulo.server.fate.Admin kill " but that