Re: Write or Ingest bottleneck

2016-12-07 Thread Josh Elser

No.

TabletServers are, for the most part, independent of other 
tabletservers. The only caveat is that they will need to send updates to 
the tabletserver hosting the accumulo.metadata table, but these are very 
small in comparison to the amount of data that you are writing.


Architecturally, this is what enables Accumulo to scale near-linearly. 
[1] "The scalability is almost linear in the case of presplit tables"


- Josh

[1] https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf

hujs wrote:

If a tsever ingests slowly, will it affect other tservers ingest rate?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19385.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-12-07 Thread hujs
If a tsever ingests slowly, will it affect other tservers ingest rate?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19385.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-12-06 Thread Josh Elser



hujs wrote:

Hello, I asked a few questions,
1, suppose I insert data into the 'a' table, each tserver in the cluster has
at least one 'a' table of tablets, I use letters such as j, k as the split
point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
D ingest rate only can reach 50k, D tserver will affect the cluster ingest
performance?


I don't think I understand this. For a table, tablet ranges are 
disjoint. If you split the tables on letters (e.g. 'a', 'f', 'j'), the 
Key-Values that have a key starting with 'a' would only reside in one 
tablet and thus only on one tabletserver.



2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
do?


Remember that Accumulo is only dealing with bytes and has no context 
that, in your case, the bytes are actually stringified numbers. For 
example, to create 10 split points, it's easy: [1, 2, 3, 4, 5, 6, 7, 8, 
9]. This creates ten tablets, (-inf, 1), [1, 2), [2, 3), ... [9, +inf).


To create 20 tablets, you can do the following: [05, 1, 15, 2, 25, 3, 
35, 4, 45, 5, 55, 6, 65, 7, 75, 8, 85, 9, 95]. This would create 20 
tablets, (-inf, 05), [05, 1), [1, 15), ... [95, +inf).


You can extend this to create more split points if necessary for 
"numbers", but it also applies to alphabetical data as you described 
earlier. Another common trick is to temporarily reduce the split 
threshold for your table, ingest a corpus of data until you get a 
desired number of split points, and then copy the current split and then 
them later (the split command in the shell can read the split points, 
one per line, from a file).


Re: Write or Ingest bottleneck

2016-12-06 Thread hujs
Hello, I asked a few questions,
1, suppose I insert data into the 'a' table, each tserver in the cluster has
at least one 'a' table of tablets, I use letters such as j, k as the split
point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
D ingest rate only can reach 50k, D tserver will affect the cluster ingest
performance?
2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
do?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19368.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-12-01 Thread Josh Elser
No worries on English. It's just difficult to say "I'm not sure what you 
meant" :)


I think your expectations are wrong for your hardware. One 7200rpm SATA 
drive is not going to reach 300K entries/sec ingest rate with Accumulo 
and HDFS. For these specs, 90K entries/sec on one tserver sounds pretty 
good to me. I'd suggest that you add a few more nodes if you want to 
further increase the cluster-wide performance and take advantage of 
near-linear scalability of the system.


hujs wrote:

   I am so sorry that I did not provide you with more detailed information
and Bad English, which made it difficult for you to understand.My tServer
can only reach 900,000entry / s in very few cases. I expect my cluster
average ingest rate to reach  300,000entry / s. I wanted to say, "Do I need
to expand my cluster to meet my high ingest rate needs?" This time I list my
hardware as detailed as possible, and when I'm doing the data insertion, the
computer Of the cpu, io load is also provided.My cluster contains four
computers, the computer's CPU is 4 cores, and 8 hardware threads. The
network is an internal network with a bandwidth of 1 gb / s. Each computer
contains a hard disk, hard drive model is SATA, Capacity: 1TB, 7200 RPM,
Firmware: CC43.The following is the acquisition of cpu and disk IO  the
information when I do insert. This information should be sufficient to
indicate the status of my cluster most of the time.
   CPU information in 4 computers:
   tSever1
top - 17:42:20 up 9 days,  4:16, 10 users,  load average: 0.88, 0.40, 0.20
Tasks: 239 total,   1 running, 237 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.8 us,  1.8 sy,  0.0 ni, 94.3 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :  5.7 us,  2.7 sy,  0.0 ni, 91.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  4.6 us,  1.7 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu3  :  2.0 us,  2.7 sy,  0.0 ni, 90.9 id,  4.4 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  2.3 us,  0.7 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.3 us,  1.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.7 us,  1.0 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  24432228 total, 24246836 used,   185392 free,0 buffers
KiB Swap: 16457724 total,  2495092 used, 13962632 free.  1016352 cached Mem
tServer2
[hadoop@slave13 bin]$ top
top - 17:43:55 up 15 days,  7:42,  7 users,  load average: 0.84, 0.44, 0.20
Tasks: 218 total,   2 running, 215 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.7 us,  2.1 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  3.6 us,  1.6 sy,  0.0 ni, 94.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  :  6.3 us,  1.1 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.6 us,  0.0 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.5 us,  0.0 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32855648 total, 32619892 used,   235756 free,0 buffers
KiB Swap:  8191996 total,90824 used,  8101172 free. 10842092 cached Mem

tServer3
[hadoop@slave10 bin]$ top
top - 17:47:46 up 8 days,  8:10,  9 users,  load average: 0.50, 0.47, 0.35
Tasks: 286 total,   2 running, 283 sleeping,   1 stopped,   0 zombie
%Cpu0  : 30.4 us,  1.4 sy,  0.0 ni, 64.3 id,  1.4 wa,  0.0 hi,  2.4 si,  0.0
st
%Cpu1  :  5.5 us,  2.8 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.0 us,  3.7 sy,  0.0 ni, 89.4 id,  1.4 wa,  0.0 hi,  0.5 si,  0.0
st
%Cpu3  :  5.5 us,  1.8 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  6.9 us,  1.4 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  2.3 us,  1.4 sy,  0.0 ni, 92.2 id,  4.1 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.8 us,  1.4 sy,  0.0 ni, 92.2 id,  4.6 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  : 11.4 us,  0.5 sy,  0.0 ni, 88.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32689760 total, 32455764 used,   233996 free,0 buffers
KiB Swap: 16457724 total,  4678984 used, 11778740 free.   712908 cached Mem

Master
top - 17:41:37 up 15 days,  7:41, 14 users,  load average: 0.47, 0.68, 0.66
Tasks: 260 total,   1 running, 259 sleeping,   0 stopped,   0 zombie
%Cpu0  :  2.1 us,  1.0 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu1  : 12.0 us,  1.0 sy,  0.0 ni, 87.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.4 us,  1.0 sy,  0.0 ni, 93.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  : 12.0 us,  0.3 sy,  0.0 ni, 87.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.0 sy,  0.0 ni, 99.0 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.0 

Re: Write or Ingest bottleneck

2016-12-01 Thread hujs
  I am so sorry that I did not provide you with more detailed information
and Bad English, which made it difficult for you to understand.My tServer
can only reach 900,000entry / s in very few cases. I expect my cluster 
average ingest rate to reach  300,000entry / s. I wanted to say, "Do I need
to expand my cluster to meet my high ingest rate needs?" This time I list my
hardware as detailed as possible, and when I'm doing the data insertion, the
computer Of the cpu, io load is also provided.My cluster contains four
computers, the computer's CPU is 4 cores, and 8 hardware threads. The
network is an internal network with a bandwidth of 1 gb / s. Each computer
contains a hard disk, hard drive model is SATA, Capacity: 1TB, 7200 RPM,
Firmware: CC43.The following is the acquisition of cpu and disk IO  the
information when I do insert. This information should be sufficient to
indicate the status of my cluster most of the time. 
  CPU information in 4 computers: 
  tSever1
top - 17:42:20 up 9 days,  4:16, 10 users,  load average: 0.88, 0.40, 0.20
Tasks: 239 total,   1 running, 237 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.8 us,  1.8 sy,  0.0 ni, 94.3 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :  5.7 us,  2.7 sy,  0.0 ni, 91.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  4.6 us,  1.7 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu3  :  2.0 us,  2.7 sy,  0.0 ni, 90.9 id,  4.4 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  2.3 us,  0.7 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.3 us,  1.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.7 us,  1.0 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  24432228 total, 24246836 used,   185392 free,0 buffers
KiB Swap: 16457724 total,  2495092 used, 13962632 free.  1016352 cached Mem
tServer2
[hadoop@slave13 bin]$ top
top - 17:43:55 up 15 days,  7:42,  7 users,  load average: 0.84, 0.44, 0.20
Tasks: 218 total,   2 running, 215 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.7 us,  2.1 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  3.6 us,  1.6 sy,  0.0 ni, 94.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  :  6.3 us,  1.1 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.6 us,  0.0 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.5 us,  0.0 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32855648 total, 32619892 used,   235756 free,0 buffers
KiB Swap:  8191996 total,90824 used,  8101172 free. 10842092 cached Mem

tServer3
[hadoop@slave10 bin]$ top
top - 17:47:46 up 8 days,  8:10,  9 users,  load average: 0.50, 0.47, 0.35
Tasks: 286 total,   2 running, 283 sleeping,   1 stopped,   0 zombie
%Cpu0  : 30.4 us,  1.4 sy,  0.0 ni, 64.3 id,  1.4 wa,  0.0 hi,  2.4 si,  0.0
st
%Cpu1  :  5.5 us,  2.8 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.0 us,  3.7 sy,  0.0 ni, 89.4 id,  1.4 wa,  0.0 hi,  0.5 si,  0.0
st
%Cpu3  :  5.5 us,  1.8 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  6.9 us,  1.4 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  2.3 us,  1.4 sy,  0.0 ni, 92.2 id,  4.1 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.8 us,  1.4 sy,  0.0 ni, 92.2 id,  4.6 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  : 11.4 us,  0.5 sy,  0.0 ni, 88.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32689760 total, 32455764 used,   233996 free,0 buffers
KiB Swap: 16457724 total,  4678984 used, 11778740 free.   712908 cached Mem

Master
top - 17:41:37 up 15 days,  7:41, 14 users,  load average: 0.47, 0.68, 0.66
Tasks: 260 total,   1 running, 259 sleeping,   0 stopped,   0 zombie
%Cpu0  :  2.1 us,  1.0 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu1  : 12.0 us,  1.0 sy,  0.0 ni, 87.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.4 us,  1.0 sy,  0.0 ni, 93.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  : 12.0 us,  0.3 sy,  0.0 ni, 87.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.0 sy,  0.0 ni, 99.0 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.0 us,  0.3 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32855644 total, 30707652 used,  2147992 free,0 buffers
KiB Swap:  8191996 total,   163916 used,  8028080 free. 11263484 cached Mem
  IO information in 4 computers:
tServer1
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.00  289.0

Re: Write or Ingest bottleneck

2016-11-30 Thread Josh Elser
Each tabletserver is ingesting 90M entries/second? That sounds pretty 
darn good to me for a 3GB heap.


All Accumulo configuration should be consistent across *all* nodes 
running Accumulo processes.


I don't believe you have shared the characteristics of your hardware yet 
either. What are the available resources? How much memory, CPU, network, 
and I/O (number and types of disks)?


I also am not comprehending what you are asking with this question: "Is 
my cluster does not change the scale of the cluster, no way to do to 
upgrade it"


Let's approach this from a different angle: what rates are you 
*expecting* to see and what gives you this expectation?


hujs wrote:

 Thank you, I cluster the size of each machine-Xmx is 2g. When I
configure-Xmx = Xms = 3g, performance is not improved. I set
tserver.mutation.queue.max = 50m, 100m or 150m, the performance is not much
improvement. My table's property table.durability has always been flush.
These attributes should be installed in the master machine? My master is
designated. I run into my records, each tserver node ingest rate only
reached more than 90 million entry / s, running ingest.sh ingest rate
slightly faster then a little. Ingest rate really not very fast. Is my
cluster does not change the scale of the cluster, no way to do to upgrade
it? If so, how can I judge my cluster has not improve performance, but the
need to expand the hardware resources, what should I do, and you should have
other better suggestions, right?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19324.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-11-30 Thread hujs
Thank you, I cluster the size of each machine-Xmx is 2g. When I
configure-Xmx = Xms = 3g, performance is not improved. I set
tserver.mutation.queue.max = 50m, 100m or 150m, the performance is not much
improvement. My table's property table.durability has always been flush.
These attributes should be installed in the master machine? My master is
designated. I run into my records, each tserver node ingest rate only
reached more than 90 million entry / s, running ingest.sh ingest rate
slightly faster then a little. Ingest rate really not very fast. Is my
cluster does not change the scale of the cluster, no way to do to upgrade
it? If so, how can I judge my cluster has not improve performance, but the
need to expand the hardware resources, what should I do, and you should have
other better suggestions, right?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19324.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-11-28 Thread Josh Elser
Ok, at least one tablet per tabletserver would be good. More than one 
tablet per tabletserver is not an issue. 5-10 per tserver would be OK.


Try increasing tserver.mutation.queue.max to something like 50M or 100M. 
4M is very small.


What is -Xmx in your accumulo-env.sh for ACCUMULO_TSERVER_OPTS?

You can also try setting table.durability=flush in accumulo-site.xml.

hujs wrote:

Sorry, the post was wrong, these days the network is not good, just to see,
figure in the "accumulo balance" this post can be seen. I'm tuning the
accumulo ingest rate now. I do not know what is the reason to limit the
accumulo ingest rate. I mainly used
Here are a few ways:
   1, per-splite, I through the custom j, n points will be divided into three
tablets, each tserver one.
   2, adjust table.file.max, tserver.compaction.minor.concurrent.max and
table.durability = flush
   3, by simulating multiple clients, with multiple Bathwriter send data.
   4, the opening of the native map
   5, configure the tserver's accumuo-site.xml and accumulo-env.sh memory
settings to 2g.
   However, ingest rate did not improve. My cluster is far from reaching the
load: cpu utilization of less than 1/8, memory usage less than 1/5, IO 8-14%
wa, with start-ingest.sh test ingest rate, but the average ingest rate less
than 18 Million entry / s. My cluster each machine configuration: cpu 8,32 g
Me. I would like to know what the impact ingest rate,
How do I tune my cluster?
Using the accumuluo 1.7.1 version.The accumulo-site.xml configuration is as
follows:




   
 instance.volumes
 hdfs://master12:9000/accumulo
 comma separated list of URIs for volumes. example:
hdfs://localhost:9000/accumulo
   

   
 instance.zookeeper.host
 master12:2181,slave13:2181,slave10:2181,slave11:2181
 comma separated list of zookeeper servers
   

   
 logger.dir.walog
 walogs
 The property only needs to be set if upgrading from 1.4
which used to store write-ahead logs on the local
   filesystem. In 1.5 write-ahead logs are stored in DFS.  When 1.5 is
started for the first time it will copy any 1.4
   write ahead logs into DFS.  It is possible to specify a
comma-separated list of directories.
 
   

   
 instance.secret
 DEFAULT
 A secret unique to a given instance that all servers must
know in order to communicate with one another.
   Change it before initialization. To
   change it later use ./bin/accumulo
org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new
[newpasswd],
   and then update this file.
 
   

   
 tserver.memory.maps.max
 2G
   

   
 tserver.memory.maps.native.enabled
 true
   

   
 tserver.cache.data.size
 128M
   

   
 tserver.cache.index.size
 128M
   

   
 trace.token.property.password

 123456
   



   
 trace.user
 root
   

   
 tserver.sort.buffer.size
 500M
   

   
 tserver.walog.max.size
 2G
   


  
 tserver.wal.blocksize
 2G
   

 tserver.mutation.queue.max
 4M
   

 tserver.compaction.major.concurrent.max

 8
   

 tserver.compaction.minor.concurrent.max

 8
   

   
 general.classpaths

 

   $ACCUMULO_HOME/lib/accumulo-server.jar,
   $ACCUMULO_HOME/lib/accumulo-core.jar,
   $ACCUMULO_HOME/lib/accumulo-start.jar,
   $ACCUMULO_HOME/lib/accumulo-fate.jar,
   $ACCUMULO_HOME/lib/accumulo-proxy.jar,
   $ACCUMULO_HOME/lib/[^.].*.jar,

   $ZOOKEEPER_HOME/zookeeper[^.].*.jar,

   $HADOOP_CONF_DIR,

   $HADOOP_PREFIX/share/hadoop/common/[^.].*.jar,
   $HADOOP_PREFIX/share/hadoop/common/lib/(?!slf4j)[^.].*.jar,
   $HADOOP_PREFIX/share/hadoop/hdfs/[^.].*.jar,
   $HADOOP_PREFIX/share/hadoop/mapreduce/[^.].*.jar,
   $HADOOP_PREFIX/share/hadoop/yarn/[^.].*.jar,
   $HADOOP_PREFIX/share/hadoop/yarn/lib/jersey.*.jar,

 
 Classpaths that accumulo checks for updates and class
files.
   


Think you dear josh Elser;




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19319.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-11-28 Thread hujs
Sorry, the post was wrong, these days the network is not good, just to see,
figure in the "accumulo balance" this post can be seen. I'm tuning the
accumulo ingest rate now. I do not know what is the reason to limit the
accumulo ingest rate. I mainly used
Here are a few ways:
  1, per-splite, I through the custom j, n points will be divided into three
tablets, each tserver one.
  2, adjust table.file.max, tserver.compaction.minor.concurrent.max and
table.durability = flush
  3, by simulating multiple clients, with multiple Bathwriter send data.
  4, the opening of the native map
  5, configure the tserver's accumuo-site.xml and accumulo-env.sh memory
settings to 2g.
  However, ingest rate did not improve. My cluster is far from reaching the
load: cpu utilization of less than 1/8, memory usage less than 1/5, IO 8-14%
wa, with start-ingest.sh test ingest rate, but the average ingest rate less
than 18 Million entry / s. My cluster each machine configuration: cpu 8,32 g
Me. I would like to know what the impact ingest rate,
How do I tune my cluster?
Using the accumuluo 1.7.1 version.The accumulo-site.xml configuration is as
follows:


  

  
instance.volumes
hdfs://master12:9000/accumulo
comma separated list of URIs for volumes. example:
hdfs://localhost:9000/accumulo
  

  
instance.zookeeper.host
master12:2181,slave13:2181,slave10:2181,slave11:2181
comma separated list of zookeeper servers
  

  
logger.dir.walog
walogs
The property only needs to be set if upgrading from 1.4
which used to store write-ahead logs on the local
  filesystem. In 1.5 write-ahead logs are stored in DFS.  When 1.5 is
started for the first time it will copy any 1.4
  write ahead logs into DFS.  It is possible to specify a
comma-separated list of directories.

  

  
instance.secret
DEFAULT
A secret unique to a given instance that all servers must
know in order to communicate with one another.
  Change it before initialization. To
  change it later use ./bin/accumulo
org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new
[newpasswd],
  and then update this file.

  

  
tserver.memory.maps.max
2G
  

  
tserver.memory.maps.native.enabled
true
  

  
tserver.cache.data.size
128M
  

  
tserver.cache.index.size
128M
  

  
trace.token.property.password

123456
  



  
trace.user
root
  

  
tserver.sort.buffer.size
500M
  

  
tserver.walog.max.size
2G
  

 
 
tserver.wal.blocksize
2G
  

tserver.mutation.queue.max
4M
  

tserver.compaction.major.concurrent.max

8
  

tserver.compaction.minor.concurrent.max

8
  

  
general.classpaths


  
  $ACCUMULO_HOME/lib/accumulo-server.jar,
  $ACCUMULO_HOME/lib/accumulo-core.jar,
  $ACCUMULO_HOME/lib/accumulo-start.jar,
  $ACCUMULO_HOME/lib/accumulo-fate.jar,
  $ACCUMULO_HOME/lib/accumulo-proxy.jar,
  $ACCUMULO_HOME/lib/[^.].*.jar,
  
  $ZOOKEEPER_HOME/zookeeper[^.].*.jar,
  
  $HADOOP_CONF_DIR,
  
  $HADOOP_PREFIX/share/hadoop/common/[^.].*.jar,
  $HADOOP_PREFIX/share/hadoop/common/lib/(?!slf4j)[^.].*.jar,
  $HADOOP_PREFIX/share/hadoop/hdfs/[^.].*.jar,
  $HADOOP_PREFIX/share/hadoop/mapreduce/[^.].*.jar,
  $HADOOP_PREFIX/share/hadoop/yarn/[^.].*.jar,
  $HADOOP_PREFIX/share/hadoop/yarn/lib/jersey.*.jar,
   

Classpaths that accumulo checks for updates and class
files.
  


 
Think you dear josh Elser;




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19319.html
Sent from the Developers mailing list archive at Nabble.com.


Re: Write or Ingest bottleneck

2016-11-18 Thread Josh Elser

Without seeing your images:

Make sure that your split points actually divide up your data.

For example, if you only write data where the rowId starts with the 
letters a-z but your split the table on numbers 0-9, only one tablet 
will receive the data.


The entries in a table is an approximation. It does not account for data 
that is only resident in memory. If you issue a compaction (`compact -t 
 -w` in the Accumulo shell) after you have written all of the 
data, you will see a correct number of results in the Monitor. The 
monitor is only for informational purposes -- it often times is showing 
approximations, not guaranteed consistent results.


"pre-split table using the threshold points" <- I don't know what you 
mean by "threshold points".


Josh Elser wrote:

Apache mailing lists regularly strip attachments. Can you please host
the images elsewhere and provide links to them?

hjs19890 wrote:

Hello everyone,
I have some problems need help. see Figure 1, I have hjs_v table pre
split multiple tablets, but I insert a large amount of data into the
cluster only one intake of data, the other two free, I have done many
times Like this operation, are the same result.
See Figure 2, the three tserver nodes of the tablets unbalanced, and
save10, save11 display entries is not correct. I would like to know what
causes this kind of result, how do I deal with such a problem?
Note: pre-split table using the threshold points, My cluster is balanced
three days ago. My cluster is a master, composed of three tservers.



Re: Write or Ingest bottleneck

2016-11-18 Thread Josh Elser
Apache mailing lists regularly strip attachments. Can you please host 
the images elsewhere and provide links to them?


hjs19890 wrote:

Hello everyone,
I have some problems need help. see Figure 1, I have hjs_v table pre
split multiple tablets, but I insert a large amount of data into the
cluster only one intake of data, the other two free, I have done many
times Like this operation, are the same result.
See Figure 2, the three tserver nodes of the tablets unbalanced, and
save10, save11 display entries is not correct. I would like to know what
causes this kind of result, how do I deal with such a problem?
Note: pre-split table using the threshold points, My cluster is balanced
three days ago. My cluster is a master, composed of three tservers.



Re: Write or Ingest bottleneck

2016-11-14 Thread Josh Elser

Hi,

What do you mean by "accumulo's ingest rate affects accumulo's insertion 
performance"? Ingest *is* insertion into the database. Please describe 
what you mean by "insertion performance". Are you comparing some custom 
code you have written to the Continuous Ingest client?


60K entries/sec per tabletserver seems to be a reasonable ingest rate 
for the hardware you have described for continuous ingest.


However, you should *definitely* tweak the default configurations. The 
provided configuration are meant to operate Accumulo in less than 3GB of 
resident memory. I would imagine that this is a bottleneck.


A non-exhaustive list of things to check ...

* Increase TabletServer JVM heap size (4-8G)
* Enable the native maps [1]
* Increase tserver.total.mutation.queue.max=256M [2]
* Reduce table durability if your use-case allows it [3]

[1] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map
[2] 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_tserver_total_mutation_queue_max

[3] http://accumulo.apache.org/blog/2016/11/02/durability-performance.html

hjs19890 wrote:

hi,
  I'm testing the accumulo insertion performance and found that accumulo's 
ingest rate affects accumulo's insertion performance. In the access to relevant 
information when I found the accumulo itself comes with the test suite. 
Therefore, I ran accumulo-1.7.1 / test / system / continuous / start-ingest.sh 
to test the ingest rate of my cluster. Test results averaged 180,000 (entry / 
s) ingest rate and I tested the results of the insertion performance is 
similar, although these tests are not comparable. This result is not very 
satisfactory.
  My cluster has 4 servers (1 master, 3 tservers), the computer 
configuration is i7-4700, 4cores, 32g Mem. The accumulo version is 1.7.1. The 
software is the default configuration. Therefore, I would like to ask a few 
questions:
1, insert performance bottleneck?
2, my cluster of intake reached the bottleneck it?
3, if not reached the bottleneck, how can I tune my cluster?
   
thanks,