subject:"Re\: Write or Ingest bottleneck"

Re: Write or Ingest bottleneck

2016-12-07 Thread Josh Elser


No.

TabletServers are, for the most part, independent of other 
tabletservers. The only caveat is that they will need to send updates to 
the tabletserver hosting the accumulo.metadata table, but these are very 
small in comparison to the amount of data that you are writing.


Architecturally, this is what enables Accumulo to scale near-linearly. 
[1] "The scalability is almost linear in the case of presplit tables"


- Josh

[1] https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf

hujs wrote:

If a tsever ingests slowly, will it affect other tservers ingest rate?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19385.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-12-07 Thread hujs

If a tsever ingests slowly, will it affect other tservers ingest rate?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19385.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-12-06 Thread Josh Elser




hujs wrote:

Hello, I asked a few questions,
1, suppose I insert data into the 'a' table, each tserver in the cluster has
at least one 'a' table of tablets, I use letters such as j, k as the split
point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
D ingest rate only can reach 50k, D tserver will affect the cluster ingest
performance?


I don't think I understand this. For a table, tablet ranges are 
disjoint. If you split the tables on letters (e.g. 'a', 'f', 'j'), the 
Key-Values that have a key starting with 'a' would only reside in one 
tablet and thus only on one tabletserver.



2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
do?


Remember that Accumulo is only dealing with bytes and has no context 
that, in your case, the bytes are actually stringified numbers. For 
example, to create 10 split points, it's easy: [1, 2, 3, 4, 5, 6, 7, 8, 
9]. This creates ten tablets, (-inf, 1), [1, 2), [2, 3), ... [9, +inf).


To create 20 tablets, you can do the following: [05, 1, 15, 2, 25, 3, 
35, 4, 45, 5, 55, 6, 65, 7, 75, 8, 85, 9, 95]. This would create 20 
tablets, (-inf, 05), [05, 1), [1, 15), ... [95, +inf).


You can extend this to create more split points if necessary for 
"numbers", but it also applies to alphabetical data as you described 
earlier. Another common trick is to temporarily reduce the split 
threshold for your table, ingest a corpus of data until you get a 
desired number of split points, and then copy the current split and then 
them later (the split command in the shell can read the split points, 
one per line, from a file).

Re: Write or Ingest bottleneck

2016-12-06 Thread hujs

Hello, I asked a few questions,
1, suppose I insert data into the 'a' table, each tserver in the cluster has
at least one 'a' table of tablets, I use letters such as j, k as the split
point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
D ingest rate only can reach 50k, D tserver will affect the cluster ingest
performance?
2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
do?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19368.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-12-01 Thread Josh Elser

No worries on English. It's just difficult to say "I'm not sure what you 
meant" :)


I think your expectations are wrong for your hardware. One 7200rpm SATA 
drive is not going to reach 300K entries/sec ingest rate with Accumulo 
and HDFS. For these specs, 90K entries/sec on one tserver sounds pretty 
good to me. I'd suggest that you add a few more nodes if you want to 
further increase the cluster-wide performance and take advantage of 
near-linear scalability of the system.


hujs wrote:

   I am so sorry that I did not provide you with more detailed information
and Bad English, which made it difficult for you to understand.My tServer
can only reach 900,000entry / s in very few cases. I expect my cluster
average ingest rate to reach  300,000entry / s. I wanted to say, "Do I need
to expand my cluster to meet my high ingest rate needs?" This time I list my
hardware as detailed as possible, and when I'm doing the data insertion, the
computer Of the cpu, io load is also provided.My cluster contains four
computers, the computer's CPU is 4 cores, and 8 hardware threads. The
network is an internal network with a bandwidth of 1 gb / s. Each computer
contains a hard disk, hard drive model is SATA, Capacity: 1TB, 7200 RPM,
Firmware: CC43.The following is the acquisition of cpu and disk IO  the
information when I do insert. This information should be sufficient to
indicate the status of my cluster most of the time.
   CPU information in 4 computers:
   tSever1
top - 17:42:20 up 9 days,  4:16, 10 users,  load average: 0.88, 0.40, 0.20
Tasks: 239 total,   1 running, 237 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.8 us,  1.8 sy,  0.0 ni, 94.3 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :  5.7 us,  2.7 sy,  0.0 ni, 91.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  4.6 us,  1.7 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu3  :  2.0 us,  2.7 sy,  0.0 ni, 90.9 id,  4.4 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  2.3 us,  0.7 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.3 us,  1.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.7 us,  1.0 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  24432228 total, 24246836 used,   185392 free,0 buffers
KiB Swap: 16457724 total,  2495092 used, 13962632 free.  1016352 cached Mem
tServer2
[hadoop@slave13 bin]$ top
top - 17:43:55 up 15 days,  7:42,  7 users,  load average: 0.84, 0.44, 0.20
Tasks: 218 total,   2 running, 215 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.7 us,  2.1 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  3.6 us,  1.6 sy,  0.0 ni, 94.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  :  6.3 us,  1.1 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.6 us,  0.0 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.5 us,  0.0 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32855648 total, 32619892 used,   235756 free,0 buffers
KiB Swap:  8191996 total,90824 used,  8101172 free. 10842092 cached Mem

tServer3
[hadoop@slave10 bin]$ top
top - 17:47:46 up 8 days,  8:10,  9 users,  load average: 0.50, 0.47, 0.35
Tasks: 286 total,   2 running, 283 sleeping,   1 stopped,   0 zombie
%Cpu0  : 30.4 us,  1.4 sy,  0.0 ni, 64.3 id,  1.4 wa,  0.0 hi,  2.4 si,  0.0
st
%Cpu1  :  5.5 us,  2.8 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.0 us,  3.7 sy,  0.0 ni, 89.4 id,  1.4 wa,  0.0 hi,  0.5 si,  0.0
st
%Cpu3  :  5.5 us,  1.8 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  6.9 us,  1.4 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  2.3 us,  1.4 sy,  0.0 ni, 92.2 id,  4.1 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.8 us,  1.4 sy,  0.0 ni, 92.2 id,  4.6 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  : 11.4 us,  0.5 sy,  0.0 ni, 88.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32689760 total, 32455764 used,   233996 free,0 buffers
KiB Swap: 16457724 total,  4678984 used, 11778740 free.   712908 cached Mem

Master
top - 17:41:37 up 15 days,  7:41, 14 users,  load average: 0.47, 0.68, 0.66
Tasks: 260 total,   1 running, 259 sleeping,   0 stopped,   0 zombie
%Cpu0  :  2.1 us,  1.0 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu1  : 12.0 us,  1.0 sy,  0.0 ni, 87.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.4 us,  1.0 sy,  0.0 ni, 93.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  : 12.0 us,  0.3 sy,  0.0 ni, 87.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.0 sy,  0.0 ni, 99.0 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.0

Re: Write or Ingest bottleneck

2016-12-01 Thread hujs

  I am so sorry that I did not provide you with more detailed information
and Bad English, which made it difficult for you to understand.My tServer
can only reach 900,000entry / s in very few cases. I expect my cluster 
average ingest rate to reach  300,000entry / s. I wanted to say, "Do I need
to expand my cluster to meet my high ingest rate needs?" This time I list my
hardware as detailed as possible, and when I'm doing the data insertion, the
computer Of the cpu, io load is also provided.My cluster contains four
computers, the computer's CPU is 4 cores, and 8 hardware threads. The
network is an internal network with a bandwidth of 1 gb / s. Each computer
contains a hard disk, hard drive model is SATA, Capacity: 1TB, 7200 RPM,
Firmware: CC43.The following is the acquisition of cpu and disk IO  the
information when I do insert. This information should be sufficient to
indicate the status of my cluster most of the time. 
  CPU information in 4 computers: 
  tSever1
top - 17:42:20 up 9 days,  4:16, 10 users,  load average: 0.88, 0.40, 0.20
Tasks: 239 total,   1 running, 237 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.8 us,  1.8 sy,  0.0 ni, 94.3 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :  5.7 us,  2.7 sy,  0.0 ni, 91.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  4.6 us,  1.7 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu3  :  2.0 us,  2.7 sy,  0.0 ni, 90.9 id,  4.4 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  2.3 us,  0.7 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.3 us,  1.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.7 us,  1.0 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  24432228 total, 24246836 used,   185392 free,0 buffers
KiB Swap: 16457724 total,  2495092 used, 13962632 free.  1016352 cached Mem
tServer2
[hadoop@slave13 bin]$ top
top - 17:43:55 up 15 days,  7:42,  7 users,  load average: 0.84, 0.44, 0.20
Tasks: 218 total,   2 running, 215 sleeping,   1 stopped,   0 zombie
%Cpu0  :  2.7 us,  2.1 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0
st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  3.6 us,  1.6 sy,  0.0 ni, 94.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  :  6.3 us,  1.1 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.6 us,  0.0 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.5 us,  0.0 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32855648 total, 32619892 used,   235756 free,0 buffers
KiB Swap:  8191996 total,90824 used,  8101172 free. 10842092 cached Mem

tServer3
[hadoop@slave10 bin]$ top
top - 17:47:46 up 8 days,  8:10,  9 users,  load average: 0.50, 0.47, 0.35
Tasks: 286 total,   2 running, 283 sleeping,   1 stopped,   0 zombie
%Cpu0  : 30.4 us,  1.4 sy,  0.0 ni, 64.3 id,  1.4 wa,  0.0 hi,  2.4 si,  0.0
st
%Cpu1  :  5.5 us,  2.8 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.0 us,  3.7 sy,  0.0 ni, 89.4 id,  1.4 wa,  0.0 hi,  0.5 si,  0.0
st
%Cpu3  :  5.5 us,  1.8 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  6.9 us,  1.4 sy,  0.0 ni, 91.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  2.3 us,  1.4 sy,  0.0 ni, 92.2 id,  4.1 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.8 us,  1.4 sy,  0.0 ni, 92.2 id,  4.6 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  : 11.4 us,  0.5 sy,  0.0 ni, 88.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32689760 total, 32455764 used,   233996 free,0 buffers
KiB Swap: 16457724 total,  4678984 used, 11778740 free.   712908 cached Mem

Master
top - 17:41:37 up 15 days,  7:41, 14 users,  load average: 0.47, 0.68, 0.66
Tasks: 260 total,   1 running, 259 sleeping,   0 stopped,   0 zombie
%Cpu0  :  2.1 us,  1.0 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0
st
%Cpu1  : 12.0 us,  1.0 sy,  0.0 ni, 87.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu2  :  5.4 us,  1.0 sy,  0.0 ni, 93.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu3  : 12.0 us,  0.3 sy,  0.0 ni, 87.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu4  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu5  :  0.7 us,  0.0 sy,  0.0 ni, 99.0 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu6  :  1.0 us,  0.3 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
%Cpu7  :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
st
KiB Mem:  32855644 total, 30707652 used,  2147992 free,0 buffers
KiB Swap:  8191996 total,   163916 used,  8028080 free. 11263484 cached Mem
  IO information in 4 computers:
tServer1
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.00  289.0

Re: Write or Ingest bottleneck

2016-11-30 Thread Josh Elser

Each tabletserver is ingesting 90M entries/second? That sounds pretty
darn good to me for a 3GB heap.

All Accumulo configuration should be consistent across *all* nodes
running Accumulo processes.

I don't believe you have shared the characteristics of your hardware yet
either. What are the available resources? How much memory, CPU, network,
and I/O (number and types of disks)?

I also am not comprehending what you are asking with this question: "Is
my cluster does not change the scale of the cluster, no way to do to
upgrade it"

Let's approach this from a different angle: what rates are you
*expecting* to see and what gives you this expectation?

hujs wrote:

Thank you, I cluster the size of each machine-Xmx is 2g. When I
configure-Xmx = Xms = 3g, performance is not improved. I set
tserver.mutation.queue.max = 50m, 100m or 150m, the performance is not much
improvement. My table's property table.durability has always been flush.
These attributes should be installed in the master machine? My master is
designated. I run into my records, each tserver node ingest rate only
reached more than 90 million entry / s, running ingest.sh ingest rate
slightly faster then a little. Ingest rate really not very fast. Is my
cluster does not change the scale of the cluster, no way to do to upgrade
it? If so, how can I judge my cluster has not improve performance, but the
need to expand the hardware resources, what should I do, and you should have
other better suggestions, right?

--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19324.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-11-30 Thread hujs

Thank you, I cluster the size of each machine-Xmx is 2g. When I
configure-Xmx = Xms = 3g, performance is not improved. I set
tserver.mutation.queue.max = 50m, 100m or 150m, the performance is not much
improvement. My table's property table.durability has always been flush.
These attributes should be installed in the master machine? My master is
designated. I run into my records, each tserver node ingest rate only
reached more than 90 million entry / s, running ingest.sh ingest rate
slightly faster then a little. Ingest rate really not very fast. Is my
cluster does not change the scale of the cluster, no way to do to upgrade
it? If so, how can I judge my cluster has not improve performance, but the
need to expand the hardware resources, what should I do, and you should have
other better suggestions, right?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19324.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-11-28 Thread Josh Elser

Ok, at least one tablet per tabletserver would be good. More than one
tablet per tabletserver is not an issue. 5-10 per tserver would be OK.

Try increasing tserver.mutation.queue.max to something like 50M or 100M.
4M is very small.

What is -Xmx in your accumulo-env.sh for ACCUMULO_TSERVER_OPTS?

You can also try setting table.durability=flush in accumulo-site.xml.

hujs wrote:

Sorry, the post was wrong, these days the network is not good, just to see,
figure in the "accumulo balance" this post can be seen. I'm tuning the
accumulo ingest rate now. I do not know what is the reason to limit the
accumulo ingest rate. I mainly used
Here are a few ways:
1, per-splite, I through the custom j, n points will be divided into three
tablets, each tserver one.
2, adjust table.file.max, tserver.compaction.minor.concurrent.max and
table.durability = flush
3, by simulating multiple clients, with multiple Bathwriter send data.
4, the opening of the native map
5, configure the tserver's accumuo-site.xml and accumulo-env.sh memory
settings to 2g.
However, ingest rate did not improve. My cluster is far from reaching the
load: cpu utilization of less than 1/8, memory usage less than 1/5, IO 8-14%
wa, with start-ingest.sh test ingest rate, but the average ingest rate less
than 18 Million entry / s. My cluster each machine configuration: cpu 8,32 g
Me. I would like to know what the impact ingest rate,
How do I tune my cluster?
Using the accumuluo 1.7.1 version.The accumulo-site.xml configuration is as
follows:

instance.volumes
hdfs://master12:9000/accumulo
comma separated list of URIs for volumes. example:
hdfs://localhost:9000/accumulo

instance.zookeeper.host
master12:2181,slave13:2181,slave10:2181,slave11:2181
comma separated list of zookeeper servers

logger.dir.walog
walogs
The property only needs to be set if upgrading from 1.4
which used to store write-ahead logs on the local
filesystem. In 1.5 write-ahead logs are stored in DFS. When 1.5 is
started for the first time it will copy any 1.4
write ahead logs into DFS. It is possible to specify a
comma-separated list of directories.

instance.secret
DEFAULT
A secret unique to a given instance that all servers must
know in order to communicate with one another.
Change it before initialization. To
change it later use ./bin/accumulo
org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new
[newpasswd],
and then update this file.

tserver.memory.maps.max
2G

tserver.memory.maps.native.enabled
true

tserver.cache.data.size
128M

tserver.cache.index.size
128M

trace.token.property.password

123456

trace.user
root

tserver.sort.buffer.size
500M

tserver.walog.max.size
2G

tserver.wal.blocksize
2G

tserver.mutation.queue.max
4M

tserver.compaction.major.concurrent.max

tserver.compaction.minor.concurrent.max

general.classpaths

$ACCUMULO_HOME/lib/accumulo-server.jar,
$ACCUMULO_HOME/lib/accumulo-core.jar,
$ACCUMULO_HOME/lib/accumulo-start.jar,
$ACCUMULO_HOME/lib/accumulo-fate.jar,
$ACCUMULO_HOME/lib/accumulo-proxy.jar,
$ACCUMULO_HOME/lib/[^.].*.jar,

$ZOOKEEPER_HOME/zookeeper[^.].*.jar,

$HADOOP_CONF_DIR,

$HADOOP_PREFIX/share/hadoop/common/[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/common/lib/(?!slf4j)[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/hdfs/[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/mapreduce/[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/yarn/[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/yarn/lib/jersey.*.jar,

Classpaths that accumulo checks for updates and class
files.

Think you dear josh Elser;

--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19319.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-11-28 Thread hujs

instance.volumes
hdfs://master12:9000/accumulo
comma separated list of URIs for volumes. example:
hdfs://localhost:9000/accumulo

instance.zookeeper.host
master12:2181,slave13:2181,slave10:2181,slave11:2181
comma separated list of zookeeper servers

tserver.memory.maps.max
2G

tserver.memory.maps.native.enabled
true

tserver.cache.data.size
128M

tserver.cache.index.size
128M

trace.token.property.password

123456

trace.user
root

tserver.sort.buffer.size
500M

tserver.walog.max.size
2G

tserver.wal.blocksize
2G

tserver.mutation.queue.max
4M

tserver.compaction.major.concurrent.max

tserver.compaction.minor.concurrent.max

general.classpaths

$ZOOKEEPER_HOME/zookeeper[^.].*.jar,

$HADOOP_CONF_DIR,

Classpaths that accumulo checks for updates and class
files.

Think you dear josh Elser;

--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19319.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Write or Ingest bottleneck

2016-11-18 Thread Josh Elser


Without seeing your images:

Make sure that your split points actually divide up your data.

For example, if you only write data where the rowId starts with the 
letters a-z but your split the table on numbers 0-9, only one tablet 
will receive the data.


The entries in a table is an approximation. It does not account for data 
that is only resident in memory. If you issue a compaction (`compact -t 
 -w` in the Accumulo shell) after you have written all of the 
data, you will see a correct number of results in the Monitor. The 
monitor is only for informational purposes -- it often times is showing 
approximations, not guaranteed consistent results.


"pre-split table using the threshold points" <- I don't know what you 
mean by "threshold points".


Josh Elser wrote:

Apache mailing lists regularly strip attachments. Can you please host
the images elsewhere and provide links to them?

hjs19890 wrote:

Hello everyone,
I have some problems need help. see Figure 1, I have hjs_v table pre
split multiple tablets, but I insert a large amount of data into the
cluster only one intake of data, the other two free, I have done many
times Like this operation, are the same result.
See Figure 2, the three tserver nodes of the tablets unbalanced, and
save10, save11 display entries is not correct. I would like to know what
causes this kind of result, how do I deal with such a problem?
Note: pre-split table using the threshold points, My cluster is balanced
three days ago. My cluster is a master, composed of three tservers.

Re: Write or Ingest bottleneck

2016-11-18 Thread Josh Elser

Apache mailing lists regularly strip attachments. Can you please host 
the images elsewhere and provide links to them?


hjs19890 wrote:

Hello everyone,
I have some problems need help. see Figure 1, I have hjs_v table pre
split multiple tablets, but I insert a large amount of data into the
cluster only one intake of data, the other two free, I have done many
times Like this operation, are the same result.
See Figure 2, the three tserver nodes of the tablets unbalanced, and
save10, save11 display entries is not correct. I would like to know what
causes this kind of result, how do I deal with such a problem?
Note: pre-split table using the threshold points, My cluster is balanced
three days ago. My cluster is a master, composed of three tservers.

Re: Write or Ingest bottleneck

2016-11-14 Thread Josh Elser

Hi,

What do you mean by "accumulo's ingest rate affects accumulo's insertion
performance"? Ingest *is* insertion into the database. Please describe
what you mean by "insertion performance". Are you comparing some custom
code you have written to the Continuous Ingest client?

60K entries/sec per tabletserver seems to be a reasonable ingest rate
for the hardware you have described for continuous ingest.

However, you should *definitely* tweak the default configurations. The
provided configuration are meant to operate Accumulo in less than 3GB of
resident memory. I would imagine that this is a bottleneck.

A non-exhaustive list of things to check ...

* Increase TabletServer JVM heap size (4-8G)
* Enable the native maps [1]
* Increase tserver.total.mutation.queue.max=256M [2]
* Reduce table durability if your use-case allows it [3]

[1] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map
[2]
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_tserver_total_mutation_queue_max

[3] http://accumulo.apache.org/blog/2016/11/02/durability-performance.html

hjs19890 wrote:

hi,
I'm testing the accumulo insertion performance and found that accumulo's
ingest rate affects accumulo's insertion performance. In the access to relevant
information when I found the accumulo itself comes with the test suite.
Therefore, I ran accumulo-1.7.1 / test / system / continuous / start-ingest.sh
to test the ingest rate of my cluster. Test results averaged 180,000 (entry /
s) ingest rate and I tested the results of the insertion performance is
similar, although these tests are not comparable. This result is not very
satisfactory.
My cluster has 4 servers (1 master, 3 tservers), the computer
configuration is i7-4700, 4cores, 32g Mem. The accumulo version is 1.7.1. The
software is the default configuration. Therefore, I would like to ask a few
questions:
1, insert performance bottleneck?
2, my cluster of intake reached the bottleneck it?
3, if not reached the bottleneck, how can I tune my cluster?

thanks,

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

Re: Write or Ingest bottleneck

13 matches

Site Navigation

Mail list logo

Footer information