Re: Write or Ingest bottleneck
No. TabletServers are, for the most part, independent of other tabletservers. The only caveat is that they will need to send updates to the tabletserver hosting the accumulo.metadata table, but these are very small in comparison to the amount of data that you are writing. Architecturally, this is what enables Accumulo to scale near-linearly. [1] "The scalability is almost linear in the case of presplit tables" - Josh [1] https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf hujs wrote: If a tsever ingests slowly, will it affect other tservers ingest rate? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19385.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
If a tsever ingests slowly, will it affect other tservers ingest rate? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19385.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
hujs wrote: Hello, I asked a few questions, 1, suppose I insert data into the 'a' table, each tserver in the cluster has at least one 'a' table of tablets, I use letters such as j, k as the split point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k, D ingest rate only can reach 50k, D tserver will affect the cluster ingest performance? I don't think I understand this. For a table, tablet ranges are disjoint. If you split the tables on letters (e.g. 'a', 'f', 'j'), the Key-Values that have a key starting with 'a' would only reside in one tablet and thus only on one tabletserver. 2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose splitpoints? Can I use the remainder of an integer as a splitpoint? Such as n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I do? Remember that Accumulo is only dealing with bytes and has no context that, in your case, the bytes are actually stringified numbers. For example, to create 10 split points, it's easy: [1, 2, 3, 4, 5, 6, 7, 8, 9]. This creates ten tablets, (-inf, 1), [1, 2), [2, 3), ... [9, +inf). To create 20 tablets, you can do the following: [05, 1, 15, 2, 25, 3, 35, 4, 45, 5, 55, 6, 65, 7, 75, 8, 85, 9, 95]. This would create 20 tablets, (-inf, 05), [05, 1), [1, 15), ... [95, +inf). You can extend this to create more split points if necessary for "numbers", but it also applies to alphabetical data as you described earlier. Another common trick is to temporarily reduce the split threshold for your table, ingest a corpus of data until you get a desired number of split points, and then copy the current split and then them later (the split command in the shell can read the split points, one per line, from a file).
Re: Write or Ingest bottleneck
Hello, I asked a few questions, 1, suppose I insert data into the 'a' table, each tserver in the cluster has at least one 'a' table of tablets, I use letters such as j, k as the split point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k, D ingest rate only can reach 50k, D tserver will affect the cluster ingest performance? 2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose splitpoints? Can I use the remainder of an integer as a splitpoint? Such as n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I do? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19368.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
No worries on English. It's just difficult to say "I'm not sure what you meant" :) I think your expectations are wrong for your hardware. One 7200rpm SATA drive is not going to reach 300K entries/sec ingest rate with Accumulo and HDFS. For these specs, 90K entries/sec on one tserver sounds pretty good to me. I'd suggest that you add a few more nodes if you want to further increase the cluster-wide performance and take advantage of near-linear scalability of the system. hujs wrote: I am so sorry that I did not provide you with more detailed information and Bad English, which made it difficult for you to understand.My tServer can only reach 900,000entry / s in very few cases. I expect my cluster average ingest rate to reach 300,000entry / s. I wanted to say, "Do I need to expand my cluster to meet my high ingest rate needs?" This time I list my hardware as detailed as possible, and when I'm doing the data insertion, the computer Of the cpu, io load is also provided.My cluster contains four computers, the computer's CPU is 4 cores, and 8 hardware threads. The network is an internal network with a bandwidth of 1 gb / s. Each computer contains a hard disk, hard drive model is SATA, Capacity: 1TB, 7200 RPM, Firmware: CC43.The following is the acquisition of cpu and disk IO the information when I do insert. This information should be sufficient to indicate the status of my cluster most of the time. CPU information in 4 computers: tSever1 top - 17:42:20 up 9 days, 4:16, 10 users, load average: 0.88, 0.40, 0.20 Tasks: 239 total, 1 running, 237 sleeping, 1 stopped, 0 zombie %Cpu0 : 2.8 us, 1.8 sy, 0.0 ni, 94.3 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st %Cpu1 : 5.7 us, 2.7 sy, 0.0 ni, 91.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 4.6 us, 1.7 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu3 : 2.0 us, 2.7 sy, 0.0 ni, 90.9 id, 4.4 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 2.3 us, 0.7 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.7 us, 0.3 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.3 us, 1.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.7 us, 1.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 24432228 total, 24246836 used, 185392 free,0 buffers KiB Swap: 16457724 total, 2495092 used, 13962632 free. 1016352 cached Mem tServer2 [hadoop@slave13 bin]$ top top - 17:43:55 up 15 days, 7:42, 7 users, load average: 0.84, 0.44, 0.20 Tasks: 218 total, 2 running, 215 sleeping, 1 stopped, 0 zombie %Cpu0 : 2.7 us, 2.1 sy, 0.0 ni, 94.1 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st %Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 3.6 us, 1.6 sy, 0.0 ni, 94.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 6.3 us, 1.1 sy, 0.0 ni, 92.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 1.6 us, 0.0 sy, 0.0 ni, 98.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.5 us, 0.0 sy, 0.0 ni, 99.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.5 us, 0.5 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.5 us, 0.5 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 32855648 total, 32619892 used, 235756 free,0 buffers KiB Swap: 8191996 total,90824 used, 8101172 free. 10842092 cached Mem tServer3 [hadoop@slave10 bin]$ top top - 17:47:46 up 8 days, 8:10, 9 users, load average: 0.50, 0.47, 0.35 Tasks: 286 total, 2 running, 283 sleeping, 1 stopped, 0 zombie %Cpu0 : 30.4 us, 1.4 sy, 0.0 ni, 64.3 id, 1.4 wa, 0.0 hi, 2.4 si, 0.0 st %Cpu1 : 5.5 us, 2.8 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 5.0 us, 3.7 sy, 0.0 ni, 89.4 id, 1.4 wa, 0.0 hi, 0.5 si, 0.0 st %Cpu3 : 5.5 us, 1.8 sy, 0.0 ni, 92.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 6.9 us, 1.4 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 2.3 us, 1.4 sy, 0.0 ni, 92.2 id, 4.1 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 1.8 us, 1.4 sy, 0.0 ni, 92.2 id, 4.6 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 11.4 us, 0.5 sy, 0.0 ni, 88.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 32689760 total, 32455764 used, 233996 free,0 buffers KiB Swap: 16457724 total, 4678984 used, 11778740 free. 712908 cached Mem Master top - 17:41:37 up 15 days, 7:41, 14 users, load average: 0.47, 0.68, 0.66 Tasks: 260 total, 1 running, 259 sleeping, 0 stopped, 0 zombie %Cpu0 : 2.1 us, 1.0 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu1 : 12.0 us, 1.0 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 5.4 us, 1.0 sy, 0.0 ni, 93.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 12.0 us, 0.3 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.7 us, 0.0 sy, 0.0 ni, 99.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 1.0
Re: Write or Ingest bottleneck
I am so sorry that I did not provide you with more detailed information and Bad English, which made it difficult for you to understand.My tServer can only reach 900,000entry / s in very few cases. I expect my cluster average ingest rate to reach 300,000entry / s. I wanted to say, "Do I need to expand my cluster to meet my high ingest rate needs?" This time I list my hardware as detailed as possible, and when I'm doing the data insertion, the computer Of the cpu, io load is also provided.My cluster contains four computers, the computer's CPU is 4 cores, and 8 hardware threads. The network is an internal network with a bandwidth of 1 gb / s. Each computer contains a hard disk, hard drive model is SATA, Capacity: 1TB, 7200 RPM, Firmware: CC43.The following is the acquisition of cpu and disk IO the information when I do insert. This information should be sufficient to indicate the status of my cluster most of the time. CPU information in 4 computers: tSever1 top - 17:42:20 up 9 days, 4:16, 10 users, load average: 0.88, 0.40, 0.20 Tasks: 239 total, 1 running, 237 sleeping, 1 stopped, 0 zombie %Cpu0 : 2.8 us, 1.8 sy, 0.0 ni, 94.3 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st %Cpu1 : 5.7 us, 2.7 sy, 0.0 ni, 91.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 4.6 us, 1.7 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu3 : 2.0 us, 2.7 sy, 0.0 ni, 90.9 id, 4.4 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 2.3 us, 0.7 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.7 us, 0.3 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.3 us, 1.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.7 us, 1.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 24432228 total, 24246836 used, 185392 free,0 buffers KiB Swap: 16457724 total, 2495092 used, 13962632 free. 1016352 cached Mem tServer2 [hadoop@slave13 bin]$ top top - 17:43:55 up 15 days, 7:42, 7 users, load average: 0.84, 0.44, 0.20 Tasks: 218 total, 2 running, 215 sleeping, 1 stopped, 0 zombie %Cpu0 : 2.7 us, 2.1 sy, 0.0 ni, 94.1 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st %Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 3.6 us, 1.6 sy, 0.0 ni, 94.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 6.3 us, 1.1 sy, 0.0 ni, 92.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 1.6 us, 0.0 sy, 0.0 ni, 98.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.5 us, 0.0 sy, 0.0 ni, 99.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.5 us, 0.5 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.5 us, 0.5 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 32855648 total, 32619892 used, 235756 free,0 buffers KiB Swap: 8191996 total,90824 used, 8101172 free. 10842092 cached Mem tServer3 [hadoop@slave10 bin]$ top top - 17:47:46 up 8 days, 8:10, 9 users, load average: 0.50, 0.47, 0.35 Tasks: 286 total, 2 running, 283 sleeping, 1 stopped, 0 zombie %Cpu0 : 30.4 us, 1.4 sy, 0.0 ni, 64.3 id, 1.4 wa, 0.0 hi, 2.4 si, 0.0 st %Cpu1 : 5.5 us, 2.8 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 5.0 us, 3.7 sy, 0.0 ni, 89.4 id, 1.4 wa, 0.0 hi, 0.5 si, 0.0 st %Cpu3 : 5.5 us, 1.8 sy, 0.0 ni, 92.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 6.9 us, 1.4 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 2.3 us, 1.4 sy, 0.0 ni, 92.2 id, 4.1 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 1.8 us, 1.4 sy, 0.0 ni, 92.2 id, 4.6 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 11.4 us, 0.5 sy, 0.0 ni, 88.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 32689760 total, 32455764 used, 233996 free,0 buffers KiB Swap: 16457724 total, 4678984 used, 11778740 free. 712908 cached Mem Master top - 17:41:37 up 15 days, 7:41, 14 users, load average: 0.47, 0.68, 0.66 Tasks: 260 total, 1 running, 259 sleeping, 0 stopped, 0 zombie %Cpu0 : 2.1 us, 1.0 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu1 : 12.0 us, 1.0 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 5.4 us, 1.0 sy, 0.0 ni, 93.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 12.0 us, 0.3 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.7 us, 0.0 sy, 0.0 ni, 99.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 1.0 us, 0.3 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 32855644 total, 30707652 used, 2147992 free,0 buffers KiB Swap: 8191996 total, 163916 used, 8028080 free. 11263484 cached Mem IO information in 4 computers: tServer1 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 289.0
Re: Write or Ingest bottleneck
Each tabletserver is ingesting 90M entries/second? That sounds pretty darn good to me for a 3GB heap. All Accumulo configuration should be consistent across *all* nodes running Accumulo processes. I don't believe you have shared the characteristics of your hardware yet either. What are the available resources? How much memory, CPU, network, and I/O (number and types of disks)? I also am not comprehending what you are asking with this question: "Is my cluster does not change the scale of the cluster, no way to do to upgrade it" Let's approach this from a different angle: what rates are you *expecting* to see and what gives you this expectation? hujs wrote: Thank you, I cluster the size of each machine-Xmx is 2g. When I configure-Xmx = Xms = 3g, performance is not improved. I set tserver.mutation.queue.max = 50m, 100m or 150m, the performance is not much improvement. My table's property table.durability has always been flush. These attributes should be installed in the master machine? My master is designated. I run into my records, each tserver node ingest rate only reached more than 90 million entry / s, running ingest.sh ingest rate slightly faster then a little. Ingest rate really not very fast. Is my cluster does not change the scale of the cluster, no way to do to upgrade it? If so, how can I judge my cluster has not improve performance, but the need to expand the hardware resources, what should I do, and you should have other better suggestions, right? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19324.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
Thank you, I cluster the size of each machine-Xmx is 2g. When I configure-Xmx = Xms = 3g, performance is not improved. I set tserver.mutation.queue.max = 50m, 100m or 150m, the performance is not much improvement. My table's property table.durability has always been flush. These attributes should be installed in the master machine? My master is designated. I run into my records, each tserver node ingest rate only reached more than 90 million entry / s, running ingest.sh ingest rate slightly faster then a little. Ingest rate really not very fast. Is my cluster does not change the scale of the cluster, no way to do to upgrade it? If so, how can I judge my cluster has not improve performance, but the need to expand the hardware resources, what should I do, and you should have other better suggestions, right? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19324.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
Ok, at least one tablet per tabletserver would be good. More than one tablet per tabletserver is not an issue. 5-10 per tserver would be OK. Try increasing tserver.mutation.queue.max to something like 50M or 100M. 4M is very small. What is -Xmx in your accumulo-env.sh for ACCUMULO_TSERVER_OPTS? You can also try setting table.durability=flush in accumulo-site.xml. hujs wrote: Sorry, the post was wrong, these days the network is not good, just to see, figure in the "accumulo balance" this post can be seen. I'm tuning the accumulo ingest rate now. I do not know what is the reason to limit the accumulo ingest rate. I mainly used Here are a few ways: 1, per-splite, I through the custom j, n points will be divided into three tablets, each tserver one. 2, adjust table.file.max, tserver.compaction.minor.concurrent.max and table.durability = flush 3, by simulating multiple clients, with multiple Bathwriter send data. 4, the opening of the native map 5, configure the tserver's accumuo-site.xml and accumulo-env.sh memory settings to 2g. However, ingest rate did not improve. My cluster is far from reaching the load: cpu utilization of less than 1/8, memory usage less than 1/5, IO 8-14% wa, with start-ingest.sh test ingest rate, but the average ingest rate less than 18 Million entry / s. My cluster each machine configuration: cpu 8,32 g Me. I would like to know what the impact ingest rate, How do I tune my cluster? Using the accumuluo 1.7.1 version.The accumulo-site.xml configuration is as follows: instance.volumes hdfs://master12:9000/accumulo comma separated list of URIs for volumes. example: hdfs://localhost:9000/accumulo instance.zookeeper.host master12:2181,slave13:2181,slave10:2181,slave11:2181 comma separated list of zookeeper servers logger.dir.walog walogs The property only needs to be set if upgrading from 1.4 which used to store write-ahead logs on the local filesystem. In 1.5 write-ahead logs are stored in DFS. When 1.5 is started for the first time it will copy any 1.4 write ahead logs into DFS. It is possible to specify a comma-separated list of directories. instance.secret DEFAULT A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file. tserver.memory.maps.max 2G tserver.memory.maps.native.enabled true tserver.cache.data.size 128M tserver.cache.index.size 128M trace.token.property.password 123456 trace.user root tserver.sort.buffer.size 500M tserver.walog.max.size 2G tserver.wal.blocksize 2G tserver.mutation.queue.max 4M tserver.compaction.major.concurrent.max 8 tserver.compaction.minor.concurrent.max 8 general.classpaths $ACCUMULO_HOME/lib/accumulo-server.jar, $ACCUMULO_HOME/lib/accumulo-core.jar, $ACCUMULO_HOME/lib/accumulo-start.jar, $ACCUMULO_HOME/lib/accumulo-fate.jar, $ACCUMULO_HOME/lib/accumulo-proxy.jar, $ACCUMULO_HOME/lib/[^.].*.jar, $ZOOKEEPER_HOME/zookeeper[^.].*.jar, $HADOOP_CONF_DIR, $HADOOP_PREFIX/share/hadoop/common/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/common/lib/(?!slf4j)[^.].*.jar, $HADOOP_PREFIX/share/hadoop/hdfs/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/mapreduce/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/yarn/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/yarn/lib/jersey.*.jar, Classpaths that accumulo checks for updates and class files. Think you dear josh Elser; -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19319.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
Sorry, the post was wrong, these days the network is not good, just to see, figure in the "accumulo balance" this post can be seen. I'm tuning the accumulo ingest rate now. I do not know what is the reason to limit the accumulo ingest rate. I mainly used Here are a few ways: 1, per-splite, I through the custom j, n points will be divided into three tablets, each tserver one. 2, adjust table.file.max, tserver.compaction.minor.concurrent.max and table.durability = flush 3, by simulating multiple clients, with multiple Bathwriter send data. 4, the opening of the native map 5, configure the tserver's accumuo-site.xml and accumulo-env.sh memory settings to 2g. However, ingest rate did not improve. My cluster is far from reaching the load: cpu utilization of less than 1/8, memory usage less than 1/5, IO 8-14% wa, with start-ingest.sh test ingest rate, but the average ingest rate less than 18 Million entry / s. My cluster each machine configuration: cpu 8,32 g Me. I would like to know what the impact ingest rate, How do I tune my cluster? Using the accumuluo 1.7.1 version.The accumulo-site.xml configuration is as follows: instance.volumes hdfs://master12:9000/accumulo comma separated list of URIs for volumes. example: hdfs://localhost:9000/accumulo instance.zookeeper.host master12:2181,slave13:2181,slave10:2181,slave11:2181 comma separated list of zookeeper servers logger.dir.walog walogs The property only needs to be set if upgrading from 1.4 which used to store write-ahead logs on the local filesystem. In 1.5 write-ahead logs are stored in DFS. When 1.5 is started for the first time it will copy any 1.4 write ahead logs into DFS. It is possible to specify a comma-separated list of directories. instance.secret DEFAULT A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file. tserver.memory.maps.max 2G tserver.memory.maps.native.enabled true tserver.cache.data.size 128M tserver.cache.index.size 128M trace.token.property.password 123456 trace.user root tserver.sort.buffer.size 500M tserver.walog.max.size 2G tserver.wal.blocksize 2G tserver.mutation.queue.max 4M tserver.compaction.major.concurrent.max 8 tserver.compaction.minor.concurrent.max 8 general.classpaths $ACCUMULO_HOME/lib/accumulo-server.jar, $ACCUMULO_HOME/lib/accumulo-core.jar, $ACCUMULO_HOME/lib/accumulo-start.jar, $ACCUMULO_HOME/lib/accumulo-fate.jar, $ACCUMULO_HOME/lib/accumulo-proxy.jar, $ACCUMULO_HOME/lib/[^.].*.jar, $ZOOKEEPER_HOME/zookeeper[^.].*.jar, $HADOOP_CONF_DIR, $HADOOP_PREFIX/share/hadoop/common/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/common/lib/(?!slf4j)[^.].*.jar, $HADOOP_PREFIX/share/hadoop/hdfs/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/mapreduce/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/yarn/[^.].*.jar, $HADOOP_PREFIX/share/hadoop/yarn/lib/jersey.*.jar, Classpaths that accumulo checks for updates and class files. Think you dear josh Elser; -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-or-Ingest-bottleneck-tp19255p19319.html Sent from the Developers mailing list archive at Nabble.com.
Re: Write or Ingest bottleneck
Without seeing your images: Make sure that your split points actually divide up your data. For example, if you only write data where the rowId starts with the letters a-z but your split the table on numbers 0-9, only one tablet will receive the data. The entries in a table is an approximation. It does not account for data that is only resident in memory. If you issue a compaction (`compact -t -w` in the Accumulo shell) after you have written all of the data, you will see a correct number of results in the Monitor. The monitor is only for informational purposes -- it often times is showing approximations, not guaranteed consistent results. "pre-split table using the threshold points" <- I don't know what you mean by "threshold points". Josh Elser wrote: Apache mailing lists regularly strip attachments. Can you please host the images elsewhere and provide links to them? hjs19890 wrote: Hello everyone, I have some problems need help. see Figure 1, I have hjs_v table pre split multiple tablets, but I insert a large amount of data into the cluster only one intake of data, the other two free, I have done many times Like this operation, are the same result. See Figure 2, the three tserver nodes of the tablets unbalanced, and save10, save11 display entries is not correct. I would like to know what causes this kind of result, how do I deal with such a problem? Note: pre-split table using the threshold points, My cluster is balanced three days ago. My cluster is a master, composed of three tservers.
Re: Write or Ingest bottleneck
Apache mailing lists regularly strip attachments. Can you please host the images elsewhere and provide links to them? hjs19890 wrote: Hello everyone, I have some problems need help. see Figure 1, I have hjs_v table pre split multiple tablets, but I insert a large amount of data into the cluster only one intake of data, the other two free, I have done many times Like this operation, are the same result. See Figure 2, the three tserver nodes of the tablets unbalanced, and save10, save11 display entries is not correct. I would like to know what causes this kind of result, how do I deal with such a problem? Note: pre-split table using the threshold points, My cluster is balanced three days ago. My cluster is a master, composed of three tservers.
Re: Write or Ingest bottleneck
Hi, What do you mean by "accumulo's ingest rate affects accumulo's insertion performance"? Ingest *is* insertion into the database. Please describe what you mean by "insertion performance". Are you comparing some custom code you have written to the Continuous Ingest client? 60K entries/sec per tabletserver seems to be a reasonable ingest rate for the hardware you have described for continuous ingest. However, you should *definitely* tweak the default configurations. The provided configuration are meant to operate Accumulo in less than 3GB of resident memory. I would imagine that this is a bottleneck. A non-exhaustive list of things to check ... * Increase TabletServer JVM heap size (4-8G) * Enable the native maps [1] * Increase tserver.total.mutation.queue.max=256M [2] * Reduce table durability if your use-case allows it [3] [1] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map [2] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_tserver_total_mutation_queue_max [3] http://accumulo.apache.org/blog/2016/11/02/durability-performance.html hjs19890 wrote: hi, I'm testing the accumulo insertion performance and found that accumulo's ingest rate affects accumulo's insertion performance. In the access to relevant information when I found the accumulo itself comes with the test suite. Therefore, I ran accumulo-1.7.1 / test / system / continuous / start-ingest.sh to test the ingest rate of my cluster. Test results averaged 180,000 (entry / s) ingest rate and I tested the results of the insertion performance is similar, although these tests are not comparable. This result is not very satisfactory. My cluster has 4 servers (1 master, 3 tservers), the computer configuration is i7-4700, 4cores, 32g Mem. The accumulo version is 1.7.1. The software is the default configuration. Therefore, I would like to ask a few questions: 1, insert performance bottleneck? 2, my cluster of intake reached the bottleneck it? 3, if not reached the bottleneck, how can I tune my cluster? thanks,