You could use YCSB and a custom workload (i don't see a predefined workload for 100% puts without reads)
https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads HBase also has a utility for running some evaluations via MR or a thread based client: $ ./hbase org.apache.hadoop.hbase.PerformanceEvaluation Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \ <OPTIONS> [-D<property=value>]* <command> <nclients> Options: nomapred Run multiple clients using threads (rather than use mapreduce) rows Rows each client runs. Default: One million size Total size in GiB. Mutually exclusive with --rows. Default: 1.0. sampleRate Execute test on a sample of total rows. Only supported by randomRead. Default: 1.0 traceRate Enable HTrace spans. Initiate tracing every N rows. Default: 0 table Alternate table name. Default: 'TestTable' multiGet If >0, when doing RandomRead, perform multiple gets instead of single gets. Default: 0 compress Compression type to use (GZ, LZO, ...). Default: 'NONE' flushCommits Used to determine if the test should flush the table. Default: false writeToWAL Set writeToWAL on puts. Default: True autoFlush Set autoFlush on htable. Default: False oneCon all the threads share the same connection. Default: False presplit Create presplit table. Recommended for accurate perf analysis (see guide). Default: disabled inmemory Tries to keep the HFiles of the CF inmemory as far as possible. Not guaranteed that reads are always served from memory. Default: false usetags Writes tags along with KVs. Use with HFile V3. Default: false numoftags Specify the no of tags that would be needed. This works only if usetags is true. filterAll Helps to filter out all the rows on the server side there by not returning any thing back to the client. Helps to check the server side performance. Uses FilterAllFilter internally. latency Set to report operation latencies. Default: False bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL] valueSize Pass value size to use: Default: 1024 valueRandom Set if we should vary value size between 0 and 'valueSize'; set on read for stats on size: Default: Not set. valueZipf Set if we should vary value size between 0 and 'valueSize' in zipf form: Default: Not set. period Report every 'period' rows: Default: opts.perClientRunRows / 10 multiGet Batch gets together into groups of N. Only supported by randomRead. Default: disabled addColumns Adds columns to scans/gets explicitly. Default: true replicas Enable region replica testing. Defaults: 1. splitPolicy Specify a custom RegionSplitPolicy for the table. randomSleep Do a random sleep before each get between 0 and entered value. Defaults: 0 columns Columns to write per row. Default: 1 caching Scan caching to use. Default: 30 Note: -D properties will be applied to the conf used. For example: -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.task.timeout=60000 Command: filterScan Run scan test using a filter to find a specific row based on it's value (make sure to use --rows=20) randomRead Run random read test randomSeekScan Run random seek and scan 100 test randomWrite Run random write test scan Run scan test (read every row) scanRange10 Run random seek scan with both start and stop row (max 10 rows) scanRange100 Run random seek scan with both start and stop row (max 100 rows) scanRange1000 Run random seek scan with both start and stop row (max 1000 rows) scanRange10000 Run random seek scan with both start and stop row (max 10000 rows) sequentialRead Run sequential read test sequentialWrite Run sequential write test Args: nclients Integer. Required. Total number of clients (and HRegionServers) running: 1 <= value <= 500 Examples: To run a single evaluation client: $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 -- Iain Wright This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message. On Tue, Dec 22, 2015 at 11:12 AM, Kumiko Yada <[email protected]> wrote: > For to add that I don't want to the bulkinsert for this test. > > Thanks > Kumiko > > -----Original Message----- > From: Kumiko Yada [mailto:[email protected]] > Sent: Tuesday, December 22, 2015 11:01 AM > To: [email protected] > Subject: Put performance test > > Hello, > > I wrote the python script w/ happybase library to do the performance put > test; however, this library is crashing when more than 900000 rows are > put. I'd like to do 1/10/100 million rows put tests. Is there any tool > that I can use for this? > > Thanks > Kumiko >
