Kumiko: You can define your own YCSB workload by specifying the readproportion and scanproportion you want.
FYI On Tue, Dec 22, 2015 at 11:39 AM, iain wright <[email protected]> wrote: > You could use YCSB and a custom workload (i don't see a predefined workload > for 100% puts without reads) > > https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads > > HBase also has a utility for running some evaluations via MR or a thread > based client: > > $ ./hbase org.apache.hadoop.hbase.PerformanceEvaluation > Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \ > <OPTIONS> [-D<property=value>]* <command> <nclients> > > Options: > nomapred Run multiple clients using threads (rather than use > mapreduce) > rows Rows each client runs. Default: One million > size Total size in GiB. Mutually exclusive with --rows. > Default: 1.0. > sampleRate Execute test on a sample of total rows. Only supported by > randomRead. Default: 1.0 > traceRate Enable HTrace spans. Initiate tracing every N rows. > Default: 0 > table Alternate table name. Default: 'TestTable' > multiGet If >0, when doing RandomRead, perform multiple gets > instead of single gets. Default: 0 > compress Compression type to use (GZ, LZO, ...). Default: 'NONE' > flushCommits Used to determine if the test should flush the table. > Default: false > writeToWAL Set writeToWAL on puts. Default: True > autoFlush Set autoFlush on htable. Default: False > oneCon all the threads share the same connection. Default: False > presplit Create presplit table. Recommended for accurate perf > analysis (see guide). Default: disabled > inmemory Tries to keep the HFiles of the CF inmemory as far as > possible. Not guaranteed that reads are always served from memory. > Default: false > usetags Writes tags along with KVs. Use with HFile V3. Default: > false > numoftags Specify the no of tags that would be needed. This works > only if usetags is true. > filterAll Helps to filter out all the rows on the server side there > by not returning any thing back to the client. Helps to check the server > side performance. Uses FilterAllFilter internally. > latency Set to report operation latencies. Default: False > bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL] > valueSize Pass value size to use: Default: 1024 > valueRandom Set if we should vary value size between 0 and > 'valueSize'; set on read for stats on size: Default: Not set. > valueZipf Set if we should vary value size between 0 and 'valueSize' > in zipf form: Default: Not set. > period Report every 'period' rows: Default: opts.perClientRunRows > / 10 > multiGet Batch gets together into groups of N. Only supported by > randomRead. Default: disabled > addColumns Adds columns to scans/gets explicitly. Default: true > replicas Enable region replica testing. Defaults: 1. > splitPolicy Specify a custom RegionSplitPolicy for the table. > randomSleep Do a random sleep before each get between 0 and entered > value. Defaults: 0 > columns Columns to write per row. Default: 1 > caching Scan caching to use. Default: 30 > > Note: -D properties will be applied to the conf used. > For example: > -Dmapreduce.output.fileoutputformat.compress=true > -Dmapreduce.task.timeout=60000 > > Command: > filterScan Run scan test using a filter to find a specific row based > on it's value (make sure to use --rows=20) > randomRead Run random read test > randomSeekScan Run random seek and scan 100 test > randomWrite Run random write test > scan Run scan test (read every row) > scanRange10 Run random seek scan with both start and stop row (max 10 > rows) > scanRange100 Run random seek scan with both start and stop row (max 100 > rows) > scanRange1000 Run random seek scan with both start and stop row (max > 1000 rows) > scanRange10000 Run random seek scan with both start and stop row (max > 10000 rows) > sequentialRead Run sequential read test > sequentialWrite Run sequential write test > > Args: > nclients Integer. Required. Total number of clients (and > HRegionServers) > running: 1 <= value <= 500 > Examples: > To run a single evaluation client: > $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite > 1 > > > > -- > Iain Wright > > This email message is confidential, intended only for the recipient(s) > named above and may contain information that is privileged, exempt from > disclosure under applicable law. If you are not the intended recipient, do > not disclose or disseminate the message to anyone except the intended > recipient. If you have received this message in error, or are not the named > recipient(s), please immediately notify the sender by return email, and > delete all copies of this message. > > On Tue, Dec 22, 2015 at 11:12 AM, Kumiko Yada <[email protected]> > wrote: > > > For to add that I don't want to the bulkinsert for this test. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Kumiko Yada [mailto:[email protected]] > > Sent: Tuesday, December 22, 2015 11:01 AM > > To: [email protected] > > Subject: Put performance test > > > > Hello, > > > > I wrote the python script w/ happybase library to do the performance put > > test; however, this library is crashing when more than 900000 rows are > > put. I'd like to do 1/10/100 million rows put tests. Is there any tool > > that I can use for this? > > > > Thanks > > Kumiko > > >
