Re: Put performance test

iain wright Tue, 22 Dec 2015 11:41:09 -0800

You could use YCSB and a custom workload (i don't see a predefined workload
for 100% puts without reads)

https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads

HBase also has a utility for running some evaluations via MR or a thread
based client:

$ ./hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
  <OPTIONS> [-D<property=value>]* <command> <nclients>

Options:
 nomapred        Run multiple clients using threads (rather than use
mapreduce)
 rows            Rows each client runs. Default: One million
 size            Total size in GiB. Mutually exclusive with --rows.
Default: 1.0.
 sampleRate      Execute test on a sample of total rows. Only supported by
randomRead. Default: 1.0
 traceRate       Enable HTrace spans. Initiate tracing every N rows.
Default: 0
 table           Alternate table name. Default: 'TestTable'
 multiGet        If >0, when doing RandomRead, perform multiple gets
instead of single gets. Default: 0
 compress        Compression type to use (GZ, LZO, ...). Default: 'NONE'
 flushCommits    Used to determine if the test should flush the table.
Default: false
 writeToWAL      Set writeToWAL on puts. Default: True
 autoFlush       Set autoFlush on htable. Default: False
 oneCon          all the threads share the same connection. Default: False
 presplit        Create presplit table. Recommended for accurate perf
analysis (see guide).  Default: disabled
 inmemory        Tries to keep the HFiles of the CF inmemory as far as
possible. Not guaranteed that reads are always served from memory.
Default: false
 usetags         Writes tags along with KVs. Use with HFile V3. Default:
false
 numoftags       Specify the no of tags that would be needed. This works
only if usetags is true.
 filterAll       Helps to filter out all the rows on the server side there
by not returning any thing back to the client.  Helps to check the server
side performance.  Uses FilterAllFilter internally.
 latency         Set to report operation latencies. Default: False
 bloomFilter      Bloom filter type, one of [NONE, ROW, ROWCOL]
 valueSize       Pass value size to use: Default: 1024
 valueRandom     Set if we should vary value size between 0 and
'valueSize'; set on read for stats on size: Default: Not set.
 valueZipf       Set if we should vary value size between 0 and 'valueSize'
in zipf form: Default: Not set.
 period          Report every 'period' rows: Default: opts.perClientRunRows
/ 10
 multiGet        Batch gets together into groups of N. Only supported by
randomRead. Default: disabled
 addColumns      Adds columns to scans/gets explicitly. Default: true
 replicas        Enable region replica testing. Defaults: 1.
 splitPolicy     Specify a custom RegionSplitPolicy for the table.
 randomSleep     Do a random sleep before each get between 0 and entered
value. Defaults: 0
 columns         Columns to write per row. Default: 1
 caching         Scan caching to use. Default: 30

 Note: -D properties will be applied to the conf used.
  For example:
   -Dmapreduce.output.fileoutputformat.compress=true
   -Dmapreduce.task.timeout=60000

Command:
 filterScan      Run scan test using a filter to find a specific row based
on it's value (make sure to use --rows=20)
 randomRead      Run random read test
 randomSeekScan  Run random seek and scan 100 test
 randomWrite     Run random write test
 scan            Run scan test (read every row)
 scanRange10     Run random seek scan with both start and stop row (max 10
rows)
 scanRange100    Run random seek scan with both start and stop row (max 100
rows)
 scanRange1000   Run random seek scan with both start and stop row (max
1000 rows)
 scanRange10000  Run random seek scan with both start and stop row (max
10000 rows)
 sequentialRead  Run sequential read test
 sequentialWrite Run sequential write test

Args:
 nclients        Integer. Required. Total number of clients (and
HRegionServers)
                 running: 1 <= value <= 500
Examples:
 To run a single evaluation client:
 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1

-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Tue, Dec 22, 2015 at 11:12 AM, Kumiko Yada <[email protected]> wrote:

> For to add that I don't want to the bulkinsert for this test.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:[email protected]]
> Sent: Tuesday, December 22, 2015 11:01 AM
> To: [email protected]
> Subject: Put performance test
>
> Hello,
>
> I wrote the python script w/ happybase library to do the performance put
> test; however, this library is crashing when more than 900000 rows are
> put.  I'd like to do 1/10/100 million rows put tests.  Is there any tool
> that I can use for this?
>
> Thanks
> Kumiko
>

Re: Put performance test

Reply via email to