Re: Put performance test

Ted Yu Tue, 22 Dec 2015 13:19:20 -0800

Kumiko:
You can define your own YCSB workload by specifying the readproportion
and scanproportion you want.


FYI

On Tue, Dec 22, 2015 at 11:39 AM, iain wright <[email protected]> wrote:

> You could use YCSB and a custom workload (i don't see a predefined workload
> for 100% puts without reads)
>
> https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
>
> HBase also has a utility for running some evaluations via MR or a thread
> based client:
>
> $ ./hbase org.apache.hadoop.hbase.PerformanceEvaluation
> Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
>   <OPTIONS> [-D<property=value>]* <command> <nclients>
>
> Options:
>  nomapred        Run multiple clients using threads (rather than use
> mapreduce)
>  rows            Rows each client runs. Default: One million
>  size            Total size in GiB. Mutually exclusive with --rows.
> Default: 1.0.
>  sampleRate      Execute test on a sample of total rows. Only supported by
> randomRead. Default: 1.0
>  traceRate       Enable HTrace spans. Initiate tracing every N rows.
> Default: 0
>  table           Alternate table name. Default: 'TestTable'
>  multiGet        If >0, when doing RandomRead, perform multiple gets
> instead of single gets. Default: 0
>  compress        Compression type to use (GZ, LZO, ...). Default: 'NONE'
>  flushCommits    Used to determine if the test should flush the table.
> Default: false
>  writeToWAL      Set writeToWAL on puts. Default: True
>  autoFlush       Set autoFlush on htable. Default: False
>  oneCon          all the threads share the same connection. Default: False
>  presplit        Create presplit table. Recommended for accurate perf
> analysis (see guide).  Default: disabled
>  inmemory        Tries to keep the HFiles of the CF inmemory as far as
> possible. Not guaranteed that reads are always served from memory.
> Default: false
>  usetags         Writes tags along with KVs. Use with HFile V3. Default:
> false
>  numoftags       Specify the no of tags that would be needed. This works
> only if usetags is true.
>  filterAll       Helps to filter out all the rows on the server side there
> by not returning any thing back to the client.  Helps to check the server
> side performance.  Uses FilterAllFilter internally.
>  latency         Set to report operation latencies. Default: False
>  bloomFilter      Bloom filter type, one of [NONE, ROW, ROWCOL]
>  valueSize       Pass value size to use: Default: 1024
>  valueRandom     Set if we should vary value size between 0 and
> 'valueSize'; set on read for stats on size: Default: Not set.
>  valueZipf       Set if we should vary value size between 0 and 'valueSize'
> in zipf form: Default: Not set.
>  period          Report every 'period' rows: Default: opts.perClientRunRows
> / 10
>  multiGet        Batch gets together into groups of N. Only supported by
> randomRead. Default: disabled
>  addColumns      Adds columns to scans/gets explicitly. Default: true
>  replicas        Enable region replica testing. Defaults: 1.
>  splitPolicy     Specify a custom RegionSplitPolicy for the table.
>  randomSleep     Do a random sleep before each get between 0 and entered
> value. Defaults: 0
>  columns         Columns to write per row. Default: 1
>  caching         Scan caching to use. Default: 30
>
>  Note: -D properties will be applied to the conf used.
>   For example:
>    -Dmapreduce.output.fileoutputformat.compress=true
>    -Dmapreduce.task.timeout=60000
>
> Command:
>  filterScan      Run scan test using a filter to find a specific row based
> on it's value (make sure to use --rows=20)
>  randomRead      Run random read test
>  randomSeekScan  Run random seek and scan 100 test
>  randomWrite     Run random write test
>  scan            Run scan test (read every row)
>  scanRange10     Run random seek scan with both start and stop row (max 10
> rows)
>  scanRange100    Run random seek scan with both start and stop row (max 100
> rows)
>  scanRange1000   Run random seek scan with both start and stop row (max
> 1000 rows)
>  scanRange10000  Run random seek scan with both start and stop row (max
> 10000 rows)
>  sequentialRead  Run sequential read test
>  sequentialWrite Run sequential write test
>
> Args:
>  nclients        Integer. Required. Total number of clients (and
> HRegionServers)
>                  running: 1 <= value <= 500
> Examples:
>  To run a single evaluation client:
>  $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite
> 1
>
>
>
> --
> Iain Wright
>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
> On Tue, Dec 22, 2015 at 11:12 AM, Kumiko Yada <[email protected]>
> wrote:
>
> > For to add that I don't want to the bulkinsert for this test.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:[email protected]]
> > Sent: Tuesday, December 22, 2015 11:01 AM
> > To: [email protected]
> > Subject: Put performance test
> >
> > Hello,
> >
> > I wrote the python script w/ happybase library to do the performance put
> > test; however, this library is crashing when more than 900000 rows are
> > put.  I'd like to do 1/10/100 million rows put tests.  Is there any tool
> > that I can use for this?
> >
> > Thanks
> > Kumiko
> >
>

Re: Put performance test

Reply via email to