Answering your first question: running tpcc for 1 minute, in a database
with 64 warehouses (6~7GB), with a buffer pool of 128MB (around 1.8% of
database size) and a hit ratio of ~91%, I get a throughput of 45~50
transactions per second.

I did some experiments and I got the following information about my tpcc
database and benchmark. The database is created with 64 warehouses.

   Table    |                Index                 | Data Size | Index Size
|  Total
------------+--------------------------------------+-----------+------------+---------
 stock      | stock_pkey                           | 2209 MB   | 263 MB
| 2472 MB
 order_line | order_line_pkey                      | 2041 MB   | 678 MB
| 2719 MB
 customer   | idx_customer_name                    | 1216 MB   | 146 MB
| 1420 MB
 customer   | customer_pkey                        | 1216 MB   | 58 MB
 | 1420 MB
 history    |                                      | 164 MB    |
 | 164 MB
 oorder     | oorder_pkey                          | 134 MB    | 68 MB
 | 362 MB
 oorder     | idx_order                            | 134 MB    | 80 MB
 | 362 MB
 oorder     | oorder_o_w_id_o_d_id_o_c_id_o_id_key | 134 MB    | 80 MB
 | 362 MB
 new_order  | new_order_pkey                       | 27 MB     | 17 MB
 | 45 MB
 item       | item_pkey                            | 10168 kB  | 2208 kB
 | 12 MB
 district   | district_pkey                        | 776 kB    | 72 kB
 | 880 kB
 warehouse  | warehouse_pkey                       | 384 kB    | 16 kB
 | 432 kB

By executing the tpcc benchmark for 1 minute I get about 2.9 million buffer
requests. The distribution of these requests in the relations and indexes
are (in descending order):

customer                                1383399
stock_pkey                               442600
stock                                    321370
order_line                               255314
order_line_pkey                          156132
oorder                                    58665
oorder_pkey                               57895
customer_pkey                             44471
new_order_pkey                            39552
idx_customer_name                         28286
new_order                                 25861
item_pkey                                 11702
item                                      11606
district                                  11389
district_pkey                              7575
warehouse                                  5276
idx_order                                  4072
oorder_o_w_id_o_d_id_o_c_id_o_id_key       2410
warehouse_pkey                             1998
history                                    1958

All this information seems normal to me. However, from the 2.9 million
buffer requests over ~800k pages, only ~150k distinct pages are being
requested. This behavior could be explained by the benchmark accessing only
a small set of the 64 warehouses instead of having a normal distributed
access over the 64 warehouses. In other words, I think that the execution
time of the benchmark is irrelevant, assuming that the transactions follow
a normal distribution regarding accesses to warehouses.

On Wed, Oct 15, 2014 at 7:41 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Wed, Oct 15, 2014 at 6:22 AM, Lucas Lersch <lucasler...@gmail.com>
> wrote:
>
>> So is it a possible normal behavior that running tpcc for 10min only
>> access 50% of the database? Furthermore, is there a guideline of parameters
>> for tpcc (# of warehouses, execution time, operations weight)?
>>
>>
> I'm not familiar with your benchmarking tool.  With the one I am most
> familiar with, pgbench, if you run it against a database which is too big
> to fit in memory, it can take a very long time to touch each page once,
> because the constant random disk reads makes it run very slowly.  Maybe
> that is something to consider here--how many transactions were actually
> executed during your 10 min run?
>
> Also, the tool might build tables that are only used under certain run
> options.  Perhaps you just aren't choosing the options which invoke usage
> of those tables.  Since you have the trace data, it should be pretty easy
> to count how many distinct blocks are accessed from each relation, and
> compare that to the size of the relations to see which relations are unused
> or lightly used.
>
> Cheers,
>
> Jeff
>



-- 
Lucas Lersch

Reply via email to