Hi
I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were
created for each file with
hdfs fsck <path> -files -blocks
Ulul
Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
Hello list,
I would like to experiment with TestDFSIO and run some benchmarks
under different configuration settings. One of the things I would
like to experiment with is to see for example how the block
replication factor (dfs.replication) has an influence on the TestDFSIO
results.
I'm using the following version of Hadoop and CDH:
bart@sandy-quad-1:~$ hadoop version
Hadoop 2.3.0-cdh5.1.2
Subversion git://github.sf.cloudera.com/CDH/cdh.git -r
8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on 2014-08-26T01:36Z
Compiled with protoc 2.5.0
From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
This command was run using
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
My main problem is how I can easily change the replication factor for
each run of TestDFSIO. I see two options:
1) Change the dfs.replication configuration value in my Cloudera
Manager, restart my cluster, and re-run TestDFSIO.
2) Somehow pass the different dfs.replication option to the command
line of TestDFSIO. On
http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1
I see that people run the TestDFSIO benchmark with the '-D
dfs.replication=1' option. This is probably the better way to go?
Method 1 seems cumbersome, and it looks like method 2 does not give
any errors on my cluster, but how can I check if TestDFSIO was indeed
run with the replication factor I specified with the -D option?
Kind regards,
Bart