While on the subject, You can also use the bigpetstore application to do this, in apache bigtop. This data is suited well for hbase ( semi structured, transactional, and features some global patterns which can make for meaningful queries and so on).
Clone apache/bigtop cd bigtop-bigpetstore gradle clean package # build the jar Then follow the instructions in the README to generate as many records as you want in a distributed context. Each record is around 80 bytes, so about 10^10 records should be on the scale you are looking for. > On Sep 22, 2014, at 5:14 AM, "[email protected]" > <[email protected]> wrote: > > Hi, > > I need to generate large amount of test data (4TB) into Hadoop, has anyone > used PDGF to do so? Could you share your cook book about PDGF in Hadoop (or > HBase)? > > Many Thanks > Arthur
