On 10/08/2013 03:16 PM, Jay Vyas wrote:
Hi folks.
Ive been hacking around on the big pet store idea. So far ive only got
the template for the synthetic data set generator:
https://raw.github.com/jayunit100/hadoop-example-jobs/master/src/main/java/org/bigtop/bigpetstore/PetStoreTransactionGeneratorJob.java
This is the "first" phase implementation of a MapReduce job that will a
generate synthetic data set of transactions in a petstore.
It is meant to be configurable: So people can use it to generate as many
transactions as they want. I will also add more "products" to it.
2) The next step will be to flesh out the transaction data and then
write up aggregations both in hive, pig, and mapreduce. That will serve
as the ETL blueprint.
3) Then the interesting part will come: Feeding those ETL'd statistics
into an available data store that is bigtop supported : i.e. SOLR
indices and HBASE keyvalues.
At that point the sample application will be ready and the first
iteration of bigtop.blueprints will be ready to share.
If Any initial thoughts or anyone else wants to jump in, let me know.? :)
Jay Vyas
http://jayunit100.blogspot.com
Looks like a great start!
Can't wait to see the following parts.
Some notss:
* Missing license header
* Package name should probably be org.apache.bigtop.blueprint.bigpetstore
* It would be nice to split all these classes in different files
* It would be nice to group instance variables at the same location (ex:
int soFar is declared right in the middle between two methods)
* It would be nice to extract strings such as "Dud Job", "transactions"
or "transaction_files" into constants
* I have spotted some System.out.println
Thanks,
Bruno