I have a problem with loading big data from mysql database into an HBase small cluster. The cluster configurations are as follow
Machine(1): HDFS/ primary HDFS node/ Yarn resource manager/ yarn node manager/ MapReduce / History server /zookeeper / Region Server/ Machine(2): Yarn Node Manager / Secondary HDFS node/ Machine(3): Yarn Node Manager /zookeeper / Region Server/ Machine(5): Master HBasse /zookeeper / Region Server/ Each machines parameters are 62GB RAM Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz CPU Loading the data is as follow: Java JDBC driver connects to MySQL databse, then the read records are mapped to HBase row then they are inserted to HBase. Each single record represent a single java class with about 10 primitive type fields. THE PROBLEM : loading the data takes too much time to load, where could the problem be ? For example : about 10 million records take about 6 hours to load from mysql to HBase, is this normal ? Can this be improved ? What are the possible reasons that could make loading data from mysql using java JDBC driver into HBase that slow ? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-slow-data-load-tp4060750.html Sent from the HBase User mailing list archive at Nabble.com.
