from:"bonnahu"

Using SparkSQL to query Hbase entity takes very long time

2014-12-02 Thread bonnahu

Hi all, 
I am new to Spark and currently I am trying to run a SparkSQL query on HBase
entity. For an entity with about 4000 rows, it will take about 12 seconds.
Is it expected? Is there any way to shorten the query process?

Here is the code snippet:


SparkConf sparkConf = new
SparkConf().setMaster(spark://serverUrl:port).setAppName(Javasparksqltest);
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration hbase_conf = HBaseConfiguration.create();
hbase_conf.set(hbase.zookeeper.quorum, serverList);
hbase_conf.set(hbase.regionserver.port, 60020);
hbase_conf.set(hbase.master, master_url);
hbase_conf.set(TableInputFormat.INPUT_TABLE, entityName);
JavaPairRDDImmutableBytesWritable, Result hBaseRDD =
jsc.newAPIHadoopRDD(hbase_conf,
TableInputFormat.class, ImmutableBytesWritable.class,
Result.class).cache();
// Generate the schema based on the string of schema
final ListStructField keyFields = new ArrayListStructField();
for (String fieldName: this.getSchemaString().split(,)) {
 KeyFields.add(DataType.createStructField(fieldName,
DataType.StringType, true));
}
StructType schema = DataType.createStructType(keyFields);
JavaRDDRow rowRDD = hBaseRDD.map(
 new FunctionTuple2lt;ImmutableBytesWritable, Result, Row() {
public Row call(Tuple2ImmutableBytesWritable, Result re)
throws Exception {
return createRow(re, this.getSchemaString());
}
});

JavaSQLContext sqlContext = new
org.apache.spark.sql.api.java.JavaSQLContext(jsc);
// Apply the schema to the RDD.
JavaSchemaRDD schemaRDD = sqlContext.applySchema(rowRDD, schema);
schemaRDD.registerTempTable(queryEntity);
JavaSchemaRDD retRDD = sqlContext.sql(SELECT * FROM mldata WHERE name=
'Spark');
logger.info(retRDD count is  + retRDD.count());

thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-SparkSQL-to-query-Hbase-entity-takes-very-long-time-tp20194.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread bonnahu

I am trying to load a large Hbase table into SPARK RDD to run a SparkSQL
query on the entity. For an entity with about 6 million rows, it will take
about 35 seconds to load it to RDD. Is it expected? Is there any way to
shorten the loading process? I have been getting some tips from
http://hbase.apache.org/book/perf.reading.html to speed up the process,
e.g., scan.setCaching(cacheSize) and only add the necessary
attributes/column to scan. I am just wondering if there are other ways to
improve the speed?

Here is the code snippet:

SparkConf sparkConf = new
SparkConf().setMaster(spark://url).setAppName(SparkSQLTest);
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration hbase_conf = HBaseConfiguration.create();
hbase_conf.set(hbase.zookeeper.quorum,url);
hbase_conf.set(hbase.regionserver.port, 60020);
hbase_conf.set(hbase.master, url);
hbase_conf.set(TableInputFormat.INPUT_TABLE, entityName);
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes(MetaInfo), Bytes.toBytes(col1));
scan.addColumn(Bytes.toBytes(MetaInfo), Bytes.toBytes(col2));
scan.addColumn(Bytes.toBytes(MetaInfo), Bytes.toBytes(col3));
scan.setCaching(this.cacheSize);
hbase_conf.set(TableInputFormat.SCAN, convertScanToString(scan));
JavaPairRDDImmutableBytesWritable, Result hBaseRDD 
= jsc.newAPIHadoopRDD(hbase_conf,
TableInputFormat.class, ImmutableBytesWritable.class,
Result.class);
logger.info(count is  + hBaseRDD.cache().count());



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread bonnahu

Hi,
Here is the configuration of the cluster:

Workers: 2
For each worker, 
Cores: 24 Total, 0 Used
Memory: 69.6 GB Total, 0.0 B Used
For the spark.executor.memory, I didn't set it, so it should be the default
value 512M.

How much space does one row only consisting of the 3 columns consume? 
the size of 3 columns are very small, probably less than 100 bytes.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396p20414.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread bonnahu

Hi Ted,
Here is the information about the Regions:
Region Server   Region Count
http://regionserver1:60030/ 44
http://regionserver2:60030/ 39
http://regionserver3:60030/ 55




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396p20417.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: KryoRegistrator exception and Kryo class not found while compiling

2014-12-11 Thread bonnahu

Is the class com.dataken.spark.examples.MyRegistrator public? if not, change
it to public and give a try.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KryoRegistrator-exception-and-Kryo-class-not-found-while-compiling-tp10396p20646.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: KryoSerializer exception in Spark Streaming JAVA

2014-12-11 Thread bonnahu

class MyRegistrator implements KryoRegistrator { 

public void registerClasses(Kryo kryo) { 
kryo.register(ImpressionFactsValue.class); 
} 
  
} 

change this class to public and give a try 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KryoSerializer-exception-in-Spark-Streaming-JAVA-tp15479p20647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Using SparkSQL to query Hbase entity takes very long time

Loading a large Hbase table into SPARK RDD takes quite long time

Re: Loading a large Hbase table into SPARK RDD takes quite long time

Re: Loading a large Hbase table into SPARK RDD takes quite long time

Re: KryoRegistrator exception and Kryo class not found while compiling

Re: KryoSerializer exception in Spark Streaming JAVA

6 matches

Site Navigation

Mail list logo

Footer information