Hi,
How can we simply cache hbase table and do sql query via java api in spark.
Thanks,
Udbhav Agarwal
Hi,
There are some examples in spark/example
<https://github.com/apache/spark/tree/master/examples> and there are also
some examples in spark package <http://spark-packages.org/>.
And I find this blog
<http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html>
is
Hi Sparkers,
How do i integrate hbase on spark !!!
Appreciate for replies !!
Regards,
Sandeep.v
;
> user@spark.apache.org>
> *Sent:* Monday, February 23, 2015 8:52 AM
> *Subject:* Re: How to integrate HBASE on Spark
>
> Installing hbase on hadoop cluster would allow hbase to utilize features
> provided by hdfs, such as short circuit read (See '90.2. Leveraging loc
Or, use the SparkOnHBase
lab.http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
From: Ted Yu
To: Akhil Das
Cc: sandeep vura ; "user@spark.apache.org"
Sent: Monday, February 23, 2015 8:52 AM
Subject: Re: How to integrate HBASE on Spark
Installin
Installing hbase on hadoop cluster would allow hbase to utilize features
provided by hdfs, such as short circuit read (See '90.2. Leveraging local
data' under http://hbase.apache.org/book.html#perf.hdfs).
Cheers
On Sun, Feb 22, 2015 at 11:38 PM, Akhil Das
wrote:
> If you are ha
cluster. If you install it on the spark
> cluster itself, then hbase might take up a few cpu cycles and there's a
> chance for the job to lag.
>
> Thanks
> Best Regards
>
> On Mon, Feb 23, 2015 at 12:48 PM, sandeep vura
> wrote:
>
>> Hi
>>
>> I had
If you are having both the clusters on the same network, then i'd suggest
you installing it on the hadoop cluster. If you install it on the spark
cluster itself, then hbase might take up a few cpu cycles and there's a
chance for the job to lag.
Thanks
Best Regards
On Mon, Feb 23, 201
Hi
I had installed spark on 3 node cluster. Spark services are up and
running.But i want to integrate hbase on spark
Do i need to install HBASE on hadoop cluster or spark cluster.
Please let me know asap.
Regards,
Sandeep.v
Hi Siddharth,
With v 4.3 of Phoenix, you can use the PhoenixInputFormat and
OutputFormat classes to pull/push to Phoenix from Spark.
HTH
Thanks
Ravi
On Wed, Feb 11, 2015 at 6:59 AM, Ted Yu wrote:
> Connectivity to hbase is also avaliable. You can take a look at:
>
> examples/
Connectivity to hbase is also avaliable. You can take a look at:
examples//src/main/python/hbase_inputformat.py
examples//src/main/python/hbase_outputformat.py
examples//src/main/scala/org/apache/spark/examples/HBaseTest.scala
examples//src/main/scala/org/apache/spark/examples/pythonconverters
>
>
>
> I am new to Spark . We have recently moved from Apache Storm to Apache
> Spark to build our OLAP tool .
>
> Now ,earlier we were using Hbase & Phoenix.
>
> We need to re-think what to use in case of Spark.
>
> Should we go ahead with Hbase or Hive or C
Hi ,
I am new to Spark . We have recently moved from Apache Storm to Apache Spark to
build our OLAP tool .
Now ,earlier we were using Hbase & Phoenix.
We need to re-think what to use in case of Spark.
Should we go ahead with Hbase or Hive or Cassandra for query processing with
Spark
Hi,
In fact, this pull https://github.com/apache/spark/pull/3920 is to do Hbase
scan. However, it is not merged yet.
You can also take a look at the example code at
http://spark-packages.org/package/20 which is using scala and python to
read data from hbase.
Hope this can be helpful.
Cheers
Gen
?Hi,
I am trying to do a hbase scan and read it into a spark rdd using pyspark. I
have successfully written data to hbase from pyspark, and been able to read a
full table from hbase using the python example code. Unfortunately I am unable
to find any example code for doing an HBase scan and
Try _fast_serialization=2 or contact PiCloud support
Can any developer that works in that stuff tell me if that problem can be
fixed?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/HBase-Thrift-API-Error-on-
Thanks for all respnding.
Finally I figured out the way to use bulk load to hbase using scala on
spark.
The sample code is here which others can refer in future:
http://www.openkb.info/2015/01/how-to-use-scala-on-spark-to-load-data.html
Thanks!
On Tue, Jan 27, 2015 at 6:27 PM, Jim Green wrote
Thanks Sun.
My understanding is , savaAsNewHadoopFile is to save as Hfile on hdfs.
Is it doable to use saveAsNewAPIHadoopDataset to directly loading to hbase?
If so, is there any sample code for that?
Thanks!
On Tue, Jan 27, 2015 at 6:07 PM, fightf...@163.com
wrote:
> Hi, Jim
> Your gen
val kv = new KeyValue(rowkeyBytes,colfam,qual,value)
List(kv)
}
Thanks,
Sun
fightf...@163.com
From: Jim Green
Date: 2015-01-28 04:44
To: Ted Yu
CC: user
Subject: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile
I used below code, and it still failed with
Jim Green wrote:
> Thanks Ted. Could you give me a simple example to load one row data in
> hbase? How should I generate the KeyValue?
> I tried multiple times, and still can not figure it out.
>
> On Tue, Jan 27, 2015 at 12:10 PM, Ted Yu wrote:
>
>> Here i
Thanks Ted. Could you give me a simple example to load one row data in
hbase? How should I generate the KeyValue?
I tried multiple times, and still can not figure it out.
On Tue, Jan 27, 2015 at 12:10 PM, Ted Yu wrote:
> Here is the method signature used by HFileOutputFormat :
>
data into hbase.
> *Env:*
> hbase 0.94
> spark-1.0.2
>
> I am trying below code to just bulk load some data into hbase table “t1”.
>
> import org.apache.spark._
> import org.apache.spark.rdd.NewHadoopRDD
> import org.apache.hadoop.hbase.{HBaseConfi
Hi Team,
I need some help on writing a scala to bulk load some data into hbase.
*Env:*
hbase 0.94
spark-1.0.2
I am trying below code to just bulk load some data into hbase table “t1”.
import org.apache.spark._
import org.apache.spark.rdd.NewHadoopRDD
import org.apache.hadoop.hbase
Hello all,
When I try to read data from an HBase table, I get an unread block data
exception. I am running HBase and Spark on a single node (my
workstation). My code is in Java, and I'm running it from the Eclipse
IDE. Here are the versions I'm using :
Cloudera : 2.5.0-cdh5.2.1
Hado
getting.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-HBase-data-Exception-tp21009p21071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe
Hi, I am getting this same error. Did you figure out how to solve the
problem? Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-0-and-HBase-Snappy-UnsatisfiedLinkError-tp19827p21005.html
Sent from the Apache Spark User List mailing list
Issue resolved after updating the Hbase version to 0.98.8-hadoop2. Thanks Ted
for all the help!
For future reference: This problem has nothing to do with Spark 1.2.0 but
simply because I built Spark 1.2.0 with the wrong Hbase version.
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday
I doubt anyone would deploy hbase 0.98.x on hadoop-1
Looks like hadoop2 profile should be made the default.
Cheers
On Tue, Jan 6, 2015 at 9:49 AM, Max Xu wrote:
> Awesome. Thanks again Ted. I remember there is a block in the pom.xml
> under the example folder that default hbase vers
Awesome. Thanks again Ted. I remember there is a block in the pom.xml under the
example folder that default hbase version to hadoop1. I figured out this last
time when I built Spark 1.1.1 but forgot this time.
hbase-hadoop1
!hbase.profile
Default profile is hbase-hadoop1 so you need to specify
-Dhbase.profile=hadoop2
See SPARK-1297
Cheers
On Tue, Jan 6, 2015 at 9:11 AM, Max Xu wrote:
> Thanks Ted. You are right, hbase-site.xml is in the classpath. But
> previously I have it in the classpath too and the app works f
Thanks Ted. You are right, hbase-site.xml is in the classpath. But previously I
have it in the classpath too and the app works fine. I believe I found the
problem. I built Spark 1.2.0 myself and forgot to change the dependency hbase
version to 0.98.8-hadoop2, which is the version I use. When I
I assume hbase-site.xml is in the classpath.
Can you try the code snippet in standalone program to see if the problem
persists ?
Cheers
On Tue, Jan 6, 2015 at 6:42 AM, Max Xu wrote:
> Hi all,
>
>
>
> I have a Spark streaming application that ingests data from a Kafka topic
Hi all,
I have a Spark streaming application that ingests data from a Kafka topic and
persists received data to Hbase. It works fine with Spark 1.1.1 in YARN cluster
mode. Basically, I use the following code to persist each partition of each RDD
to Hbase:
@Override
void call
also hbase itself works ok:
hbase(main):006:0> scan 'test'ROW COLUMN+CELL
key1
column=f1:asd, timestamp=1419463092904, value=456
I am running it in yarn-client mode and I believe hbase-client is part of the
spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar which I am submitting at
launch.
adding another jstack taken during the hanging - http://pastebin.com/QDQrBw70 -
this is of the CoarseGrainedExecutorBackend
bq. "hbase.zookeeper.quorum": "localhost"
You are running hbase cluster in standalone mode ?
Is hbase-client jar in the classpath ?
Cheers
On Wed, Dec 24, 2014 at 4:11 PM, Antony Mayi wrote:
> I just run it by hand from pyspark shell. here is the steps:
>
> pyspar
e([['testkey', 'f1', 'testqual', 'testval']], 1).map(lambda x:
(x[0], x)).saveAsNewAPIHadoopDataset(... conf=conf,...
keyConverter=keyConv,... valueConverter=valueConv)
then it spills few of the INFO level messages about submitting a tas
I went over the jstack but didn't find any call related to hbase or
zookeeper.
Do you find anything important in the logs ?
Looks like container launcher was waiting for the script to return some
result:
1. at
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecR
it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi
wrote:
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing with the example from the st
2.0 saving a RDD from pyspark
> using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing
> with the example from the stock hbase_outputformat.py.
>
> anyone having same issue? (and able to solve?)
>
> using hbase 0.98.6 and yarn-client mode.
>
> thanks,
> Antony.
>
>
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve
2014-12-15 17:52:47, "Aniket Bhatnagar" wrote:
"The reason not using sc.newAPIHadoopRDD is it only support one scan each time."
I am not sure is that's true. You can use multiple scans as following:
val scanStrings = scans.map(scan => convertScanToString(scan))
conf.setSt
Please see
http://stackoverflow.com/questions/18565953/wrong-number-of-arguments-when-a-calling-function-from-class-in-python
Cheers
On Mon, Dec 22, 2014 at 8:04 PM, Antony Mayi wrote:
> using hbase 0.98.6
>
> there is no stack trace, just this short error.
>
> just noti
using hbase 0.98.6
there is no stack trace, just this short error.
just noticed it does the fallback to toString as in the message as this is what
I get back to python:
hbase_rdd.collect()
[(u'key1', u'List(cf1:12345:14567890, cf2:123:14567896)')]
so the question is why it fa
Which HBase version are you using ?
Can you show the full stack trace ?
Cheers
On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi
wrote:
> Hi,
>
> can anyone please give me some help how to write custom converter of hbase
> data to (for example) tuples of ((family, qualifier, value), )
Hi,
can anyone please give me some help how to write custom converter of hbase data
to (for example) tuples of ((family, qualifier, value), ) for pyspark:
I was trying something like (here trying to tuples of
("family:qualifier:value", )):
class HBaseResultToTupleConverter extends
I'm using JDBCRDD
<https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.rdd.JdbcRDD>
+ Hbase JDBC driver <http://phoenix.apache.org/>+ schemaRDD
<https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD>
make sure to use
Hi ,
Can someone help me , Any pointers would help.
Thanks
Subacini
On Fri, Dec 19, 2014 at 10:47 PM, Subacini B wrote:
> Hi All,
>
> Is there any API that can be used directly to write schemaRDD to HBase??
> If not, what is the best way to write schemaRDD to HBase.
>
> Thanks
> Subacini
>
Hi All,
Is there any API that can be used directly to write schemaRDD to HBase??
If not, what is the best way to write schemaRDD to HBase.
Thanks
Subacini
.1001560.n3.nabble.com/file/n20746/pom.xml>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-1-1-1-with-Hbase-0-98-8-hadoop2-and-hadoop-2-3-0-tp20746.html
> Sent from the Apache Spark User List mailing list archive a
age in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-1-1-1-with-Hbase-0-98-8-hadoop2-and-hadoop-2-3-0-tp20746.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe,
In case you are still looking for help, there has been multiple discussions
in this mailing list that you can try searching for. Or you can simply use
https://github.com/unicredit/hbase-rdd :-)
Thanks,
Aniket
On Wed Dec 03 2014 at 16:11:47 Ted Yu wrote:
> Which hbase release are you runn
rings : _*)
where convertScanToString is implemented as:
/**
* Serializes a HBase scan into string.
* @param scan Scan to serialize.
* @return Base64 encoded serialized scan.
*/
private def convertScanToString(scan: Scan) = {
val proto: ClientProtos.Scan = ProtobufUtil.toScan(scan)
Base64.encodeBytes
ible put all
> rowkeys
> > into HBaseConfiguration
> > Option 2:
> > sc.newAPIHadoopRDD(conf, classOf[MultiTableInputFormat],
> > classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
> > classOf[org.apache.hadoop.hbase.client.Result])
> >
> >
ges into several parts then use option 2, but I
> prefer option 1. So is there any solution for option 1?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-issue-when-using-HBase-with-Spark-tp2065
Can you paste the complete code? it looks like at some point you are
passing a hadoop's configuration which is not Serializable. You can look at
this thread for similar discussion
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-into-HBase-td13378.html
Thanks
Best Regard
.1001560.n3.nabble.com/Serialization-issue-when-using-HBase-with-Spark-tp20655.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional co
Hello, you can have a look at this project hbase-rdd
<https://github.com/unicredit/hbase-rdd> that provides a simple method to
bulk load an rdd to HBase.
fralken
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.htm
-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396p20417.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
the size of 3 columns are very small, probably less than 100 bytes.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396p20414.html
Sent from the Apache Spark User List mailing list archi
ou run other
> stuff in the background?
>
> Best regards
> Am 04.12.2014 23:57 schrieb "bonnahu" :
>
>> I am trying to load a large Hbase table into SPARK RDD to run a SparkSQL
>> query on the entity. For an entity with about 6 million rows, it will take
>>
Hi,
What is your cluster setup? How mich memory do you have? How much space
does one row only consisting of the 3 columns consume? Do you run other
stuff in the background?
Best regards
Am 04.12.2014 23:57 schrieb "bonnahu" :
> I am trying to load a large Hbase table into SPARK
I am trying to load a large Hbase table into SPARK RDD to run a SparkSQL
query on the entity. For an entity with about 6 million rows, it will take
about 35 seconds to load it to RDD. Is it expected? Is there any way to
shorten the loading process? I have been getting some tips from
http
Which hbase release are you running ?
If it is 0.98, take a look at:
https://issues.apache.org/jira/browse/SPARK-1297
Thanks
On Dec 2, 2014, at 10:21 PM, Jai wrote:
> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regardi
You could go through these to start with
http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase
http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark
Thanks
Best Regards
On Wed, Dec 3, 2014 at
I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
Cluster and I am looking for some links regarding the same. Can someone
please guide me through the steps to accomplish this. Thanks a lot for
Helping
--
View this message in context:
http://apache-spark-user-list.1001560
Hi all,
I am new to Spark and currently I am trying to run a SparkSQL query on HBase
entity. For an entity with about 4000 rows, it will take about 12 seconds.
Is it expected? Is there any way to shorten the query process?
Here is the code snippet:
SparkConf sparkConf = new
SparkConf
a:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
This exception occurs at the line "val peopleRows = new NewHadoopRDD” when try
to read rows from HBase (0.98)
;
}
return CellUtil.cloneValue(cells[0]);
This explains why you only got one row.
In the thread you mentioned, see the code posted by freedafeng which
iterates the Cells in Result.
Cheers
On Wed, Nov 12, 2014 at 1:04 PM, Ted Yu wrote:
> To my knowledge, Spark 1.1 comes with HBase 0.94
> To u
To my knowledge, Spark 1.1 comes with HBase 0.94
To utilize HBase 0.98, you will need:
https://issues.apache.org/jira/browse/SPARK-1297
You can apply the patch and build Spark yourself.
Cheers
On Wed, Nov 12, 2014 at 12:57 PM, Alan Prando wrote:
> Hi Ted! Thanks for anwsering...
>
>
Can you give us a bit more detail:
hbase release you're using.
whether you can reproduce using hbase shell.
I did the following using hbase shell against 0.98.4:
hbase(main):001:0> create 'test', 'f1'
0 row(s) in 2.9140 seconds
=> Hbase::Table - test
hbase(main)
Hi all,
I'm trying to read an hbase table using this an example from github (
https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py),
however I have two qualifiers in a column family.
Ex.:
ROW COLUMN+CELL row1 column=f1:1, timestamp=1401883411986,
Hi Nick,
I saw the HBase api has experienced lots of changes. If I remember
correctly, the default hbase in spark 1.1.0 is 0.94.6. The one I am using is
0.98.1. To get the column family names and qualifier names, we need to call
different methods for these two different versions. I don't kno
a PR probably later today.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18744.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
d you write
> HBaseResultToStringConverter to do what you wanted it to do?
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18650.html
> Sent from the Apache Spark User Li
e.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For addit
hello,all
I have tested reading Hbase table with spark1.1 using
SparkContext.newAPIHadoopRDD.I found the performance is much slower than
reading from HIVE.I also try read data using HFileScanner on one region
HFile,but the performance is not good.So,How do I improve performance spark
reading
just wrote a custom convert in scala to replace HBaseResultToStringConverter.
Just couple of lines of code.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18639.html
Sent from the
e is a big limitation. Converting from the
'list()' from the 'Result' is more general and easy to use.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18619.html
Sent from the
Hello there,
I am wondering how to get the column family names and column qualifier names
when using pyspark to read an hbase table with multiple column families.
I have a hbase table as follows,
hbase(main):007:0> scan 'data1'
ROW
Maybe I should create a private AMI to use for my question No.1? Assuming I
use the default instance type as the base image.. Anyone tried this?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/EC2-cluster-set-up-and-access-to-HBase-in-a-different-cluster
The plan is to create an EC2 cluster and run the (py) spark on it. Input data
is from s3, output data goes to an hbase in a persistent cluster (also EC2).
My questions are:
1. I need to install some software packages on all the workers (sudo apt-get
install ...). Is there a better way to do this
ent: Thursday, October 16, 2014 12:50:01 AM
Subject: Re: How to add HBase dependencies and conf with spark-submit?
Thanks, Soumitra Kumar,
I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add
hbase-protocol.jar , hbase-common.jar , hbase-client.jar , htrace-core.jar i
Thanks, Soumitra Kumar,
I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add
hbase-protocol.jar, hbase-common.jar, hbase-client.jar, htrace-core.jar in
--jar, but it did work.
Actually, I put all these four jars in SPARK_CLASSPATH along with HBase conf
directory.
2014-10
I am writing to HBase, following are my options:
export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar
spark-submit \
--jars
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib
+user@hbase
2014-10-15 20:48 GMT+08:00 Fengyun RAO :
> We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an
> HBase table in Spark program.
>
> I notice there are:
> spark.driver.extraClassPath
> spark.executor.extraClassPathproperties to manage extra Clas
We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an
HBase table in Spark program.
I notice there are:
spark.driver.extraClassPath
spark.executor.extraClassPathproperties to manage extra ClassPath, over
even an deprecated SPARK_CLASSPATH.
The problem is what classpath or
Sean,
I did specify the number of cores to use as follows:
... ...
val sparkConf = new SparkConf()
.setAppName("<<< Reading HBase >>>")
.set("spark.cores.max", "32")
val sc = new SparkContext(sparkConf)
... ...
But that d
You do need to specify the number of executor cores to use. Executors are
not like mappers. After all they may do much more in their lifetime than
just read splits from HBase so would not make sense to determine it by
something that the first line of the program does.
On Oct 8, 2014 8:00 AM, &quo
Hi Sean,
Do I need to specify the number of executors when submitting the job? I
suppose the number of executors will be determined by the number of regions
of the table. Just like a MapReduce job, you needn't specify the number of
map tasks when reading from a HBase table.
The scri
How did you run your program? I don't see from your earlier post that
you ever asked for more executors.
On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao wrote:
> I found the reason why reading HBase is too slow. Although each
> regionserver serves multiple regions for the table I'm rea
I found the reason why reading HBase is too slow. Although each
regionserver serves multiple regions for the table I'm reading, the number
of Spark workers allocated by Yarn is too low. Actually, I could see that
the table has dozens of regions spread over about 20 regionservers, but
onl
-Dhadoop.version=2.4.1
-DskipTests clean package
Now everything is ok.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-hbase-0-98-6-hadoop2-version-py4j-protocol-Py4JJavaError-java-lang-ClassNotFoundException-tp15668p15778.html
Sent from the
forgot to copy user list
On Sat, Oct 4, 2014 at 3:12 PM, Nick Pentreath
wrote:
> what version did you put in the pom.xml?
>
> it does seem to be in Maven central:
> http://search.maven.org/#artifactdetails%7Corg.apache.hbase%7Chbase%7C0.98.6-hadoop2%7Cpom
>
>
> org.apa
Hi,
I installed hbase-0.98.6-hadoop2. It's working not any problem with that.
When i am try to run spark hbase python examples, (wordcount examples
working - not python issue)
./bin/spark-submit --master local --driver-class-path
./examples/target/spark-examples_2.10-1.1.0.jar
./exa
Hi,
I installed hbase-0.98.6-hadoop2. It's working not any problem with that.
When i am try to run spark hbase python examples, (wordcount examples
working - not python issue)
./bin/spark-submit --master local --driver-class-path
./examples/target/spark-examples_2.10-1.1.0.jar
./exa
2014 at 9:34 AM, Vladimir Rodionov <
> vrodio...@splicemachine.com> wrote:
>
>> Using TableInputFormat is not the fastest way of reading data from HBase.
>> Do not expect 100s of Mb per sec. You probably should take a look at M/R
>> over HBase snapshots.
>>
As far as I know, that feature is not in CDH 5.0.0
FYI
On Wed, Oct 1, 2014 at 9:34 AM, Vladimir Rodionov <
vrodio...@splicemachine.com> wrote:
> Using TableInputFormat is not the fastest way of reading data from HBase.
> Do not expect 100s of Mb per sec. You probably should take a
Using TableInputFormat is not the fastest way of reading data from HBase.
Do not expect 100s of Mb per sec. You probably should take a look at M/R
over HBase snapshots.
https://issues.apache.org/jira/browse/HBASE-8369
-Vladimir Rodionov
On Wed, Oct 1, 2014 at 8:17 AM, Tao Xiao wrote:
> I
; This would show whether the slowdown is in HBase code or somewhere else.
>
> Cheers
>
> On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao
> wrote:
>
>> I checked HBase UI. Well, this table is not completely evenly spread
>> across the nodes, but I think to some extent it c
501 - 600 of 777 matches
Mail list logo