Can you launch a job which exercises TableInputFormat on the same table
without using Spark ?
This would show whether the slowdown is in HBase code or somewhere else.
Cheers
On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao wrote:
> I checked HBase UI. Well, this table is not completely eve
I checked HBase UI. Well, this table is not completely evenly spread across
the nodes, but I think to some extent it can be seen as nearly evenly
spread - at least there is not a single node which has too many regions.
Here is a screenshot of HBase UI
<http://imgbin.org/index.php?page=image
HBase TableInputFormat creates input splits one per each region. You can
not achieve high level of parallelism unless you have 5-10 regions per RS
at least. What does it mean? You probably have too few regions. You can
verify that in HBase Web UI.
-Vladimir Rodionov
On Mon, Sep 29, 2014 at 7:21
Hi, Tao,
When I used newAPIHadoopRDD (Accumulo not HBase) I found that I had to
specify executor-memory and num-executors explicitly on the command line or
else I didn't get any parallelism across the cluster.
I used --executor-memory 3G --num-executors 24 but obviously other
parameters wi
can you look at your HBase UI to check whether your job is just reading from a
single region server?
Best,
--
Nan Zhu
On Monday, September 29, 2014 at 10:21 PM, Tao Xiao wrote:
> I submitted a job in Yarn-Client mode, which simply reads from a HBase table
> containing tens of milli
Are the regions for this table evenly spread across nodes in your cluster ?
Were region servers under (heavy) load when your job ran ?
Cheers
On Mon, Sep 29, 2014 at 7:21 PM, Tao Xiao wrote:
> I submitted a job in Yarn-Client mode, which simply reads from a HBase
> table containing t
I submitted the job in Yarn-Client mode using the following script:
export
SPARK_JAR=/usr/games/spark/xt/spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar
export HADOOP_CLASSPATH=$(hbase classpath)
export
CLASSPATH=$CLASSPATH:/usr/games/spark/xt/SparkDemo-0.0.1-SNAPSHOT.jar:/usr/games
I submitted a job in Yarn-Client mode, which simply reads from a HBase
table containing tens of millions of records and then does a *count *action.
The job runs for a much longer time than I expected, so I wonder whether it
was because the data to read was too much. Actually, there are 20 nodes in
/examples/pythonconverters/HBaseConverters.scala
>
> Cheers
>
> On Wed, Sep 24, 2014 at 9:39 AM, Madabhattula Rajesh Kumar <
> mrajaf...@gmail.com> wrote:
>
>> Hi Team,
>>
>> Could you please point me the example program for Spark HBase to read
>> columns and values
>>
>> Regards,
>> Rajesh
>>
>
>
Cheers
On Wed, Sep 24, 2014 at 9:39 AM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi Team,
>
> Could you please point me the example program for Spark HBase to read
> columns and values
>
> Regards,
> Rajesh
>
Hi Team,
Could you please point me the example program for Spark HBase to read
columns and values
Regards,
Rajesh
(For that time, my program did
not include the HBase export task.)
BTW, I use Spark 1.0.0.
Thank you.
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, September 22, 2014 6:26 PM
To: innowireless TaeYun Kim
Cc: user
Subject: Re: Bulk-load to HBase
On Mon, S
On Mon, Sep 22, 2014 at 10:21 AM, innowireless TaeYun Kim
wrote:
> I have to merge the byte[]s that have the same key.
> If merging is done with reduceByKey(), a lot of intermediate byte[]
> allocation and System.arraycopy() is executed, and it is too slow. So I had
> to resort to groupByKey(),
.
And in fact it actually worked well when I implemented the same process with
HBase Put class.
So, I assume that it is not the problem.
WithIndex is for excluding the record for the first partition.
I could remove the record after collect()and sort(), but it was easier.
I think that the problem is
, the number of regions is fairly small for the RDD, and the size
> of a region is big.
> This is intentional since the reasonable size of a HBase region is several GB.
> But, for Spark, it is too big for a partition that can be handled for an
> executor.
> I thought mapPartitionsWith
first record of each partitions.
// First partition's first record is excluded, since it's not needed.
.collect();
Collections.sort(splitKeys);
// Now we have the split keys
// Create a HBase table
createHBaseTableWithSplitKeys(tableName, splitKey
new HTable(conf, "output")
HFileOutputFormat.configureIncrementalLoad (job, table);
saveAsNewAPIHadoopFile("hdfs://localhost.localdomain:8020/user/cloudera/spark",
classOf[ImmutableBytesWritable], classOf[Put], classOf[HFileOutputF
lot of data to be uploaded to HBase and also, I didn't want to
> take the pain of importing generated HFiles into HBase. Is there a way to
> invoke HBase HFile import batch script programmatically?
>
> On 19 September 2014 17:58, innowireless TaeYun Kim <
> taeyun@innowirel
Agreed that the bulk import would be faster. In my case, I wasn't expecting
a lot of data to be uploaded to HBase and also, I didn't want to take the
pain of importing generated HFiles into HBase. Is there a way to invoke
HBase HFile import batch script programmatically?
On 19 Septemb
: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Friday, September 19, 2014 9:20 PM
To: user@spark.apache.org
Subject: RE: Bulk-load to HBase
Thank you for the example code.
Currently I use foreachPartition() + Put(), but your example code can be used
to clean up my code
Thank you for the example code.
Currently I use foreachPartition() + Put(), but your example code can be used
to clean up my code.
BTW, since the data uploaded by Put() goes through normal HBase write path, it
can be slow.
So, it would be nice if bulk-load could be used, since it
t found saveAsNewAPIHadoopDataset.
>
> Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there
> any example code for that?
>
>
>
> Thanks.
>
>
>
> *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
> *Sent:* Friday, September 19,
@spark.apache.org
Subject: RE: Bulk-load to HBase
Hi,
After reading several documents, it seems that saveAsHadoopDataset cannot
use HFileOutputFormat.
It's because saveAsHadoopDataset method uses JobConf, so it belongs to the
old Hadoop API, while HFileOutputFormat is a member of mapr
Am I right?
If so, is there another method to bulk-load to HBase from RDD?
Thanks.
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Friday, September 19, 2014 7:17 PM
To: user@spark.apache.org
Subject: Bulk-load to HBase
Hi,
Is there a way to bulk-load to
Hi,
Is there a way to bulk-load to HBase from RDD?
HBase offers HFileOutputFormat class for bulk loading by MapReduce job, but
I cannot figure out how to use it with saveAsHadoopDataset.
Thanks.
On 14.09.2014 19:21, Reinis Vicups wrote:
>> I did actually try Seans suggestion just before I posted for the first time
>> in this thread. I got an error when doing this and thought that I am not
>> understanding what Sean was suggesting.
>>
>> Now I re-attempted yo
m not understanding what Sean was suggesting.
Now I re-attempted your suggestions with spark 1.0.0-cdh5.1.0, hbase
0.98.1-cdh5.1.0 and hadoop 2.3.0-cdh5.1.0 I am currently using.
I used following:
val mortbayEnforce = "org.mortbay.jetty" % "servlet-api" %
"3.0.20
Yes that was very helpful… ☺
Here are a few more I found on my quest to get HBase working with Spark –
This one details about Hbase dependencies and spark classpaths
http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html
This one has a code overview –
http://www.abcn.net/2014
Btw, there are some examples in the Spark GitHub repo that you may find
helpful. Here's one
<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala>
related to HBase.
On Tue, Sep 16, 2014 at 1:22 PM, wrote:
> *Hi, *
&g
Hi,
I had a similar situation in which I needed to read data from HBase and work
with the data inside of a spark context. After much ggling, I finally got
mine to work. There are a bunch of steps that you need to do get this working -
The problem is that the spark context does not know
hbase-client module serves client facing APIs.
hbase-server module is supposed to host classes used on server side.
There is still some work to be done so that the above goal is achieved.
On Tue, Sep 16, 2014 at 9:06 AM, Y. Dong wrote:
> Thanks Ted. It is indeed in hbase-server. Just curi
bq. TableInputFormat does not even exist in hbase-client API
It is in hbase-server module.
Take a look at http://hbase.apache.org/book.html#mapreduce.example.read
On Tue, Sep 16, 2014 at 8:18 AM, Y. Dong wrote:
> Hello,
>
> I’m currently using spark-core 1.1 and hbase 0.98.5 and
Hello,
I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read
from hbase. The Java code is attached. However the problem is TableInputFormat
does not even exist in hbase-client API, is there any other way I can read from
hbase? Thanks
SparkConf sconf = new SparkConf
I did actually try Seans suggestion just before I posted for the first
time in this thread. I got an error when doing this and thought that I
am not understanding what Sean was suggesting.
Now I re-attempted your suggestions with spark 1.0.0-cdh5.1.0, hbase
0.98.1-cdh5.1.0 and hadoop 2.3.0
sparkConf = new SparkConf().setAppName("HBaseTest")
> | val sc = new SparkContext(sparkConf)
> | val conf = HBaseConfiguration.create()
> | // Other options for configuring scan behavior are available. More
> information available at
> | //
> http://
hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html
| conf.set(TableInputFormat.INPUT_TABLE, args(0))
| // Initialize hBase table if necessary
| val admin = new HBaseAdmin(conf)
| if (!admin.isTableAvailable(args(0))) {
| val tableDesc =
building-with-maven.md
>> |+++ docs/building-with-maven.md
>> --
>> File to patch:
>>
>>
>>
>>
>>
>>
>> Please advise.
>> Regards
>> Arthur
>>
>>
>>
>> On 14 Sep, 2014, at 10:48 p
;> |--- docs/building-with-maven.md
>> |+++ docs/building-with-maven.md
>> --
>> File to patch:
>>
>>
>>
>>
>>
>>
>> Please advise.
>> Regards
>> Arthur
>>
>>
>>
>> On
patch:
>
>
>
>
>
>
> Please advise.
> Regards
> Arthur
>
>
>
> On 14 Sep, 2014, at 10:48 pm, Ted Yu wrote:
>
>> Spark examples builds against hbase 0.94 by default.
>>
>> If you want to run against 0.98, see:
>> SPARK-1297
t;
>
> Please advise.
> Regards
> Arthur
>
>
>
> On 14 Sep, 2014, at 10:48 pm, Ted Yu wrote:
>
> Spark examples builds against hbase 0.94 by default.
>
> If you want to run against 0.98, see:
> SPARK-1297 https://issues.apache.org/jira/browse/SPARK-1297
>
&
\
+To build against HBase 0.98.x releases, "hbase-hadoop1" is the default
profile. This means hbase-0.98.x-hadoop1 would be used.\
+When building against hadoop-2, "hbase-hadoop2" profile should be specified.\
+\
Examples:\
\
\{% highlight bash %\}\
diff --git examples/pom.xm
Spark examples builds against hbase 0.94 by default.
If you want to run against 0.98, see:
SPARK-1297 https://issues.apache.org/jira/browse/SPARK-1297
Cheers
On Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com <
arthur.hk.c...@gmail.com> wrote:
> Hi,
>
> I have
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
:31: error: object hbase is not a member of package org.apache.hadoop
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
Regards
Arthur
some other jar, you would
have to exclude it from your build.
Hope it helps.
Thanks,
Aniket
On 12 September 2014 02:21, wrote:
> Thank you, Aniket for your hint!
>
> Alas, I am facing really "hellish" situation as it seems, because I have
> integration tests using BOTH
This was already answered at the bottom of this same thread -- read below.
On Thu, Sep 11, 2014 at 9:51 PM, wrote:
> class "javax.servlet.ServletRegistration"'s signer information does not
> match signer information of other classes in the same package
> java.lang.SecurityException: class "javax
Thank you, Aniket for your hint!
Alas, I am facing really "hellish" situation as it seems, because I have
integration tests using BOTH spark and HBase (Minicluster). Thus I get either:
class "javax.servlet.ServletRegistration"'s signer information does not match
si
Dependency hell... My fav problem :).
I had run into a similar issue with hbase and jetty. I cant remember thw
exact fix, but is are excerpts from my dependencies that may be relevant:
val hadoop2Common = "org.apache.hadoop" % "hadoop-common" % hadoo
Hi guys,
any luck with this issue, anyone?
I aswell tried all the possible exclusion combos to a no avail.
thanks for your ideas
reinis
-Original-Nachricht-
> Von: "Stephen Boesch"
> An: user
> Datum: 28-06-2014 15:12
> Betreff: Re: HBase 0.96+ with Spar
kka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(Fo
sk(ForkJoinPool.java:1339)
>>
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>> Basically, what I am doing on
Hi,
I am using the following code to write data to hbase. I see the jobs are
send off but I never get anything in my hbase database. Spark doesn't throw
any error? How can such a problem be debugged. Is the code below correct for
writing data to hbase?
val conf = HBaseConfiguration.c
run nc -lk, I type in
> words separated by commas and hit enter i.e. "bill,ted".
>
>
> On Wed, Sep 3, 2014 at 2:36 PM, Ted Yu wrote:
>
>> Adding back user@
>>
>> I am not familiar with the NotSerializableException. Can you show the
>> full stack trace ?
>>
>>
hit enter i.e. "bill,ted".
On Wed, Sep 3, 2014 at 2:36 PM, Ted Yu wrote:
> Adding back user@
>
> I am not familiar with the NotSerializableException. Can you show the
> full stack trace ?
>
> See SPARK-1297 for changes you need to make so that Spark works with
>
approach I should be using?
Thanks for the help.
On Wed, Sep 3, 2014 at 2:43 PM, Sean Owen-2 [via Apache Spark User List] <
ml-node+s1001560n13385...@n3.nabble.com> wrote:
> This doesn't seem to have to do with HBase per se. Some function is
> getting the StreamingContext into the
This doesn't seem to have to do with HBase per se. Some function is
getting the StreamingContext into the closure and that won't work. Is
this exactly the code? since it doesn't reference a StreamingContext,
but is there maybe a different version in reality that tries to use
S
Adding back user@
I am not familiar with the NotSerializableException. Can you show the full
stack trace ?
See SPARK-1297 for changes you need to make so that Spark works with hbase
0.98
Cheers
On Wed, Sep 3, 2014 at 2:33 PM, Kevin Peng wrote:
> Ted,
>
> The hbase-site.xml
I have been trying to understand how spark streaming and hbase connect, but
have not been successful. What I am trying to do is given a spark stream,
process that stream and store the results in an hbase table. So far this is
what I have:
import org.apache.spark.SparkConf
import
t;> Hi,
>>
>> tried
>> mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests
>> dependency:tree > dep.txt
>>
>> Attached the dep. txt for your information.
>>
>>
>> Regards
>> Arthur
>>
>> On 28 Aug, 2014
"0.98.2" is not an HBase version, but "0.98.2-hadoop2" is:
http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.hbase%22%20AND%20a%3A%22hbase%22
On Thu, Aug 28, 2014 at 2:54 AM, arthur.hk.c...@gmail.com <
arthur.hk.c...@gmail.com> wrote:
> Hi,
>
>
> dep.txt
>
> Attached the dep. txt for your information.
>
>
> Regards
> Arthur
>
> On 28 Aug, 2014, at 12:22 pm, Ted Yu wrote:
>
>> I forgot to include '-Dhadoop.version=2.4.1' in the command below.
>>
>> The modified command passe
Attached the dep. txt for your information.
>
>
> Regards
> Arthur
>
> On 28 Aug, 2014, at 12:22 pm, Ted Yu wrote:
>
> I forgot to include '-Dhadoop.version=2.4.1' in the command below.
>
> The modified command passed.
>
> You can verify the dependence
rsion=2.4.1' in the command below.The modified command passed.You can verify the dependence on hbase 0.98 through this command:
mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests dependency:tree > dep.txtCheersOn Wed, Aug 27, 2014 at 8:58 PM, Ted Yu <yuzhih...@gmail.com>
I forgot to include '-Dhadoop.version=2.4.1' in the command below.
The modified command passed.
You can verify the dependence on hbase 0.98 through this command:
mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests
dependency:tree > dep.txt
Cheers
On Wed, Aug
e pom.xml.rej
>> patching file pom.xml
>> Hunk #1 FAILED at 54.
>> Hunk #2 FAILED at 72.
>> Hunk #3 FAILED at 171.
>> 3 out of 3 hunks FAILED -- saving rejects to file pom.xml.rej
>> can't find file to patch at input line 267
>> Perhaps you should hav
s was:
> --
> |
> |From cd58437897bf02b644c2171404ccffae5d12a2be Mon Sep 17 00:00:00 2001
> |From: tedyu
> |Date: Mon, 11 Aug 2014 15:57:46 -0700
> |Subject: [PATCH 3/4] SPARK-1297 Upgrade HBase dependency to 0.98 - add
> | description to building-with-
aving rejects to file pom.xml.rej
> can't find file to patch at input line 267
> Perhaps you should have used the -p or --strip option?
> The text leading up to this was:
> --
> |
> |From cd58437897bf02b644c2171404ccffae5d12a2be Mon Sep 17 00:00:00 2001
>
m: tedyu
|Date: Mon, 11 Aug 2014 15:57:46 -0700
|Subject: [PATCH 3/4] SPARK-1297 Upgrade HBase dependency to 0.98 - add
| description to building-with-maven.md
|
|---
| docs/building-with-maven.md | 3 +++
| 1 file changed, 3 insertions(+)
|
|diff --git a/docs/building-with-maven.md b/docs/build
:
> https://github.com/apache/spark/pull/1893
>
>
> On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com <
> arthur.hk.c...@gmail.com> wrote:
>
>> (correction: "Compilation Error: Spark 1.0.2 with HBase 0.98” , please
>> ignore if duplicated)
>>
&g
e/spark/pull/1893
>
>
> On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com
> wrote:
> (correction: "Compilation Error: Spark 1.0.2 with HBase 0.98” , please
> ignore if duplicated)
>
>
> Hi,
>
> I need to use Spark with HBase 0.98 and tried
See SPARK-1297
The pull request is here:
https://github.com/apache/spark/pull/1893
On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com <
arthur.hk.c...@gmail.com> wrote:
> (correction: "Compilation Error: Spark 1.0.2 with HBase 0.98” , please
> ignore if duplicated)
>
(correction: "Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore
if duplicated)
Hi,
I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase
0.98,
My steps:
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz
tar -vxf spark-1.0.2.tgz
cd
Hi,
I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase
0.98,
My steps:
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz
tar -vxf spark-1.0.2.tgz
cd spark-1.0.2
edit project/SparkBuild.scala, set HBASE_VERSION
// HBase version; set as appropriate.
val
It looks like the issue I had is that I didn't pull in htrace-core jar into
the spark class path.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-Connecting-to-HBase-in-spark-shell-tp12855p12924.html
Sent from the Apache Spark User List mailing
gt;> But, do you know how I am supposed to set that table name on the jobConf?
>> I
>> don't have access to that object from my client driver?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.
that object from my client driver?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-connecting-from-Spark-to-a-Hive-table-backed-by-HBase-tp12284p12331.html
> Sent from the Apache Spark User L
-from-Spark-to-a-Hive-table-backed-by-HBase-tp12284p12331.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: u
Looks like hbaseTableName is null, probably caused by incorrect configuration.
String hbaseTableName = jobConf.get(HBaseSerDe.HBASE_TABLE_NAME);
setHTable(new HTable(HBaseConfiguration.create(jobConf),
Bytes.toBytes(hbaseTableName)));
Here is the definition.
public static final Strin
014 at 12:00 AM, Akhil Das
>> wrote:
>>
>>> Looks like your hiveContext is null. Have a look at this documentation.
>>> <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables>
>>>
>>> Thanks
>>> Best Regards
>>
egards
>>
>>
>> On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo <
>> ce...@zephyrhealthinc.com> wrote:
>>
>>> Hello:
>>>
>>> I am trying to setup Spark to connect to a Hive table which is backed by
>>> HBase, but I am running
PM, Cesar Arevalo > wrote:
>
>> Hello:
>>
>> I am trying to setup Spark to connect to a Hive table which is backed by
>> HBase, but I am running into the following NullPointerException:
>>
>> scala> val hiveCount = hiveContext.sql("select count(*) f
to a Hive table which is backed by
> HBase, but I am running into the following NullPointerException:
>
> scala> val hiveCount = hiveContext.sql("select count(*) from
> dataset_records").collect().head.getLong(0)
> 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select
Hello:
I am trying to setup Spark to connect to a Hive table which is backed by
HBase, but I am running into the following NullPointerException:
scala> val hiveCount = hiveContext.sql("select count(*) from
dataset_records").collect().head.getLong(0)
14/08/18 06:34:29 INFO ParseDr
ad of
>> TableInputFormat.class.
>>
>> Cheers
>>
>>
>> On Wed, Aug 6, 2014 at 5:54 AM, Amit Singh Hora <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=11651&i=1>> wrote:
>>
>>> Hi All,
>>>
>>>
erything
worked.
Best Regards,
Jiajia
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Input-from-Kafka-output-to-HBase-tp8686p11732.html
Sent from the Apache Spark User List mailing list archiv
per? did
> Kafka and HBase share the same zookeeper and port? If not, did u set a
> right config for Hbase job? -- Khanderao
>
>
> On Wed, Jul 2, 2014 at 4:12 PM, JiajiaJing wrote:
>
>> Hi,
>>
>> I am trying to write a program that take input from kafka topics, do
es classOf[TableInputFormat] instead of
> TableInputFormat.class.
>
> Cheers
>
>
> On Wed, Aug 6, 2014 at 5:54 AM, Amit Singh Hora
> wrote:
>
>> Hi All,
>>
>> I am trying to run a SQL query on HBase using spark job ,till now i am
>> able
>> to get t
this two posts should be good for setting up spark+hbase environment and use
the results of hbase table scan as RDD
settings
http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html
some samples:
http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html
--
View
( the version is quiet old)
Attached is a piece of Code (Spark Java API) to connect to HBase.
Thanks
Best Regards
On Thu, Aug 7, 2014 at 1:48 PM, Deepa Jayaveer
wrote:
> Hi
> I read your white paper about " " . We wanted to do a Proof of Concept on
> Spark with HBase.
I hope this has been resolved, were u connected to right zookeeper? did
Kafka and HBase share the same zookeeper and port? If not, did u set a
right config for Hbase job? -- Khanderao
On Wed, Jul 2, 2014 at 4:12 PM, JiajiaJing wrote:
> Hi,
>
> I am trying to write a program that t
Hi
I read your white paper about " " . We wanted to do a Proof of Concept on
Spark with HBase. Documents
are not much available to set up the spark cluster in Hadoop 2
environment. If you have any,
can you please give us some reference URLs
Also, some sample program to connect to H
Hi All,
I am trying to run a SQL query on HBase using spark job ,till now i am able
to get the desierd results but as the data set size increases Spark job is
taking a long time
I believe i am doing something wrong,as after going through documentation
and videos discussing on spark performance
n.Seq;
>
> import scala.collection.JavaConverters.*;
>
> import scala.reflect.ClassTag;
>
> public class SparkHBaseMain {
>
> @SuppressWarnings("deprecation")
>
> public static void main(String[] arg){
>
> try{
>
> List jars =
>>
ot;,
"/home/akhld/Downloads/sparkhbasecode/hbase-server-0.96.0-hadoop2.jar",
"/home/akhld/Downloads/sparkhbasecode/hbase-protocol-0.96.0-hadoop2.jar",
"/home/akhld/Downloads/sparkhbasecode/hbase-hadoop2-compat-0.96.0-hadoop2.jar",
"/home/akhld/Downloads/sparkhbas
;post".getBytes(),
>> "title".getBytes());*
>> for(KeyValue kl:kvl){
>> String sb = new String(kl.getValue());
>> System.out.println(sb);
>> }
>
>
>
>
> Thanks
> Best Regards
>
>
> On Thu, Jul 31, 2014 at 10:19 PM, Madabhattula
tle".getBytes());*
> for(KeyValue kl:kvl){
> String sb = new String(kl.getValue());
> System.out.println(sb);
> }
Thanks
Best Regards
On Thu, Jul 31, 2014 at 10:19 PM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi Team,
>
> I
Hi Team,
I'm using below code to read table from hbase
Configuration conf = HBaseConfiguration.create();
conf.set(TableInputFormat.INPUT_TABLE, "table1");
JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(
conf,
TableInputFormat.class,
ImmutableBytesWritable.class,
Hi Team,
Could you please let me know example program/link for JavaSparkSql to join
2 Hbase tables.
Regards,
Rajesh
Hi,
I want to use Spark with HBase and I'm confused about how to ingest my data
using HBase' HFileOutputFormat. It recommends calling
configureIncrementalLoad which does the following:
- Inspects the table to configure a total order partitioner
- Uploads the partitions file to t
Hi Rajesh,
I saw : Warning: Local jar /home/rajesh/hbase-0.96.1.1-hadoop2/lib/hbase
-client-0.96.1.1-hadoop2.jar, does not exist, skipping.
in your log.
I believe this jar contains the HBaseConfiguration. I'm not sure what went
wrong in your case but can you try without spaces in --jars
Hi Team,
Now i've changed my code and reading configuration from hbase-site.xml
file(this file is in classpath). When i run this program using : mvn
exec:java
-Dexec.mainClass="com.cisco.ana.accessavailability.AccessAvailability". It
is working fine. But when i run this program fr
601 - 700 of 777 matches
Mail list logo