icated transactional logs (as SQL statements) of out of database for
all updates and store them in Hbase, then one can go through the Hbase data
with Spark.
In the past this was prohibitive using database audit as it had heavy price
on the RDBMS performance. However, with Big Data this can be done through
th
On 18 October 2016 at 08:18, Jörn Franke wrote:
> Careful Hbase with Phoenix is only in certain scenarios faster. When it is
> about processing small amounts out of a bigger amount of data (depends on
> node memory, the operation etc). Hive+tez+orc can be rather competitive,
> llap ma
Careful Hbase with Phoenix is only in certain scenarios faster. When it is
about processing small amounts out of a bigger amount of data (depends on node
memory, the operation etc). Hive+tez+orc can be rather competitive, llap
makes sense for interactive ad-hoc queries that are rather similar
()) AS timestamp)
FROM ${DATABASE}.externalMarketData
That works fine. However, Hbase is much faster for data retrieval with
phoenix
When we get Hive with LLAP, I gather Hive will replace Hbase.
So in summary we have
1. raw data delivered to HDFS
2. data ingested into Hbase via cron
3. HDFS
I do not see a rationale to have hbase in this scheme of thingsmay be I
am missing something?
If data is delivered in HDFS, why not just add partition to an existing
Hive table?
On Tue, Oct 18, 2016 at 8:23 AM, Mich Talebzadeh
wrote:
> Thanks Mike,
>
> My test csv data comes as
Its a quasi columnar store.
Sort of a hi-bred approach.
On Oct 17, 2016, at 4:30 PM, Mich Talebzadeh
mailto:mich.talebza...@gmail.com>> wrote:
I assume that Hbase is more of columnar data store by virtue of it storing
column data together.
many interpretation of this is all over
I assume that Hbase is more of columnar data store by virtue of it storing
column data together.
many interpretation of this is all over places. However, it is not columnar
in a sense of column based (as opposed to row based) implementation of
relational model.
Dr Mich Talebzadeh
LinkedIn
,
86.31917515824627016510
5f4e3a9d-05cc-41a2-98b3-40810685641e, S03, 2016-10-17T22:02:09,
95.48298277703729129559
And this is my Hbase table with one column family
create 'marketDataHbase', 'price_info'
It is populated every 15 minutes from test.csv files delivered via Kafka
and Flume to HDFS
ber Big Data isn’t relational its more of a hierarchy model or record
> model. Think IMS or Pick (Dick Pick’s revelation, U2, Universe, etc …)
>
>
>> On Oct 17, 2016, at 3:45 PM, Jörn Franke wrote:
>>
>> It has some implication because it imposes the SQL model on Hbas
e:
It has some implication because it imposes the SQL model on Hbase. Internally
it translates the SQL queries into custom Hbase processors. Keep also in mind
for what Hbase need a proper key design and how Phoenix designs those keys to
get the best performance out of it. I think for oltp i
Skip Phoenix
On Oct 17, 2016, at 2:20 PM, Thakrar, Jayesh
mailto:jthak...@conversantmedia.com>> wrote:
Ben,
Also look at Phoenix (Apache project) which provides a better (one of the best)
SQL/JDBC layer on top of HBase.
http://phoenix.apache.org/
Cheers,
Jayesh
From: vincent groma
@Mitch
You don’t have a schema in HBase other than the table name and the list of
associated column families.
So you can’t really infer a schema easily…
On Oct 17, 2016, at 2:17 PM, Mich Talebzadeh
mailto:mich.talebza...@gmail.com>> wrote:
How about this method of creating Data Fra
orry for jumping in late to the game…
If memory serves (which may not be a good thing…) :
You can use HiveServer2 as a connection point to HBase.
While this doesn’t perform well, its probably the cleanest solution.
I’m not keen on Phoenix… wouldn’t recommend it….
The issue is that you’re trying
UUID in the process.
You would be better off not using HBase and storing the data in Parquet files
in a directory partitioned on date. Or rather the rowkey would be the max_ts -
TS so that your data is in LIFO.
Note: I’ve used the term epoch to describe the max value of a long (8 bytes of
‘FF
It has some implication because it imposes the SQL model on Hbase. Internally
it translates the SQL queries into custom Hbase processors. Keep also in mind
for what Hbase need a proper key design and how Phoenix designs those keys to
get the best performance out of it. I think for oltp it is a
This will give me an opportunity to start using Structured Streaming. Then, I
can try adding more functionality. If all goes well, then we could transition
off of HBase to a more in-memory data solution that can “spill-over” data for
us.
> On Oct 17, 2016, at 11:53 AM, vincent gromakow
…) :
>
> You can use HiveServer2 as a connection point to HBase.
> While this doesn’t perform well, its probably the cleanest solution.
> I’m not keen on Phoenix… wouldn’t recommend it….
>
>
> The issue is that you’re trying to make HBase, a key/value object store, a
> Relat
Ben,
*Also look at Phoenix (Apache project) which provides a better (one of the
best) SQL/JDBC layer on top of HBase.*
*http://phoenix.apache.org/ <http://phoenix.apache.org/>*
I am afraid this does not work with Spark 2!
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/p
Ben,
Also look at Phoenix (Apache project) which provides a better (one of the best)
SQL/JDBC layer on top of HBase.
http://phoenix.apache.org/
Cheers,
Jayesh
From: vincent gromakowski
Date: Monday, October 17, 2016 at 1:53 PM
To: Benjamin Kim
Cc: Michael Segel , Jörn Franke
, Mich
How about this method of creating Data Frames on Hbase tables directly.
I define an RDD for each column in the column family as below. In this case
column trade_info:ticker
//create rdd
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf
e instantiable in any spark job.
>
> 2016-10-17 18:17 GMT+02:00 Michael Segel :
>
>> Guys,
>> Sorry for jumping in late to the game…
>>
>> If memory serves (which may not be a good thing…) :
>>
>> You can use HiveServer2 as a connection point to HBase
el Segel <mailto:msegel_had...@hotmail.com>>:
> Guys,
> Sorry for jumping in late to the game…
>
> If memory serves (which may not be a good thing…) :
>
> You can use HiveServer2 as a connection point to HBase.
> While this doesn’t perform well, its probably the cle
d thing…) :
>
> You can use HiveServer2 as a connection point to HBase.
> While this doesn’t perform well, its probably the cleanest solution.
> I’m not keen on Phoenix… wouldn’t recommend it….
>
>
> The issue is that you’re trying to make HBase, a key/value object store, a
Guys,
Sorry for jumping in late to the game…
If memory serves (which may not be a good thing…) :
You can use HiveServer2 as a connection point to HBase.
While this doesn’t perform well, its probably the cleanest solution.
I’m not keen on Phoenix… wouldn’t recommend it….
The issue is that
Hi,
I have trade data stored in Hbase table. Data arrives in csv format to HDFS
and then loaded into Hbase via periodic load with
org.apache.hadoop.hbase.mapreduce.ImportTsv.
The Hbase table has one Column family "trade_info" and three columns:
ticker, timecreated, price.
The RowKey i
Thanks for all the suggestions. It would seem you guys are right about the
Tableau side of things. The reports don’t need to be real-time, and they won’t
be directly feeding off of the main DMP HBase data. Instead, it’ll be batched
to Parquet or Kudu/Impala or even PostgreSQL.
I originally
wrote:
> Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided
> already a good alternative. However, you should check if it contains a
> recent version of Hbase and Phoenix. That being said, I just wonder what is
> the dataflow, data model and the analysis yo
Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided
already a good alternative. However, you should check if it contains a recent
version of Hbase and Phoenix. That being said, I just wonder what is the
dataflow, data model and the analysis you plan to do. Maybe there are
Mich,
Unfortunately, we are moving away from Hive and unifying on Spark using CDH 5.8
as our distro. And, the Tableau released a Spark ODBC/JDBC driver too. I will
either try Phoenix JDBC Server for HBase or push to move faster to Kudu with
Impala. We will use Impala as the JDBC in-between
will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 8 October 2016 at 19:40, Felix Cheung
> wrote:
>
>> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC
>> server - HBASE woul
from such
> loss, damage or destruction.
>
>
> On 8 October 2016 at 19:40, Felix Cheung <mailto:felixcheun...@hotmail.com>> wrote:
> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC
> server - HBASE would work better.
>
> Without n
- JDBC data source - Phoenix JDBC
> server - HBASE would work better.
>
> Without naming specifics, there are at least 4 or 5 different
> implementations of HBASE sources, each at varying level of development and
> different requirements (HBASE rel
I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC server
- HBASE would work better.
Without naming specifics, there are at least 4 or 5 different implementations
of HBASE sources, each at varying level of development and different
requirements (HBASE release ve
Mich,
Are you talking about the Phoenix JDBC Server? If so, I forgot about that
alternative.
Thanks,
Ben
> On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh
> wrote:
>
> I don't think it will work
>
> you can use phoenix on top of hbase
>
> hbase(main):336:0>
Yes. I tried that with the hbase-spark package, but it didn’t work. We were
hoping it would. If it did, we would be using it for everything from Ad Servers
to REST Endpoints and even Reporting Servers. I guess we will have to wait
until they fix it.
> On Oct 8, 2016, at 11:05 AM, Felix Che
I don't think it will work
you can use phoenix on top of hbase
hbase(main):336:0> scan 'tsco', 'LIMIT' => 1
ROW COLUMN+CELL
TSCO-1-Apr-08
column=stock_daily:Date, timestamp=1475866783376, value
Great, then I think those packages as Spark data source should allow you to do
exactly that (replace org.apache.spark.sql.jdbc with HBASE one)
I do think it will be great to get more examples around this though. Would be
great if you could share your experience with this
Felix,
My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using just
SQL. I have been able to CREATE tables using this statement below in the past:
CREATE TABLE
USING org.apache.spark.sql.jdbc
OPTIONS (
url
"jdbc:postgresql://:/dm?user=&password=&qu
Ben,
I'm not sure I'm following completely.
Is your goal to use Spark to create or access tables in HBASE? If so the link
below and several packages out there support that by having a HBASE data source
for Spark. There are some examples on how the Spark code look like in that link
a
cannot CREATE a wrapper
table on top of a HBase table in Spark SQL?
What do you think? Is this the right approach?
Thanks,
Ben
> On Oct 8, 2016, at 10:33 AM, Felix Cheung wrote:
>
> HBase has released support for Spark
> hbase.apache.org/book.html#spark <http://hbase.apache.org/
HBase has released support for Spark
hbase.apache.org/book.html#spark<http://hbase.apache.org/book.html#spark>
And if you search you should find several alternative approaches.
On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim"
mailto:bbuil...@gmail.com>> wrote:
Doe
Does anyone know if Spark can work with HBase tables using Spark SQL? I know in
Hive we are able to create tables on top of an underlying HBase table that can
be accessed using MapReduce jobs. Can the same be done using HiveContext or
SQLContext? We are trying to setup a way to GET and POST
n Kim" wrote:
>
>> Lately, I’ve been experimenting with Kudu. It has been a much better
>> experience than with HBase. Using it is much simpler, even from spark-shell.
>>
>> spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0
>>
>> It’s like
’ve been experimenting with Kudu. It has been a much better
> experience than with HBase. Using it is much simpler, even from spark-shell.
>
> spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0
>
> It’s like going back to rudimentary DB systems where tables have just a
>
experience than with HBase. Using it is much simpler, even from spark-shell.
>
> spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0
>
> It’s like going back to rudimentary DB systems where tables have just a
> primary key and the columns. Additional benefits include a home-grow
Lately, I’ve been experimenting with Kudu. It has been a much better experience
than with HBase. Using it is much simpler, even from spark-shell.
spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0
It’s like going back to rudimentary DB systems where tables have just a primary
key and
n: ticker+date as row key has following benefits:
>
> 1. using ticker+date as row key will enable you to hold multiple ticker in
> this single hbase table. (Think composite primary key)
> 2. Using date itself as row key will lead to hotspots (Look up hotspoting
> due to monotonically inc
Hi
Looks like you are saving to new.csv but still loading tsco.csv? Its
definitely the header.
Suggestion: ticker+date as row key has following benefits:
1. using ticker+date as row key will enable you to hold multiple ticker in
this single hbase table. (Think composite primary key)
2. Using
nded to the end
of each line
Then I run the following command
$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
stock_daily:open, stock_daily:high, stock_daily:low, stock_daily:close,
stock_daily:volum
How do you specify ticker+rtrade as row key in the below
>
> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
> stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:
> high,stock_dai
Thanks Ayan,
How do you specify ticker+rtrade as row key in the below
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=','
-Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker,
stock_daily:tradedate,
stock_daily:open,stock_daily:high,stock_daily:low,st
Hi Mitch
It is more to do with hbase than spark.
Row key can be anything, yes but essentially what you are doing is insert
and update tesco PLC row. Given your schema, ticker+trade date seems to be
a good row key
On 3 Oct 2016 18:25, "Mich Talebzadeh" wrote:
> thanks again.
>
thanks again.
I added that jar file to the classpath and that part worked.
I was using spark shell so I have to use spark-submit for it to be able to
interact with map-reduce job.
BTW when I use the command line utility ImportTsv to load a file into
Hbase with the following table format
in the
file /etc/spark/conf/classpath.txt. So, we entered the path for the htrace jar
into the /etc/spark/conf/classpath.txt file. Then, it worked. We could
read/write to HBase.
> On Oct 2, 2016, at 12:52 AM, Mich Talebzadeh
> wrote:
>
> Thanks Ben
>
> The thing is I am
Thanks Ben
The thing is I am using Spark 2 and no stack from CDH!
Is this approach to reading/writing to Hbase specific to Cloudera?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
;
> So far no issues.
>
> Then I do
>
> val conf = HBaseConfiguration.create()
> conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml,
> core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml,
> yarn-site.xml, hbase-default.xml, hbase
-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml
val tableName = "testTable"
tableName: String = testTable
But this one fails:
scala> val table = new HTable(conf, tableName)
java.io.IOException: java.lang.reflect.InvocationTargetE
, damage or destruction.
On 22 September 2016 at 17:34, Mich Talebzadeh
wrote:
> Hi ,
>
> I have been seeing errors at OS level when running sqoop import or hbase
> to get data into Hive and Sqoop respectively.
>
> The gist of the error is at the last line.
>
> 2016-09-22 10:
Hi ,
I have been seeing errors at OS level when running sqoop import or hbase to
get data into Hive and Sqoop respectively.
The gist of the error is at the last line.
2016-09-22 10:49:39,472 [myid:] - INFO [main:Job@1356] - Job
job_1474535924802_0003 completed successfully
2016-09-22 10:49
Thanks Das and Ayan.
Do you have any refrences on how to create connection pool for hbase inside
foreachpartitions as mentioned in guide. In my case, I have to use kerberos
hbase cluster.
On Wed, Sep 21, 2016 at 6:39 PM, Tathagata Das
wrote:
> http://spark.apache.org/docs/latest/stream
http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
On Wed, Sep 21, 2016 at 4:26 PM, ayan guha wrote:
> Connection object is not serialisable. You need to implement a getorcreate
> function which would run on each executors to create
Connection object is not serialisable. You need to implement a getorcreate
function which would run on each executors to create hbase connection
locally.
On 22 Sep 2016 08:34, "KhajaAsmath Mohammed"
wrote:
> Hello Everyone,
>
> I am running spark application to push data from
Hello Everyone,
I am running spark application to push data from kafka. I am able to get
hbase kerberos connection successfully outside of functon before calling
foreachrdd on Dstream.
Job fails inside foreachrdd stating that hbaseconnection object is not
serialized. could you please let me now
Hi,
Without using Spark there are a couple of options. You can refer to the link:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/.
The gist is that you convert the data into HFiles and use the bulk upload
option to get the data quickly into HBase.
HTH
Kabeer.
On
Hi Guys
I have a huge dataset (~ 1TB) which has about a billion records. I have to
transfer it to an HBase table. What is the fastest way of doing it?
--
Thank You
Regards
Punit Naik
There is a HSpark project, https://github.com/yzhou2001/HSpark, providing
native and fast access to HBase.
Currently it only supports Spark 1.4, but any suggestions and contributions
are more than welcome.
Try it out to find its speedups!
On Sat, Sep 3, 2016 at 12:57 PM, Mich Talebzadeh
wrote
Mine is Hbase-0.98,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your ow
I’m using Spark 1.6 and HBase 1.2. Have you got it to work using these versions?
> On Sep 3, 2016, at 12:49 PM, Mich Talebzadeh
> wrote:
>
> I am trying to find a solution for this
>
> ERROR log: error in initSerDe: java.lang.ClassNotFo
e Hive but not Spark.
>
> Cheers,
> Ben
>
> On Sep 2, 2016, at 3:37 PM, Mich Talebzadeh
> wrote:
>
> Hi,
>
> You can create Hive external tables on top of existing Hbase table using
> the property
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHa
Mich,
I’m in the same boat. We can use Hive but not Spark.
Cheers,
Ben
> On Sep 2, 2016, at 3:37 PM, Mich Talebzadeh wrote:
>
> Hi,
>
> You can create Hive external tables on top of existing Hbase table using the
> property
>
> STORED BY 'org.apache.hadoop.h
Hi,
You can create Hive external tables on top of existing Hbase table using
the property
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
Example
hive> show create table hbase_table;
OK
CREATE TABLE `hbase_table`(
`key` int COMMENT '',
`value1` strin
You can either read hbase in rdd and then turn it to a df or expose hbase
tables using hive and read from hive or use phoenix
On 3 Sep 2016 08:08, "KhajaAsmath Mohammed" wrote:
> Hi Kim,
>
> I am also looking for same information. Just got the same requirement
> today
Hi Kim,
I am also looking for same information. Just got the same requirement today.
Thanks,
Asmath
On Fri, Sep 2, 2016 at 4:46 PM, Benjamin Kim wrote:
> I was wondering if anyone has tried to create Spark SQL tables on top of
> HBase tables so that data in HBase can be accessed using
I was wondering if anyone has tried to create Spark SQL tables on top of HBase
tables so that data in HBase can be accessed using Spark Thriftserver with SQL
statements? This is similar what can be done using Hive.
Thanks,
Ben
I want to read and write data from hbase using pyspark. I am getting below
error plz help
My code
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sqlContext = SQLContext(sc)
sparkconf = {
"hbase.zookeeper.quorum": "localhost",
"hbase.map
The PR will be reviewed soon.
Thanks,
Weiqing
From: Sachin Jain mailto:sachinjain...@gmail.com>>
Date: Sunday, August 28, 2016 at 11:12 PM
To: spats mailto:spatil.sud...@gmail.com>>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Issues with Spark On Hbase Connector and
Have you looked at spark-packges.org? There are several different HBase
connectors there, not sure if any meet you need or not.
https://spark-packages.org/?q=hbase
HTH,
-Todd
On Tue, Aug 30, 2016 at 5:23 AM, ayan guha wrote:
> You can use rdd level new hadoop format api and pass
You can use rdd level new hadoop format api and pass on appropriate
classes.
On 30 Aug 2016 19:13, "Mich Talebzadeh" wrote:
> Hi,
>
> Is there an existing interface to read from and write to Hbase table in
> Spark.
>
> Similar to below for Parquet
&g
Hi,
Is there an existing interface to read from and write to Hbase table in
Spark.
Similar to below for Parquet
val s = spark.read.parquet("oraclehadoop.sales2")
s.write.mode("overwrite").parquet("oraclehadoop.sales4")
Or need too write Hive table which is alread
There is connection leak problem with hortonworks hbase connector if you
use hbase 1.2.0.
I tried to use hortonwork's connector and felt into the same problem.
Have a look at this Hbase issue HBASE-16017 [0]. The fix for this was
backported to 1.3.0, 1.4.0 and 2.0.0
I have raised a tick
ark-user-list.1001560.n3.nabble.com/Issue-with-Spark-HBase-connector-streamBulkGet-method-tp27613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Regarding hbase connector by hortonworks
https://github.com/hortonworks-spark/shc, it would be great if someone can
answer these
1. What versions of Hbase & Spark expected? I could not run examples
provided using spark 1.6.0 & hbase 1.2.0
2. I get error when i run example provided her
>> On 15 Aug 2016, at 08:29, Aneela Saleem > > wrote:
>>
>> Thanks Jacek!
>>
>> I have already set hbase.security.authentication property set to
>> kerberos, since Hbase with kerberos is working fine.
>>
>> I tested again after correcting the typo b
security.authentication property set to
> kerberos, since Hbase with kerberos is working fine.
>
> I tested again after correcting the typo but got same error. Following is
> the code, Please have a look:
>
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf&
On 15 Aug 2016, at 08:29, Aneela Saleem
mailto:ane...@platalytics.com>> wrote:
Thanks Jacek!
I have already set hbase.security.authentication property set to kerberos,
since Hbase with kerberos is working fine.
I tested again after correcting the typo but got same error. Following
Thanks Jacek!
I have already set hbase.security.authentication property set to kerberos,
since Hbase with kerberos is working fine.
I tested again after correcting the typo but got same error. Following is
the code, Please have a look:
System.setProperty("java.security.krb5.conf",
Hi Aneela,
My (little to no) understanding of how to make it work is to use
hbase.security.authentication property set to kerberos (see [1]).
Spark on YARN uses it to get the tokens for Hive, HBase et al (see
[2]). It happens when Client starts conversation to YARN RM (see [3]).
You should not
Thanks for your response Jacek!
Here is the code, how spark accesses HBase:
System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
System.setProperty("java.security.auth.login.config",
"/etc/hbase/conf/zk-jaas.conf");
val hconf = HBaseCon
Hi,
How do you access HBase? What's the version of Spark?
(I don't see spark packages in the stack trace)
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowsk
Hi there,
I have lots of raw data in several Hive tables where we built a workflow to
"join" those records together and restructured into HBase. It was done
using plain MapReduce to generate HFile, and then load incremental from
HFile into HBase to guarantee the best performance.
H
Hi all,
I'm trying to run a spark job that accesses HBase with security enabled.
When i run the following command:
*/usr/local/spark-2/bin/spark-submit --keytab /etc/hadoop/conf/spark.keytab
--principal spark/hadoop-master@platalyticsrealm --class
com.platalytics.example.spark.App --master
Hi Ben,
This seems more like a question for community.cloudera.com. However, it would
be in hbase not spark I believe.
https://repository.cloudera.com/artifactory/webapp/#/artifacts/browse/tree/General/cloudera-release-repo/org/apache/hbase/hbase-spark
David Newberger
-Original Message
I would like to know if anyone has tried using the hbase-spark module? I tried
to follow the examples in conjunction with CDH 5.8.0. I cannot find the
HBaseTableCatalog class in the module or in any of the Spark jars. Can someone
help?
Thanks,
Ben
Solved, see:
http://stackoverflow.com/questions/38470114/how-to-connect-hbase-and-spark-using-python/38575095
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-connect-HBase-and-Spark-using-Python-tp27372p27409.html
Sent from the Apache Spark User List
It is included in Cloudera’s CDH 5.8.
> On Jul 22, 2016, at 6:13 PM, Mail.com wrote:
>
> Hbase Spark module will be available with Hbase 2.0. Is that out yet?
>
>> On Jul 22, 2016, at 8:50 PM, Def_Os wrote:
>>
>> So it appears it should be possible to use HBas
Hbase Spark module will be available with Hbase 2.0. Is that out yet?
> On Jul 22, 2016, at 8:50 PM, Def_Os wrote:
>
> So it appears it should be possible to use HBase's new hbase-spark module, if
> you follow this pattern:
> https://hbase.apache.org/book.html#
So it appears it should be possible to use HBase's new hbase-spark module, if
you follow this pattern:
https://hbase.apache.org/book.html#_sparksql_dataframes
Unfortunately, when I run my example from PySpark, I get the following
exception:
> py4j.protocol.Py4JJavaError: An error occurr
I'd like to know whether there's any way to query HBase with Spark SQL via
the PySpark interface. See my question on SO:
http://stackoverflow.com/questions/38470114/how-to-connect-hbase-and-spark-using-python
The new HBase-Spark module in HBase, which introduces the
HBaseContext/JavaHB
Hi i am doing bulk load into HBase as HFileFormat, by
using saveAsNewAPIHadoopFile
when i try to write i am getting an exception
java.io.IOException: Added a key not lexically larger than previous.
following is the code snippet
case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue
Hi Ram, Thanks very much it worked.
Puneet
From: ram kumar [mailto:ramkumarro...@gmail.com]
Sent: Thursday, July 07, 2016 6:51 PM
To: Puneet Tripathi
Cc: user@spark.apache.org
Subject: Re: Spark with HBase Error - Py4JJavaError
Hi Puneet,
Have you tried appending
--jars $SPARK_HOME/lib/spark
101 - 200 of 777 matches
Mail list logo