Financial fraud detection using streaming RDBMS data into Spark & Hbase

2016-12-15 Thread Mich Talebzadeh
icated transactional logs (as SQL statements) of out of database for all updates and store them in Hbase, then one can go through the Hbase data with Spark. In the past this was prohibitive using database audit as it had heavy price on the RDBMS performance. However, with Big Data this can be done through th

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-18 Thread Mich Talebzadeh
On 18 October 2016 at 08:18, Jörn Franke wrote: > Careful Hbase with Phoenix is only in certain scenarios faster. When it is > about processing small amounts out of a bigger amount of data (depends on > node memory, the operation etc). Hive+tez+orc can be rather competitive, > llap ma

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-18 Thread Jörn Franke
Careful Hbase with Phoenix is only in certain scenarios faster. When it is about processing small amounts out of a bigger amount of data (depends on node memory, the operation etc). Hive+tez+orc can be rather competitive, llap makes sense for interactive ad-hoc queries that are rather similar

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Mich Talebzadeh
()) AS timestamp) FROM ${DATABASE}.externalMarketData That works fine. However, Hbase is much faster for data retrieval with phoenix When we get Hive with LLAP, I gather Hive will replace Hbase. So in summary we have 1. raw data delivered to HDFS 2. data ingested into Hbase via cron 3. HDFS

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread ayan guha
I do not see a rationale to have hbase in this scheme of thingsmay be I am missing something? If data is delivered in HDFS, why not just add partition to an existing Hive table? On Tue, Oct 18, 2016 at 8:23 AM, Mich Talebzadeh wrote: > Thanks Mike, > > My test csv data comes as

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Michael Segel
Its a quasi columnar store. Sort of a hi-bred approach. On Oct 17, 2016, at 4:30 PM, Mich Talebzadeh mailto:mich.talebza...@gmail.com>> wrote: I assume that Hbase is more of columnar data store by virtue of it storing column data together. many interpretation of this is all over

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Mich Talebzadeh
I assume that Hbase is more of columnar data store by virtue of it storing column data together. many interpretation of this is all over places. However, it is not columnar in a sense of column based (as opposed to row based) implementation of relational model. Dr Mich Talebzadeh LinkedIn

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Mich Talebzadeh
, 86.31917515824627016510 5f4e3a9d-05cc-41a2-98b3-40810685641e, S03, 2016-10-17T22:02:09, 95.48298277703729129559 And this is my Hbase table with one column family create 'marketDataHbase', 'price_info' It is populated every 15 minutes from test.csv files delivered via Kafka and Flume to HDFS

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Jörn Franke
ber Big Data isn’t relational its more of a hierarchy model or record > model. Think IMS or Pick (Dick Pick’s revelation, U2, Universe, etc …) > > >> On Oct 17, 2016, at 3:45 PM, Jörn Franke wrote: >> >> It has some implication because it imposes the SQL model on Hbas

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Michael Segel
e: It has some implication because it imposes the SQL model on Hbase. Internally it translates the SQL queries into custom Hbase processors. Keep also in mind for what Hbase need a proper key design and how Phoenix designs those keys to get the best performance out of it. I think for oltp i

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Michael Segel
Skip Phoenix On Oct 17, 2016, at 2:20 PM, Thakrar, Jayesh mailto:jthak...@conversantmedia.com>> wrote: Ben, Also look at Phoenix (Apache project) which provides a better (one of the best) SQL/JDBC layer on top of HBase. http://phoenix.apache.org/ Cheers, Jayesh From: vincent groma

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Michael Segel
@Mitch You don’t have a schema in HBase other than the table name and the list of associated column families. So you can’t really infer a schema easily… On Oct 17, 2016, at 2:17 PM, Mich Talebzadeh mailto:mich.talebza...@gmail.com>> wrote: How about this method of creating Data Fra

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Michael Segel
orry for jumping in late to the game… If memory serves (which may not be a good thing…) : You can use HiveServer2 as a connection point to HBase. While this doesn’t perform well, its probably the cleanest solution. I’m not keen on Phoenix… wouldn’t recommend it…. The issue is that you’re trying

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Michael Segel
UUID in the process. You would be better off not using HBase and storing the data in Parquet files in a directory partitioned on date. Or rather the rowkey would be the max_ts - TS so that your data is in LIFO. Note: I’ve used the term epoch to describe the max value of a long (8 bytes of ‘FF

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Jörn Franke
It has some implication because it imposes the SQL model on Hbase. Internally it translates the SQL queries into custom Hbase processors. Keep also in mind for what Hbase need a proper key design and how Phoenix designs those keys to get the best performance out of it. I think for oltp it is a

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Benjamin Kim
This will give me an opportunity to start using Structured Streaming. Then, I can try adding more functionality. If all goes well, then we could transition off of HBase to a more in-memory data solution that can “spill-over” data for us. > On Oct 17, 2016, at 11:53 AM, vincent gromakow

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread ayan guha
…) : > > You can use HiveServer2 as a connection point to HBase. > While this doesn’t perform well, its probably the cleanest solution. > I’m not keen on Phoenix… wouldn’t recommend it…. > > > The issue is that you’re trying to make HBase, a key/value object store, a > Relat

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Mich Talebzadeh
Ben, *Also look at Phoenix (Apache project) which provides a better (one of the best) SQL/JDBC layer on top of HBase.* *http://phoenix.apache.org/ <http://phoenix.apache.org/>* I am afraid this does not work with Spark 2! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/p

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Thakrar, Jayesh
Ben, Also look at Phoenix (Apache project) which provides a better (one of the best) SQL/JDBC layer on top of HBase. http://phoenix.apache.org/ Cheers, Jayesh From: vincent gromakowski Date: Monday, October 17, 2016 at 1:53 PM To: Benjamin Kim Cc: Michael Segel , Jörn Franke , Mich

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Mich Talebzadeh
How about this method of creating Data Frames on Hbase tables directly. I define an RDD for each column in the column family as below. In this case column trade_info:ticker //create rdd val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread vincent gromakowski
e instantiable in any spark job. > > 2016-10-17 18:17 GMT+02:00 Michael Segel : > >> Guys, >> Sorry for jumping in late to the game… >> >> If memory serves (which may not be a good thing…) : >> >> You can use HiveServer2 as a connection point to HBase

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Benjamin Kim
el Segel <mailto:msegel_had...@hotmail.com>>: > Guys, > Sorry for jumping in late to the game… > > If memory serves (which may not be a good thing…) : > > You can use HiveServer2 as a connection point to HBase. > While this doesn’t perform well, its probably the cle

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread vincent gromakowski
d thing…) : > > You can use HiveServer2 as a connection point to HBase. > While this doesn’t perform well, its probably the cleanest solution. > I’m not keen on Phoenix… wouldn’t recommend it…. > > > The issue is that you’re trying to make HBase, a key/value object store, a

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Michael Segel
Guys, Sorry for jumping in late to the game… If memory serves (which may not be a good thing…) : You can use HiveServer2 as a connection point to HBase. While this doesn’t perform well, its probably the cleanest solution. I’m not keen on Phoenix… wouldn’t recommend it…. The issue is that

Accessing Hbase tables through Spark, this seems to work

2016-10-16 Thread Mich Talebzadeh
Hi, I have trade data stored in Hbase table. Data arrives in csv format to HDFS and then loaded into Hbase via periodic load with org.apache.hadoop.hbase.mapreduce.ImportTsv. The Hbase table has one Column family "trade_info" and three columns: ticker, timecreated, price. The RowKey i

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Benjamin Kim
Thanks for all the suggestions. It would seem you guys are right about the Tableau side of things. The reports don’t need to be real-time, and they won’t be directly feeding off of the main DMP HBase data. Instead, it’ll be batched to Parquet or Kudu/Impala or even PostgreSQL. I originally

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Jörn Franke
wrote: > Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided > already a good alternative. However, you should check if it contains a > recent version of Hbase and Phoenix. That being said, I just wonder what is > the dataflow, data model and the analysis yo

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Jörn Franke
Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided already a good alternative. However, you should check if it contains a recent version of Hbase and Phoenix. That being said, I just wonder what is the dataflow, data model and the analysis you plan to do. Maybe there are

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
Mich, Unfortunately, we are moving away from Hive and unifying on Spark using CDH 5.8 as our distro. And, the Tableau released a Spark ODBC/JDBC driver too. I will either try Phoenix JDBC Server for HBase or push to move faster to Kudu with Impala. We will use Impala as the JDBC in-between

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Mich Talebzadeh
will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 8 October 2016 at 19:40, Felix Cheung > wrote: > >> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC >> server - HBASE woul

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
from such > loss, damage or destruction. > > > On 8 October 2016 at 19:40, Felix Cheung <mailto:felixcheun...@hotmail.com>> wrote: > I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC > server - HBASE would work better. > > Without n

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Mich Talebzadeh
- JDBC data source - Phoenix JDBC > server - HBASE would work better. > > Without naming specifics, there are at least 4 or 5 different > implementations of HBASE sources, each at varying level of development and > different requirements (HBASE rel

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC server - HBASE would work better. Without naming specifics, there are at least 4 or 5 different implementations of HBASE sources, each at varying level of development and different requirements (HBASE release ve

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
Mich, Are you talking about the Phoenix JDBC Server? If so, I forgot about that alternative. Thanks, Ben > On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh > wrote: > > I don't think it will work > > you can use phoenix on top of hbase > > hbase(main):336:0>

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
Yes. I tried that with the hbase-spark package, but it didn’t work. We were hoping it would. If it did, we would be using it for everything from Ad Servers to REST Endpoints and even Reporting Servers. I guess we will have to wait until they fix it. > On Oct 8, 2016, at 11:05 AM, Felix Che

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Mich Talebzadeh
I don't think it will work you can use phoenix on top of hbase hbase(main):336:0> scan 'tsco', 'LIMIT' => 1 ROW COLUMN+CELL TSCO-1-Apr-08 column=stock_daily:Date, timestamp=1475866783376, value

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
Great, then I think those packages as Spark data source should allow you to do exactly that (replace org.apache.spark.sql.jdbc with HBASE one) I do think it will be great to get more examples around this though. Would be great if you could share your experience with this

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
Felix, My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using just SQL. I have been able to CREATE tables using this statement below in the past: CREATE TABLE USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:postgresql://:/dm?user=&password=&qu

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
Ben, I'm not sure I'm following completely. Is your goal to use Spark to create or access tables in HBASE? If so the link below and several packages out there support that by having a HBASE data source for Spark. There are some examples on how the Spark code look like in that link a

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
cannot CREATE a wrapper table on top of a HBase table in Spark SQL? What do you think? Is this the right approach? Thanks, Ben > On Oct 8, 2016, at 10:33 AM, Felix Cheung wrote: > > HBase has released support for Spark > hbase.apache.org/book.html#spark <http://hbase.apache.org/

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Felix Cheung
HBase has released support for Spark hbase.apache.org/book.html#spark<http://hbase.apache.org/book.html#spark> And if you search you should find several alternative approaches. On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" mailto:bbuil...@gmail.com>> wrote: Doe

Spark SQL Thriftserver with HBase

2016-10-07 Thread Benjamin Kim
Does anyone know if Spark can work with HBase tables using Spark SQL? I know in Hive we are able to create tables on top of an underlying HBase table that can be accessed using MapReduce jobs. Can the same be done using HiveContext or SQLContext? We are trying to setup a way to GET and POST

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
n Kim" wrote: > >> Lately, I’ve been experimenting with Kudu. It has been a much better >> experience than with HBase. Using it is much simpler, even from spark-shell. >> >> spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0 >> >> It’s like

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
’ve been experimenting with Kudu. It has been a much better > experience than with HBase. Using it is much simpler, even from spark-shell. > > spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0 > > It’s like going back to rudimentary DB systems where tables have just a >

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
experience than with HBase. Using it is much simpler, even from spark-shell. > > spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0 > > It’s like going back to rudimentary DB systems where tables have just a > primary key and the columns. Additional benefits include a home-grow

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
Lately, I’ve been experimenting with Kudu. It has been a much better experience than with HBase. Using it is much simpler, even from spark-shell. spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0 It’s like going back to rudimentary DB systems where tables have just a primary key and

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
n: ticker+date as row key has following benefits: > > 1. using ticker+date as row key will enable you to hold multiple ticker in > this single hbase table. (Think composite primary key) > 2. Using date itself as row key will lead to hotspots (Look up hotspoting > due to monotonically inc

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
Hi Looks like you are saving to new.csv but still loading tsco.csv? Its definitely the header. Suggestion: ticker+date as row key has following benefits: 1. using ticker+date as row key will enable you to hold multiple ticker in this single hbase table. (Think composite primary key) 2. Using

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
nded to the end of each line Then I run the following command $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:open, stock_daily:high, stock_daily:low, stock_daily:close, stock_daily:volum

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
How do you specify ticker+rtrade as row key in the below > > hbase org.apache.hadoop.hbase.mapreduce.ImportTsv > -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY, > stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily: > high,stock_dai

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Thanks Ayan, How do you specify ticker+rtrade as row key in the below hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:high,stock_daily:low,st

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
Hi Mitch It is more to do with hbase than spark. Row key can be anything, yes but essentially what you are doing is insert and update tesco PLC row. Given your schema, ticker+trade date seems to be a good row key On 3 Oct 2016 18:25, "Mich Talebzadeh" wrote: > thanks again. >

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
thanks again. I added that jar file to the classpath and that part worked. I was using spark shell so I have to use spark-submit for it to be able to interact with map-reduce job. BTW when I use the command line utility ImportTsv to load a file into Hbase with the following table format

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Benjamin Kim
in the file /etc/spark/conf/classpath.txt. So, we entered the path for the htrace jar into the /etc/spark/conf/classpath.txt file. Then, it worked. We could read/write to HBase. > On Oct 2, 2016, at 12:52 AM, Mich Talebzadeh > wrote: > > Thanks Ben > > The thing is I am

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Mich Talebzadeh
Thanks Ben The thing is I am using Spark 2 and no stack from CDH! Is this approach to reading/writing to Hbase specific to Cloudera? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Benjamin Kim
; > So far no issues. > > Then I do > > val conf = HBaseConfiguration.create() > conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, > core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, > yarn-site.xml, hbase-default.xml, hbase

Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Mich Talebzadeh
-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml val tableName = "testTable" tableName: String = testTable But this one fails: scala> val table = new HTable(conf, tableName) java.io.IOException: java.lang.reflect.InvocationTargetE

Re: sqoop Imported and Hbase ImportTsv issue with Fled: No enum constant mapreduce.JobCounter.MB_MILLIS_MAPS

2016-09-22 Thread Mich Talebzadeh
, damage or destruction. On 22 September 2016 at 17:34, Mich Talebzadeh wrote: > Hi , > > I have been seeing errors at OS level when running sqoop import or hbase > to get data into Hive and Sqoop respectively. > > The gist of the error is at the last line. > > 2016-09-22 10:

sqoop Imported and Hbase ImportTsv issue with Fled: No enum constant mapreduce.JobCounter.MB_MILLIS_MAPS

2016-09-22 Thread Mich Talebzadeh
Hi , I have been seeing errors at OS level when running sqoop import or hbase to get data into Hive and Sqoop respectively. The gist of the error is at the last line. 2016-09-22 10:49:39,472 [myid:] - INFO [main:Job@1356] - Job job_1474535924802_0003 completed successfully 2016-09-22 10:49

Re: Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-22 Thread KhajaAsmath Mohammed
Thanks Das and Ayan. Do you have any refrences on how to create connection pool for hbase inside foreachpartitions as mentioned in guide. In my case, I have to use kerberos hbase cluster. On Wed, Sep 21, 2016 at 6:39 PM, Tathagata Das wrote: > http://spark.apache.org/docs/latest/stream

Re: Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-21 Thread Tathagata Das
http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd On Wed, Sep 21, 2016 at 4:26 PM, ayan guha wrote: > Connection object is not serialisable. You need to implement a getorcreate > function which would run on each executors to create

Re: Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-21 Thread ayan guha
Connection object is not serialisable. You need to implement a getorcreate function which would run on each executors to create hbase connection locally. On 22 Sep 2016 08:34, "KhajaAsmath Mohammed" wrote: > Hello Everyone, > > I am running spark application to push data from

Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-21 Thread KhajaAsmath Mohammed
Hello Everyone, I am running spark application to push data from kafka. I am able to get hbase kerberos connection successfully outside of functon before calling foreachrdd on Dstream. Job fails inside foreachrdd stating that hbaseconnection object is not serialized. could you please let me now

Re: Spark to HBase Fast Bulk Upload

2016-09-19 Thread Kabeer Ahmed
Hi, Without using Spark there are a couple of options. You can refer to the link: http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/. The gist is that you convert the data into HFiles and use the bulk upload option to get the data quickly into HBase. HTH Kabeer. On

Spark to HBase Fast Bulk Upload

2016-09-19 Thread Punit Naik
Hi Guys I have a huge dataset (~ 1TB) which has about a billion records. I have to transfer it to an HBase table. What is the fastest way of doing it? -- Thank You Regards Punit Naik

Re: Spark SQL Tables on top of HBase Tables

2016-09-05 Thread Yan Zhou
There is a HSpark project, https://github.com/yzhou2001/HSpark, providing native and fast access to HBase. Currently it only supports Spark 1.4, but any suggestions and contributions are more than welcome. Try it out to find its speedups! On Sat, Sep 3, 2016 at 12:57 PM, Mich Talebzadeh wrote

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Mich Talebzadeh
Mine is Hbase-0.98, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your ow

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Benjamin Kim
I’m using Spark 1.6 and HBase 1.2. Have you got it to work using these versions? > On Sep 3, 2016, at 12:49 PM, Mich Talebzadeh > wrote: > > I am trying to find a solution for this > > ERROR log: error in initSerDe: java.lang.ClassNotFo

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Mich Talebzadeh
e Hive but not Spark. > > Cheers, > Ben > > On Sep 2, 2016, at 3:37 PM, Mich Talebzadeh > wrote: > > Hi, > > You can create Hive external tables on top of existing Hbase table using > the property > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHa

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Benjamin Kim
Mich, I’m in the same boat. We can use Hive but not Spark. Cheers, Ben > On Sep 2, 2016, at 3:37 PM, Mich Talebzadeh wrote: > > Hi, > > You can create Hive external tables on top of existing Hbase table using the > property > > STORED BY 'org.apache.hadoop.h

Re: Spark SQL Tables on top of HBase Tables

2016-09-02 Thread Mich Talebzadeh
Hi, You can create Hive external tables on top of existing Hbase table using the property STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' Example hive> show create table hbase_table; OK CREATE TABLE `hbase_table`( `key` int COMMENT '', `value1` strin

Re: Spark SQL Tables on top of HBase Tables

2016-09-02 Thread ayan guha
You can either read hbase in rdd and then turn it to a df or expose hbase tables using hive and read from hive or use phoenix On 3 Sep 2016 08:08, "KhajaAsmath Mohammed" wrote: > Hi Kim, > > I am also looking for same information. Just got the same requirement > today

Re: Spark SQL Tables on top of HBase Tables

2016-09-02 Thread KhajaAsmath Mohammed
Hi Kim, I am also looking for same information. Just got the same requirement today. Thanks, Asmath On Fri, Sep 2, 2016 at 4:46 PM, Benjamin Kim wrote: > I was wondering if anyone has tried to create Spark SQL tables on top of > HBase tables so that data in HBase can be accessed using

Spark SQL Tables on top of HBase Tables

2016-09-02 Thread Benjamin Kim
I was wondering if anyone has tried to create Spark SQL tables on top of HBase tables so that data in HBase can be accessed using Spark Thriftserver with SQL statements? This is similar what can be done using Hive. Thanks, Ben

Fwd: Pyspark Hbase Problem

2016-08-31 Thread md mehrab
I want to read and write data from hbase using pyspark. I am getting below error plz help My code from pyspark import SparkContext, SQLContext sc = SparkContext() sqlContext = SQLContext(sc) sparkconf = { "hbase.zookeeper.quorum": "localhost", "hbase.map

Re: Issues with Spark On Hbase Connector and versions

2016-08-30 Thread Weiqing Yang
The PR will be reviewed soon. Thanks, Weiqing From: Sachin Jain mailto:sachinjain...@gmail.com>> Date: Sunday, August 28, 2016 at 11:12 PM To: spats mailto:spatil.sud...@gmail.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: Issues with Spark On Hbase Connector and

Re: Writing to Hbase table from Spark

2016-08-30 Thread Todd Nist
Have you looked at spark-packges.org? There are several different HBase connectors there, not sure if any meet you need or not. https://spark-packages.org/?q=hbase HTH, -Todd On Tue, Aug 30, 2016 at 5:23 AM, ayan guha wrote: > You can use rdd level new hadoop format api and pass

Re: Writing to Hbase table from Spark

2016-08-30 Thread ayan guha
You can use rdd level new hadoop format api and pass on appropriate classes. On 30 Aug 2016 19:13, "Mich Talebzadeh" wrote: > Hi, > > Is there an existing interface to read from and write to Hbase table in > Spark. > > Similar to below for Parquet &g

Writing to Hbase table from Spark

2016-08-30 Thread Mich Talebzadeh
Hi, Is there an existing interface to read from and write to Hbase table in Spark. Similar to below for Parquet val s = spark.read.parquet("oraclehadoop.sales2") s.write.mode("overwrite").parquet("oraclehadoop.sales4") Or need too write Hive table which is alread

Re: Issues with Spark On Hbase Connector and versions

2016-08-28 Thread Sachin Jain
There is connection leak problem with hortonworks hbase connector if you use hbase 1.2.0. I tried to use hortonwork's connector and felt into the same problem. Have a look at this Hbase issue HBASE-16017 [0]. The fix for this was backported to 1.3.0, 1.4.0 and 2.0.0 I have raised a tick

Issue with Spark HBase connector streamBulkGet method

2016-08-28 Thread BiksN
ark-user-list.1001560.n3.nabble.com/Issue-with-Spark-HBase-connector-streamBulkGet-method-tp27613.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Issues with Spark On Hbase Connector and versions

2016-08-27 Thread spats
Regarding hbase connector by hortonworks https://github.com/hortonworks-spark/shc, it would be great if someone can answer these 1. What versions of Hbase & Spark expected? I could not run examples provided using spark 1.6.0 & hbase 1.2.0 2. I get error when i run example provided her

Re: Accessing HBase through Spark with Security enabled

2016-08-21 Thread Aneela Saleem
>> On 15 Aug 2016, at 08:29, Aneela Saleem > > wrote: >> >> Thanks Jacek! >> >> I have already set hbase.security.authentication property set to >> kerberos, since Hbase with kerberos is working fine. >> >> I tested again after correcting the typo b

Re: Accessing HBase through Spark with Security enabled

2016-08-16 Thread Aneela Saleem
security.authentication property set to > kerberos, since Hbase with kerberos is working fine. > > I tested again after correcting the typo but got same error. Following is > the code, Please have a look: > > System.setProperty("java.security.krb5.conf", "/etc/krb5.conf&

Re: Accessing HBase through Spark with Security enabled

2016-08-15 Thread Steve Loughran
On 15 Aug 2016, at 08:29, Aneela Saleem mailto:ane...@platalytics.com>> wrote: Thanks Jacek! I have already set hbase.security.authentication property set to kerberos, since Hbase with kerberos is working fine. I tested again after correcting the typo but got same error. Following

Re: Accessing HBase through Spark with Security enabled

2016-08-15 Thread Aneela Saleem
Thanks Jacek! I have already set hbase.security.authentication property set to kerberos, since Hbase with kerberos is working fine. I tested again after correcting the typo but got same error. Following is the code, Please have a look: System.setProperty("java.security.krb5.conf",

Re: Accessing HBase through Spark with Security enabled

2016-08-13 Thread Jacek Laskowski
Hi Aneela, My (little to no) understanding of how to make it work is to use hbase.security.authentication property set to kerberos (see [1]). Spark on YARN uses it to get the tokens for Hive, HBase et al (see [2]). It happens when Client starts conversation to YARN RM (see [3]). You should not

Re: Accessing HBase through Spark with Security enabled

2016-08-12 Thread Aneela Saleem
Thanks for your response Jacek! Here is the code, how spark accesses HBase: System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); System.setProperty("java.security.auth.login.config", "/etc/hbase/conf/zk-jaas.conf"); val hconf = HBaseCon

Re: Accessing HBase through Spark with Security enabled

2016-08-12 Thread Jacek Laskowski
Hi, How do you access HBase? What's the version of Spark? (I don't see spark packages in the stack trace) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowsk

PySpark read from HBase

2016-08-12 Thread Bin Wang
Hi there, I have lots of raw data in several Hive tables where we built a workflow to "join" those records together and restructured into HBase. It was done using plain MapReduce to generate HFile, and then load incremental from HFile into HBase to guarantee the best performance. H

Accessing HBase through Spark with Security enabled

2016-08-07 Thread Aneela Saleem
Hi all, I'm trying to run a spark job that accesses HBase with security enabled. When i run the following command: */usr/local/spark-2/bin/spark-submit --keytab /etc/hadoop/conf/spark.keytab --principal spark/hadoop-master@platalyticsrealm --class com.platalytics.example.spark.App --master

RE: HBase-Spark Module

2016-07-29 Thread David Newberger
Hi Ben, This seems more like a question for community.cloudera.com. However, it would be in hbase not spark I believe. https://repository.cloudera.com/artifactory/webapp/#/artifacts/browse/tree/General/cloudera-release-repo/org/apache/hbase/hbase-spark David Newberger -Original Message

HBase-Spark Module

2016-07-29 Thread Benjamin Kim
I would like to know if anyone has tried using the hbase-spark module? I tried to follow the examples in conjunction with CDH 5.8.0. I cannot find the HBaseTableCatalog class in the module or in any of the Spark jars. Can someone help? Thanks, Ben

Re: How to connect HBase and Spark using Python?

2016-07-25 Thread Def_Os
Solved, see: http://stackoverflow.com/questions/38470114/how-to-connect-hbase-and-spark-using-python/38575095 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-connect-HBase-and-Spark-using-Python-tp27372p27409.html Sent from the Apache Spark User List

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Benjamin Kim
It is included in Cloudera’s CDH 5.8. > On Jul 22, 2016, at 6:13 PM, Mail.com wrote: > > Hbase Spark module will be available with Hbase 2.0. Is that out yet? > >> On Jul 22, 2016, at 8:50 PM, Def_Os wrote: >> >> So it appears it should be possible to use HBas

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Mail.com
Hbase Spark module will be available with Hbase 2.0. Is that out yet? > On Jul 22, 2016, at 8:50 PM, Def_Os wrote: > > So it appears it should be possible to use HBase's new hbase-spark module, if > you follow this pattern: > https://hbase.apache.org/book.html#

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Def_Os
So it appears it should be possible to use HBase's new hbase-spark module, if you follow this pattern: https://hbase.apache.org/book.html#_sparksql_dataframes Unfortunately, when I run my example from PySpark, I get the following exception: > py4j.protocol.Py4JJavaError: An error occurr

How to connect HBase and Spark using Python?

2016-07-20 Thread Def_Os
I'd like to know whether there's any way to query HBase with Spark SQL via the PySpark interface. See my question on SO: http://stackoverflow.com/questions/38470114/how-to-connect-hbase-and-spark-using-python The new HBase-Spark module in HBase, which introduces the HBaseContext/JavaHB

Spark HBase bulk load using hfile format

2016-07-13 Thread yeshwanth kumar
Hi i am doing bulk load into HBase as HFileFormat, by using saveAsNewAPIHadoopFile when i try to write i am getting an exception java.io.IOException: Added a key not lexically larger than previous. following is the code snippet case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue

RE: Spark with HBase Error - Py4JJavaError

2016-07-08 Thread Puneet Tripathi
Hi Ram, Thanks very much it worked. Puneet From: ram kumar [mailto:ramkumarro...@gmail.com] Sent: Thursday, July 07, 2016 6:51 PM To: Puneet Tripathi Cc: user@spark.apache.org Subject: Re: Spark with HBase Error - Py4JJavaError Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark

<    1   2   3   4   5   6   7   8   >