Re: Spark SQL Thriftserver with HBase

Benjamin Kim Sat, 08 Oct 2016 12:15:57 -0700

Mich,

First and foremost, we have visualization servers that run Tableau for external 
user reports. Second, we have servers that are ad servers and REST endpoints 
for cookie sync and segmentation data exchange. These will use JDBC directly 
within the same data-center. When not colocated in the same data-center, they 
will connected to a located database server using JDBC. Either way, by using 
JDBC everywhere, it simplifies and unifies the code on the JDBC industry 
standard.


Does this make sense?

Thanks,
Ben

> On Oct 8, 2016, at 11:47 AM, Mich Talebzadeh <[email protected]> 
> wrote:
> 
> Like any other design what is your presentation layer and end users?
> 
> Are they SQL centric users from Tableau background or they may use spark 
> functional programming.
> 
> It is best to describe the use case.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 8 October 2016 at 19:40, Felix Cheung <[email protected] 
> <mailto:[email protected]>> wrote:
> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC 
> server - HBASE would work better.
> 
> Without naming specifics, there are at least 4 or 5 different implementations 
> of HBASE sources, each at varying level of development and different 
> requirements (HBASE release version, Kerberos support etc)
> 
> 
> _____________________________
> From: Benjamin Kim <[email protected] <mailto:[email protected]>>
> Sent: Saturday, October 8, 2016 11:26 AM
> Subject: Re: Spark SQL Thriftserver with HBase
> To: Mich Talebzadeh <[email protected] 
> <mailto:[email protected]>>
> Cc: <[email protected] <mailto:[email protected]>>, Felix Cheung 
> <[email protected] <mailto:[email protected]>>
> 
> 
> 
> Mich,
> 
> Are you talking about the Phoenix JDBC Server? If so, I forgot about that 
> alternative.
> 
> Thanks,
> Ben
> 
> 
> On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> I don't think it will work
> 
> you can use phoenix on top of hbase
> 
> hbase(main):336:0> scan 'tsco', 'LIMIT' => 1
> ROW                                                       COLUMN+CELL
>  TSCO-1-Apr-08                                            
> column=stock_daily:Date, timestamp=1475866783376, value=1-Apr-08
>  TSCO-1-Apr-08                                            
> column=stock_daily:close, timestamp=1475866783376, value=405.25
>  TSCO-1-Apr-08                                            
> column=stock_daily:high, timestamp=1475866783376, value=406.75
>  TSCO-1-Apr-08                                            
> column=stock_daily:low, timestamp=1475866783376, value=379.25
>  TSCO-1-Apr-08                                            
> column=stock_daily:open, timestamp=1475866783376, value=380.00
>  TSCO-1-Apr-08                                            
> column=stock_daily:stock, timestamp=1475866783376, value=TESCO PLC
>  TSCO-1-Apr-08                                            
> column=stock_daily:ticker, timestamp=1475866783376, value=TSCO
>  TSCO-1-Apr-08                                            
> column=stock_daily:volume, timestamp=1475866783376, value=49664486
> 
> And the same on Phoenix on top of Hvbase table
> 
> 0: jdbc:phoenix:thin:url=http://rhes564:8765 <http://rhes564:8765/>> select 
> substr(to_char(to_date("Date",'dd-MMM-yy')),1,10) AS TradeDate, "close" AS 
> "Day's close", "high" AS "Day's High", "low" AS "Day's Low", "open" AS "Day's 
> Open", "ticker", "volume", (to_number("low")+to_number("high"))/2 AS 
> "AverageDailyPrice" from "tsco" where to_number("volume") > 0 and "high" != 
> '-' and to_date("Date",'dd-MMM-yy') > to_date('2015-10-06','yyyy-MM-dd') 
> order by  to_date("Date",'dd-MMM-yy') limit 1;
> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
> |  TRADEDATE  | Day's close  | Day's High  | Day's Low  | Day's Open  | 
> ticker  |  volume   | AverageDailyPrice  |
> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
> | 2015-10-07  | 197.00       | 198.05      | 184.84     | 192.20      | TSCO  
>   | 30046994  | 191.445            |
> 
> HTH
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destructionof data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed.The author 
> will in no case be liable for any monetary damages arising from suchloss, 
> damage or destruction.
>  
> 
> On 8 October 2016 at 19:05, Felix Cheung <[email protected] 
> <mailto:[email protected]>> wrote:
> Great, then I think those packages as Spark data source should allow you to 
> do exactly that (replace org.apache.spark.sql.jdbc with HBASE one)
> 
> I do think it will be great to get more examples around this though. Would be 
> great if you could share your experience with this!
> 
> 
> _____________________________
> From: Benjamin Kim <[email protected] <mailto:[email protected]>>
> Sent: Saturday, October 8, 2016 11:00 AM
> Subject: Re: Spark SQL Thriftserver with HBase
> To: Felix Cheung <[email protected] 
> <mailto:[email protected]>>
> Cc: <[email protected] <mailto:[email protected]>>
> 
> 
> Felix,
> 
> My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using 
> just SQL. I have been able to CREATE tables using this statement below in the 
> past:
> 
> CREATE TABLE <table-name>
> USING org.apache.spark.sql.jdbc
> OPTIONS (
>   url 
> "jdbc:postgresql://<hostname>:<port>/dm?user=<username>&password=<password>",
>   dbtable "dim.dimension_acamp"
> );
> 
> After doing this, I can access the PostgreSQL table using Spark SQL JDBC 
> Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I want to 
> do the same with HBase tables. We tried this using Hive and HiveServer2, but 
> the response times are just too long.
> 
> Thanks,
> Ben
> 
> 
> On Oct 8, 2016, at 10:53 AM, Felix Cheung <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Ben,
> 
> I'm not sure I'm following completely.
> 
> Is your goal to use Spark to create or access tables in HBASE? If so the link 
> below and several packages out there support that by having a HBASE data 
> source for Spark. There are some examples on how the Spark code look like in 
> that link as well. On that note, you should also be able to use the HBASE 
> data source from pure SQL (Spark SQL) query as well, which should work in the 
> case with the Spark SQL JDBC Thrift Server (with 
> USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10
>  <http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>).
> 
> 
> _____________________________
> From: Benjamin Kim <[email protected] <mailto:[email protected]>>
> Sent: Saturday, October 8, 2016 10:40 AM
> Subject: Re: Spark SQL Thriftserver with HBase
> To: Felix Cheung <[email protected] 
> <mailto:[email protected]>>
> Cc: <[email protected] <mailto:[email protected]>>
> 
> 
> Felix,
> 
> The only alternative way is to create a stored procedure (udf) in database 
> terms that would run Spark scala code underneath. In this way, I can use 
> Spark SQL JDBC Thriftserver to execute it using SQL code passing the key, 
> values I want to UPSERT. I wonder if this is possible since I cannot CREATE a 
> wrapper table on top of a HBase table in Spark SQL?
> 
> What do you think? Is this the right approach?
> 
> Thanks,
> Ben
> 
> On Oct 8, 2016, at 10:33 AM, Felix Cheung <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> HBase has released support for Spark
> hbase.apache.org/book.html#spark <http://hbase.apache.org/book.html#spark>
> 
> And if you search you should find several alternative approaches.
> 
> 
> 
> 
> 
> On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Does anyone know if Spark can work with HBase tables using Spark SQL? I know 
> in Hive we are able to create tables on top of an underlying HBase table that 
> can be accessed using MapReduce jobs. Can the same be done using HiveContext 
> or SQLContext? We are trying to setup a way to GET and POST data to and from 
> the HBase table using the Spark SQL JDBC thriftserver from our RESTful API 
> endpoints and/or HTTP web farms. If we can get this to work, then we can load 
> balance the thriftservers. In addition, this will benefit us in giving us a 
> way to abstract the data storage layer away from the presentation layer code. 
> There is a chance that we will swap out the data storage technology in the 
> future. We are currently experimenting with Kudu.
> 
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected] 
> <mailto:[email protected]>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Spark SQL Thriftserver with HBase

Reply via email to