Re: 答复: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-13 Thread Ted Malaska
d > use Gets as much as possible. > > > > Thanks, > > > > > > *发件人:* Ted Malaska [mailto:ted.mala...@cloudera.com] > *发送时间:* 2015年8月12日 9:14 > *收件人:* Yan Zhou.sc > *抄送:* dev@spark.apache.org; Bing Xiao (Bing); Ted Yu; user > *主题:* RE: 答复: 答复: Package Rel

答复: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
); Ted Yu; user 主题: RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro" There a number of ways to bulk load. There is bulk put, partition bulk put, mr bulk load, and now hbase-14150 which is spark shuffle bulk load. Let me know if I have missed a bulk loading option. A

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Ted Malaska
> > My understanding is that 14181 does not run Spark execution engine at all, > but will make use of Spark Dataframe semantic and/or logic planning to pass > a logic (sub-)plan to the HBase. If true, it might > > be desirable to directly support Dataframe in HBase. > > > >

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
@spark.apache.org; Ted Yu; Bing Xiao (Bing); user Subject: RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro" The bulk load code is 14150 if u r interested. Let me know how it can be made faster. It's just a spark shuffle and writing hfiles. Unless astro wrote it

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Ted Malaska
d/or logic planning to pass > a logic (sub-)plan to the HBase. If true, it might > > be desirable to directly support Dataframe in HBase. > > > > Thanks, > > > > > > *From:* Ted Malaska [mailto:ted.mala...@cloudera.com] > *Sent:* Wednesday, August 12, 2015 7:28 AM

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
To: Yan Zhou.sc Cc: user; dev@spark.apache.org; Bing Xiao (Bing); Ted Yu Subject: RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro" Hey Yan, I've been the one building out this spark functionality in hbase so maybe I can help clarify. The hbase-spark module is

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Ted Malaska
t > of querying string columns in HBase as integers from Astro. > > > > Thanks, > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* Wednesday, August 12, 2015 7:02 AM > *To:* Yan Zhou.sc > *Cc:* Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apach

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
Subject: Re: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro" Yan: Where can I find performance numbers for Astro (it's close to middle of August) ? Cheers On Tue, Aug 11, 2015 at 3:58 PM, Yan Zhou.sc mailto:yan.zhou...@huawei.com>> wrote: Finally I can take a lo

Re: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Ted Yu
Yan: Where can I find performance numbers for Astro (it's close to middle of August) ? Cheers On Tue, Aug 11, 2015 at 3:58 PM, Yan Zhou.sc wrote: > Finally I can take a look at HBASE-14181 now. Unfortunately there is no > design doc mentioned. Superficially it is very similar to Astro with a >

答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
Finally I can take a look at HBASE-14181 now. Unfortunately there is no design doc mentioned. Superficially it is very similar to Astro with a difference of this being part of HBase client library; while Astro works as a Spark package so will evolve and function more closely with Spark SQL/Datafr

答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
Ok. Then a question will be to define a boundary between a query engine and a built-in processing. If, for instance, the Spark DataFrame functionalities involving shuffling are to be supported inside HBase, in my opinion, it’d be hard not to tag it as an query engine. If, on the other hand, only