☕ The Coffee and Tea Pods Machine for your Office is finally here! 🍵

2016-04-10 Thread Itality Pty Ltd
Office Solutions - itality.com.au (http://itality.us13.list-manage.com/track/click?u=d1477b4be4d24f8dd4a378476&id=744f5b565d&e=deff820e5a) View this email in your browser (http://us13.campaign-archive1.com/?u=d1477b4be4d24f8dd4a378476&id=a0e99ea79d&e=deff820e5a) http://itality.us13.list-manage.

Re: Spark on Kudu

2016-04-10 Thread Benjamin Kim
J-D, Priority is data population of tables using DataFrames. That’s all I heard the most. It is the same with HBase. But, I bet once this is taken care of, the fast querying part would follow because the data is now in Kudu. If SparkSQL integration is there, that would simplify things even more

Re: Spark on Kudu

2016-04-10 Thread Mark Hamstra
> > Do they care being able to insert into Kudu with SparkSQL I care about insert into Kudu with Spark SQL. I'm currently delaying a refactoring of some Spark SQL-oriented insert functionality while trying to evaluate what to expect from Kudu. Whether Kudu does a good job supporting inserts wit

Re: Spark on Kudu

2016-04-10 Thread Jean-Daniel Cryans
Yup, starting to get a good idea. What are your DS folks looking for in terms of functionality related to Spark? A SparkSQL integration that's as fully featured as Impala's? Do they care being able to insert into Kudu with SparkSQL or just being able to query real fast? Anything more specific to S

hi

2016-04-10 Thread Darshan Singh

Re: Spark on Kudu

2016-04-10 Thread Benjamin Kim
Yes, we took Kudu for a test run using 0.6 and 0.7 versions. But, since it’s not “production-ready”, upper management doesn’t want to fully deploy it yet. They just want to keep an eye on it though. Kudu was so much simpler and easier to use in every aspect compared to HBase. Impala was great fo

Re: Spark on Kudu

2016-04-10 Thread Jean-Daniel Cryans
On Sun, Apr 10, 2016 at 12:30 AM, Benjamin Kim wrote: > J-D, > > The main thing I hear that Cassandra is being used as an updatable hot > data store to ensure that duplicates are taken care of and idempotency is > maintained. Whether data was directly retrieved from Cassandra for > analytics, rep

Re: Spark on Kudu

2016-04-10 Thread Benjamin Kim
J-D, The main thing I hear that Cassandra is being used as an updatable hot data store to ensure that duplicates are taken care of and idempotency is maintained. Whether data was directly retrieved from Cassandra for analytics, reports, or searches, it was not clear as to what was its main use.