Hi, Now, I’m getting this error when trying to write to the table.
import scala.collection.JavaConverters._ val key_seq = Seq(“my_id") val key_list = List(“my_id”).asJava kuduContext.createTable(tableName, df.schema, key_seq, new CreateTableOptions().setNumReplicas(1).addHashPartitions(key_list, 100)) df.write .options(Map("kudu.master" -> kuduMaster,"kudu.table" -> tableName)) .mode("overwrite") .kudu java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors: Not found: key not found (error 0)Not found: key not found (error 0)Not found: key not found (error 0)Not found: key not found (error 0)Not found: key not found (error 0) Does the key field need to be first in the DataFrame? Thanks, Ben > On Jun 14, 2016, at 4:28 PM, Dan Burkert <d...@cloudera.com> wrote: > > > > On Tue, Jun 14, 2016 at 4:20 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Dan, > > Thanks! It got further. Now, how do I set the Primary Key to be a column(s) > in the DataFrame and set the partitioning? Is it like this? > > kuduContext.createTable(tableName, df.schema, Seq(“my_id"), new > CreateTableOptions().setNumReplicas(1).addHashPartitions(“my_id")) > > java.lang.IllegalArgumentException: Table partitioning must be specified > using setRangePartitionColumns or addHashPartitions > > Yep. The `Seq("my_id")` part of that call is specifying the set of primary > key columns, so in this case you have specified the single PK column "my_id". > The `addHashPartitions` call adds hash partitioning to the table, in this > case over the column "my_id" (which is good, it must be over one or more PK > columns, so in this case "my_id" is the one and only valid combination). > However, the call to `addHashPartition` also takes the number of buckets as > the second param. You shouldn't get the IllegalArgumentException as long as > you are specifying either `addHashPartitions` or `setRangePartitionColumns`. > > - Dan > > > Thanks, > Ben > > >> On Jun 14, 2016, at 4:07 PM, Dan Burkert <d...@cloudera.com >> <mailto:d...@cloudera.com>> wrote: >> >> Looks like we're missing an import statement in that example. Could you try: >> >> import org.kududb.client._ >> and try again? >> >> - Dan >> >> On Tue, Jun 14, 2016 at 4:01 PM, Benjamin Kim <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> wrote: >> I encountered an error trying to create a table based on the documentation >> from a DataFrame. >> >> <console>:49: error: not found: type CreateTableOptions >> kuduContext.createTable(tableName, df.schema, Seq("key"), new >> CreateTableOptions().setNumReplicas(1)) >> >> Is there something I’m missing? >> >> Thanks, >> Ben >> >>> On Jun 14, 2016, at 3:00 PM, Jean-Daniel Cryans <jdcry...@apache.org >>> <mailto:jdcry...@apache.org>> wrote: >>> >>> It's only in Cloudera's maven repo: >>> https://repository.cloudera.com/cloudera/cloudera-repos/org/kududb/kudu-spark_2.10/0.9.0/ >>> >>> <https://repository.cloudera.com/cloudera/cloudera-repos/org/kududb/kudu-spark_2.10/0.9.0/> >>> >>> J-D >>> >>> On Tue, Jun 14, 2016 at 2:59 PM, Benjamin Kim <bbuil...@gmail.com >>> <mailto:bbuil...@gmail.com>> wrote: >>> Hi J-D, >>> >>> I installed Kudu 0.9.0 using CM, but I can’t find the kudu-spark jar for >>> spark-shell to use. Can you show me where to find it? >>> >>> Thanks, >>> Ben >>> >>> >>>> On Jun 8, 2016, at 1:19 PM, Jean-Daniel Cryans <jdcry...@apache.org >>>> <mailto:jdcry...@apache.org>> wrote: >>>> >>>> What's in this doc is what's gonna get released: >>>> https://github.com/cloudera/kudu/blob/master/docs/developing.adoc#kudu-integration-with-spark >>>> >>>> <https://github.com/cloudera/kudu/blob/master/docs/developing.adoc#kudu-integration-with-spark> >>>> >>>> J-D >>>> >>>> On Tue, Jun 7, 2016 at 8:52 PM, Benjamin Kim <bbuil...@gmail.com >>>> <mailto:bbuil...@gmail.com>> wrote: >>>> Will this be documented with examples once 0.9.0 comes out? >>>> >>>> Thanks, >>>> Ben >>>> >>>> >>>>> On May 28, 2016, at 3:22 PM, Jean-Daniel Cryans <jdcry...@apache.org >>>>> <mailto:jdcry...@apache.org>> wrote: >>>>> >>>>> It will be in 0.9.0. >>>>> >>>>> J-D >>>>> >>>>> On Sat, May 28, 2016 at 8:31 AM, Benjamin Kim <bbuil...@gmail.com >>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>> Hi Chris, >>>>> >>>>> Will all this effort be rolled into 0.9.0 and be ready for use? >>>>> >>>>> Thanks, >>>>> Ben >>>>> >>>>> >>>>>> On May 18, 2016, at 9:01 AM, Chris George <christopher.geo...@rms.com >>>>>> <mailto:christopher.geo...@rms.com>> wrote: >>>>>> >>>>>> There is some code in review that needs some more refinement. >>>>>> It will allow upsert/insert from a dataframe using the datasource api. >>>>>> It will also allow the creation and deletion of tables from a dataframe >>>>>> http://gerrit.cloudera.org:8080/#/c/2992/ >>>>>> <http://gerrit.cloudera.org:8080/#/c/2992/> >>>>>> >>>>>> Example usages will look something like: >>>>>> http://gerrit.cloudera.org:8080/#/c/2992/5/docs/developing.adoc >>>>>> <http://gerrit.cloudera.org:8080/#/c/2992/5/docs/developing.adoc> >>>>>> >>>>>> -Chris George >>>>>> >>>>>> >>>>>> On 5/18/16, 9:45 AM, "Benjamin Kim" <bbuil...@gmail.com >>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>> >>>>>> Can someone tell me what the state is of this Spark work? >>>>>> >>>>>> Also, does anyone have any sample code on how to update/insert data in >>>>>> Kudu using DataFrames? >>>>>> >>>>>> Thanks, >>>>>> Ben >>>>>> >>>>>> >>>>>>> On Apr 13, 2016, at 8:22 AM, Chris George <christopher.geo...@rms.com >>>>>>> <mailto:christopher.geo...@rms.com>> wrote: >>>>>>> >>>>>>> SparkSQL cannot support these type of statements but we may be able to >>>>>>> implement similar functionality through the api. >>>>>>> -Chris >>>>>>> >>>>>>> On 4/12/16, 5:19 PM, "Benjamin Kim" <bbuil...@gmail.com >>>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>>> >>>>>>> It would be nice to adhere to the SQL:2003 standard for an “upsert” if >>>>>>> it were to be implemented. >>>>>>> >>>>>>> MERGE INTO table_name USING table_reference ON (condition) >>>>>>> WHEN MATCHED THEN >>>>>>> UPDATE SET column1 = value1 [, column2 = value2 ...] >>>>>>> WHEN NOT MATCHED THEN >>>>>>> INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 …]) >>>>>>> >>>>>>> Cheers, >>>>>>> Ben >>>>>>> >>>>>>>> On Apr 11, 2016, at 12:21 PM, Chris George <christopher.geo...@rms.com >>>>>>>> <mailto:christopher.geo...@rms.com>> wrote: >>>>>>>> >>>>>>>> I have a wip kuduRDD that I made a few months ago. I pushed it into >>>>>>>> gerrit if you want to take a look. >>>>>>>> http://gerrit.cloudera.org:8080/#/c/2754/ >>>>>>>> <http://gerrit.cloudera.org:8080/#/c/2754/> >>>>>>>> It does pushdown predicates which the existing input formatter based >>>>>>>> rdd does not. >>>>>>>> >>>>>>>> Within the next two weeks I’m planning to implement a datasource for >>>>>>>> spark that will have pushdown predicates and insertion/update >>>>>>>> functionality (need to look more at cassandra and the hbase datasource >>>>>>>> for best way to do this) I agree that server side upsert would be >>>>>>>> helpful. >>>>>>>> Having a datasource would give us useful data frames and also make >>>>>>>> spark sql usable for kudu. >>>>>>>> >>>>>>>> My reasoning for having a spark datasource and not using Impala is: 1. >>>>>>>> We have had trouble getting impala to run fast with high concurrency >>>>>>>> when compared to spark 2. We interact with datasources which do not >>>>>>>> integrate with impala. 3. We have custom sql query planners for >>>>>>>> extended sql functionality. >>>>>>>> >>>>>>>> -Chris George >>>>>>>> >>>>>>>> >>>>>>>> On 4/11/16, 12:22 PM, "Jean-Daniel Cryans" <jdcry...@apache.org >>>>>>>> <mailto:jdcry...@apache.org>> wrote: >>>>>>>> >>>>>>>> You guys make a convincing point, although on the upsert side we'll >>>>>>>> need more support from the servers. Right now all you can do is an >>>>>>>> INSERT then, if you get a dup key, do an UPDATE. I guess we could at >>>>>>>> least add an API on the client side that would manage it, but it >>>>>>>> wouldn't be atomic. >>>>>>>> >>>>>>>> J-D >>>>>>>> >>>>>>>> On Mon, Apr 11, 2016 at 9:34 AM, Mark Hamstra <m...@clearstorydata.com >>>>>>>> <mailto:m...@clearstorydata.com>> wrote: >>>>>>>> It's pretty simple, actually. I need to support versioned datasets in >>>>>>>> a Spark SQL environment. Instead of a hack on top of a Parquet data >>>>>>>> store, I'm hoping (among other reasons) to be able to use Kudu's write >>>>>>>> and timestamp-based read operations to support not only appending >>>>>>>> data, but also updating existing data, and even some schema migration. >>>>>>>> The most typical use case is a dataset that is updated periodically >>>>>>>> (e.g., weekly or monthly) in which the the preliminary data in the >>>>>>>> previous window (week or month) is updated with values that are >>>>>>>> expected to remain unchanged from then on, and a new set of >>>>>>>> preliminary values for the current window need to be added/appended. >>>>>>>> >>>>>>>> Using Kudu's Java API and developing additional functionality on top >>>>>>>> of what Kudu has to offer isn't too much to ask, but the ease of >>>>>>>> integration with Spark SQL will gate how quickly we would move to >>>>>>>> using Kudu and how seriously we'd look at alternatives before making >>>>>>>> that decision. >>>>>>>> >>>>>>>> On Mon, Apr 11, 2016 at 8:14 AM, Jean-Daniel Cryans >>>>>>>> <jdcry...@apache.org <mailto:jdcry...@apache.org>> wrote: >>>>>>>> Mark, >>>>>>>> >>>>>>>> Thanks for taking some time to reply in this thread, glad it caught >>>>>>>> the attention of other folks! >>>>>>>> >>>>>>>> On Sun, Apr 10, 2016 at 12:33 PM, Mark Hamstra >>>>>>>> <m...@clearstorydata.com <mailto:m...@clearstorydata.com>> wrote: >>>>>>>> Do they care being able to insert into Kudu with SparkSQL >>>>>>>> >>>>>>>> I care about insert into Kudu with Spark SQL. I'm currently delaying >>>>>>>> a refactoring of some Spark SQL-oriented insert functionality while >>>>>>>> trying to evaluate what to expect from Kudu. Whether Kudu does a good >>>>>>>> job supporting inserts with Spark SQL will be a key consideration as >>>>>>>> to whether we adopt Kudu. >>>>>>>> >>>>>>>> I'd like to know more about why SparkSQL inserts in necessary for you. >>>>>>>> Is it just that you currently do it that way into some database or >>>>>>>> parquet so with minimal refactoring you'd be able to use Kudu? Would >>>>>>>> re-writing those SQL lines into Scala and directly use the Java API's >>>>>>>> KuduSession be too much work? >>>>>>>> >>>>>>>> Additionally, what do you expect to gain from using Kudu VS your >>>>>>>> current solution? If it's not completely clear, I'd love to help you >>>>>>>> think through it. >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Apr 10, 2016 at 12:23 PM, Jean-Daniel Cryans >>>>>>>> <jdcry...@apache.org <mailto:jdcry...@apache.org>> wrote: >>>>>>>> Yup, starting to get a good idea. >>>>>>>> >>>>>>>> What are your DS folks looking for in terms of functionality related >>>>>>>> to Spark? A SparkSQL integration that's as fully featured as Impala's? >>>>>>>> Do they care being able to insert into Kudu with SparkSQL or just >>>>>>>> being able to query real fast? Anything more specific to Spark that >>>>>>>> I'm missing? >>>>>>>> >>>>>>>> FWIW the plan is to get to 1.0 in late Summer/early Fall. At Cloudera >>>>>>>> all our resources are committed to making things happen in time, and a >>>>>>>> more fully featured Spark integration isn't in our plans during that >>>>>>>> period. I'm really hoping someone in the community will help with >>>>>>>> Spark, the same way we got a big contribution for the Flume sink. >>>>>>>> >>>>>>>> J-D >>>>>>>> >>>>>>>> On Sun, Apr 10, 2016 at 11:29 AM, Benjamin Kim <bbuil...@gmail.com >>>>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>>>> Yes, we took Kudu for a test run using 0.6 and 0.7 versions. But, >>>>>>>> since it’s not “production-ready”, upper management doesn’t want to >>>>>>>> fully deploy it yet. They just want to keep an eye on it though. Kudu >>>>>>>> was so much simpler and easier to use in every aspect compared to >>>>>>>> HBase. Impala was great for the report writers and analysts to >>>>>>>> experiment with for the short time it was up. But, once again, the >>>>>>>> only blocker was the lack of Spark support for our Data >>>>>>>> Developers/Scientists. So, production-level data population won’t >>>>>>>> happen until then. >>>>>>>> >>>>>>>> I hope this helps you get an idea where I am coming from… >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Ben >>>>>>>> >>>>>>>> >>>>>>>>> On Apr 10, 2016, at 11:08 AM, Jean-Daniel Cryans <jdcry...@apache.org >>>>>>>>> <mailto:jdcry...@apache.org>> wrote: >>>>>>>>> >>>>>>>>> On Sun, Apr 10, 2016 at 12:30 AM, Benjamin Kim <bbuil...@gmail.com >>>>>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>>>>> J-D, >>>>>>>>> >>>>>>>>> The main thing I hear that Cassandra is being used as an updatable >>>>>>>>> hot data store to ensure that duplicates are taken care of and >>>>>>>>> idempotency is maintained. Whether data was directly retrieved from >>>>>>>>> Cassandra for analytics, reports, or searches, it was not clear as to >>>>>>>>> what was its main use. Some also just used it for a staging area to >>>>>>>>> populate downstream tables in parquet format. The last thing I heard >>>>>>>>> was that CQL was terrible, so that rules out much use of direct >>>>>>>>> queries against it. >>>>>>>>> >>>>>>>>> I'm no C* expert, but I don't think CQL is meant for real analytics, >>>>>>>>> just ease of use instead of plainly using the APIs. Even then, Kudu >>>>>>>>> should beat it easily on big scans. Same for HBase. We've done >>>>>>>>> benchmarks against the latter, not the former. >>>>>>>>> >>>>>>>>> >>>>>>>>> As for our company, we have been looking for an updatable data store >>>>>>>>> for a long time that can be quickly queried directly either using >>>>>>>>> Spark SQL or Impala or some other SQL engine and still handle TB or >>>>>>>>> PB of data without performance degradation and many configuration >>>>>>>>> headaches. For now, we are using HBase to take on this role with >>>>>>>>> Phoenix as a fast way to directly query the data. I can see Kudu as >>>>>>>>> the best way to fill this gap easily, especially being the closest >>>>>>>>> thing to other relational databases out there in familiarity for the >>>>>>>>> many SQL analytics people in our company. The other alternative would >>>>>>>>> be to go with AWS Redshift for the same reasons, but it would come at >>>>>>>>> a cost, of course. If we went with either solutions, Kudu or >>>>>>>>> Redshift, it would get rid of the need to extract from HBase to >>>>>>>>> parquet tables or export to PostgreSQL to support more of the SQL >>>>>>>>> language using by analysts or the reporting software we use.. >>>>>>>>> >>>>>>>>> Ok, the usual then *smile*. Looks like we're not too far off with >>>>>>>>> Kudu. Have you folks tried Kudu with Impala yet with those use cases? >>>>>>>>> >>>>>>>>> >>>>>>>>> I hope this helps. >>>>>>>>> >>>>>>>>> It does, thanks for nice reply. >>>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Ben >>>>>>>>> >>>>>>>>>> On Apr 9, 2016, at 2:00 PM, Jean-Daniel Cryans <jdcry...@apache.org >>>>>>>>>> <mailto:jdcry...@apache.org>> wrote: >>>>>>>>>> >>>>>>>>>> Ha first time I'm hearing about SMACK. Inside Cloudera we like to >>>>>>>>>> refer to "Impala + Kudu" as Kimpala, but yeah it's not as sexy. My >>>>>>>>>> colleagues who were also there did say that the hype around Spark >>>>>>>>>> isn't dying down. >>>>>>>>>> >>>>>>>>>> There's definitely an overlap in the use cases that Cassandra, >>>>>>>>>> HBase, and Kudu cater to. I wouldn't go as far as saying that C* is >>>>>>>>>> just an interim solution for the use case you describe. >>>>>>>>>> >>>>>>>>>> Nothing significant happened in Kudu over the past month, it's a >>>>>>>>>> storage engine so things move slowly *smile*. I'd love to see more >>>>>>>>>> contributions on the Spark front. I know there's code out there that >>>>>>>>>> could be integrated in kudu-spark, it just needs to land in gerrit. >>>>>>>>>> I'm sure folks will happily review it. >>>>>>>>>> >>>>>>>>>> Do you have relevant experiences you can share? I'd love to learn >>>>>>>>>> more about the use cases for which you envision using Kudu as a C* >>>>>>>>>> replacement. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> J-D >>>>>>>>>> >>>>>>>>>> On Fri, Apr 8, 2016 at 12:45 PM, Benjamin Kim <bbuil...@gmail.com >>>>>>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>>>>>> Hi J-D, >>>>>>>>>> >>>>>>>>>> My colleagues recently came back from Strata in San Jose. They told >>>>>>>>>> me that everything was about Spark and there is a big buzz about the >>>>>>>>>> SMACK stack (Spark, Mesos, Akka, Cassandra, Kafka). I still think >>>>>>>>>> that Cassandra is just an interim solution as a low-latency, easily >>>>>>>>>> queried data store. I was wondering if anything significant happened >>>>>>>>>> in regards to Kudu, especially on the Spark front. Plus, can you >>>>>>>>>> come up with your own proposed stack acronym to promote? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Ben >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mar 1, 2016, at 12:20 PM, Jean-Daniel Cryans >>>>>>>>>>> <jdcry...@apache.org <mailto:jdcry...@apache.org>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Ben, >>>>>>>>>>> >>>>>>>>>>> AFAIK no one in the dev community committed to any timeline. I know >>>>>>>>>>> of one person on the Kudu Slack who's working on a better RDD, but >>>>>>>>>>> that's about it. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> J-D >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 1, 2016 at 11:00 AM, Benjamin Kim <b...@amobee.com >>>>>>>>>>> <mailto:b...@amobee.com>> wrote: >>>>>>>>>>> Hi J-D, >>>>>>>>>>> >>>>>>>>>>> Quick question… Is there an ETA for KUDU-1214? I want to target a >>>>>>>>>>> version of Kudu to begin real testing of Spark against it for our >>>>>>>>>>> devs. At least, I can tell them what timeframe to anticipate. >>>>>>>>>>> >>>>>>>>>>> Just curious, >>>>>>>>>>> Benjamin Kim >>>>>>>>>>> Data Solutions Architect >>>>>>>>>>> >>>>>>>>>>> [a•mo•bee] (n.) the company defining digital marketing. >>>>>>>>>>> >>>>>>>>>>> Mobile: +1 818 635 2900 <tel:%2B1%20818%20635%202900> >>>>>>>>>>> 3250 Ocean Park Blvd, Suite 200 | Santa Monica, CA 90405 | >>>>>>>>>>> www.amobee.com <http://www.amobee.com/> >>>>>>>>>>> >>>>>>>>>>>> On Feb 24, 2016, at 3:51 PM, Jean-Daniel Cryans >>>>>>>>>>>> <jdcry...@apache.org <mailto:jdcry...@apache.org>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> The DStream stuff isn't there at all. I'm not sure if it's needed >>>>>>>>>>>> either. >>>>>>>>>>>> >>>>>>>>>>>> The kuduRDD is just leveraging the MR input format, ideally we'd >>>>>>>>>>>> use scans directly. >>>>>>>>>>>> >>>>>>>>>>>> The SparkSQL stuff is there but it doesn't do any sort of >>>>>>>>>>>> pushdown. It's really basic. >>>>>>>>>>>> >>>>>>>>>>>> The goal was to provide something for others to contribute to. We >>>>>>>>>>>> have some basic unit tests that others can easily extend. None of >>>>>>>>>>>> us on the team are Spark experts, but we'd be really happy to >>>>>>>>>>>> assist one improve the kudu-spark code. >>>>>>>>>>>> >>>>>>>>>>>> J-D >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 24, 2016 at 3:41 PM, Benjamin Kim <bbuil...@gmail.com >>>>>>>>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>>>>>>>> J-D, >>>>>>>>>>>> >>>>>>>>>>>> It looks like it fulfills most of the basic requirements (kudu >>>>>>>>>>>> RDD, kudu DStream) in KUDU-1214. Am I right? Besides shoring up >>>>>>>>>>>> more Spark SQL functionality (Dataframes) and doing the >>>>>>>>>>>> documentation, what more needs to be done? Optimizations? >>>>>>>>>>>> >>>>>>>>>>>> I believe that it’s a good place to start using Spark with Kudu >>>>>>>>>>>> and compare it to HBase with Spark (not clean). >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Ben >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Feb 24, 2016, at 3:10 PM, Jean-Daniel Cryans >>>>>>>>>>>>> <jdcry...@apache.org <mailto:jdcry...@apache.org>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> AFAIK no one is working on it, but we did manage to get this in >>>>>>>>>>>>> for 0.7.0: https://issues.cloudera.org/browse/KUDU-1321 >>>>>>>>>>>>> <https://issues.cloudera.org/browse/KUDU-1321> >>>>>>>>>>>>> >>>>>>>>>>>>> It's a really simple wrapper, and yes you can use SparkSQL on >>>>>>>>>>>>> Kudu, but it will require a lot more work to make it fast/useful. >>>>>>>>>>>>> >>>>>>>>>>>>> Hope this helps, >>>>>>>>>>>>> >>>>>>>>>>>>> J-D >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 24, 2016 at 3:08 PM, Benjamin Kim <bbuil...@gmail.com >>>>>>>>>>>>> <mailto:bbuil...@gmail.com>> wrote: >>>>>>>>>>>>> I see this KUDU-1214 >>>>>>>>>>>>> <https://issues.cloudera.org/browse/KUDU-1214> targeted for >>>>>>>>>>>>> 0.8.0, but I see no progress on it. When this is complete, will >>>>>>>>>>>>> this mean that Spark will be able to work with Kudu both >>>>>>>>>>>>> programmatically and as a client via Spark SQL? Or is there more >>>>>>>>>>>>> work that needs to be done on the Spark side for it to work? >>>>>>>>>>>>> >>>>>>>>>>>>> Just curious. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Ben >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >