September 19, 2018 at 4:35 PM
To: "Thakrar, Jayesh"
Cc: "tigerqu...@outlook.com" , Spark Dev List
Subject: Re: [Discuss] Datasource v2 support for manipulating partitions
What does partition management look like in those systems and what are the
options we would standard
*To: *"Thakrar, Jayesh"
> *Cc: *"tigerqu...@outlook.com" , Spark Dev List <
> dev@spark.apache.org>
> *Subject: *Re: [Discuss] Datasource v2 support for manipulating partitions
>
>
>
> I'm open to exploring the idea of adding partition management as a catalog
&
"
Cc: "tigerqu...@outlook.com" , Spark Dev List
Subject: Re: [Discuss] Datasource v2 support for manipulating partitions
I'm open to exploring the idea of adding partition management as a catalog API.
The approach we're taking is to have an interface for each concern a catalog
migh
I'm open to exploring the idea of adding partition management as a catalog
API. The approach we're taking is to have an interface for each concern a
catalog might implement, like TableCatalog (proposed in SPARK-24252), but
also FunctionCatalog for stored functions and possibly
Totally agree with you Dale, that there are situations for efficiency,
performance and better control/visibility/manageability that we need to expose
partition management.
So as described, I suggested two things - the ability to do it in the current
V2 API form via options and appropriate
Hi Jayesh,
I get where you are coming from - partitions are just an implementation
optimisation that we really shouldn’t be bothering the end user with.
Unfortunately that view is like saying RPC is like a procedure call, and
details of the network transport should be hidden from the end user.
I am not involved with the design or development of the V2 API - so these could
be naïve comments/thoughts.
Just as dataset is to abstract away from RDD, which otherwise required a little
more intimate knowledge about Spark internals, I am guessing the absence of
partition operations is either
I've been following the development of the new data source abstraction with
keen interest. One of the issues that has occurred to me as I sat down and
planned how I would implement a data source is how I would support
manipulating partitions.
My reading of the current prototype is that Data