Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-20 Thread Thakrar, Jayesh
September 19, 2018 at 4:35 PM To: "Thakrar, Jayesh" Cc: "tigerqu...@outlook.com" , Spark Dev List Subject: Re: [Discuss] Datasource v2 support for manipulating partitions What does partition management look like in those systems and what are the options we would standard

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-19 Thread Ryan Blue
*To: *"Thakrar, Jayesh" > *Cc: *"tigerqu...@outlook.com" , Spark Dev List < > dev@spark.apache.org> > *Subject: *Re: [Discuss] Datasource v2 support for manipulating partitions > > > > I'm open to exploring the idea of adding partition management as a catalog &

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-19 Thread Thakrar, Jayesh
" Cc: "tigerqu...@outlook.com" , Spark Dev List Subject: Re: [Discuss] Datasource v2 support for manipulating partitions I'm open to exploring the idea of adding partition management as a catalog API. The approach we're taking is to have an interface for each concern a catalog migh

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-19 Thread Ryan Blue
I'm open to exploring the idea of adding partition management as a catalog API. The approach we're taking is to have an interface for each concern a catalog might implement, like TableCatalog (proposed in SPARK-24252), but also FunctionCatalog for stored functions and possibly

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-18 Thread Thakrar, Jayesh
Totally agree with you Dale, that there are situations for efficiency, performance and better control/visibility/manageability that we need to expose partition management. So as described, I suggested two things - the ability to do it in the current V2 API form via options and appropriate

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-17 Thread tigerquoll
Hi Jayesh, I get where you are coming from - partitions are just an implementation optimisation that we really shouldn’t be bothering the end user with. Unfortunately that view is like saying RPC is like a procedure call, and details of the network transport should be hidden from the end user.

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-16 Thread Thakrar, Jayesh
I am not involved with the design or development of the V2 API - so these could be naïve comments/thoughts. Just as dataset is to abstract away from RDD, which otherwise required a little more intimate knowledge about Spark internals, I am guessing the absence of partition operations is either

[Discuss] Datasource v2 support for manipulating partitions

2018-09-16 Thread tigerquoll
I've been following the development of the new data source abstraction with keen interest. One of the issues that has occurred to me as I sat down and planned how I would implement a data source is how I would support manipulating partitions. My reading of the current prototype is that Data