Re: [VOTE] SPIP: Spark API for Table Metadata

2019-02-27 Thread Russell Spitzer
+1 (non-binding)

On Wed, Feb 27, 2019, 6:28 PM Ryan Blue  wrote:

> Hi everyone,
>
> In the last DSv2 sync, the consensus was that the table metadata SPIP was
> ready to bring up for a vote. Now that the multi-catalog identifier SPIP
> vote has passed, I'd like to start one for the table metadata API,
> TableCatalog.
>
> The proposal is for adding a TableCatalog interface that will be used by
> v2 plans. That interface has methods to load, create, drop, alter, refresh,
> rename, and check existence for tables. It also specifies the set of
> metadata used to configure tables: schema, partitioning, and key-value
> properties. For more information, please read the SPIP proposal doc
> 
> .
>
> Please vote in the next 3 days.
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don't think this is a good idea because ...
>
>
> Thanks!
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-27 Thread Wenchen Fan
I'm good with the list from Ryan, thanks!

On Thu, Feb 28, 2019 at 1:00 AM Ryan Blue  wrote:

> I think that's a good plan. Let's get the functionality done, but mark it
> experimental pending a new row API.
>
> So is there agreement on this set of work, then?
>
> On Tue, Feb 26, 2019 at 6:30 PM Matei Zaharia 
> wrote:
>
>> To add to this, we can add a stable interface anytime if the original one
>> was marked as unstable; we wouldn’t have to wait until 4.0. We had a lot of
>> APIs that were experimental in 2.0 and then got stabilized in later 2.x
>> releases for example.
>>
>> Matei
>>
>> > On Feb 26, 2019, at 5:12 PM, Reynold Xin  wrote:
>> >
>> > We will have to fix that before we declare dev2 is stable, because
>> InternalRow is not a stable API. We don’t necessarily need to do it in 3.0.
>> >
>> > On Tue, Feb 26, 2019 at 5:10 PM Matt Cheah  wrote:
>> > Will that then require an API break down the line? Do we save that for
>> Spark 4?
>> >
>> >
>> >
>> >
>> > -Matt Cheah?
>> >
>> >
>> >
>> > From: Ryan Blue 
>> > Reply-To: "rb...@netflix.com" 
>> > Date: Tuesday, February 26, 2019 at 4:53 PM
>> > To: Matt Cheah 
>> > Cc: Sean Owen , Wenchen Fan ,
>> Xiao Li , Matei Zaharia ,
>> Spark Dev List 
>> > Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2
>> >
>> >
>> >
>> > That's a good question.
>> >
>> >
>> >
>> > While I'd love to have a solution for that, I don't think it is a good
>> idea to delay DSv2 until we have one. That is going to require a lot of
>> internal changes and I don't see how we could make the release date if we
>> are including an InternalRow replacement.
>> >
>> >
>> >
>> > On Tue, Feb 26, 2019 at 4:41 PM Matt Cheah  wrote:
>> >
>> > Reynold made a note earlier about a proper Row API that isn’t
>> InternalRow – is that still on the table?
>> >
>> >
>> >
>> > -Matt Cheah
>> >
>> >
>> >
>> > From: Ryan Blue 
>> > Reply-To: "rb...@netflix.com" 
>> > Date: Tuesday, February 26, 2019 at 4:40 PM
>> > To: Matt Cheah 
>> > Cc: Sean Owen , Wenchen Fan ,
>> Xiao Li , Matei Zaharia ,
>> Spark Dev List 
>> > Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2
>> >
>> >
>> >
>> > Thanks for bumping this, Matt. I think we can have the discussion here
>> to clarify exactly what we’re committing to and then have a vote thread
>> once we’re agreed.
>> > Getting back to the DSv2 discussion, I think we have a good handle on
>> what would be added:
>> > · Plugin system for catalogs
>> >
>> > · TableCatalog interface (I’ll start a vote thread for this
>> SPIP shortly)
>> >
>> > · TableCatalog implementation backed by SessionCatalog that can
>> load v2 tables
>> >
>> > · Resolution rule to load v2 tables using the new catalog
>> >
>> > · CTAS logical and physical plan nodes
>> >
>> > · Conversions from SQL parsed logical plans to v2 logical plans
>> >
>> > Initially, this will always use the v2 catalog backed by SessionCatalog
>> to avoid dependence on the multi-catalog work. All of those are already
>> implemented and working, so I think it is reasonable that we can get them
>> in.
>> > Then we can consider a few stretch goals:
>> > · Get in as much DDL as we can. I think create and drop table
>> should be easy.
>> >
>> > · Multi-catalog identifier parsing and multi-catalog support
>> >
>> > If we get those last two in, it would be great. We can make the call
>> closer to release time. Does anyone want to change this set of work?
>> >
>> >
>> > On Tue, Feb 26, 2019 at 4:23 PM Matt Cheah  wrote:
>> >
>> > What would then be the next steps we'd take to collectively decide on
>> plans and timelines moving forward? Might I suggest scheduling a conference
>> call with appropriate PMCs to put our ideas together? Maybe such a
>> discussion can take place at next week's meeting? Or do we need to have a
>> separate formalized voting thread which is guided by a PMC?
>> >
>> > My suggestion is to try to make concrete steps forward and to avoid
>> letting this slip through the cracks.
>> >
>> > I also think there would be merits to having a project plan and
>> estimates around how long each of the features we want to complete is going
>> to take to implement and review.
>> >
>> > -Matt Cheah
>> >
>> > On 2/24/19, 3:05 PM, "Sean Owen"  wrote:
>> >
>> > Sure, I don't read anyone making these statements though? Let's
>> assume
>> > good intent, that "foo should happen" as "my opinion as a member of
>> > the community, which is not solely up to me, is that foo should
>> > happen". I understand it's possible for a person to make their
>> opinion
>> > over-weighted; this whole style of decision making assumes good
>> actors
>> > and doesn't optimize against bad ones. Not that it can't happen,
>> just
>> > not seeing it here.
>> >
>> > I have never seen any vote on a feature list, by a PMC or otherwise.
>> > We can do that if really needed I guess. But that also isn't the
>> > authoritative process in play here, in 

[VOTE] SPIP: Spark API for Table Metadata

2019-02-27 Thread Ryan Blue
Hi everyone,

In the last DSv2 sync, the consensus was that the table metadata SPIP was
ready to bring up for a vote. Now that the multi-catalog identifier SPIP
vote has passed, I'd like to start one for the table metadata API,
TableCatalog.

The proposal is for adding a TableCatalog interface that will be used by v2
plans. That interface has methods to load, create, drop, alter, refresh,
rename, and check existence for tables. It also specifies the set of
metadata used to configure tables: schema, partitioning, and key-value
properties. For more information, please read the SPIP proposal doc

.

Please vote in the next 3 days.

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...


Thanks!

-- 
Ryan Blue
Software Engineer
Netflix


Re: [DISCUSS] SPIP: .NET bindings for Apache Spark

2019-02-27 Thread Holden Karau
I’m +1 with Seans comment on the JIRA initially starting outside of Spark
is probably the easiest  way forward.

On Wed, Feb 27, 2019 at 10:04 AM Sriram Sundaresan <
sriram.sundare...@imaginea.com> wrote:

> I am interested to take this up. Please let me know how to
> proceed/contribute to this.
>
>
>
> On Wed, Feb 27, 2019 at 11:27 PM Terry Kim  wrote:
>
>> Hi,
>>
>> I have posted a SPIP to JIRA:
>> https://issues.apache.org/jira/browse/SPARK-27006.
>>
>> I look forward to your feedback.
>>
>> Thanks,
>> Terry
>>
>
> Disclaimer:
> The contents of this email and any attachments are confidential. They are
> intended for the named recipient(s) only. If you have received this email
> by mistake, please notify the sender immediately and do not disclose the
> contents to anyone or make copies thereof.

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] SPIP: .NET bindings for Apache Spark

2019-02-27 Thread Sriram Sundaresan
I am interested to take this up. Please let me know how to
proceed/contribute to this.



On Wed, Feb 27, 2019 at 11:27 PM Terry Kim  wrote:

> Hi,
>
> I have posted a SPIP to JIRA:
> https://issues.apache.org/jira/browse/SPARK-27006.
>
> I look forward to your feedback.
>
> Thanks,
> Terry
>

-- 
Disclaimer:
The contents of this email and any attachments are 
confidential. They are intended for the named recipient(s) only. If you 
have received this email by mistake, please notify the sender immediately 
and do not disclose the contents to anyone or make copies thereof. 


[DISCUSS] SPIP: .NET bindings for Apache Spark

2019-02-27 Thread Terry Kim
Hi,

I have posted a SPIP to JIRA:
https://issues.apache.org/jira/browse/SPARK-27006.

I look forward to your feedback.

Thanks,
Terry


Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-27 Thread Ryan Blue
I think that's a good plan. Let's get the functionality done, but mark it
experimental pending a new row API.

So is there agreement on this set of work, then?

On Tue, Feb 26, 2019 at 6:30 PM Matei Zaharia 
wrote:

> To add to this, we can add a stable interface anytime if the original one
> was marked as unstable; we wouldn’t have to wait until 4.0. We had a lot of
> APIs that were experimental in 2.0 and then got stabilized in later 2.x
> releases for example.
>
> Matei
>
> > On Feb 26, 2019, at 5:12 PM, Reynold Xin  wrote:
> >
> > We will have to fix that before we declare dev2 is stable, because
> InternalRow is not a stable API. We don’t necessarily need to do it in 3.0.
> >
> > On Tue, Feb 26, 2019 at 5:10 PM Matt Cheah  wrote:
> > Will that then require an API break down the line? Do we save that for
> Spark 4?
> >
> >
> >
> >
> > -Matt Cheah?
> >
> >
> >
> > From: Ryan Blue 
> > Reply-To: "rb...@netflix.com" 
> > Date: Tuesday, February 26, 2019 at 4:53 PM
> > To: Matt Cheah 
> > Cc: Sean Owen , Wenchen Fan ,
> Xiao Li , Matei Zaharia ,
> Spark Dev List 
> > Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2
> >
> >
> >
> > That's a good question.
> >
> >
> >
> > While I'd love to have a solution for that, I don't think it is a good
> idea to delay DSv2 until we have one. That is going to require a lot of
> internal changes and I don't see how we could make the release date if we
> are including an InternalRow replacement.
> >
> >
> >
> > On Tue, Feb 26, 2019 at 4:41 PM Matt Cheah  wrote:
> >
> > Reynold made a note earlier about a proper Row API that isn’t
> InternalRow – is that still on the table?
> >
> >
> >
> > -Matt Cheah
> >
> >
> >
> > From: Ryan Blue 
> > Reply-To: "rb...@netflix.com" 
> > Date: Tuesday, February 26, 2019 at 4:40 PM
> > To: Matt Cheah 
> > Cc: Sean Owen , Wenchen Fan ,
> Xiao Li , Matei Zaharia ,
> Spark Dev List 
> > Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2
> >
> >
> >
> > Thanks for bumping this, Matt. I think we can have the discussion here
> to clarify exactly what we’re committing to and then have a vote thread
> once we’re agreed.
> > Getting back to the DSv2 discussion, I think we have a good handle on
> what would be added:
> > · Plugin system for catalogs
> >
> > · TableCatalog interface (I’ll start a vote thread for this SPIP
> shortly)
> >
> > · TableCatalog implementation backed by SessionCatalog that can
> load v2 tables
> >
> > · Resolution rule to load v2 tables using the new catalog
> >
> > · CTAS logical and physical plan nodes
> >
> > · Conversions from SQL parsed logical plans to v2 logical plans
> >
> > Initially, this will always use the v2 catalog backed by SessionCatalog
> to avoid dependence on the multi-catalog work. All of those are already
> implemented and working, so I think it is reasonable that we can get them
> in.
> > Then we can consider a few stretch goals:
> > · Get in as much DDL as we can. I think create and drop table
> should be easy.
> >
> > · Multi-catalog identifier parsing and multi-catalog support
> >
> > If we get those last two in, it would be great. We can make the call
> closer to release time. Does anyone want to change this set of work?
> >
> >
> > On Tue, Feb 26, 2019 at 4:23 PM Matt Cheah  wrote:
> >
> > What would then be the next steps we'd take to collectively decide on
> plans and timelines moving forward? Might I suggest scheduling a conference
> call with appropriate PMCs to put our ideas together? Maybe such a
> discussion can take place at next week's meeting? Or do we need to have a
> separate formalized voting thread which is guided by a PMC?
> >
> > My suggestion is to try to make concrete steps forward and to avoid
> letting this slip through the cracks.
> >
> > I also think there would be merits to having a project plan and
> estimates around how long each of the features we want to complete is going
> to take to implement and review.
> >
> > -Matt Cheah
> >
> > On 2/24/19, 3:05 PM, "Sean Owen"  wrote:
> >
> > Sure, I don't read anyone making these statements though? Let's
> assume
> > good intent, that "foo should happen" as "my opinion as a member of
> > the community, which is not solely up to me, is that foo should
> > happen". I understand it's possible for a person to make their
> opinion
> > over-weighted; this whole style of decision making assumes good
> actors
> > and doesn't optimize against bad ones. Not that it can't happen, just
> > not seeing it here.
> >
> > I have never seen any vote on a feature list, by a PMC or otherwise.
> > We can do that if really needed I guess. But that also isn't the
> > authoritative process in play here, in contrast.
> >
> > If there's not a more specific subtext or issue here, which is fine
> to
> > say (on private@ if it's sensitive or something), yes, let's move on
> > in good faith.
> >
> > On Sun, Feb 24, 2019 at 3:45 PM Mark 

Spark-History ACLS

2019-02-27 Thread G, Ajay (Nokia - IN/Bangalore)
Hello,

I was trying spark-history acl security on spark-2.4. I have written a 
authentication filter which handles user authentication. This is the 
spark-config I have used.

spark.ui.filters com.ag.spark.AuthenticationFilter

spark.acls.enable true
spark.history.ui.acls.enable true
spark.history.ui.admin.acls ajay
spark.history.ui.admin.acls.groups ajay
spark.ui.view.acls ajay
spark.ui.view.acls.groups ajay


When ACLS is enabled all users (users who don't have view permission) can 
access  /api/v1/applications and when I hit any specific 
application-id only then securityManager.setAcl() is internally called.
Is this behaviour expected ? If yes, "Can this be documented in the user guide.


Thanks and Regards,
Ajay