Re: DataSourceV2 hangouts sync

2018-11-01 Thread Ryan Blue
Thanks to everyone that attended the sync! We had some good discussions.
Here are my notes for anyone that missed it or couldn’t join the live
stream. If anyone wants to add to this, please send additional thoughts or
corrections.

*Attendees:*

   - Ryan Blue - Netflix - Using v2 to integrate Iceberg with Spark. SQL,
   DDL/schema evolution, delete support, and hidden partitioning working in
   Netflix’s branch.
   - John Zhuge - Netflix - Working on multi-catalog support.
   - Felix Cheung - Uber - Interested in integrating Uber data sources.
   External catalog would be useful
   - Reynold Xin - DataBricks - Working more on SQL and sources
   - Arun M - HortonWorks - Interested in streaming, unified continuous and
   micro-batch modes
   - Dale Richardson - Private developer - Interested in non-FS based
   partitions
   - Dongjoon Hyun - HortonWorks - Looking at ORC support in v2,
   transactional processing on the write side, data lineage
   - Genglian Wang - DataBricks - ORC with v2 API
   - Hyukjin Kwon - HortonWorks - DSv2 API implementation for Hive
   warehouse, LLAP
   - Kevin Yu - IBM - Design for v2
   - Matt Cheah - Palantir - Interested in integrating a custom data store
   - Thomas D’Silva - Salesforce - Interested in a v2 Phoenix connector
   - Wenchen Fan - DataBricks - Proposed DSv2
   - Xiao Li - DataBricks - SparkSQL, reviews v2 patches
   - Yuanjian Li - Interested in continuous streaming, new catalog API

*Goals for this sync:*

   - Build consensus around the context that affects roadmap planning
   - Set priorities and some reasonable milestones
   - In the future, we’ll open things up to more general technical
   discussions, but we will be more effective if we are aligned first.

*Conclusions:*

   - Current blocker is Wenchen’s API update. Please review the refactor
   proposal
   

   and PR #22547 
   - Catalog support: this is a high priority blocker for SQL and real use
   of DSv2
  - Please review the TableCatalog SPIP proposal
  

  and the implementation, PR #21306
  
  - A proposal for incrementally introducing catalogs is in PR #21978
  
  - Generic and specific catalog support should use the same API
  - Replacing the current global catalog will be done in parts with
  more specific APIs like TableCatalog and FunctionCatalog
   - Behavior compatibility:
  - V2 will have well-defined behavior, primarily implemented by Spark
  to ensure consistency across sources (e.g. CTAS)
  - Uses of V1 sources should not see behavior changes when sources are
  updated to use V2.
  - Reconciling these concerns is difficult. File-based sources may
  need to implement compatibility hacks, like checking
  spark.sql.sources.partitionOverwriteMode
  - Explicit overwrite is preferred to automatic partition overwrite.
  This mechanism could be used to translate some behaviors of INSERT
  OVERWRITE ... PARTITION for V2 sources.

*Context that affects roadmap planning:*

This is a *summary* of the long discussion, not quotes. It may not be in
the right order, but I think it captures the highlights.

   -

   The community adopted the SPIP to standardize logical plans
   

   and this requires a catalog API for sources
   
.
   With multi-catalog support also coming soon, it makes sense to incorporate
   this in the design and planning.
   - Wenchen mentioned that there are two types of catalogs. First, generic
  catalogs that can track tables with configurable implementations
(like the
  current catalog that can track Parquet, JDBC, JSON, etc. tables). Second,
  there are specific catalogs that expose a certain type of table (like a
  JDBC catalog that exposes all of the tables in a relational DB or a
  Cassandra catalog).
  - Ryan: we should be able to use the same catalog API for both of
  these use cases.
  - Reynold: DataBricks is interested in a catalog API and it should
  replace the existing API. Replacing the existing API is difficult because
  there are many concerns, like tracking functions. The current API is
  complicated and may be difficult to replace.
  - Ryan: Past discussions have suggested replacing the current catalog
  API in pieces, like the proposed TableCatalog API for named tables, a
  FunctionCatalog API to track UDFs, and a PathCatalog for
path-based tables.
  I’ve 

Re: DataSourceV2 hangouts sync

2018-10-31 Thread Arun Mahadevan
Thanks for bringing up the custom metrics API in the list, its something
that needs to be addressed.

A couple more items worth considering,

1. Possibility to unify the batch, micro-batch and continuous sources.
(similar to SPARK-25000)
Right now now there is significant code duplication even between
micro-batch v/s continuous sources.
Attempt to redesign such that a single implementation could potentially
work across modes (by implementing relevant apis).
2. Better framework support for supporting end-end exactly-once in
streaming. (maybe framework level support for 2PC).

Thanks,
Arun


On Tue, 30 Oct 2018 at 19:24, Wenchen Fan  wrote:

> Hi all,
>
> I spent some time thinking about the roadmap, and came up with an initial
> list:
> SPARK-25390: data source V2 API refactoring
> SPARK-24252: add catalog support
> SPARK-25531: new write APIs for data source v2
> SPARK-25190: better operator pushdown API
> Streaming rate control API
> Custom metrics API
> Migrate existing data sources
> Move data source v2 and built-in implementations to individual modules.
>
>
> Let's have more discussion over the hangout.
>
> Thanks,
> Wenchen
>
> On Tue, Oct 30, 2018 at 4:32 AM Ryan Blue 
> wrote:
>
>> Everyone,
>>
>> There are now 25 guests invited, which is a lot of people to actively
>> participate in a sync like this.
>>
>> For those of you who probably won't actively participate, I've added a
>> live stream. If you don't plan to talk, please use the live stream instead
>> of the meet/hangout so that we don't end up with so many people that we
>> can't actually get the discussion going. Here's a link to the stream:
>>
>> https://stream.meet.google.com/stream/6be59d80-04c7-44dc-9042-4f3b597fc8ba
>>
>> Thanks!
>>
>> rb
>>
>> On Thu, Oct 25, 2018 at 1:09 PM Ryan Blue  wrote:
>>
>>> Hi everyone,
>>>
>>> There's been some great discussion for DataSourceV2 in the last few
>>> months, but it has been difficult to resolve some of the discussions and I
>>> don't think that we have a very clear roadmap for getting the work done.
>>>
>>> To coordinate better as a community, I'd like to start a regular sync-up
>>> over google hangouts. We use this in the Parquet community to have more
>>> effective community discussions about thorny technical issues and to get
>>> aligned on an overall roadmap. It is really helpful in that community and I
>>> think it would help us get DSv2 done more quickly.
>>>
>>> Here's how it works: people join the hangout, we go around the list to
>>> gather topics, have about an hour-long discussion, and then send a summary
>>> of the discussion to the dev list for anyone that couldn't participate.
>>> That way we can move topics along, but we keep the broader community in the
>>> loop as well for further discussion on the mailing list.
>>>
>>> I'll volunteer to set up the sync and send invites to anyone that wants
>>> to attend. If you're interested, please reply with the email address you'd
>>> like to put on the invite list (if there's a way to do this without
>>> specific invites, let me know). Also for the first sync, please note what
>>> times would work for you so we can try to account for people in different
>>> time zones.
>>>
>>> For the first one, I was thinking some day next week (time TBD by those
>>> interested) and starting off with a general roadmap discussion before
>>> diving into specific technical topics.
>>>
>>> Thanks,
>>>
>>> rb
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>


Re: DataSourceV2 hangouts sync

2018-10-30 Thread Wenchen Fan
Hi all,

I spent some time thinking about the roadmap, and came up with an initial
list:
SPARK-25390: data source V2 API refactoring
SPARK-24252: add catalog support
SPARK-25531: new write APIs for data source v2
SPARK-25190: better operator pushdown API
Streaming rate control API
Custom metrics API
Migrate existing data sources
Move data source v2 and built-in implementations to individual modules.


Let's have more discussion over the hangout.

Thanks,
Wenchen

On Tue, Oct 30, 2018 at 4:32 AM Ryan Blue  wrote:

> Everyone,
>
> There are now 25 guests invited, which is a lot of people to actively
> participate in a sync like this.
>
> For those of you who probably won't actively participate, I've added a
> live stream. If you don't plan to talk, please use the live stream instead
> of the meet/hangout so that we don't end up with so many people that we
> can't actually get the discussion going. Here's a link to the stream:
>
> https://stream.meet.google.com/stream/6be59d80-04c7-44dc-9042-4f3b597fc8ba
>
> Thanks!
>
> rb
>
> On Thu, Oct 25, 2018 at 1:09 PM Ryan Blue  wrote:
>
>> Hi everyone,
>>
>> There's been some great discussion for DataSourceV2 in the last few
>> months, but it has been difficult to resolve some of the discussions and I
>> don't think that we have a very clear roadmap for getting the work done.
>>
>> To coordinate better as a community, I'd like to start a regular sync-up
>> over google hangouts. We use this in the Parquet community to have more
>> effective community discussions about thorny technical issues and to get
>> aligned on an overall roadmap. It is really helpful in that community and I
>> think it would help us get DSv2 done more quickly.
>>
>> Here's how it works: people join the hangout, we go around the list to
>> gather topics, have about an hour-long discussion, and then send a summary
>> of the discussion to the dev list for anyone that couldn't participate.
>> That way we can move topics along, but we keep the broader community in the
>> loop as well for further discussion on the mailing list.
>>
>> I'll volunteer to set up the sync and send invites to anyone that wants
>> to attend. If you're interested, please reply with the email address you'd
>> like to put on the invite list (if there's a way to do this without
>> specific invites, let me know). Also for the first sync, please note what
>> times would work for you so we can try to account for people in different
>> time zones.
>>
>> For the first one, I was thinking some day next week (time TBD by those
>> interested) and starting off with a general roadmap discussion before
>> diving into specific technical topics.
>>
>> Thanks,
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: DataSourceV2 hangouts sync

2018-10-29 Thread Ryan Blue
Everyone,

There are now 25 guests invited, which is a lot of people to actively
participate in a sync like this.

For those of you who probably won't actively participate, I've added a live
stream. If you don't plan to talk, please use the live stream instead of
the meet/hangout so that we don't end up with so many people that we can't
actually get the discussion going. Here's a link to the stream:

https://stream.meet.google.com/stream/6be59d80-04c7-44dc-9042-4f3b597fc8ba

Thanks!

rb

On Thu, Oct 25, 2018 at 1:09 PM Ryan Blue  wrote:

> Hi everyone,
>
> There's been some great discussion for DataSourceV2 in the last few
> months, but it has been difficult to resolve some of the discussions and I
> don't think that we have a very clear roadmap for getting the work done.
>
> To coordinate better as a community, I'd like to start a regular sync-up
> over google hangouts. We use this in the Parquet community to have more
> effective community discussions about thorny technical issues and to get
> aligned on an overall roadmap. It is really helpful in that community and I
> think it would help us get DSv2 done more quickly.
>
> Here's how it works: people join the hangout, we go around the list to
> gather topics, have about an hour-long discussion, and then send a summary
> of the discussion to the dev list for anyone that couldn't participate.
> That way we can move topics along, but we keep the broader community in the
> loop as well for further discussion on the mailing list.
>
> I'll volunteer to set up the sync and send invites to anyone that wants to
> attend. If you're interested, please reply with the email address you'd
> like to put on the invite list (if there's a way to do this without
> specific invites, let me know). Also for the first sync, please note what
> times would work for you so we can try to account for people in different
> time zones.
>
> For the first one, I was thinking some day next week (time TBD by those
> interested) and starting off with a general roadmap discussion before
> diving into specific technical topics.
>
> Thanks,
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: DataSourceV2 hangouts sync

2018-10-28 Thread Russell Spitzer
Responding for invite

On Fri, Oct 26, 2018, 12:34 PM Ryan Blue  wrote:

> Looks like the majority opinion is for Wednesday. I've sent out an invite
> to everyone that replied and will add more people as I hear more responses.
>
> Thanks, everyone!
>
> On Fri, Oct 26, 2018 at 3:23 AM Gengliang Wang  wrote:
>
>> +1
>>
>> On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon  wrote:
>>
>> I didn't know I live in the same timezone with you Wenchen :D.
>> Monday or Wednesday at 5PM PDT sounds good to me too FWIW.
>>
>> 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성:
>>
>>> Good point. How about Monday or Wednesday at 5PM PDT then?
>>>
>>> Everyone, please reply to me (no need to spam the list) with which
>>> option works for you and I'll send an invite for the one with the most
>>> votes.
>>>
>>> On Thu, Oct 25, 2018 at 5:14 PM Wenchen Fan  wrote:
>>>
 Friday at the bay area is Saturday at my side, it will be great if we
 can pick a day from Monday to Thursday.

 On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue  wrote:

> Since not many people have replied with a time window, how about we
> aim for 5PM PDT? That should work for Wenchen and most people here in the
> bay area.
>
> If that makes it so some people can't attend, we can do the next one
> earlier for people in Europe.
>
> If we go with 5PM PDT, then what day works best for everyone?
>
> On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan 
> wrote:
>
>> Big +1 on this!
>>
>> I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay
>> area. Hopefully we can coordinate a time that fits everyone.
>>
>> Thanks
>> Wenchen
>>
>>
>>
>> On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>>
>>> +1. Thank you for volunteering, Ryan!
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li 
>>> wrote:
>>>
 +1

 Reynold Xin  于2018年10月25日周四 下午4:16写道:

> +1
>
>
>
> On Thu, Oct 25, 2018 at 4:12 PM Li Jin 
> wrote:
>
>> Although I am not specifically involved in DSv2, I think having
>> this kind of meeting is definitely helpful to discuss, move certain 
>> effort
>> forward and keep people on the same page. Glad to see this kind of 
>> working
>> group happening.
>>
>> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge 
>> wrote:
>>
>>> Great idea!
>>>
>>> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue <
>>> rb...@netflix.com.invalid> wrote:
>>>
 Hi everyone,

 There's been some great discussion for DataSourceV2 in the last
 few months, but it has been difficult to resolve some of the 
 discussions
 and I don't think that we have a very clear roadmap for getting 
 the work
 done.

 To coordinate better as a community, I'd like to start a
 regular sync-up over google hangouts. We use this in the Parquet 
 community
 to have more effective community discussions about thorny 
 technical issues
 and to get aligned on an overall roadmap. It is really helpful in 
 that
 community and I think it would help us get DSv2 done more quickly.

 Here's how it works: people join the hangout, we go around the
 list to gather topics, have about an hour-long discussion, and 
 then send a
 summary of the discussion to the dev list for anyone that couldn't
 participate. That way we can move topics along, but we keep the 
 broader
 community in the loop as well for further discussion on the 
 mailing list.

 I'll volunteer to set up the sync and send invites to anyone
 that wants to attend. If you're interested, please reply with the 
 email
 address you'd like to put on the invite list (if there's a way to 
 do this
 without specific invites, let me know). Also for the first sync, 
 please
 note what times would work for you so we can try to account for 
 people in
 different time zones.

 For the first one, I was thinking some day next week (time TBD
 by those interested) and starting off with a general roadmap 
 discussion
 before diving into specific technical topics.

 Thanks,

 rb

 --
 Ryan Blue
 Software Engineer
 Netflix

>>>
>>>
>>> --
>>> John Zhuge

Re: DataSourceV2 hangouts sync

2018-10-26 Thread Ryan Blue
Looks like the majority opinion is for Wednesday. I've sent out an invite
to everyone that replied and will add more people as I hear more responses.

Thanks, everyone!

On Fri, Oct 26, 2018 at 3:23 AM Gengliang Wang  wrote:

> +1
>
> On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon  wrote:
>
> I didn't know I live in the same timezone with you Wenchen :D.
> Monday or Wednesday at 5PM PDT sounds good to me too FWIW.
>
> 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성:
>
>> Good point. How about Monday or Wednesday at 5PM PDT then?
>>
>> Everyone, please reply to me (no need to spam the list) with which option
>> works for you and I'll send an invite for the one with the most votes.
>>
>> On Thu, Oct 25, 2018 at 5:14 PM Wenchen Fan  wrote:
>>
>>> Friday at the bay area is Saturday at my side, it will be great if we
>>> can pick a day from Monday to Thursday.
>>>
>>> On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue  wrote:
>>>
 Since not many people have replied with a time window, how about we aim
 for 5PM PDT? That should work for Wenchen and most people here in the bay
 area.

 If that makes it so some people can't attend, we can do the next one
 earlier for people in Europe.

 If we go with 5PM PDT, then what day works best for everyone?

 On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan 
 wrote:

> Big +1 on this!
>
> I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay
> area. Hopefully we can coordinate a time that fits everyone.
>
> Thanks
> Wenchen
>
>
>
> On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
> wrote:
>
>> +1. Thank you for volunteering, Ryan!
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>>>
 +1



 On Thu, Oct 25, 2018 at 4:12 PM Li Jin 
 wrote:

> Although I am not specifically involved in DSv2, I think having
> this kind of meeting is definitely helpful to discuss, move certain 
> effort
> forward and keep people on the same page. Glad to see this kind of 
> working
> group happening.
>
> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge 
> wrote:
>
>> Great idea!
>>
>> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue <
>> rb...@netflix.com.invalid> wrote:
>>
>>> Hi everyone,
>>>
>>> There's been some great discussion for DataSourceV2 in the last
>>> few months, but it has been difficult to resolve some of the 
>>> discussions
>>> and I don't think that we have a very clear roadmap for getting the 
>>> work
>>> done.
>>>
>>> To coordinate better as a community, I'd like to start a regular
>>> sync-up over google hangouts. We use this in the Parquet community 
>>> to have
>>> more effective community discussions about thorny technical issues 
>>> and to
>>> get aligned on an overall roadmap. It is really helpful in that 
>>> community
>>> and I think it would help us get DSv2 done more quickly.
>>>
>>> Here's how it works: people join the hangout, we go around the
>>> list to gather topics, have about an hour-long discussion, and then 
>>> send a
>>> summary of the discussion to the dev list for anyone that couldn't
>>> participate. That way we can move topics along, but we keep the 
>>> broader
>>> community in the loop as well for further discussion on the mailing 
>>> list.
>>>
>>> I'll volunteer to set up the sync and send invites to anyone
>>> that wants to attend. If you're interested, please reply with the 
>>> email
>>> address you'd like to put on the invite list (if there's a way to 
>>> do this
>>> without specific invites, let me know). Also for the first sync, 
>>> please
>>> note what times would work for you so we can try to account for 
>>> people in
>>> different time zones.
>>>
>>> For the first one, I was thinking some day next week (time TBD
>>> by those interested) and starting off with a general roadmap 
>>> discussion
>>> before diving into specific technical topics.
>>>
>>> Thanks,
>>>
>>> rb
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> John Zhuge
>>
>

 --
 Ryan Blue
 Software Engineer
 Netflix

>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: DataSourceV2 hangouts sync

2018-10-26 Thread Gengliang Wang
+1

> On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon  wrote:
> 
> I didn't know I live in the same timezone with you Wenchen :D.
> Monday or Wednesday at 5PM PDT sounds good to me too FWIW. 
> 
> 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성:
> Good point. How about Monday or Wednesday at 5PM PDT then?
> 
> Everyone, please reply to me (no need to spam the list) with which option 
> works for you and I'll send an invite for the one with the most votes.
> 
> On Thu, Oct 25, 2018 at 5:14 PM Wenchen Fan  > wrote:
> Friday at the bay area is Saturday at my side, it will be great if we can 
> pick a day from Monday to Thursday.
> 
> On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue  > wrote:
> Since not many people have replied with a time window, how about we aim for 
> 5PM PDT? That should work for Wenchen and most people here in the bay area.
> 
> If that makes it so some people can't attend, we can do the next one earlier 
> for people in Europe.
> 
> If we go with 5PM PDT, then what day works best for everyone?
> 
> On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan  > wrote:
> Big +1 on this!
> 
> I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay area. 
> Hopefully we can coordinate a time that fits everyone.
> 
> Thanks
> Wenchen
> 
> 
> 
> On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun  > wrote:
> +1. Thank you for volunteering, Ryan!
> 
> Bests,
> Dongjoon.
> 
> 
> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  > wrote:
> +1
> 
> Reynold Xin mailto:r...@databricks.com>> 于2018年10月25日周四 
> 下午4:16写道:
> +1
> 
> 
> 
> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  > wrote:
> Although I am not specifically involved in DSv2, I think having this kind of 
> meeting is definitely helpful to discuss, move certain effort forward and 
> keep people on the same page. Glad to see this kind of working group 
> happening.
> 
> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  > wrote:
> Great idea!
> 
> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue  wrote:
> Hi everyone,
> 
> There's been some great discussion for DataSourceV2 in the last few months, 
> but it has been difficult to resolve some of the discussions and I don't 
> think that we have a very clear roadmap for getting the work done.
> 
> To coordinate better as a community, I'd like to start a regular sync-up over 
> google hangouts. We use this in the Parquet community to have more effective 
> community discussions about thorny technical issues and to get aligned on an 
> overall roadmap. It is really helpful in that community and I think it would 
> help us get DSv2 done more quickly.
> 
> Here's how it works: people join the hangout, we go around the list to gather 
> topics, have about an hour-long discussion, and then send a summary of the 
> discussion to the dev list for anyone that couldn't participate. That way we 
> can move topics along, but we keep the broader community in the loop as well 
> for further discussion on the mailing list.
> 
> I'll volunteer to set up the sync and send invites to anyone that wants to 
> attend. If you're interested, please reply with the email address you'd like 
> to put on the invite list (if there's a way to do this without specific 
> invites, let me know). Also for the first sync, please note what times would 
> work for you so we can try to account for people in different time zones.
> 
> For the first one, I was thinking some day next week (time TBD by those 
> interested) and starting off with a general roadmap discussion before diving 
> into specific technical topics.
> 
> Thanks,
> 
> rb
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix
> 
> 
> -- 
> John Zhuge
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix



Re: DataSourceV2 hangouts sync

2018-10-25 Thread Saikat Kanjilal
Ditto, I’d also like to join and am in Seattle, generally afternoons work 
better for me.

Sent from my iPhone

On Oct 25, 2018, at 5:02 PM, Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:

Big +1 on this!

I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay area. 
Hopefully we can coordinate a time that fits everyone.

Thanks
Wenchen



On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1. Thank you for volunteering, Ryan!

Bests,
Dongjoon.


On Thu, Oct 25, 2018 at 4:19 PM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
+1

Reynold Xin mailto:r...@databricks.com>> 于2018年10月25日周四 
下午4:16写道:
+1



On Thu, Oct 25, 2018 at 4:12 PM Li Jin 
mailto:ice.xell...@gmail.com>> wrote:
Although I am not specifically involved in DSv2, I think having this kind of 
meeting is definitely helpful to discuss, move certain effort forward and keep 
people on the same page. Glad to see this kind of working group happening.

On Thu, Oct 25, 2018 at 5:58 PM John Zhuge 
mailto:jzh...@apache.org>> wrote:
Great idea!

On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
mailto:rb...@netflix.com.invalid>> wrote:
Hi everyone,

There's been some great discussion for DataSourceV2 in the last few months, but 
it has been difficult to resolve some of the discussions and I don't think that 
we have a very clear roadmap for getting the work done.

To coordinate better as a community, I'd like to start a regular sync-up over 
google hangouts. We use this in the Parquet community to have more effective 
community discussions about thorny technical issues and to get aligned on an 
overall roadmap. It is really helpful in that community and I think it would 
help us get DSv2 done more quickly.

Here's how it works: people join the hangout, we go around the list to gather 
topics, have about an hour-long discussion, and then send a summary of the 
discussion to the dev list for anyone that couldn't participate. That way we 
can move topics along, but we keep the broader community in the loop as well 
for further discussion on the mailing list.

I'll volunteer to set up the sync and send invites to anyone that wants to 
attend. If you're interested, please reply with the email address you'd like to 
put on the invite list (if there's a way to do this without specific invites, 
let me know). Also for the first sync, please note what times would work for 
you so we can try to account for people in different time zones.

For the first one, I was thinking some day next week (time TBD by those 
interested) and starting off with a general roadmap discussion before diving 
into specific technical topics.

Thanks,

rb

--
Ryan Blue
Software Engineer
Netflix


--
John Zhuge


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
I didn't know I live in the same timezone with you Wenchen :D.
Monday or Wednesday at 5PM PDT sounds good to me too FWIW.

2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성:

> Good point. How about Monday or Wednesday at 5PM PDT then?
>
> Everyone, please reply to me (no need to spam the list) with which option
> works for you and I'll send an invite for the one with the most votes.
>
> On Thu, Oct 25, 2018 at 5:14 PM Wenchen Fan  wrote:
>
>> Friday at the bay area is Saturday at my side, it will be great if we can
>> pick a day from Monday to Thursday.
>>
>> On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue  wrote:
>>
>>> Since not many people have replied with a time window, how about we aim
>>> for 5PM PDT? That should work for Wenchen and most people here in the bay
>>> area.
>>>
>>> If that makes it so some people can't attend, we can do the next one
>>> earlier for people in Europe.
>>>
>>> If we go with 5PM PDT, then what day works best for everyone?
>>>
>>> On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan  wrote:
>>>
 Big +1 on this!

 I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay
 area. Hopefully we can coordinate a time that fits everyone.

 Thanks
 Wenchen



 On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
 wrote:

> +1. Thank you for volunteering, Ryan!
>
> Bests,
> Dongjoon.
>
>
> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:
>
>> +1
>>
>> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>>
>>> +1
>>>
>>>
>>>
>>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin 
>>> wrote:
>>>
 Although I am not specifically involved in DSv2, I think having
 this kind of meeting is definitely helpful to discuss, move certain 
 effort
 forward and keep people on the same page. Glad to see this kind of 
 working
 group happening.

 On Thu, Oct 25, 2018 at 5:58 PM John Zhuge 
 wrote:

> Great idea!
>
> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue
>  wrote:
>
>> Hi everyone,
>>
>> There's been some great discussion for DataSourceV2 in the last
>> few months, but it has been difficult to resolve some of the 
>> discussions
>> and I don't think that we have a very clear roadmap for getting the 
>> work
>> done.
>>
>> To coordinate better as a community, I'd like to start a regular
>> sync-up over google hangouts. We use this in the Parquet community 
>> to have
>> more effective community discussions about thorny technical issues 
>> and to
>> get aligned on an overall roadmap. It is really helpful in that 
>> community
>> and I think it would help us get DSv2 done more quickly.
>>
>> Here's how it works: people join the hangout, we go around the
>> list to gather topics, have about an hour-long discussion, and then 
>> send a
>> summary of the discussion to the dev list for anyone that couldn't
>> participate. That way we can move topics along, but we keep the 
>> broader
>> community in the loop as well for further discussion on the mailing 
>> list.
>>
>> I'll volunteer to set up the sync and send invites to anyone that
>> wants to attend. If you're interested, please reply with the email 
>> address
>> you'd like to put on the invite list (if there's a way to do this 
>> without
>> specific invites, let me know). Also for the first sync, please note 
>> what
>> times would work for you so we can try to account for people in 
>> different
>> time zones.
>>
>> For the first one, I was thinking some day next week (time TBD by
>> those interested) and starting off with a general roadmap discussion 
>> before
>> diving into specific technical topics.
>>
>> Thanks,
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> John Zhuge
>

>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Ryan Blue
Good point. How about Monday or Wednesday at 5PM PDT then?

Everyone, please reply to me (no need to spam the list) with which option
works for you and I'll send an invite for the one with the most votes.

On Thu, Oct 25, 2018 at 5:14 PM Wenchen Fan  wrote:

> Friday at the bay area is Saturday at my side, it will be great if we can
> pick a day from Monday to Thursday.
>
> On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue  wrote:
>
>> Since not many people have replied with a time window, how about we aim
>> for 5PM PDT? That should work for Wenchen and most people here in the bay
>> area.
>>
>> If that makes it so some people can't attend, we can do the next one
>> earlier for people in Europe.
>>
>> If we go with 5PM PDT, then what day works best for everyone?
>>
>> On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan  wrote:
>>
>>> Big +1 on this!
>>>
>>> I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay
>>> area. Hopefully we can coordinate a time that fits everyone.
>>>
>>> Thanks
>>> Wenchen
>>>
>>>
>>>
>>> On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
>>> wrote:
>>>
 +1. Thank you for volunteering, Ryan!

 Bests,
 Dongjoon.


 On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:

> +1
>
> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>
>> +1
>>
>>
>>
>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:
>>
>>> Although I am not specifically involved in DSv2, I think having this
>>> kind of meeting is definitely helpful to discuss, move certain effort
>>> forward and keep people on the same page. Glad to see this kind of 
>>> working
>>> group happening.
>>>
>>> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge 
>>> wrote:
>>>
 Great idea!

 On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
 wrote:

> Hi everyone,
>
> There's been some great discussion for DataSourceV2 in the last
> few months, but it has been difficult to resolve some of the 
> discussions
> and I don't think that we have a very clear roadmap for getting the 
> work
> done.
>
> To coordinate better as a community, I'd like to start a regular
> sync-up over google hangouts. We use this in the Parquet community to 
> have
> more effective community discussions about thorny technical issues 
> and to
> get aligned on an overall roadmap. It is really helpful in that 
> community
> and I think it would help us get DSv2 done more quickly.
>
> Here's how it works: people join the hangout, we go around the
> list to gather topics, have about an hour-long discussion, and then 
> send a
> summary of the discussion to the dev list for anyone that couldn't
> participate. That way we can move topics along, but we keep the 
> broader
> community in the loop as well for further discussion on the mailing 
> list.
>
> I'll volunteer to set up the sync and send invites to anyone that
> wants to attend. If you're interested, please reply with the email 
> address
> you'd like to put on the invite list (if there's a way to do this 
> without
> specific invites, let me know). Also for the first sync, please note 
> what
> times would work for you so we can try to account for people in 
> different
> time zones.
>
> For the first one, I was thinking some day next week (time TBD by
> those interested) and starting off with a general roadmap discussion 
> before
> diving into specific technical topics.
>
> Thanks,
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


 --
 John Zhuge

>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Wenchen Fan
Friday at the bay area is Saturday at my side, it will be great if we can
pick a day from Monday to Thursday.

On Fri, Oct 26, 2018 at 8:08 AM Ryan Blue  wrote:

> Since not many people have replied with a time window, how about we aim
> for 5PM PDT? That should work for Wenchen and most people here in the bay
> area.
>
> If that makes it so some people can't attend, we can do the next one
> earlier for people in Europe.
>
> If we go with 5PM PDT, then what day works best for everyone?
>
> On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan  wrote:
>
>> Big +1 on this!
>>
>> I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay
>> area. Hopefully we can coordinate a time that fits everyone.
>>
>> Thanks
>> Wenchen
>>
>>
>>
>> On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
>> wrote:
>>
>>> +1. Thank you for volunteering, Ryan!
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:
>>>
 +1

 Reynold Xin  于2018年10月25日周四 下午4:16写道:

> +1
>
>
>
> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:
>
>> Although I am not specifically involved in DSv2, I think having this
>> kind of meeting is definitely helpful to discuss, move certain effort
>> forward and keep people on the same page. Glad to see this kind of 
>> working
>> group happening.
>>
>> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:
>>
>>> Great idea!
>>>
>>> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
>>> wrote:
>>>
 Hi everyone,

 There's been some great discussion for DataSourceV2 in the last few
 months, but it has been difficult to resolve some of the discussions 
 and I
 don't think that we have a very clear roadmap for getting the work 
 done.

 To coordinate better as a community, I'd like to start a regular
 sync-up over google hangouts. We use this in the Parquet community to 
 have
 more effective community discussions about thorny technical issues and 
 to
 get aligned on an overall roadmap. It is really helpful in that 
 community
 and I think it would help us get DSv2 done more quickly.

 Here's how it works: people join the hangout, we go around the list
 to gather topics, have about an hour-long discussion, and then send a
 summary of the discussion to the dev list for anyone that couldn't
 participate. That way we can move topics along, but we keep the broader
 community in the loop as well for further discussion on the mailing 
 list.

 I'll volunteer to set up the sync and send invites to anyone that
 wants to attend. If you're interested, please reply with the email 
 address
 you'd like to put on the invite list (if there's a way to do this 
 without
 specific invites, let me know). Also for the first sync, please note 
 what
 times would work for you so we can try to account for people in 
 different
 time zones.

 For the first one, I was thinking some day next week (time TBD by
 those interested) and starting off with a general roadmap discussion 
 before
 diving into specific technical topics.

 Thanks,

 rb

 --
 Ryan Blue
 Software Engineer
 Netflix

>>>
>>>
>>> --
>>> John Zhuge
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Ryan Blue
Since not many people have replied with a time window, how about we aim for
5PM PDT? That should work for Wenchen and most people here in the bay area.

If that makes it so some people can't attend, we can do the next one
earlier for people in Europe.

If we go with 5PM PDT, then what day works best for everyone?

On Thu, Oct 25, 2018 at 5:01 PM Wenchen Fan  wrote:

> Big +1 on this!
>
> I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay
> area. Hopefully we can coordinate a time that fits everyone.
>
> Thanks
> Wenchen
>
>
>
> On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
> wrote:
>
>> +1. Thank you for volunteering, Ryan!
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>>>
 +1



 On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:

> Although I am not specifically involved in DSv2, I think having this
> kind of meeting is definitely helpful to discuss, move certain effort
> forward and keep people on the same page. Glad to see this kind of working
> group happening.
>
> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:
>
>> Great idea!
>>
>> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> There's been some great discussion for DataSourceV2 in the last few
>>> months, but it has been difficult to resolve some of the discussions 
>>> and I
>>> don't think that we have a very clear roadmap for getting the work done.
>>>
>>> To coordinate better as a community, I'd like to start a regular
>>> sync-up over google hangouts. We use this in the Parquet community to 
>>> have
>>> more effective community discussions about thorny technical issues and 
>>> to
>>> get aligned on an overall roadmap. It is really helpful in that 
>>> community
>>> and I think it would help us get DSv2 done more quickly.
>>>
>>> Here's how it works: people join the hangout, we go around the list
>>> to gather topics, have about an hour-long discussion, and then send a
>>> summary of the discussion to the dev list for anyone that couldn't
>>> participate. That way we can move topics along, but we keep the broader
>>> community in the loop as well for further discussion on the mailing 
>>> list.
>>>
>>> I'll volunteer to set up the sync and send invites to anyone that
>>> wants to attend. If you're interested, please reply with the email 
>>> address
>>> you'd like to put on the invite list (if there's a way to do this 
>>> without
>>> specific invites, let me know). Also for the first sync, please note 
>>> what
>>> times would work for you so we can try to account for people in 
>>> different
>>> time zones.
>>>
>>> For the first one, I was thinking some day next week (time TBD by
>>> those interested) and starting off with a general roadmap discussion 
>>> before
>>> diving into specific technical topics.
>>>
>>> Thanks,
>>>
>>> rb
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> John Zhuge
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Wenchen Fan
Big +1 on this!

I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay area.
Hopefully we can coordinate a time that fits everyone.

Thanks
Wenchen



On Fri, Oct 26, 2018 at 7:21 AM Dongjoon Hyun 
wrote:

> +1. Thank you for volunteering, Ryan!
>
> Bests,
> Dongjoon.
>
>
> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:
>
>> +1
>>
>> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>>
>>> +1
>>>
>>>
>>>
>>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:
>>>
 Although I am not specifically involved in DSv2, I think having this
 kind of meeting is definitely helpful to discuss, move certain effort
 forward and keep people on the same page. Glad to see this kind of working
 group happening.

 On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:

> Great idea!
>
> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
> wrote:
>
>> Hi everyone,
>>
>> There's been some great discussion for DataSourceV2 in the last few
>> months, but it has been difficult to resolve some of the discussions and 
>> I
>> don't think that we have a very clear roadmap for getting the work done.
>>
>> To coordinate better as a community, I'd like to start a regular
>> sync-up over google hangouts. We use this in the Parquet community to 
>> have
>> more effective community discussions about thorny technical issues and to
>> get aligned on an overall roadmap. It is really helpful in that community
>> and I think it would help us get DSv2 done more quickly.
>>
>> Here's how it works: people join the hangout, we go around the list
>> to gather topics, have about an hour-long discussion, and then send a
>> summary of the discussion to the dev list for anyone that couldn't
>> participate. That way we can move topics along, but we keep the broader
>> community in the loop as well for further discussion on the mailing list.
>>
>> I'll volunteer to set up the sync and send invites to anyone that
>> wants to attend. If you're interested, please reply with the email 
>> address
>> you'd like to put on the invite list (if there's a way to do this without
>> specific invites, let me know). Also for the first sync, please note what
>> times would work for you so we can try to account for people in different
>> time zones.
>>
>> For the first one, I was thinking some day next week (time TBD by
>> those interested) and starting off with a general roadmap discussion 
>> before
>> diving into specific technical topics.
>>
>> Thanks,
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> John Zhuge
>



Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
+1 !

2018년 10월 26일 (금) 오전 7:21, Dongjoon Hyun 님이 작성:

> +1. Thank you for volunteering, Ryan!
>
> Bests,
> Dongjoon.
>
>
> On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:
>
>> +1
>>
>> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>>
>>> +1
>>>
>>>
>>>
>>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:
>>>
 Although I am not specifically involved in DSv2, I think having this
 kind of meeting is definitely helpful to discuss, move certain effort
 forward and keep people on the same page. Glad to see this kind of working
 group happening.

 On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:

> Great idea!
>
> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
> wrote:
>
>> Hi everyone,
>>
>> There's been some great discussion for DataSourceV2 in the last few
>> months, but it has been difficult to resolve some of the discussions and 
>> I
>> don't think that we have a very clear roadmap for getting the work done.
>>
>> To coordinate better as a community, I'd like to start a regular
>> sync-up over google hangouts. We use this in the Parquet community to 
>> have
>> more effective community discussions about thorny technical issues and to
>> get aligned on an overall roadmap. It is really helpful in that community
>> and I think it would help us get DSv2 done more quickly.
>>
>> Here's how it works: people join the hangout, we go around the list
>> to gather topics, have about an hour-long discussion, and then send a
>> summary of the discussion to the dev list for anyone that couldn't
>> participate. That way we can move topics along, but we keep the broader
>> community in the loop as well for further discussion on the mailing list.
>>
>> I'll volunteer to set up the sync and send invites to anyone that
>> wants to attend. If you're interested, please reply with the email 
>> address
>> you'd like to put on the invite list (if there's a way to do this without
>> specific invites, let me know). Also for the first sync, please note what
>> times would work for you so we can try to account for people in different
>> time zones.
>>
>> For the first one, I was thinking some day next week (time TBD by
>> those interested) and starting off with a general roadmap discussion 
>> before
>> diving into specific technical topics.
>>
>> Thanks,
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> John Zhuge
>



Re: DataSourceV2 hangouts sync

2018-10-25 Thread Dongjoon Hyun
+1. Thank you for volunteering, Ryan!

Bests,
Dongjoon.


On Thu, Oct 25, 2018 at 4:19 PM Xiao Li  wrote:

> +1
>
> Reynold Xin  于2018年10月25日周四 下午4:16写道:
>
>> +1
>>
>>
>>
>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:
>>
>>> Although I am not specifically involved in DSv2, I think having this
>>> kind of meeting is definitely helpful to discuss, move certain effort
>>> forward and keep people on the same page. Glad to see this kind of working
>>> group happening.
>>>
>>> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:
>>>
 Great idea!

 On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
 wrote:

> Hi everyone,
>
> There's been some great discussion for DataSourceV2 in the last few
> months, but it has been difficult to resolve some of the discussions and I
> don't think that we have a very clear roadmap for getting the work done.
>
> To coordinate better as a community, I'd like to start a regular
> sync-up over google hangouts. We use this in the Parquet community to have
> more effective community discussions about thorny technical issues and to
> get aligned on an overall roadmap. It is really helpful in that community
> and I think it would help us get DSv2 done more quickly.
>
> Here's how it works: people join the hangout, we go around the list to
> gather topics, have about an hour-long discussion, and then send a summary
> of the discussion to the dev list for anyone that couldn't participate.
> That way we can move topics along, but we keep the broader community in 
> the
> loop as well for further discussion on the mailing list.
>
> I'll volunteer to set up the sync and send invites to anyone that
> wants to attend. If you're interested, please reply with the email address
> you'd like to put on the invite list (if there's a way to do this without
> specific invites, let me know). Also for the first sync, please note what
> times would work for you so we can try to account for people in different
> time zones.
>
> For the first one, I was thinking some day next week (time TBD by
> those interested) and starting off with a general roadmap discussion 
> before
> diving into specific technical topics.
>
> Thanks,
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


 --
 John Zhuge

>>>


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Xiao Li
+1

Reynold Xin  于2018年10月25日周四 下午4:16写道:

> +1
>
>
>
> On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:
>
>> Although I am not specifically involved in DSv2, I think having this kind
>> of meeting is definitely helpful to discuss, move certain effort forward
>> and keep people on the same page. Glad to see this kind of working group
>> happening.
>>
>> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:
>>
>>> Great idea!
>>>
>>> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
>>> wrote:
>>>
 Hi everyone,

 There's been some great discussion for DataSourceV2 in the last few
 months, but it has been difficult to resolve some of the discussions and I
 don't think that we have a very clear roadmap for getting the work done.

 To coordinate better as a community, I'd like to start a regular
 sync-up over google hangouts. We use this in the Parquet community to have
 more effective community discussions about thorny technical issues and to
 get aligned on an overall roadmap. It is really helpful in that community
 and I think it would help us get DSv2 done more quickly.

 Here's how it works: people join the hangout, we go around the list to
 gather topics, have about an hour-long discussion, and then send a summary
 of the discussion to the dev list for anyone that couldn't participate.
 That way we can move topics along, but we keep the broader community in the
 loop as well for further discussion on the mailing list.

 I'll volunteer to set up the sync and send invites to anyone that wants
 to attend. If you're interested, please reply with the email address you'd
 like to put on the invite list (if there's a way to do this without
 specific invites, let me know). Also for the first sync, please note what
 times would work for you so we can try to account for people in different
 time zones.

 For the first one, I was thinking some day next week (time TBD by those
 interested) and starting off with a general roadmap discussion before
 diving into specific technical topics.

 Thanks,

 rb

 --
 Ryan Blue
 Software Engineer
 Netflix

>>>
>>>
>>> --
>>> John Zhuge
>>>
>>


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Reynold Xin
+1



On Thu, Oct 25, 2018 at 4:12 PM Li Jin  wrote:

> Although I am not specifically involved in DSv2, I think having this kind
> of meeting is definitely helpful to discuss, move certain effort forward
> and keep people on the same page. Glad to see this kind of working group
> happening.
>
> On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:
>
>> Great idea!
>>
>> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> There's been some great discussion for DataSourceV2 in the last few
>>> months, but it has been difficult to resolve some of the discussions and I
>>> don't think that we have a very clear roadmap for getting the work done.
>>>
>>> To coordinate better as a community, I'd like to start a regular sync-up
>>> over google hangouts. We use this in the Parquet community to have more
>>> effective community discussions about thorny technical issues and to get
>>> aligned on an overall roadmap. It is really helpful in that community and I
>>> think it would help us get DSv2 done more quickly.
>>>
>>> Here's how it works: people join the hangout, we go around the list to
>>> gather topics, have about an hour-long discussion, and then send a summary
>>> of the discussion to the dev list for anyone that couldn't participate.
>>> That way we can move topics along, but we keep the broader community in the
>>> loop as well for further discussion on the mailing list.
>>>
>>> I'll volunteer to set up the sync and send invites to anyone that wants
>>> to attend. If you're interested, please reply with the email address you'd
>>> like to put on the invite list (if there's a way to do this without
>>> specific invites, let me know). Also for the first sync, please note what
>>> times would work for you so we can try to account for people in different
>>> time zones.
>>>
>>> For the first one, I was thinking some day next week (time TBD by those
>>> interested) and starting off with a general roadmap discussion before
>>> diving into specific technical topics.
>>>
>>> Thanks,
>>>
>>> rb
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> John Zhuge
>>
>


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Li Jin
Although I am not specifically involved in DSv2, I think having this kind
of meeting is definitely helpful to discuss, move certain effort forward
and keep people on the same page. Glad to see this kind of working group
happening.

On Thu, Oct 25, 2018 at 5:58 PM John Zhuge  wrote:

> Great idea!
>
> On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue 
> wrote:
>
>> Hi everyone,
>>
>> There's been some great discussion for DataSourceV2 in the last few
>> months, but it has been difficult to resolve some of the discussions and I
>> don't think that we have a very clear roadmap for getting the work done.
>>
>> To coordinate better as a community, I'd like to start a regular sync-up
>> over google hangouts. We use this in the Parquet community to have more
>> effective community discussions about thorny technical issues and to get
>> aligned on an overall roadmap. It is really helpful in that community and I
>> think it would help us get DSv2 done more quickly.
>>
>> Here's how it works: people join the hangout, we go around the list to
>> gather topics, have about an hour-long discussion, and then send a summary
>> of the discussion to the dev list for anyone that couldn't participate.
>> That way we can move topics along, but we keep the broader community in the
>> loop as well for further discussion on the mailing list.
>>
>> I'll volunteer to set up the sync and send invites to anyone that wants
>> to attend. If you're interested, please reply with the email address you'd
>> like to put on the invite list (if there's a way to do this without
>> specific invites, let me know). Also for the first sync, please note what
>> times would work for you so we can try to account for people in different
>> time zones.
>>
>> For the first one, I was thinking some day next week (time TBD by those
>> interested) and starting off with a general roadmap discussion before
>> diving into specific technical topics.
>>
>> Thanks,
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> John Zhuge
>


Re: DataSourceV2 hangouts sync

2018-10-25 Thread John Zhuge
Great idea!

On Thu, Oct 25, 2018 at 1:10 PM Ryan Blue  wrote:

> Hi everyone,
>
> There's been some great discussion for DataSourceV2 in the last few
> months, but it has been difficult to resolve some of the discussions and I
> don't think that we have a very clear roadmap for getting the work done.
>
> To coordinate better as a community, I'd like to start a regular sync-up
> over google hangouts. We use this in the Parquet community to have more
> effective community discussions about thorny technical issues and to get
> aligned on an overall roadmap. It is really helpful in that community and I
> think it would help us get DSv2 done more quickly.
>
> Here's how it works: people join the hangout, we go around the list to
> gather topics, have about an hour-long discussion, and then send a summary
> of the discussion to the dev list for anyone that couldn't participate.
> That way we can move topics along, but we keep the broader community in the
> loop as well for further discussion on the mailing list.
>
> I'll volunteer to set up the sync and send invites to anyone that wants to
> attend. If you're interested, please reply with the email address you'd
> like to put on the invite list (if there's a way to do this without
> specific invites, let me know). Also for the first sync, please note what
> times would work for you so we can try to account for people in different
> time zones.
>
> For the first one, I was thinking some day next week (time TBD by those
> interested) and starting off with a general roadmap discussion before
> diving into specific technical topics.
>
> Thanks,
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
John Zhuge


Re: DataSourceV2 hangouts sync

2018-10-25 Thread Felix Cheung
Yes please!



From: Ryan Blue 
Sent: Thursday, October 25, 2018 1:10 PM
To: Spark Dev List
Subject: DataSourceV2 hangouts sync

Hi everyone,

There's been some great discussion for DataSourceV2 in the last few months, but 
it has been difficult to resolve some of the discussions and I don't think that 
we have a very clear roadmap for getting the work done.

To coordinate better as a community, I'd like to start a regular sync-up over 
google hangouts. We use this in the Parquet community to have more effective 
community discussions about thorny technical issues and to get aligned on an 
overall roadmap. It is really helpful in that community and I think it would 
help us get DSv2 done more quickly.

Here's how it works: people join the hangout, we go around the list to gather 
topics, have about an hour-long discussion, and then send a summary of the 
discussion to the dev list for anyone that couldn't participate. That way we 
can move topics along, but we keep the broader community in the loop as well 
for further discussion on the mailing list.

I'll volunteer to set up the sync and send invites to anyone that wants to 
attend. If you're interested, please reply with the email address you'd like to 
put on the invite list (if there's a way to do this without specific invites, 
let me know). Also for the first sync, please note what times would work for 
you so we can try to account for people in different time zones.

For the first one, I was thinking some day next week (time TBD by those 
interested) and starting off with a general roadmap discussion before diving 
into specific technical topics.

Thanks,

rb

--
Ryan Blue
Software Engineer
Netflix


DataSourceV2 hangouts sync

2018-10-25 Thread Ryan Blue
Hi everyone,

There's been some great discussion for DataSourceV2 in the last few months,
but it has been difficult to resolve some of the discussions and I don't
think that we have a very clear roadmap for getting the work done.

To coordinate better as a community, I'd like to start a regular sync-up
over google hangouts. We use this in the Parquet community to have more
effective community discussions about thorny technical issues and to get
aligned on an overall roadmap. It is really helpful in that community and I
think it would help us get DSv2 done more quickly.

Here's how it works: people join the hangout, we go around the list to
gather topics, have about an hour-long discussion, and then send a summary
of the discussion to the dev list for anyone that couldn't participate.
That way we can move topics along, but we keep the broader community in the
loop as well for further discussion on the mailing list.

I'll volunteer to set up the sync and send invites to anyone that wants to
attend. If you're interested, please reply with the email address you'd
like to put on the invite list (if there's a way to do this without
specific invites, let me know). Also for the first sync, please note what
times would work for you so we can try to account for people in different
time zones.

For the first one, I was thinking some day next week (time TBD by those
interested) and starting off with a general roadmap discussion before
diving into specific technical topics.

Thanks,

rb

-- 
Ryan Blue
Software Engineer
Netflix