Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Eugene Kirpichov
There will be, but not yet. I'll update the thread when the conference
tells me the video is available.

On Thu, Mar 8, 2018, 7:14 PM OrielResearch Eila Arich-Landkof <
e...@orielresearch.org> wrote:

> Hi Eugene,
> is there a video that I can watch?
>
> Many thanks,
> Eila
>
> On Thu, Mar 8, 2018 at 2:49 PM, Eugene Kirpichov 
> wrote:
>
>> Hey all,
>>
>> The slides for my yesterday's talk at Strata San Jose
>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>>  have
>> been posted on the talk page. They may be of interest both to users and IO
>> authors.
>>
>> Thanks.
>>
>
>
>
> --
> Eila
> www.orielresearch.org
> https://www.meetup.com/Deep-Learning-In-Production/
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread OrielResearch Eila Arich-Landkof
Hi Eugene,
is there a video that I can watch?

Many thanks,
Eila

On Thu, Mar 8, 2018 at 2:49 PM, Eugene Kirpichov 
wrote:

> Hey all,
>
> The slides for my yesterday's talk at Strata San Jose https://conferences.
> oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been
> posted on the talk page. They may be of interest both to users and IO
> authors.
>
> Thanks.
>



-- 
Eila
www.orielresearch.org
https://www.meetup.com/Deep-Learning-In-Production/


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Chamikara Jayalath
Great talk, Eugene.

Ted, will share more info on Kafka IO for Python soon :)

- Cham

On Thu, Mar 8, 2018 at 4:55 PM Ted Yu  wrote:

> I see.
>
> I have added myself as watcher on BEAM-3788.
>
> Thanks
>
> On Thu, Mar 8, 2018 at 4:51 PM, Eugene Kirpichov 
> wrote:
>
>> Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was
>> implemented before SDFs existed and hasn't been rewritten yet), but it will
>> be, once more runners catch up with the support: currently we have Dataflow
>> and Flink. +Chamikara Jayalath  is currently
>> working on implementing it using SDFs in the Python SDK.
>>
>> On Thu, Mar 8, 2018 at 4:34 PM Ted Yu  wrote:
>>
>>> Eugene:
>>> Very informative talk.
>>>
>>> I looked at:
>>>
>>> sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java
>>>
>>> Is there some example showing how OffsetRangeTracker works with Kafka
>>> partition(s) ?
>>>
>>> Thanks
>>>
>>> On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov 
>>> wrote:
>>>
 Hi Thomas!

 In case of tailing a Kafka partition, the restriction would be
 [start_offset, infinity), and it would keep being split by checkpointing
 into [start_offset, end_offset) and [end_offset, infinity)

 On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise  wrote:

> Eugene,
>
> I actually had one question regarding the application of SDF for the
> Kafka consumer. Reading through a topic partition can be parallel by
> splitting a partition into multiple restrictions (for use cases where 
> order
> does not matter). But how would the tail read be managed? I assume there
> would not be a new restriction whenever new records arrive (added 
> latency)?
> The examples on slide 40 show an end offset for Kafka, but for a 
> continuous
> read there wouldn't be an end offset?
>
> Thanks,
> Thomas
>
>
> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:
>
>> Great, thanks for sharing!
>>
>>
>> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov <
>> kirpic...@google.com> wrote:
>>
>>> Oops that's just the template I used. Thanks for noticing, will
>>> regenerate the PDF and reupload when I get to it.
>>>
>>>
>>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin 
>>> wrote:
>>>
 Looks like it was a good talk! Why is it Google Confidential &
 Proprietary, though?

 Dan

 On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov <
 kirpic...@google.com> wrote:

> Hey all,
>
> The slides for my yesterday's talk at Strata San Jose
> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>  have
> been posted on the talk page. They may be of interest both to users 
> and IO
> authors.
>
> Thanks.
>


>>
>
>>>
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Ted Yu
I see.

I have added myself as watcher on BEAM-3788.

Thanks

On Thu, Mar 8, 2018 at 4:51 PM, Eugene Kirpichov 
wrote:

> Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was
> implemented before SDFs existed and hasn't been rewritten yet), but it will
> be, once more runners catch up with the support: currently we have Dataflow
> and Flink. +Chamikara Jayalath  is currently
> working on implementing it using SDFs in the Python SDK.
>
> On Thu, Mar 8, 2018 at 4:34 PM Ted Yu  wrote:
>
>> Eugene:
>> Very informative talk.
>>
>> I looked at:
>> sdks/java/core/src/test/java/org/apache/beam/sdk/
>> transforms/splittabledofn/OffsetRangeTrackerTest.java
>>
>> Is there some example showing how OffsetRangeTracker works with Kafka
>> partition(s) ?
>>
>> Thanks
>>
>> On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov 
>> wrote:
>>
>>> Hi Thomas!
>>>
>>> In case of tailing a Kafka partition, the restriction would be
>>> [start_offset, infinity), and it would keep being split by checkpointing
>>> into [start_offset, end_offset) and [end_offset, infinity)
>>>
>>> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise  wrote:
>>>
 Eugene,

 I actually had one question regarding the application of SDF for the
 Kafka consumer. Reading through a topic partition can be parallel by
 splitting a partition into multiple restrictions (for use cases where order
 does not matter). But how would the tail read be managed? I assume there
 would not be a new restriction whenever new records arrive (added latency)?
 The examples on slide 40 show an end offset for Kafka, but for a continuous
 read there wouldn't be an end offset?

 Thanks,
 Thomas


 On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:

> Great, thanks for sharing!
>
>
> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov <
> kirpic...@google.com> wrote:
>
>> Oops that's just the template I used. Thanks for noticing, will
>> regenerate the PDF and reupload when I get to it.
>>
>>
>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin 
>> wrote:
>>
>>> Looks like it was a good talk! Why is it Google Confidential &
>>> Proprietary, though?
>>>
>>> Dan
>>>
>>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov <
>>> kirpic...@google.com> wrote:
>>>
 Hey all,

 The slides for my yesterday's talk at Strata San Jose
 https://conferences.oreilly.com/strata/strata-ca/
 public/schedule/detail/63696 have been posted on the talk page.
 They may be of interest both to users and IO authors.

 Thanks.

>>>
>>>
>

>>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Eugene Kirpichov
Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was
implemented before SDFs existed and hasn't been rewritten yet), but it will
be, once more runners catch up with the support: currently we have Dataflow
and Flink. +Chamikara Jayalath  is currently working
on implementing it using SDFs in the Python SDK.

On Thu, Mar 8, 2018 at 4:34 PM Ted Yu  wrote:

> Eugene:
> Very informative talk.
>
> I looked at:
>
> sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java
>
> Is there some example showing how OffsetRangeTracker works with Kafka
> partition(s) ?
>
> Thanks
>
> On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov 
> wrote:
>
>> Hi Thomas!
>>
>> In case of tailing a Kafka partition, the restriction would be
>> [start_offset, infinity), and it would keep being split by checkpointing
>> into [start_offset, end_offset) and [end_offset, infinity)
>>
>> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise  wrote:
>>
>>> Eugene,
>>>
>>> I actually had one question regarding the application of SDF for the
>>> Kafka consumer. Reading through a topic partition can be parallel by
>>> splitting a partition into multiple restrictions (for use cases where order
>>> does not matter). But how would the tail read be managed? I assume there
>>> would not be a new restriction whenever new records arrive (added latency)?
>>> The examples on slide 40 show an end offset for Kafka, but for a continuous
>>> read there wouldn't be an end offset?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:
>>>
 Great, thanks for sharing!


 On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov >>> > wrote:

> Oops that's just the template I used. Thanks for noticing, will
> regenerate the PDF and reupload when I get to it.
>
>
> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin 
> wrote:
>
>> Looks like it was a good talk! Why is it Google Confidential &
>> Proprietary, though?
>>
>> Dan
>>
>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov <
>> kirpic...@google.com> wrote:
>>
>>> Hey all,
>>>
>>> The slides for my yesterday's talk at Strata San Jose
>>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>>>  have
>>> been posted on the talk page. They may be of interest both to users and 
>>> IO
>>> authors.
>>>
>>> Thanks.
>>>
>>
>>

>>>
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Ted Yu
Eugene:
Very informative talk.

I looked at:
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java

Is there some example showing how OffsetRangeTracker works with Kafka
partition(s) ?

Thanks

On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov 
wrote:

> Hi Thomas!
>
> In case of tailing a Kafka partition, the restriction would be
> [start_offset, infinity), and it would keep being split by checkpointing
> into [start_offset, end_offset) and [end_offset, infinity)
>
> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise  wrote:
>
>> Eugene,
>>
>> I actually had one question regarding the application of SDF for the
>> Kafka consumer. Reading through a topic partition can be parallel by
>> splitting a partition into multiple restrictions (for use cases where order
>> does not matter). But how would the tail read be managed? I assume there
>> would not be a new restriction whenever new records arrive (added latency)?
>> The examples on slide 40 show an end offset for Kafka, but for a continuous
>> read there wouldn't be an end offset?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:
>>
>>> Great, thanks for sharing!
>>>
>>>
>>> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov 
>>> wrote:
>>>
 Oops that's just the template I used. Thanks for noticing, will
 regenerate the PDF and reupload when I get to it.


 On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:

> Looks like it was a good talk! Why is it Google Confidential &
> Proprietary, though?
>
> Dan
>
> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov <
> kirpic...@google.com> wrote:
>
>> Hey all,
>>
>> The slides for my yesterday's talk at Strata San Jose
>> https://conferences.oreilly.com/strata/strata-ca/
>> public/schedule/detail/63696 have been posted on the talk page. They
>> may be of interest both to users and IO authors.
>>
>> Thanks.
>>
>
>
>>>
>>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Eugene Kirpichov
Hi Thomas!

In case of tailing a Kafka partition, the restriction would be
[start_offset, infinity), and it would keep being split by checkpointing
into [start_offset, end_offset) and [end_offset, infinity)

On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise  wrote:

> Eugene,
>
> I actually had one question regarding the application of SDF for the Kafka
> consumer. Reading through a topic partition can be parallel by splitting a
> partition into multiple restrictions (for use cases where order does not
> matter). But how would the tail read be managed? I assume there would not
> be a new restriction whenever new records arrive (added latency)? The
> examples on slide 40 show an end offset for Kafka, but for a continuous
> read there wouldn't be an end offset?
>
> Thanks,
> Thomas
>
>
> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:
>
>> Great, thanks for sharing!
>>
>>
>> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov 
>> wrote:
>>
>>> Oops that's just the template I used. Thanks for noticing, will
>>> regenerate the PDF and reupload when I get to it.
>>>
>>>
>>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:
>>>
 Looks like it was a good talk! Why is it Google Confidential &
 Proprietary, though?

 Dan

 On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >>> > wrote:

> Hey all,
>
> The slides for my yesterday's talk at Strata San Jose
> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>  have
> been posted on the talk page. They may be of interest both to users and IO
> authors.
>
> Thanks.
>


>>
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Thomas Weise
Eugene,

I actually had one question regarding the application of SDF for the Kafka
consumer. Reading through a topic partition can be parallel by splitting a
partition into multiple restrictions (for use cases where order does not
matter). But how would the tail read be managed? I assume there would not
be a new restriction whenever new records arrive (added latency)? The
examples on slide 40 show an end offset for Kafka, but for a continuous
read there wouldn't be an end offset?

Thanks,
Thomas


On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:

> Great, thanks for sharing!
>
>
> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov 
> wrote:
>
>> Oops that's just the template I used. Thanks for noticing, will
>> regenerate the PDF and reupload when I get to it.
>>
>>
>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:
>>
>>> Looks like it was a good talk! Why is it Google Confidential &
>>> Proprietary, though?
>>>
>>> Dan
>>>
>>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
>>> wrote:
>>>
 Hey all,

 The slides for my yesterday's talk at Strata San Jose
 https://conferences.oreilly.com/strata/strata-ca/public
 /schedule/detail/63696 have been posted on the talk page. They may be
 of interest both to users and IO authors.

 Thanks.

>>>
>>>
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Thomas Weise
Great, thanks for sharing!


On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov 
wrote:

> Oops that's just the template I used. Thanks for noticing, will regenerate
> the PDF and reupload when I get to it.
>
>
> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:
>
>> Looks like it was a good talk! Why is it Google Confidential &
>> Proprietary, though?
>>
>> Dan
>>
>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
>> wrote:
>>
>>> Hey all,
>>>
>>> The slides for my yesterday's talk at Strata San Jose
>>> https://conferences.oreilly.com/strata/strata-ca/
>>> public/schedule/detail/63696 have been posted on the talk page. They
>>> may be of interest both to users and IO authors.
>>>
>>> Thanks.
>>>
>>
>>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Raghu Angadi
Terrific! Thanks Eugene. Just the slides themselves are so good, can't wait
for the video.
Do you know when the video might be available?


On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov 
wrote:

> Oops that's just the template I used. Thanks for noticing, will regenerate
> the PDF and reupload when I get to it.
>
> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:
>
>> Looks like it was a good talk! Why is it Google Confidential &
>> Proprietary, though?
>>
>> Dan
>>
>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
>> wrote:
>>
>>> Hey all,
>>>
>>> The slides for my yesterday's talk at Strata San Jose
>>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>>>  have
>>> been posted on the talk page. They may be of interest both to users and IO
>>> authors.
>>>
>>> Thanks.
>>>
>>
>>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Ismaël Mejía
Excellent, loved the 'Nobody writes a paper about their IO API'. IO is
such an important but less valued part of Big Data, kind of ironic.
Great work Eugene !

On Thu, Mar 8, 2018 at 9:40 PM, Kenneth Knowles  wrote:
> Love it. Great flashy title, too :-)
>
> On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov 
> wrote:
>>
>> Oops that's just the template I used. Thanks for noticing, will regenerate
>> the PDF and reupload when I get to it.
>>
>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:
>>>
>>> Looks like it was a good talk! Why is it Google Confidential &
>>> Proprietary, though?
>>>
>>> Dan
>>>
>>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
>>> wrote:

 Hey all,

 The slides for my yesterday's talk at Strata San Jose
 https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
 have been posted on the talk page. They may be of interest both to users 
 and
 IO authors.

 Thanks.
>>>
>>>
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Kenneth Knowles
Love it. Great flashy title, too :-)

On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov 
wrote:

> Oops that's just the template I used. Thanks for noticing, will regenerate
> the PDF and reupload when I get to it.
>
> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:
>
>> Looks like it was a good talk! Why is it Google Confidential &
>> Proprietary, though?
>>
>> Dan
>>
>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
>> wrote:
>>
>>> Hey all,
>>>
>>> The slides for my yesterday's talk at Strata San Jose
>>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>>>  have
>>> been posted on the talk page. They may be of interest both to users and IO
>>> authors.
>>>
>>> Thanks.
>>>
>>
>>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Eugene Kirpichov
Oops that's just the template I used. Thanks for noticing, will regenerate
the PDF and reupload when I get to it.

On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:

> Looks like it was a good talk! Why is it Google Confidential &
> Proprietary, though?
>
> Dan
>
> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
> wrote:
>
>> Hey all,
>>
>> The slides for my yesterday's talk at Strata San Jose
>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
>>  have
>> been posted on the talk page. They may be of interest both to users and IO
>> authors.
>>
>> Thanks.
>>
>
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Lukasz Cwik
I really like slide 19:
Author: "I made a bigdata programming model"
Reader: "Cool, how does data get in and out?"
Author: "Brb"

On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
wrote:

> Hey all,
>
> The slides for my yesterday's talk at Strata San Jose https://conferences.
> oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been
> posted on the talk page. They may be of interest both to users and IO
> authors.
>
> Thanks.
>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Dan Halperin
Looks like it was a good talk! Why is it Google Confidential & Proprietary,
though?

Dan

On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov 
wrote:

> Hey all,
>
> The slides for my yesterday's talk at Strata San Jose https://conferences.
> oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been
> posted on the talk page. They may be of interest both to users and IO
> authors.
>
> Thanks.
>


"Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Eugene Kirpichov
Hey all,

The slides for my yesterday's talk at Strata San Jose
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696
have
been posted on the talk page. They may be of interest both to users and IO
authors.

Thanks.