Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
There will be, but not yet. I'll update the thread when the conference tells me the video is available. On Thu, Mar 8, 2018, 7:14 PM OrielResearch Eila Arich-Landkof < e...@orielresearch.org> wrote: > Hi Eugene, > is there a video that I can watch? > > Many thanks, > Eila > > On Thu, Mar 8, 2018 at 2:49 PM, Eugene Kirpichov > wrote: > >> Hey all, >> >> The slides for my yesterday's talk at Strata San Jose >> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 >> have >> been posted on the talk page. They may be of interest both to users and IO >> authors. >> >> Thanks. >> > > > > -- > Eila > www.orielresearch.org > https://www.meetup.com/Deep-Learning-In-Production/ >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Hi Eugene, is there a video that I can watch? Many thanks, Eila On Thu, Mar 8, 2018 at 2:49 PM, Eugene Kirpichov wrote: > Hey all, > > The slides for my yesterday's talk at Strata San Jose https://conferences. > oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been > posted on the talk page. They may be of interest both to users and IO > authors. > > Thanks. > -- Eila www.orielresearch.org https://www.meetup.com/Deep-Learning-In-Production/
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Great talk, Eugene. Ted, will share more info on Kafka IO for Python soon :) - Cham On Thu, Mar 8, 2018 at 4:55 PM Ted Yu wrote: > I see. > > I have added myself as watcher on BEAM-3788. > > Thanks > > On Thu, Mar 8, 2018 at 4:51 PM, Eugene Kirpichov > wrote: > >> Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was >> implemented before SDFs existed and hasn't been rewritten yet), but it will >> be, once more runners catch up with the support: currently we have Dataflow >> and Flink. +Chamikara Jayalath is currently >> working on implementing it using SDFs in the Python SDK. >> >> On Thu, Mar 8, 2018 at 4:34 PM Ted Yu wrote: >> >>> Eugene: >>> Very informative talk. >>> >>> I looked at: >>> >>> sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java >>> >>> Is there some example showing how OffsetRangeTracker works with Kafka >>> partition(s) ? >>> >>> Thanks >>> >>> On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov >>> wrote: >>> Hi Thomas! In case of tailing a Kafka partition, the restriction would be [start_offset, infinity), and it would keep being split by checkpointing into [start_offset, end_offset) and [end_offset, infinity) On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise wrote: > Eugene, > > I actually had one question regarding the application of SDF for the > Kafka consumer. Reading through a topic partition can be parallel by > splitting a partition into multiple restrictions (for use cases where > order > does not matter). But how would the tail read be managed? I assume there > would not be a new restriction whenever new records arrive (added > latency)? > The examples on slide 40 show an end offset for Kafka, but for a > continuous > read there wouldn't be an end offset? > > Thanks, > Thomas > > > On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise wrote: > >> Great, thanks for sharing! >> >> >> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov < >> kirpic...@google.com> wrote: >> >>> Oops that's just the template I used. Thanks for noticing, will >>> regenerate the PDF and reupload when I get to it. >>> >>> >>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin >>> wrote: >>> Looks like it was a good talk! Why is it Google Confidential & Proprietary, though? Dan On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov < kirpic...@google.com> wrote: > Hey all, > > The slides for my yesterday's talk at Strata San Jose > https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 > have > been posted on the talk page. They may be of interest both to users > and IO > authors. > > Thanks. > >> > >>> >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
I see. I have added myself as watcher on BEAM-3788. Thanks On Thu, Mar 8, 2018 at 4:51 PM, Eugene Kirpichov wrote: > Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was > implemented before SDFs existed and hasn't been rewritten yet), but it will > be, once more runners catch up with the support: currently we have Dataflow > and Flink. +Chamikara Jayalath is currently > working on implementing it using SDFs in the Python SDK. > > On Thu, Mar 8, 2018 at 4:34 PM Ted Yu wrote: > >> Eugene: >> Very informative talk. >> >> I looked at: >> sdks/java/core/src/test/java/org/apache/beam/sdk/ >> transforms/splittabledofn/OffsetRangeTrackerTest.java >> >> Is there some example showing how OffsetRangeTracker works with Kafka >> partition(s) ? >> >> Thanks >> >> On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov >> wrote: >> >>> Hi Thomas! >>> >>> In case of tailing a Kafka partition, the restriction would be >>> [start_offset, infinity), and it would keep being split by checkpointing >>> into [start_offset, end_offset) and [end_offset, infinity) >>> >>> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise wrote: >>> Eugene, I actually had one question regarding the application of SDF for the Kafka consumer. Reading through a topic partition can be parallel by splitting a partition into multiple restrictions (for use cases where order does not matter). But how would the tail read be managed? I assume there would not be a new restriction whenever new records arrive (added latency)? The examples on slide 40 show an end offset for Kafka, but for a continuous read there wouldn't be an end offset? Thanks, Thomas On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise wrote: > Great, thanks for sharing! > > > On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov < > kirpic...@google.com> wrote: > >> Oops that's just the template I used. Thanks for noticing, will >> regenerate the PDF and reupload when I get to it. >> >> >> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin >> wrote: >> >>> Looks like it was a good talk! Why is it Google Confidential & >>> Proprietary, though? >>> >>> Dan >>> >>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov < >>> kirpic...@google.com> wrote: >>> Hey all, The slides for my yesterday's talk at Strata San Jose https://conferences.oreilly.com/strata/strata-ca/ public/schedule/detail/63696 have been posted on the talk page. They may be of interest both to users and IO authors. Thanks. >>> >>> > >>
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was implemented before SDFs existed and hasn't been rewritten yet), but it will be, once more runners catch up with the support: currently we have Dataflow and Flink. +Chamikara Jayalath is currently working on implementing it using SDFs in the Python SDK. On Thu, Mar 8, 2018 at 4:34 PM Ted Yu wrote: > Eugene: > Very informative talk. > > I looked at: > > sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java > > Is there some example showing how OffsetRangeTracker works with Kafka > partition(s) ? > > Thanks > > On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov > wrote: > >> Hi Thomas! >> >> In case of tailing a Kafka partition, the restriction would be >> [start_offset, infinity), and it would keep being split by checkpointing >> into [start_offset, end_offset) and [end_offset, infinity) >> >> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise wrote: >> >>> Eugene, >>> >>> I actually had one question regarding the application of SDF for the >>> Kafka consumer. Reading through a topic partition can be parallel by >>> splitting a partition into multiple restrictions (for use cases where order >>> does not matter). But how would the tail read be managed? I assume there >>> would not be a new restriction whenever new records arrive (added latency)? >>> The examples on slide 40 show an end offset for Kafka, but for a continuous >>> read there wouldn't be an end offset? >>> >>> Thanks, >>> Thomas >>> >>> >>> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise wrote: >>> Great, thanks for sharing! On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov >>> > wrote: > Oops that's just the template I used. Thanks for noticing, will > regenerate the PDF and reupload when I get to it. > > > On Thu, Mar 8, 2018, 11:59 AM Dan Halperin > wrote: > >> Looks like it was a good talk! Why is it Google Confidential & >> Proprietary, though? >> >> Dan >> >> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov < >> kirpic...@google.com> wrote: >> >>> Hey all, >>> >>> The slides for my yesterday's talk at Strata San Jose >>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 >>> have >>> been posted on the talk page. They may be of interest both to users and >>> IO >>> authors. >>> >>> Thanks. >>> >> >> >>> >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Eugene: Very informative talk. I looked at: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java Is there some example showing how OffsetRangeTracker works with Kafka partition(s) ? Thanks On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov wrote: > Hi Thomas! > > In case of tailing a Kafka partition, the restriction would be > [start_offset, infinity), and it would keep being split by checkpointing > into [start_offset, end_offset) and [end_offset, infinity) > > On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise wrote: > >> Eugene, >> >> I actually had one question regarding the application of SDF for the >> Kafka consumer. Reading through a topic partition can be parallel by >> splitting a partition into multiple restrictions (for use cases where order >> does not matter). But how would the tail read be managed? I assume there >> would not be a new restriction whenever new records arrive (added latency)? >> The examples on slide 40 show an end offset for Kafka, but for a continuous >> read there wouldn't be an end offset? >> >> Thanks, >> Thomas >> >> >> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise wrote: >> >>> Great, thanks for sharing! >>> >>> >>> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov >>> wrote: >>> Oops that's just the template I used. Thanks for noticing, will regenerate the PDF and reupload when I get to it. On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: > Looks like it was a good talk! Why is it Google Confidential & > Proprietary, though? > > Dan > > On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov < > kirpic...@google.com> wrote: > >> Hey all, >> >> The slides for my yesterday's talk at Strata San Jose >> https://conferences.oreilly.com/strata/strata-ca/ >> public/schedule/detail/63696 have been posted on the talk page. They >> may be of interest both to users and IO authors. >> >> Thanks. >> > > >>> >>
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Hi Thomas! In case of tailing a Kafka partition, the restriction would be [start_offset, infinity), and it would keep being split by checkpointing into [start_offset, end_offset) and [end_offset, infinity) On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise wrote: > Eugene, > > I actually had one question regarding the application of SDF for the Kafka > consumer. Reading through a topic partition can be parallel by splitting a > partition into multiple restrictions (for use cases where order does not > matter). But how would the tail read be managed? I assume there would not > be a new restriction whenever new records arrive (added latency)? The > examples on slide 40 show an end offset for Kafka, but for a continuous > read there wouldn't be an end offset? > > Thanks, > Thomas > > > On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise wrote: > >> Great, thanks for sharing! >> >> >> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov >> wrote: >> >>> Oops that's just the template I used. Thanks for noticing, will >>> regenerate the PDF and reupload when I get to it. >>> >>> >>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: >>> Looks like it was a good talk! Why is it Google Confidential & Proprietary, though? Dan On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >>> > wrote: > Hey all, > > The slides for my yesterday's talk at Strata San Jose > https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 > have > been posted on the talk page. They may be of interest both to users and IO > authors. > > Thanks. > >> >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Eugene, I actually had one question regarding the application of SDF for the Kafka consumer. Reading through a topic partition can be parallel by splitting a partition into multiple restrictions (for use cases where order does not matter). But how would the tail read be managed? I assume there would not be a new restriction whenever new records arrive (added latency)? The examples on slide 40 show an end offset for Kafka, but for a continuous read there wouldn't be an end offset? Thanks, Thomas On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise wrote: > Great, thanks for sharing! > > > On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov > wrote: > >> Oops that's just the template I used. Thanks for noticing, will >> regenerate the PDF and reupload when I get to it. >> >> >> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: >> >>> Looks like it was a good talk! Why is it Google Confidential & >>> Proprietary, though? >>> >>> Dan >>> >>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >>> wrote: >>> Hey all, The slides for my yesterday's talk at Strata San Jose https://conferences.oreilly.com/strata/strata-ca/public /schedule/detail/63696 have been posted on the talk page. They may be of interest both to users and IO authors. Thanks. >>> >>> >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Great, thanks for sharing! On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov wrote: > Oops that's just the template I used. Thanks for noticing, will regenerate > the PDF and reupload when I get to it. > > > On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: > >> Looks like it was a good talk! Why is it Google Confidential & >> Proprietary, though? >> >> Dan >> >> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >> wrote: >> >>> Hey all, >>> >>> The slides for my yesterday's talk at Strata San Jose >>> https://conferences.oreilly.com/strata/strata-ca/ >>> public/schedule/detail/63696 have been posted on the talk page. They >>> may be of interest both to users and IO authors. >>> >>> Thanks. >>> >> >>
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Terrific! Thanks Eugene. Just the slides themselves are so good, can't wait for the video. Do you know when the video might be available? On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov wrote: > Oops that's just the template I used. Thanks for noticing, will regenerate > the PDF and reupload when I get to it. > > On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: > >> Looks like it was a good talk! Why is it Google Confidential & >> Proprietary, though? >> >> Dan >> >> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >> wrote: >> >>> Hey all, >>> >>> The slides for my yesterday's talk at Strata San Jose >>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 >>> have >>> been posted on the talk page. They may be of interest both to users and IO >>> authors. >>> >>> Thanks. >>> >> >>
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Excellent, loved the 'Nobody writes a paper about their IO API'. IO is such an important but less valued part of Big Data, kind of ironic. Great work Eugene ! On Thu, Mar 8, 2018 at 9:40 PM, Kenneth Knowles wrote: > Love it. Great flashy title, too :-) > > On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov > wrote: >> >> Oops that's just the template I used. Thanks for noticing, will regenerate >> the PDF and reupload when I get to it. >> >> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: >>> >>> Looks like it was a good talk! Why is it Google Confidential & >>> Proprietary, though? >>> >>> Dan >>> >>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >>> wrote: Hey all, The slides for my yesterday's talk at Strata San Jose https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been posted on the talk page. They may be of interest both to users and IO authors. Thanks. >>> >>> >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Love it. Great flashy title, too :-) On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov wrote: > Oops that's just the template I used. Thanks for noticing, will regenerate > the PDF and reupload when I get to it. > > On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: > >> Looks like it was a good talk! Why is it Google Confidential & >> Proprietary, though? >> >> Dan >> >> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov >> wrote: >> >>> Hey all, >>> >>> The slides for my yesterday's talk at Strata San Jose >>> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 >>> have >>> been posted on the talk page. They may be of interest both to users and IO >>> authors. >>> >>> Thanks. >>> >> >>
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Oops that's just the template I used. Thanks for noticing, will regenerate the PDF and reupload when I get to it. On Thu, Mar 8, 2018, 11:59 AM Dan Halperin wrote: > Looks like it was a good talk! Why is it Google Confidential & > Proprietary, though? > > Dan > > On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov > wrote: > >> Hey all, >> >> The slides for my yesterday's talk at Strata San Jose >> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 >> have >> been posted on the talk page. They may be of interest both to users and IO >> authors. >> >> Thanks. >> > >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
I really like slide 19: Author: "I made a bigdata programming model" Reader: "Cool, how does data get in and out?" Author: "Brb" On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov wrote: > Hey all, > > The slides for my yesterday's talk at Strata San Jose https://conferences. > oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been > posted on the talk page. They may be of interest both to users and IO > authors. > > Thanks. >
Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Looks like it was a good talk! Why is it Google Confidential & Proprietary, though? Dan On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov wrote: > Hey all, > > The slides for my yesterday's talk at Strata San Jose https://conferences. > oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been > posted on the talk page. They may be of interest both to users and IO > authors. > > Thanks. >
"Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available
Hey all, The slides for my yesterday's talk at Strata San Jose https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696 have been posted on the talk page. They may be of interest both to users and IO authors. Thanks.