GSOC - Implement an S3 filesystem for Python SDK

2019-03-12 Thread Pasan Kamburugamuwa
Hi , I am a 3rd year software Engineering undergraduate at Sri Lanka Institute of Information Technology(SLIIT), Sri Lanka. I am interested in this project for GSOC 2019. I have gone through the document and I would like to deep diving into the codebase. So can you please point me to any relevan

Scio v0.7.3 released!

2019-03-12 Thread Filipe Regadas
Hi all! We just released Scio v0.7.3. This version includes only bug fixes and improvements. https://github.com/spotify/scio/releases/tag/v0.7.3 "Vulpes Vulpes" Bug Fixes & Improvements Fix FileStorage.avroFile (#1727 ) Fix perf regression in Coder (

Re: Beam Meetups Feb 2019

2019-03-12 Thread Austin Bennett
Hi Teja and All, The video recordings from the recent SF meetup have been posted to the Beam YouTube channel (thanks, Matthias!). Links: General Beam YouTube: https://www.youtube.com/channel/UChNnb_YO_7B0HlW6FhAXZZQ *Beam Introduction*: https://www.youtube.com/watch?v=Ao2NM8rvKZY *TFX*: https:

WindowTypeDescriptor of output PCollection emitted from GroupByKey

2019-03-12 Thread rahul patwari
Hi, I am exploring sessions windowing in apache beam. I have created a pipeline to know the window start time and window end time of the elements emitted from GroupByKey, which groups elements to which sessions window was applied. I got the Exception: Exception in thread "main" java.

Re: GSOC - Apache Beam Python SDK

2019-03-12 Thread Pablo Estrada
Oh, if you are not yet subscribed to the ASF slack, you can do so here: https://s.apache.org/slack-invite On Tue, Mar 12, 2019 at 10:30 AM Pablo Estrada wrote: > Hi Pasan! > Welcome to Apache Beam. Happy to have your interest. Can you share what > are your specific questions about the topic? My

Re: GSOC - Apache Beam Python SDK

2019-03-12 Thread Pablo Estrada
Hi Pasan! Welcome to Apache Beam. Happy to have your interest. Can you share what are your specific questions about the topic? My initial advice would be to study the filesystems[1] packages of Beam, and the GCS filesystem[2]. As a piece of advice, you can find us in the ASF slack: https://s.apache

Performance of stateful DoFn vs CombineByKey

2019-03-12 Thread Steve Niemitz
Hi all. I'm curious if anyone has done any comparison of the performance of a pipeline that uses CombineByKey, vs one that uses a stateful DoFn with combining state. [1] More specifically, if I had a pipeline that had a CombineByKey configured with early firings every N minutes, and I replaced th

GSOC - Apache Beam Python SDK

2019-03-12 Thread Pasan Kamburugamuwa
I am an undergraduate from Sri Lanka Institute of Information Technology and I am in my 3rd year. So I would like to do an internship based on your organization and I am highly interested in Apache Beam project. So I want to know how to start my career in this project. I have gone through the docum