Re: writing status

2017-08-25 Thread Sourabh Bajaj
I think there are other use-cases for this such as triggering a non-beam
job when the pipeline is done. A basic case I have seen in the past is just
building a dashboard of metrics like how much data are we processing each
day, time the pipeline took etc.

On Fri, Aug 25, 2017 at 10:43 AM Eugene Kirpichov
 wrote:

> Thanks. A couple of questions:
> - For step 1 (read Avro files with unknown schema), I presume you're using
> AvroIO.parseGenericRecords()?
> - For step 4: why do you need the checkpoint: is it because you actually
> want to continuously ingest new Avro files as they keep appearing? In that
> case you might want to take advantage of
> https://github.com/apache/beam/pull/3725 when it's in.
>
> On Fri, Aug 25, 2017 at 10:39 AM Steve Niemitz 
> wrote:
>
> > Having the sink output something I think is a good option.
> >
> > My use case looks something like this:
> > 1) Read a bunch of avro files (all with the same schema, but the schema
> is
> > not known before hand)
> > 2) Use the avro schema + data to generate bigtable mutations
> > 3) write the mutations
> > 4) *once all files are processed and written *-> update a "checkpoint"
> > marker in another bigtable table, which also depends on the schema from
> > (1).
> >
> > I've hacked this up by making the flow in step 4 rely on an output from
> > step 2 that goes through a GroupBy to ensure that all records are at
> least
> > processed by step 2 before step 4 runs, but there's still a race
> condition
> > between the last record being emitted by step 2 and the write in step 3
> > completing.
> >
> > If as you said, the sink emitted a record when it completed, that'd solve
> > the race condition.
> >
> > In summary: right now the flow looks like this (terrible ASCII attempt):
> >
> > Read Avro Files (extract schema + data) (1)
> >  |
> > V
> > Generate mutations (2)
> >  |> [GroupBy -> Take first -> Generate
> mutation
> > -> Bigtable write] (4)
> > V
> > Write mutations (3)
> >
> >
> > On Fri, Aug 25, 2017 at 11:53 AM, Eugene Kirpichov <
> > kirpic...@google.com.invalid> wrote:
> >
> > > I'd like to know more about your both use cases, can you clarify? I
> think
> > > making sinks output something that can be waited on by another pipeline
> > > step is a reasonable request, but more details would help refine this
> > > suggestion.
> > >
> > > On Fri, Aug 25, 2017, 8:46 AM Chamikara Jayalath  >
> > > wrote:
> > >
> > > > Can you do this from the program that runs the Beam job, after job is
> > > > complete (you might have to use a blocking runner or poll for the
> > status
> > > of
> > > > the job) ?
> > > >
> > > > - Cham
> > > >
> > > > On Fri, Aug 25, 2017 at 8:44 AM Steve Niemitz 
> > > wrote:
> > > >
> > > > > I also have a similar use case (but with BigTable) that I feel
> like I
> > > had
> > > > > to hack up to make work.  It'd be great to hear if there is a way
> to
> > do
> > > > > something like this already, or if there are plans in the future.
> > > > >
> > > > > On Fri, Aug 25, 2017 at 9:46 AM, Chaim Turkel 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >   I have a few piplines that are an ETL from different systems to
> > > > > bigquery.
> > > > > > I would like to write the status of the ETL after all records
> have
> > > > > > been updated to the bigquery.
> > > > > > The problem is that writing to bigquery is a sink and you cannot
> > have
> > > > > > any other steps after the sink.
> > > > > > I tried a sideoutput, but this is called in no correlation to the
> > > > > > writing to bigquery, so i don't know if it succeeded or failed.
> > > > > >
> > > > > >
> > > > > > any ideas?
> > > > > > chaim
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Sourabh Bajaj
Some projects I have seen close stale PRs after 30 days, saying "Closing
due to lack of activity, please feel free to re-open".

On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay 
wrote:

> Sounds like we have consensus. Since this is a new policy, I would suggest
> picking the most flexible option for now (90 days) and we can tighten it in
> the future. To answer Kenn's question, I do not know, how other projects
> handle this. I did a basic search but could not find a good answer.
>
> What mechanism can we use to close PRs, assuming that author will be out of
> communication. We can push a commit with a "This closes #xyz #abc" message.
> Is there another way to do this?
>
> Ahmet
>
> On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
>
> > Makes sense to close after a long time of inactivity and no response, and
> > as Kenn mentioned they can always re-open.
> >
> > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré 
> > wrote:
> >
> > > If we consider the author, it makes sense.
> > >
> > > Regards
> > > JB
> > >
> > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
> > > >The proposal makes sense.
> > > >
> > > >If the author of PR doesn't respond for 90 days, the PR is likely out
> > > >of
> > > >sync with current repo.
> > > >
> > > >Cheers
> > > >
> > > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay  >
> > > >wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Do we have an existing policy for handling stale PRs? If not could
> we
> > > >come
> > > >> up with one. We are getting close to 100 open PRs. Some of the open
> > > >PRs
> > > >> have not been touched for a while, and if we exclude the pings the
> > > >number
> > > >> will be higher.
> > > >>
> > > >> For example, we could close PRs that have not been updated by the
> > > >original
> > > >> author for 90 days even after multiple attempts to reach them (e.g.
> > > >[1],
> > > >> [2] are such PRs.)
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Thank you,
> > > >> Ahmet
> > > >>
> > > >> [1] https://github.com/apache/beam/pull/1464
> > > >> [2] https://github.com/apache/beam/pull/2949
> > > >>
> > >
> >
>


Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Sourabh Bajaj
I created PR/3627 for cherry picking a fix for BEAM-2636.

On Mon, Jul 24, 2017 at 8:20 AM Ismaël Mejía  wrote:

> Not a blocker but maybe it is worth considering the fix for
> https://issues.apache.org/jira/browse/BEAM-2587 too.
>
> I also was bitten by this issue and I could only get it to work by
> doing a 'pip install --user grpcio-tools' (not sure if this is a
> proper solution but it works for me), however when I validated the
> python only source code it worked out of the box without issue.
>
> On Mon, Jul 24, 2017 at 2:37 PM, Jean-Baptiste Onofré 
> wrote:
> > Awesome !
> >
> > Thanks Aljoscha
> >
> > Regards
> > JB
> >
> >
> > On 07/24/2017 02:32 PM, Aljoscha Krettek wrote:
> >>
> >> I opened a PR against the release-2.1.0 branch:
> >> https://github.com/apache/beam/pull/3625
> >> 
> >>
> >> This should not fail any tests since it was recently reviewed and merged
> >> for the master.
> >>
> >> Best,
> >> Aljoscha
> >>
> >>> On 24. Jul 2017, at 14:09, Jean-Baptiste Onofré 
> wrote:
> >>>
> >>> +1
> >>>
> >>> Definitely good to have it for RC3.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 07/24/2017 02:05 PM, Aljoscha Krettek wrote:
> 
>  When we're cutting a new RC anyways we could also include the fixes
> for
>  https://issues.apache.org/jira/browse/BEAM-2571
>  . It's an actual
> bug in the
>  Flink Runner and the fix for that is a set of three fixes that should
> be
>  easy to cherry-pick on top of the release branch.
>  If we agree I could open a PR for that.
>  Best,
>  Aljoscha
> >
> > On 24. Jul 2017, at 13:47, Aviem Zur  wrote:
> >
> > We also have two tests failing in Spark runner as detailed by the
> > following
> > two tickets:
> > https://issues.apache.org/jira/browse/BEAM-2670
> > https://issues.apache.org/jira/browse/BEAM-2671
> >
> > On Mon, Jul 24, 2017 at 11:44 AM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > wrote:
> >
> >> Hi all,
> >>
> >> due to https://issues.apache.org/jira/browse/BEAM-2662, I cancel
> this
> >> vote.
> >>
> >> We also have a build issue with the Spark runner that I would like
> to
> >> fix
> >> for RC3:
> >>
> >>
> >>
> >>
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_ValidatesRunner_Spark/2446/
> >>
> >> So, we are going to work on the Spark runner test fix for RC3
> >> (BEAM-2662 is
> >> already fixed on release-2.1.0 branch).
> >>
> >> I will submit RC3 to vote as soon as Spark runner tests are fully
> OK.
> >>
> >> Regards
> >> JB
> >>
> >> On 07/18/2017 06:30 PM, Jean-Baptiste Onofré wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> Please review and vote on the release candidate #2 for the version
> >>
> >> 2.1.0, as
> >>>
> >>> follows:
> >>>
> >>> [ ] +1, Approve the release
> >>> [ ] -1, Do not approve the release (please provide specific
> comments)
> >>>
> >>>
> >>> The complete staging area is available for your review, which
> >>> includes:
> >>> * JIRA release notes [1],
> >>> * the official Apache source release to be deployed to
> >>> dist.apache.org
> >>
> >> [2],
> >>>
> >>> which is signed with the key with fingerprint C8282E76 [3],
> >>> * all artifacts to be deployed to the Maven Central Repository [4],
> >>> * source code tag "v2.1.0-RC2" [5],
> >>> * website pull request listing the release and publishing the API
> >>
> >> reference
> >>>
> >>> manual [6].
> >>> * Python artifacts are deployed along with the source release to
> the
> >>> dist.apache.org [2].
> >>>
> >>> The vote will be open for at least 72 hours. It is adopted by
> >>> majority
> >>
> >> approval,
> >>>
> >>> with at least 3 PMC affirmative votes.
> >>>
> >>> Thanks,
> >>> JB
> >>>
> >>> [1]
> >>>
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12340528
> >>>
> >>>
> >>> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
> >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >>> [4]
> >>
> >>
> https://repository.apache.org/content/repositories/orgapachebeam-1019/
> >>>
> >>> [5] https://github.com/apache/beam/tree/v2.1.0-RC2
> >>> [6] https://github.com/apache/beam-site/pull/270
> >>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>
> >>
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > 

Re: New template PR description spams BEAM-1234

2017-07-18 Thread Sourabh Bajaj
+1 to this

Sent a PR for this https://github.com/apache/beam/pull/3587

On Tue, Jul 18, 2017 at 10:21 AM Eugene Kirpichov
 wrote:

> Hi,
>
> Was there a recent change to the default PR description on Beam?
> https://issues.apache.org/jira/browse/BEAM-1234 just got a couple of
> notifications from unrelated PRs because of the following part of the
> template PR description:
>
>-  Format the pull request title like [BEAM-1234] Fixes bug in
>ApproximateQuantiles, where you replace BEAM-1234 with the appropriate
>JIRA issue.
>
> I'm assuming this is a recent change because the JIRA issue didn't get
> spammed before.
>
> Can we change this to something like "BEAM-"?
>


Re: [VOTE] Release 2.1.0, release candidate #1

2017-07-15 Thread Sourabh Bajaj
Hi JB,

https://github.com/apache/beam/pull/3563 cherrypicks the fix for BEAM-2595
to the release branch, please review.

I wasn't able to reproduce the issue in BEAM-2271 but was hoping
https://github.com/apache/beam/pull/3563 will fix it, so would be great if
you can take a look at it as well.

Thanks
Sourabh

On Fri, Jul 14, 2017 at 10:30 PM Jean-Baptiste Onofré 
wrote:

> Hi Ahmet,
>
> sorry I missed those Jira.
>
> Can you help to cherry-pick and fix both issues on release-2.1.0 branch ?
>
> The release guide already mention to check to open Jira on the release
> target. I
> just missed these two Jira, sorry about that.
>
> I don't think an additional list is required.
>
> I will cancel this vote and cut a RC2 as soon as BEAM-2595 and BEAM-2771
> are
> addressed.
>
> Thanks,
> Regards
> JB
>
> On 07/14/2017 06:15 AM, Ahmet Altay wrote:
> > -1
> >
> > Thank you JB. Unfortunately I do not want to approve this RC :(. My
> reason
> > is that there are two open issues in the burndown list (
> > https://s.apache.org/beam-2.1.0-burndown). I think we should either fix
> > them or explicitly move them out of the list. BEAM-2595 is a regression
> in
> > usability (not in functionality), and it is fixed in master. We could
> > cherry pick that. BEAM-2271 is an improvement to the release process. I
> > would prefer fixing the process now instead of the next release cycle.
> > However, if we want to release sooner, it is fine to clean the zip files
> > manually.
> >
> > Another point I would like to raise is about the validation process.
> During
> > 2.0 release we created a list of things to validate before that release.
> > Should we re-use that list for this and subseqeuent releases?
> >
> > Ahmet
> >
> > On Tue, Jul 11, 2017 at 6:02 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> Hi everyone,
> >>
> >> Please review and vote on the release candidate #1 for the version
> 2.1.0,
> >> as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2], which is signed with the key with fingerprint C8282E76 [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag "v2.1.0-RC1" [5],
> >> * website pull request listing the release and publishing the API
> >> reference manual [6].
> >> * Python artifacts are deployed along with the source release to the
> >> dist.apache.org [2].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> JB
> >>
> >> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> >> ctId=12319527=12340528
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1018/
> >> [5] https://github.com/apache/beam/tree/v2.1.0-RC1
> >> [6] https://github.com/apache/beam-site/pull/270
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Passing pipeline options into PTransforms and Filesystems in Python

2017-07-11 Thread Sourabh Bajaj
I'm not sure ValueProviders address the issue of getting credentials to
underlying libraries or FileSystem though as they are only exposed at the
PTransform level.

Eg. If I was using Flink on AWS and reading data from GCS we currently
don't have a way for TextIO to get credentials it can use to read from GCS.
We just rely on other libraries for doing that work and they assume you've
gcloud tool installed. This is partially caused due to TextIO not exposing
an option to pass an extra credential object when accessing the FileSystem.

On a tangential note we currently rely on credentials being passed as part
of the serialized object such as in the JdbcIO; the password is just part
of the connection string and then serialized with the DoFn itself. It might
be worth considering exposing a credential provider system similar to value
providers (or a type of value provider) where one could use a KMS if they
choose to.

On Tue, Jul 11, 2017 at 4:49 PM Sourabh Bajaj <sourabhba...@google.com>
wrote:

> We do the latter of treating constants as StaticValueProviders in the
> pipeline right now.
>
> On Tue, Jul 11, 2017 at 4:47 PM Dmitry Demeshchuk <dmi...@postmates.com>
> wrote:
>
>> Thanks a lot for the input, folks!
>>
>> Also, thanks for telling me about the concept of ValueProvider, Kenneth!
>> This was a good reminder to myself that some stuff that's described in the
>> Dataflow docs (I discovered
>> https://cloud.google.com/dataflow/docs/templates/creating-templates after
>> having read your reply) doesn't necessarily exist in the Beam
>> documentation.
>>
>> I do agree with Thomas' (and Robert's, in the JIRA bug) point that we may
>> often want to supply separate credentials for separate steps. It increases
>> the verbosity, and raises a question of what to do about filesystems
>> (ReadFromText and WriteToText), but it also has a lot of value.
>>
>> As of accessing pipeline options, what if PTransforms were treating
>> pipeline options as a NestedValueProvider of a sort?
>>
>> class MyDoFn(beam.DoFn):
>> def process(self, item):
>> # We fetch pipeline options in runtime
>> # or, it could look like opts = self.pipeline_options()
>> opts = self.pipeline_options.get()
>>
>> ​
>> Alternatively, we could treat each individual option as a ValueProvider
>> object, even if really it's just a constant.
>>
>>
>> On Tue, Jul 11, 2017 at 4:00 PM, Robert Bradshaw <
>> rober...@google.com.invalid> wrote:
>>
>> > Templates, including ValueProviders, were recently added to the Python
>> > SDK. +1 to pursuing this train of thought (and as I mentioned on the
>> > bug, and has been mentioned here, we don't want to add PipelineOptions
>> > access to PTransforms/at construction time).
>> >
>> > On Tue, Jul 11, 2017 at 3:21 PM, Kenneth Knowles <k...@google.com.invalid
>> >
>> > wrote:
>> > > Hi Dmitry,
>> > >
>> > > This is a very worthwhile discussion that has recently come up on
>> > > StackOverflow, here: https://stackoverflow.com/a/45024542/4820657
>> > >
>> > > We actually recently _removed_ the PipelineOptions from
>> Pipeline.apply in
>> > > Java since they tend to cause transforms to have implicit changes that
>> > make
>> > > them non-portable. Baking in credentials would probably fall into this
>> > > category.
>> > >
>> > > The other aspect to this is that we want to be able to build a
>> pipeline
>> > and
>> > > run it later, in an environment chosen when we decide to run it. So
>> > > PipelineOptions are really for running, not building, a Pipeline. You
>> can
>> > > still use them for arg parsing and passing specific values to
>> transforms
>> > -
>> > > that is essentially orthogonal and just accidentally conflated.
>> > >
>> > > I can't speak to the state of Python SDK's maturity in this regard,
>> but
>> > > there is a concept of a "ValueProvider" that is a deferred value that
>> can
>> > > be specified by PipelineOptions when you run your pipeline. This may
>> be
>> > > what you want. You build a PTransform passing some of its
>> configuration
>> > > parameters as ValueProvider and at run time you set them to actual
>> values
>> > > that are passed to the UDFs in your pipeline.
>> > >
>> > > Hope this helps. Despite not being deeply involved in Python, I
>> wanted to
>> > > lay out the terri

Re: Passing pipeline options into PTransforms and Filesystems in Python

2017-07-11 Thread Sourabh Bajaj
We do the latter of treating constants as StaticValueProviders in the
pipeline right now.

On Tue, Jul 11, 2017 at 4:47 PM Dmitry Demeshchuk 
wrote:

> Thanks a lot for the input, folks!
>
> Also, thanks for telling me about the concept of ValueProvider, Kenneth!
> This was a good reminder to myself that some stuff that's described in the
> Dataflow docs (I discovered
> https://cloud.google.com/dataflow/docs/templates/creating-templates after
> having read your reply) doesn't necessarily exist in the Beam
> documentation.
>
> I do agree with Thomas' (and Robert's, in the JIRA bug) point that we may
> often want to supply separate credentials for separate steps. It increases
> the verbosity, and raises a question of what to do about filesystems
> (ReadFromText and WriteToText), but it also has a lot of value.
>
> As of accessing pipeline options, what if PTransforms were treating
> pipeline options as a NestedValueProvider of a sort?
>
> class MyDoFn(beam.DoFn):
> def process(self, item):
> # We fetch pipeline options in runtime
> # or, it could look like opts = self.pipeline_options()
> opts = self.pipeline_options.get()
>
> ​
> Alternatively, we could treat each individual option as a ValueProvider
> object, even if really it's just a constant.
>
>
> On Tue, Jul 11, 2017 at 4:00 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
> > Templates, including ValueProviders, were recently added to the Python
> > SDK. +1 to pursuing this train of thought (and as I mentioned on the
> > bug, and has been mentioned here, we don't want to add PipelineOptions
> > access to PTransforms/at construction time).
> >
> > On Tue, Jul 11, 2017 at 3:21 PM, Kenneth Knowles  >
> > wrote:
> > > Hi Dmitry,
> > >
> > > This is a very worthwhile discussion that has recently come up on
> > > StackOverflow, here: https://stackoverflow.com/a/45024542/4820657
> > >
> > > We actually recently _removed_ the PipelineOptions from Pipeline.apply
> in
> > > Java since they tend to cause transforms to have implicit changes that
> > make
> > > them non-portable. Baking in credentials would probably fall into this
> > > category.
> > >
> > > The other aspect to this is that we want to be able to build a pipeline
> > and
> > > run it later, in an environment chosen when we decide to run it. So
> > > PipelineOptions are really for running, not building, a Pipeline. You
> can
> > > still use them for arg parsing and passing specific values to
> transforms
> > -
> > > that is essentially orthogonal and just accidentally conflated.
> > >
> > > I can't speak to the state of Python SDK's maturity in this regard, but
> > > there is a concept of a "ValueProvider" that is a deferred value that
> can
> > > be specified by PipelineOptions when you run your pipeline. This may be
> > > what you want. You build a PTransform passing some of its configuration
> > > parameters as ValueProvider and at run time you set them to actual
> values
> > > that are passed to the UDFs in your pipeline.
> > >
> > > Hope this helps. Despite not being deeply involved in Python, I wanted
> to
> > > lay out the territory so someone else could comment further without
> > having
> > > to go into background.
> > >
> > > Kenn
> > >
> > > On Tue, Jul 11, 2017 at 3:03 PM, Dmitry Demeshchuk <
> dmi...@postmates.com
> > >
> > > wrote:
> > >
> > >> Hi folks,
> > >>
> > >> Sometimes, it would be very useful if PTransforms had access to global
> > >> pipeline options, such as various credentials, settings and so on.
> > >>
> > >> Per conversation in https://issues.apache.org/jira/browse/BEAM-2572,
> > I'd
> > >> like to kick off a discussion about that.
> > >>
> > >> This would be beneficial for at least one major use case: support for
> > >> different cloud providers (AWS, Azure, etc) and an ability to specify
> > each
> > >> provider's credentials just once in the pipeline options.
> > >>
> > >> It looks like the trickiest part is not to make the PTransform objects
> > have
> > >> access to pipeline options (we could possibly just modified the
> > >> Pipeline.apply
> > >>  > >> apache_beam/pipeline.py#L355>
> > >> method), but to actually pass these options down the road, such as to
> > DoFn
> > >> objects and FileSystem objects.
> > >>
> > >> I'm still in the process of reading the code and understanding of what
> > this
> > >> could look like, so any input would be really appreciated.
> > >>
> > >> Thank you.
> > >>
> > >> --
> > >> Best regards,
> > >> Dmitry Demeshchuk.
> > >>
> >
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>


[Proposal] Submitting pipelines to Runners in another language

2017-07-05 Thread Sourabh Bajaj
Hi,

I wanted to share a proposal for submitting pipelines from SDK X
(Python/Go) to runners written in another language Y (Java) (Flink / Spark
/ Apex) using the Runner API. Please find the doc here

.

As always comments and feedback are welcome.

Thanks
Sourabh


Re: [DISCUSSION] Encouraging more contributions

2017-06-29 Thread Sourabh Bajaj
The Rust community is trying an interesting experiment for encouraging more
diversity in the contributors:
https://blog.rust-lang.org/2017/06/27/Increasing-Rusts-Reach.html

On Fri, Apr 28, 2017 at 12:05 PM Sourabh Bajaj <sourabhba...@google.com>
wrote:

> I think they can probably reach out to the mentor for questions like: How
> to navigate the code base? What parts of the code could they use as a
> pattern? This could be done using the preferred mode of communication based
> on the contributor.
>
> My opinion is that large projects and communities may come across as
> intimidating to first time contributors, so being as welcoming and
> encouraging is important.
>
> On Thu, Apr 27, 2017 at 8:52 PM Aviem Zur <aviem...@gmail.com> wrote:
>
>> @
>> Sourabh Bajaj
>>
>> The mentoring on starter tickets is an interesting Idea. How would it
>> technically work?.
>>
>> A new contributor assigns a starter ticket to themselves. What happens
>> from
>> there?
>>
>> On Tue, Apr 25, 2017 at 12:01 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>> > I think it is important to clarify that the developer documentation
>> > discussed in this thread is of two kinds:
>> >
>> > 6.1. Documents with proposals and new designs, those covered by the
>> > Beam Improvement Proposal (BEAM-566), and that we need to put with a
>> > single file index (I remember there was a google dir for this but not
>> > sure it is still valid, and in any case probably the website is a
>> > better place for this). Is there any progress on this?
>> >
>> > 6.2. Documentation about how things work, so new developers can get
>> > into developing features/fixes for the project, those are the kind
>> > that Kenneth/Etienne mention and include Stephen’s IO guide but could
>> > be definitely expanded to include things like how does the different
>> > runner translation works, or some details on triggers/materialization
>> > of panes/windows from the SDK point of view. However the hard part of
>> > this documents is that they should be maintained e.g. updated when the
>> > code evolves so they don’t get outdated as JB mentions.
>> >
>> > On Tue, Apr 25, 2017 at 10:47 AM, Wesley Tanaka
>> > <wtan...@yahoo.com.invalid> wrote:
>> > > These are the ones I've come across so far, are there others?
>> > >
>> > > * Dynamic DoFn https://s.apache.org/a-new-dofn
>> > >
>> > > ** Splittable DoFn (Obsoletes Source API)
>> > http://s.apache.org/splittable-do-fn
>> > >
>> > > ** State and Timers for DoFn: https://s.apache.org/beam-state
>> > >
>> > >
>> > > * Lateness https://s.apache.org/beam-lateness
>> > >
>> > >
>> > > * Metrics API http://s.apache.org/beam-metrics-api
>> > >
>> > > ** I/O Metrics https://s.apache.org/standard-io-metrics
>> > >
>> > >
>> > > * Runner API http://s.apache.org/beam-runner-api
>> > >
>> > > ** https://s.apache.org/beam-runner-composites
>> > >
>> > > ** https://s.apache.org/beam-side-inputs-1-pager
>> > >
>> > >
>> > > * Fn API http://s.apache.org/beam-fn-api
>> > >
>> > > ---
>> > > Wesley Tanaka
>> > > https://wtanaka.com/
>> > >
>> > >
>> > > On Monday, April 24, 2017, 2:45:45 PM HST, Sourabh Bajaj <
>> > sourabhba...@google.com.INVALID> wrote:
>> > > For 6. I think having them in one page on the website where we can
>> find
>> > the
>> > > design docs more easily would be great.
>> > >
>> > > 7. For low-hanging-fruit, one thing I really liked from some Mozilla
>> > > projects was assigning a mentor on the ticket. Someone you can reach
>> out
>> > to
>> > > if you have questions. I think this makes the entry barrier really low
>> > for
>> > > first time contributors who might feel intimidated asking questions
>> > > completely in public.
>> > >
>> > > On Mon, Apr 24, 2017 at 10:06 AM Kenneth Knowles
>> <k...@google.com.invalid
>> > >
>> > > wrote:
>> > >
>> > >> I like the subject Etienne has brought up, and will give it a number
>> in
>> > >> this list :-)
>> > >>
>> > >> 6. Have more technical reference docs (not just workspace set up) for
>> > >> c

Re: [PROPOSAL] for AWS Aurora relational database connector

2017-06-13 Thread Sourabh Bajaj
+1 for S3 being more of a FS

@Madhusudan can you point to some documentation on how to do row-range
queries in Aurora as from a quick scan it follows the MySql 5.6 syntax so
you will still need an order by for the IO to do exactly once reads. So
wanted to learn more about how the questions raised by Eugene are handled.

Thanks
Sourabh

On Mon, Jun 12, 2017 at 9:32 PM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> I think it's a mix of filesystem and IO. For S3, I see more a Beam
> filesystem
> than a pure IO.
>
> WDYT ?
>
> Regards
> JB
>
> On 06/13/2017 02:43 AM, tarush grover wrote:
> > Hi All,
> >
> > I think this can be added under java --> io --> aws-cloud-platform with
> > more io connectors can be added into it eg. S3 also.
> >
> > Regards,
> > Tarush
> >
> > On Mon, Jun 12, 2017 at 4:03 AM, Madhusudan Borkar 
> > wrote:
> >
> >> Yes, I believe so. Thanks for the Jira.
> >>
> >> Madhu Borkar
> >>
> >> On Sat, Jun 10, 2017 at 10:36 PM, Jean-Baptiste Onofré  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I created a Jira to add custom splitting to JdbcIO (but it's not so
> >>> trivial depending of the backends.
> >>>
> >>> Regarding your proposal it sounds interesting, but do you think we will
> >>> have really "parallel" read of the split ? I think splitting makes
> sense
> >> if
> >>> we can do parallel read: if we split to read on an unique backend, it
> >>> doesn't bring lot of improvement.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 06/10/2017 09:28 PM, Madhusudan Borkar wrote:
> >>>
>  Hi,
>  We are proposing to develop connector for AWS Aurora. Aurora being
> >> cluster
>  for relational database (MySQL) has no Java api for reading/writing
> >> other
>  than jdbc client. Although there is a JdbcIO available, it looks like
> it
>  doesn't work in parallel. The proposal is to provide split
> functionality
>  and then use transform to parallelize the operation. As mentioned
> above,
>  this is typical sql based database and not comparable with likes of
> >> Hive.
>  Hive implementation is based on abstraction over Hdfs file system of
>  Hadoop, which provides splits. Here none of these are applicable.
>  During implementation of Hive connector there was lot of discussion as
> >> how
>  to implement connector while strictly following Beam design principal
>  using
>  Bounded source. I am not sure how Aurora connector will fit into these
>  design principals.
>  Here is our proposal.
>  1. Split functionality: If the table contains 'x' rows, it will be
> split
>  into 'n' bundles in the split method. This would be done like follows
> :
>  noOfSplits = 'x' * size of a single row / bundleSize hint from runner.
>  2. Then each of these 'pseudo' splits would be read in parallel
>  3. Each of these reads will use db connection from connection pool.
>  This will provide better bench marking. Please, let know your views.
> 
>  Thanks
>  Madhu Borkar
> 
> 
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: graph generator?

2017-05-25 Thread Sourabh Bajaj
+1

On Thu, May 25, 2017 at 11:10 AM Lukasz Cwik 
wrote:

> +1 on Runner API to dot format or plantuml format.
>
> On Thu, May 25, 2017 at 11:06 AM, Dan Halperin  >
> wrote:
>
> > I think that a util that converted from the Runner API definition of a
> > pipeline into some sort of graph format (like DOT?) would be generally
> > useful. By using the Runner API, the tool would be SDK- and
> > Runner-independent view of the pipeline.
> >
> > On Thu, May 25, 2017 at 10:54 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Hi,
> > >
> > > If you mean a graphical tool, no, it's up to each execution engine
> (it's
> > > what we showed last week at ApacheCon with Davor).
> > >
> > > Some tools can graphically generate the graph with the corresponding
> Beam
> > > pipeline.
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 05/25/2017 07:48 PM, Romain Manni-Bucau wrote:
> > >
> > >> Hello guys,
> > >>
> > >> does beam have a graph generator from a pipeline? Not sure current API
> > >> fully allows to bypass the runner to just get the beam graph but it
> can
> > >> help to have a small main generating a png/svg/ascii/ditaa (or a
> > >> maven/gradle plugin ;))
> > >>
> > >> Needed that in by hazelcast-jet work to visualize the graph, i have a
> > >> quick
> > >> and dirty impl based on jung (BSD license :() based on jet graph but
> > think
> > >> it should be pretty trivial to use directly beam graph based on a
> > pipeline
> > >> visitor.
> > >>
> > >> Mainly sending this mail to share it in case anyone needs it more than
> > >> anything else:
> > >> https://gist.github.com/rmannibucau/b5f4e310b40ce414f95f6e22530bbe6e
> > >>
> > >> Romain Manni-Bucau
> > >> @rmannibucau  |  Blog
> > >>  | Old Blog
> > >>  | Github <
> > >> https://github.com/rmannibucau> |
> > >> LinkedIn  | JavaEE Factory
> > >> 
> > >>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>


Re: [Road map][R SDK]

2017-05-23 Thread Sourabh Bajaj
Hi,

1. I don't think there is a R SDK in development within the main repository
not sure if someone is building it in a fork or not.

2. There might be some demand for it in the data science community but
currently people have been using rpy2 and doing the processing via the
Python SDK based on the few emails we've seen on the mailing list.

3. The biggest blocker in building a R SDK might be good support for GRPC
as the Fn API uses that to communicate between the runner and the worker.

Others might have more thoughts on this.

Best,
Sourabh

On Tue, May 23, 2017 at 8:19 AM AndrasNagy  wrote:

> Hello,
>
> After a bit of googling i decided to ask the following:
>
> Is there an "official" roadmap for 2017?
> If yes, please send me a link. (i found one for 2016, but not so sure if
> that was retrospective)
>
> Is there demand for a R SDK?
> Is there already in construction something?
>
> If yes I would like to ask for a git repo
>
> Thanks in advance!
>
> NagyAndras
>


Re: Website homepage visual refresh

2017-05-16 Thread Sourabh Bajaj
+1 this is great.

On Tue, May 16, 2017 at 10:18 AM Jesse Anderson 
wrote:

> Nice work!
>
> On Tue, May 16, 2017 at 10:09 AM Davor Bonaci  wrote:
>
> > I think it is great too -- since it is an obvious improvement, let's
> merge
> > and iterate!
> >
> > On Tue, May 16, 2017 at 6:06 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Hi Jeremy,
> > >
> > > great job ! I like the new look'n feel.
> > >
> > > Thanks !
> > > Regards
> > > JB
> > >
> > >
> > > On 05/16/2017 07:44 AM, Jeremy Weinstein wrote:
> > >
> > >> Hi Beam community! fran...@apache.org and I have been working on a
> > >> project
> > >> to refresh the visual design of the Beam website. We have the
> following
> > >> few
> > >> goals:
> > >>
> > >> a) Breathe some life into the website homepage
> > >> b) Simplify and clean up the project's CSS and various supporting
> files
> > >> c) Make it a little more fun and engaging for new developers to start
> > >> learning about Beam and enter into the content
> > >> d) Help explain Beam to passive and interested non-users
> > >>
> > >> I'd like the community's help on a few things.
> > >>
> > >> 1) First and foremost, any feedback on the design update is welcome.
> > >> 2) Secondly, there is a section on the homepage for
> testimonials/quotes
> > >> from Beam users and/or organizations about their usage of Beam. We
> could
> > >> set this up on a rotational basis to cycle through quotes, but to
> start,
> > >> if
> > >> anyone knows of any good quotes, posts, or tweets about Beam, I'd like
> > to
> > >> source those and place them into the "A collaborative effort" section.
> > >> Please send them over to me and I can flow them into the build.
> > >>
> > >> We're hoping to refresh the site before or soon after the first stable
> > >> release. For this first pass we've focused on the main landing page,
> but
> > >> next up we'd like to improve several of the inside pages, as well as
> > >> update
> > >> the code toggles, and simplify a bit of the navigational structure.
> > >>
> > >> Sending this PR [1] out now as an FYI and to solicit feedback. We'll
> > make
> > >> a
> > >> few more improvements based on suggestions, as well as a few tweaks to
> > >> TODOs in the header and footer. Feedback is welcome - thanks everyone!
> > >>
> > >> [1] https://github.com/apache/beam-site/pull/244 +
> > >> http://apache-beam-website-pull-requests.storage.googleapis.
> > >> com/244/index.html
> > >>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
> --
> Thanks,
>
> Jesse
>


Re: Congratulations Davor!

2017-05-04 Thread Sourabh Bajaj
Congrats!!
On Thu, May 4, 2017 at 7:48 AM Mingmin Xu  wrote:

> Congratulations @Davor!
>
>
> > On May 4, 2017, at 7:08 AM, Amit Sela  wrote:
> >
> > Congratulations Davor!
> >
> >> On Thu, May 4, 2017, 10:02 JingsongLee  wrote:
> >>
> >> Congratulations!
> >> --
> >> From:Jesse Anderson 
> >> Time:2017 May 4 (Thu) 21:36
> >> To:dev 
> >> Subject:Re: Congratulations Davor!
> >> Congrats!
> >>
> >>> On Thu, May 4, 2017, 6:20 AM Aljoscha Krettek 
> wrote:
> >>>
> >>> Congrats! :-)
>  On 4. May 2017, at 14:34, Kenneth Knowles 
> >>> wrote:
> 
>  Awesome!
> 
> > On Thu, May 4, 2017 at 1:19 AM, Ted Yu  wrote:
> >
> > Congratulations, Davor!
> >
> > On Thu, May 4, 2017 at 12:45 AM, Aviem Zur  >>> wrote:
> >
> >> Congrats Davor! :)
> >>
> >> On Thu, May 4, 2017 at 10:42 AM Jean-Baptiste Onofré <
> >> j...@nanthrax.net>
> >> wrote:
> >>
> >>> Congrats ! Well deserved ;)
> >>>
> >>> Regards
> >>> JB
> >>>
>  On 05/04/2017 09:30 AM, Jason Kuster wrote:
>  Hi all,
> 
>  The ASF has just published a blog post[1] welcoming new members of
> > the
>  Apache Software Foundation, and our own Davor Bonaci is among
> them!
>  Congratulations and thank you to Davor for all of your work for
> the
> >> Beam
>  community, and the ASF at large. Well deserved.
> 
>  Best,
> 
>  Jason
> 
>  [1] https://blogs.apache.org/foundation/entry/the-apache-sof
>  tware-foundation-welcomes
> 
>  P.S. I dug through the list to make sure I wasn't missing any
> other
> >> Beam
>  community members; if I have, my sincerest apologies and please
> >> recognize
>  them on this or a new thread.
> 
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>
> >
> >>>
> >>> --
> >> Thanks,
> >>
> >> Jesse
> >>
> >>
>


Re: [DISCUSSION] Encouraging more contributions

2017-04-24 Thread Sourabh Bajaj
For 6. I think having them in one page on the website where we can find the
design docs more easily would be great.

7. For low-hanging-fruit, one thing I really liked from some Mozilla
projects was assigning a mentor on the ticket. Someone you can reach out to
if you have questions. I think this makes the entry barrier really low for
first time contributors who might feel intimidated asking questions
completely in public.

On Mon, Apr 24, 2017 at 10:06 AM Kenneth Knowles 
wrote:

> I like the subject Etienne has brought up, and will give it a number in
> this list :-)
>
> 6. Have more technical reference docs (not just workspace set up) for
> contributors.
>
> I think this overlaps a lot with a prior discussion about where to collect
> design proposals [1]. Design docs used to be just dropped into a public
> folder, but that got disorganized. And that thread was about work in
> progress, so JIRA was a good place for details after a dev@ thread agrees
> on a proposal. At this point, the designs are pretty solid conceptually or
> even implemented and we could start to build out deeper technical bits on
> the web site, or at least some place that people can find it. We do have
> the Testing Guide and the PTransform Style Guide and somewhere near there
> we could have deeper references. I think we need a broader vision for the
> "table of contents" here.
>
> For my docs (triggers, lateness, runner API, side inputs, state, coders) I
> haven't had time, but I do intend to both translate from GDoc to some other
> format and also rewrite versions for users where appropriate. Probably this
> will mean coming up with that table of contents.
>
> Kenn
>
> [1]
>
> https://lists.apache.org/thread.html/%3c6bc60c88-cf91-4fff-eae6-fea6ee06f...@nanthrax.net%3E
>
>
> On Mon, Apr 24, 2017 at 9:33 AM, Neelesh Salian 
> wrote:
>
> > Agreed. I have some old JIRAs that I am cleaning up.
> >
> > Thank you for bringing this up.
> >
> > On Mon, Apr 24, 2017 at 9:29 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Same also for Slack, github comments, etc.
> > >
> > > From a Apache perspective, it should happen on the mailing list,
> > > eventually referencing a central wiki/faq/whatever.
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 04/24/2017 06:23 PM, Mingmin Xu wrote:
> > >
> > >> many design documents are mixed in maillist, jira comments, it would
> be
> > a
> > >> big help to put them in a centralized list. Also I would expect more
> > >> wiki/blogs to provide in-depth analysis, like the translation from
> > >> pipeline
> > >> to runner specified topology, window/trigger implementation. Without
> > these
> > >> knowledge, it's hard to touch the core concepts.
> > >>
> > >> On Mon, Apr 24, 2017 at 6:03 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > >> wrote:
> > >>
> > >> Got it. By experience on other Apache projects, it's really hard to
> > >>> maintain ;)
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>>
> > >>> On 04/24/2017 02:56 PM, Etienne Chauchot wrote:
> > >>>
> > >>> Hi JB,
> > 
> >  I was proposing a FAQ (or another form), not something about IDE
> > setup.
> >  The FAQ
> >  could group in the same place Q/A like for example "what is a
> source,
> >  how
> >  do I
> >  use it to implement an IO"
> > 
> >  Etienne
> > 
> > 
> >  Le 24/04/2017 à 14:19, Jean-Baptiste Onofré a écrit :
> > 
> >  Hi Etienne,
> > >
> > > What about the contribution guide ? I think it's covered in the
> > > IntelliJ
> > > and
> > > Eclipse setup sections.
> > >
> > > Regards
> > > JB
> > >
> > > On 04/24/2017 02:12 PM, Etienne Chauchot wrote:
> > >
> > > Hi all,
> > >>
> > >> I definitely agree with everything that is said in this thread.
> > >>
> > >> I might suggest another good to have:
> > >>
> > >> to ease the work of a new contributor, it would be nice to have
> some
> > >> sort of
> > >> programming guide but not oriented to pipeline writers but to
> > >> sdk/runner/io/...
> > >> writers.
> > >>
> > >> I know that new contributors have the docs available in the google
> > >> drive, the
> > >> ML, the code base, and the availability of beamers, but maybe
> having
> > >> key points
> > >> in a common place (like FAQ for sdk/runner/io/... writers, for
> > >> example)
> > >> would be
> > >> interesting.
> > >>
> > >> Best,
> > >>
> > >> Etienne
> > >>
> > >>
> > >> Le 24/04/2017 à 09:14, Jean-Baptiste Onofré a écrit :
> > >>
> > >> Hi,
> > >>>
> > >>> I think we already tag the newbie jira ("low hanging fruit" ;)).
> > >>>
> > >>> Good idea for domain of interest/concept.
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>> On 04/24/2017 09:01 AM, Ankur Chauhan wrote:
> > >>>
> > >>> Might I suggest adding 

Re: [PROPOSAL] Remove KeyedCombineFn

2017-04-21 Thread Sourabh Bajaj
+1

On Fri, Apr 21, 2017 at 10:53 AM Thomas Groh 
wrote:

> A happy +1. This simplifies the code base, and if we find a compelling use,
> it shouldn't be too bad to add it back in.
>
> On Fri, Apr 21, 2017 at 10:24 AM, Kenneth Knowles 
> wrote:
>
> > Hi all,
> >
> > I propose that we remove KeyedCombineFn before the first stable release.
> >
> > I don't think it adds enough value for the complexity it adds to e.g.
> > CombineWithContext [1] and state [2, 3], and it doesn't seem to me that
> > users really use it when we might expect. I am happy to be demonstrated
> > wrong.
> >
> > It is very likely that you have never written [4, 5] or thought about
> > KeyedCombineFn. So for context, here are excepts from signatures just to
> > show the difference from CombineFn:
> >
> > CombineFn {
> >   AccumT createAccumulator();
> >   AccumT addInput(AccumT accum, InputT input);
> >   AccumT mergeAccumulators(Iterable accums);
> >   OutputT extractOutput(AccumT accum);
> > }
> >
> > KeyedCombineFn {
> >   AccumT createAccumulator(K key);
> >   AccumT addInput(K key, AccumT accum, InputT input);
> >   AccumT mergeAccumulators(K key, Iterable accums);
> >   OutputT extractOutput(K key, AccumT accum);
> > }
> >
> > So what are the particular reasons for this, versus a CombineFn that has
> > KVs as its input and accumulator types?
> >
> >  - There are some performance improvements potentially from not passing
> > keys around, based on the assumption they are always available.
> >
> >  - There is also a spec difference because it only has to be associative
> > and commutative per key, cannot be applied in a global combine, and
> > addInput is automatically key preserving.
> >
> > But in fact, in all of my code crawling the class is almost never used
> > (even over the course of its history at Google) and even the few uses I
> > found were often mistakes where the key is totally ignored, probably
> > because a user thinks "I am doing a keyed combine so I need a keyed
> combine
> > function". So the number of users actually affected is about zero.
> >
> > I would be curious if anyone has a compelling case for keeping
> > KeyedCombineFn.
> >
> > Kenn
> >
> > [1]
> > https://github.com/yafengguo/Apache-beam/blob/master/sdks/
> > java/core/src/main/java/org/apache/beam/sdk/transforms/
> > CombineWithContext.java
> > [2] https://issues.apache.org/jira/browse/BEAM-1336
> > [3] https://github.com/apache/beam/pull/2627
> > [4]
> > https://github.com/search?l=Java=KeyedCombineFn=
> > advsearch=Code=%E2%9C%93
> > [5] https://www.google.com/search?q=KeyedCombineFn
> >
>


Re: Beam File System in the Python SDK

2017-03-20 Thread Sourabh Bajaj
Thanks for the feedback Tibor. I think in the first iteration we can
probably assume that the underlying filesystem is taking care of permission
enforcement. Once we have a few sources using the FS API we can may be
revisit this as we'll surely learn a few things from that. Thoughts ?

On Sun, Mar 19, 2017 at 12:16 AM Tibor Kiss <tk...@hortonworks.com> wrote:

> Thanks for putting this together, Sourabh!
> I made two comments in the document (error handling, with statement).
>
> Are there any plans to support permissions (mode bits or acls) in the FS
> API?
> I believe most (if not all) of the underlying filesystems support (some
> sort of) permission enforcement.
>
> Thanks,
> Tibor
>
> > On Mar 17, 2017, at 10:14 PM, Sourabh Bajaj 
> > <sourabhba...@google.com.INVALID>
> wrote:
> >
> > Wanted to share the design proposal
> > <
> https://docs.google.com/document/d/10qD0RXmdI0240wPShaGDRm9Zt9a_ess-ABlvYx2LZFA/edit#heading=h.czvx1winvche
> >
> > for the Beam File System API in python. I have marked the places where it
> > might be slightly different from the current Java implementation, mainly
> > around error handling. As always feedback and comments are welcome.
> >
> > Thanks
> > Sourabh
> >
> > On Wed, Mar 1, 2017 at 4:44 PM Chamikara Jayalath <chamik...@apache.org>
> > wrote:
> >
> >> Great! Thanks Sourabh.
> >>
> >> - Cham
> >>
> >> On Wed, Mar 1, 2017 at 3:58 PM Robert Bradshaw
> <rober...@google.com.invalid
> >>>
> >> wrote:
> >>
> >>> Much needed! Added a couple of comments.
> >>>
> >>> On Wed, Mar 1, 2017 at 3:08 PM, Sourabh Bajaj <
> >>> sourabhba...@google.com.invalid> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> BEAM-1441 <https://issues.apache.org/jira/browse/BEAM-1441> is a
> >> ticket
> >>>> for
> >>>> implementing the Beam File System in the Python SDK similar to the one
> >>>> introduced in BEAM-59 <https://issues.apache.org/jira/browse/BEAM-59
> >.
> >> I
> >>>> tried to take a pass on the implementation in #2136
> >>>> <https://github.com/apache/beam/pull/2136> and followed the Java API
> >> as
> >>>> closely as possible. Please feel free to give your comments here or on
> >>> the
> >>>> pull request directly.
> >>>>
> >>>> Reference: Original design doc
> >>>> <https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-
> >>>> XJsVG3qel2lhdKTknmZ_7M/edit#>
> >>>>
> >>>>
> >>>> Thanks
> >>>> Sourabh
> >>>>
> >>>
> >>
>
>


Beam File System in the Python SDK

2017-03-01 Thread Sourabh Bajaj
Hi,

BEAM-1441  is a ticket for
implementing the Beam File System in the Python SDK similar to the one
introduced in BEAM-59 . I
tried to take a pass on the implementation in #2136
 and followed the Java API as
closely as possible. Please feel free to give your comments here or on the
pull request directly.

Reference: Original design doc



Thanks
Sourabh


Re: Release 0.6.0

2017-02-27 Thread Sourabh Bajaj
+1 for the new release

On Mon, Feb 27, 2017 at 2:06 PM Davor Bonaci  wrote:

> +1 -- let's get it started!
>
> On Mon, Feb 27, 2017 at 2:01 PM, Ahmet Altay 
> wrote:
>
> > Hi all,
> >
> > It's been about a month since the last release. I would like propose
> > starting the next release. There are no releasing blocking bugs in JIRA
> > [1]. Are there any release blocking issues I am missing?
> >
> > Unless there is an objection I will volunteer to manage this release.
> This
> > will be the first release with Python content. In case there are issues
> > with that it might be easier for me to resolve and document those as part
> > of the release process.
> >
> > Thank you,
> > Ahmet
> >
> > [1]
> > https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
> > 20fixVersion%20%3D%200.6.0%20ORDER%20BY%20due%20ASC%2C%
> > 20priority%20DESC%2C%20created%20ASC
> >
>


Re: Interest in a (virtual) contributor meeting?

2017-02-22 Thread Sourabh Bajaj
+1

On Wed, Feb 22, 2017 at 2:11 PM Kenneth Knowles 
wrote:

> +1 !!
>
> On Wed, Feb 22, 2017 at 6:19 AM, Kobi Salant 
> wrote:
>
> > +1
> >
> > בתאריך 22 בפבר' 2017 2:54 PM,‏ "Aljoscha Krettek" 
> > כתב:
> >
> > > +1
> > >
> > > On Wed, 22 Feb 2017 at 10:08 JingsongLee 
> > wrote:
> > >
> > > > +1
> > > >
> > > >
> > > > 来自阿里邮箱 iPhone版 --原始邮件 --发件人:Davor
> > Bonaci
> > > <
> > > > da...@apache.org>日期:2017-02-22 11:19:12收件人:dev@beam.apache.org <
> > > > dev@beam.apache.org>主题:Interest in a (virtual) contributor
> meeting?In
> > > the
> > > > early days of the project, we have held a few meetings for the
> > > > initial community to get to know each other. Since then, the
> community
> > > has
> > > > grown a huge amount, but we haven't organized any get-togethers.
> > > >
> > > > I wanted to gauge interest in a potential video conference call in
> the
> > > near
> > > > future. No specific agenda -- simply a chance for everyone to meet
> > others
> > > > and see the faces of people we share a common passion with. Of
> course,
> > an
> > > > open discussion on any topic of interest to the contributor community
> > is
> > > > welcome. This would be strictly informal -- any decisions are
> reserved
> > > for
> > > > the mailing list discussions.
> > > >
> > > > If you'd be interested in attending, please reply back. If there's
> > > > sufficient interest, I'd be happy to try to organize something in the
> > > near
> > > > future.
> > > >
> > > > Thanks!
> > > >
> > > > Davor
> > > >
> > >
> >
>


Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Sourabh Bajaj
Congrats!!

On Thu, Jan 26, 2017 at 5:02 PM Jason Kuster 
wrote:

> Congrats all! Very exciting. :)
>
> On Thu, Jan 26, 2017 at 4:48 PM, Jesse Anderson 
> wrote:
>
> > Welcome!
> >
> > On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci  wrote:
> >
> > > Please join me and the rest of Beam PMC in welcoming the following
> > > contributors as our newest committers. They have significantly
> > contributed
> > > to the project in different ways, and we look forward to many more
> > > contributions in the future.
> > >
> > > * Stas Levin
> > > Stas has contributed across the breadth of the project, from the Spark
> > > runner to the core pieces and Java SDK. Looking at code contributions
> > > alone, he authored 43 commits and reported 25 issues. Stas is very
> active
> > > on the mailing lists too, contributing to good discussions and
> proposing
> > > improvements to the Beam model.
> > >
> > > * Ahmet Altay
> > > Ahmet is a major contributor to the Python SDK, both in terms of design
> > and
> > > code contribution. Looking at code contributions alone, he authored 98
> > > commits and reviewed dozens of pull requests. With Python SDK’s
> imminent
> > > merge to the master branch, Ahmet contributed towards establishing a
> new
> > > major component in Beam.
> > >
> > > * Pei He
> > > Pei has been contributing to Beam since its inception, accumulating a
> > total
> > > of 118 commits since February. He has made several major contributions,
> > > most recently by redesigning IOChannelFactory / FileSystem APIs (in
> > > progress), which would extend Beam’s portability to many additional
> file
> > > systems and cloud providers.
> > >
> > > Congratulations to all three! Welcome!
> > >
> > > Davor
> > >
> >
>
>
>
> --
> ---
> Jason Kuster
> Apache Beam (Incubating) / Google Cloud Dataflow
>