[jira] [Created] (BEAM-1164) Allow a DoFn to opt in to mutating it's input
Frances Perry created BEAM-1164: --- Summary: Allow a DoFn to opt in to mutating it's input Key: BEAM-1164 URL: https://issues.apache.org/jira/browse/BEAM-1164 Project: Beam Issue Type: Bug Components: beam-model Reporter: Frances Perry Priority: Minor Runners generally can't tell if a DoFn is mutating inputs, but assuming so by default leads to significant performance implications from unnecessary copying (around sibling fusion, etc). So instead the model prevents mutating inputs, and the Direct Runner validates this behavior. (See: http://beam.incubator.apache.org/contribute/design-principles/#make-efficient-things-easy-rather-than-make-easy-things-efficient) However, if users are processing a small number of large records by making incremental changes (for example, genomics use cases), the cost of immutability requirement can be very large. As a workaround, users sometimes do suboptimal things (fusing ParDos by hand) or undefined things when they expect the immutability requirement is unnecessarily strict (adding no-op coders in places they hope the runner won't be materializing things, mutating things anyway when they don't expect sibling fusion to happen, etc). We should consider adding a signal (MutatingDoFn?) that users explicitly opt in to to say their code may mutate inputs. The runner can then use this assumption to either prevent optimizations that would break in the face of this or insert additional copies as needed to allow optimizations to preserve semantics. See this related user@ discussion: https://lists.apache.org/thread.html/f39689f54147117f3fc54c498eff1a20fa73f1be5b5cad5b6f816fd3@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1070) Service Account Based Authentication Broken
[ https://issues.apache.org/jira/browse/BEAM-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-1070: Assignee: Ahmet Altay (was: Frances Perry) > Service Account Based Authentication Broken > --- > > Key: BEAM-1070 > URL: https://issues.apache.org/jira/browse/BEAM-1070 > Project: Beam > Issue Type: Bug > Components: sdk-py > Environment: CentOS Linux release 7.1.1503 (Core) > Python 2.7.5 >Reporter: Stephen Reichling >Assignee: Ahmet Altay >Priority: Critical > > {{sdks/python/apache_beam/internal/auth.py}} calls into the > {{oauth2client.service_account.ServiceAccountCredentials.from_p12_keyfile}} > method with invalid and incorrectly-ordered parameters. Compare the [function > signature of > ServiceAccountCredentials.from_p12_keyfile|https://github.com/google/oauth2client/blob/ae73312942d3cf0e98f097dfbb40f136c2a7c463/oauth2client/service_account.py#L300-L303] > with [how it is > invoked|https://github.com/apache/incubator-beam/blob/9ded359daefc6040d61a1f33c77563474fcb09b6/sdks/python/apache_beam/internal/auth.py#L150-L154]. > This causes a runtime error when one attempts to use a service account to > authenticate with the Google Dataflow APIs. > The specific problems are: > - the {{client_scopes}} variable (a list) is passed as a positional > parameter where the function signature expects the {{private_key_password}} > parameter (a string). > - a keyed parameter, {{user_agent}}, is passed but no such parameter is > defined in the function signature. > - no value is provided for {{private_key_password}}. All p12 key files for > service accounts issued by Google Cloud have the password {{notasecret}} as > documented > [here|https://support.google.com/cloud/answer/6158849?hl=en#serviceaccounts], > so it's currently not possible to use a Google-issued p12 key file with this > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1068) Service Account Credentials File Specified via Pipeline Option Ignored
[ https://issues.apache.org/jira/browse/BEAM-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-1068: Assignee: Ahmet Altay (was: Frances Perry) > Service Account Credentials File Specified via Pipeline Option Ignored > -- > > Key: BEAM-1068 > URL: https://issues.apache.org/jira/browse/BEAM-1068 > Project: Beam > Issue Type: Bug > Components: sdk-py > Environment: CentOS Linux release 7.1.1503 (Core) > Python 2.7.5 >Reporter: Stephen Reichling >Assignee: Ahmet Altay >Priority: Minor > > When writing a pipeline that authenticates with Google Dataflow APIs using a > service account, specifying the path to that service account's credentials > file in the {{PipelineOptions}} object passed in to the pipeline does not > work, it only works when passed as a command-line flag. > For example, if I write code like so: > {code} > pipelineOptions = options.PipelineOptions() > gcOptions = pipelineOptions.view_as(options.GoogleCloudOptions) > gcOptions.service_account_name = 'My Service Account Name' > gcOptions.service_account_key_file = '/some/path/keyfile.p12' > pipeline = beam.Pipeline(options=pipelineOptions) > # ... add stages to the pipeline > p.run() > {code} > and execute it like so: > {{python ./my_pipeline.py}} > ...the service account I specify will not be used. > Only if I were to execute the code like so: > {{python ./my_pipeline.py --service_account_name 'My Service Account Name' > --service_account_key_file /some/path/keyfile.p12}} > ...does it actually use the service account. > The problem appears to be rooted in `auth.py` which reconstructs the > {{PipelineOptions}} object directly from {{sys.argv}} rather than using the > instance passed in to the pipeline: > https://github.com/apache/incubator-beam/blob/9ded359daefc6040d61a1f33c77563474fcb09b6/sdks/python/apache_beam/internal/auth.py#L129-L130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-666) Add accurate "How to Run" instructions for each of the WC examples
[ https://issues.apache.org/jira/browse/BEAM-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-666: -- Assignee: Frances Perry (was: Hadar Hod) > Add accurate "How to Run" instructions for each of the WC examples > -- > > Key: BEAM-666 > URL: https://issues.apache.org/jira/browse/BEAM-666 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Hadar Hod >Assignee: Frances Perry > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-667) Include code snippets from real examples
[ https://issues.apache.org/jira/browse/BEAM-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-667: -- Assignee: Frances Perry (was: Hadar Hod) > Include code snippets from real examples > > > Key: BEAM-667 > URL: https://issues.apache.org/jira/browse/BEAM-667 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Hadar Hod >Assignee: Frances Perry > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-667) Include code snippets from real examples
[ https://issues.apache.org/jira/browse/BEAM-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654554#comment-15654554 ] Frances Perry commented on BEAM-667: These need to be redone given https://github.com/apache/incubator-beam/pull/1315 > Include code snippets from real examples > > > Key: BEAM-667 > URL: https://issues.apache.org/jira/browse/BEAM-667 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Hadar Hod >Assignee: Hadar Hod > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-900) Spark quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641199#comment-15641199 ] Frances Perry commented on BEAM-900: Amit, could you help find this an owner? Thanks! > Spark quickstart instructions > - > > Key: BEAM-900 > URL: https://issues.apache.org/jira/browse/BEAM-900 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: Amit Sela > > After initial quickstart structure is pushed, add commandlines for Spark > execution to quickstart.md and detailed Spark setup instructions to > learn/runners/spark.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-900) Spark quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-900: --- Assignee: Amit Sela (was: James Malone) > Spark quickstart instructions > - > > Key: BEAM-900 > URL: https://issues.apache.org/jira/browse/BEAM-900 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: Amit Sela > > After initial quickstart structure is pushed, add commandlines for Spark > execution to quickstart.md and detailed Spark setup instructions to > learn/runners/spark.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (BEAM-899) Flink quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641198#comment-15641198 ] Frances Perry edited comment on BEAM-899 at 11/6/16 5:34 AM: - Aljoscha, could you help make sure this finds an owner? Thanks! was (Author: frances): Aljosha, could you help make sure this finds an owner? Thanks! > Flink quickstart instructions > - > > Key: BEAM-899 > URL: https://issues.apache.org/jira/browse/BEAM-899 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: Aljoscha Krettek > > After initial quickstart structure is pushed, add commandlines for Flink > execution to quickstart.md and detailed Flink setup instructions to > learn/runners/flink.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-899) Flink quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-899: --- Assignee: Aljoscha Krettek (was: James Malone) > Flink quickstart instructions > - > > Key: BEAM-899 > URL: https://issues.apache.org/jira/browse/BEAM-899 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: Aljoscha Krettek > > After initial quickstart structure is pushed, add commandlines for Flink > execution to quickstart.md and detailed Flink setup instructions to > learn/runners/flink.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-899) Flink quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641198#comment-15641198 ] Frances Perry commented on BEAM-899: Aljosha, could you help make sure this finds an owner? Thanks! > Flink quickstart instructions > - > > Key: BEAM-899 > URL: https://issues.apache.org/jira/browse/BEAM-899 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: Aljoscha Krettek > > After initial quickstart structure is pushed, add commandlines for Flink > execution to quickstart.md and detailed Flink setup instructions to > learn/runners/flink.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-919) Remove remaining old use/learn links from website src
Frances Perry created BEAM-919: -- Summary: Remove remaining old use/learn links from website src Key: BEAM-919 URL: https://issues.apache.org/jira/browse/BEAM-919 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone Priority: Minor We still have old links lingering after the website refactoring. For example, the release guide (https://github.com/apache/incubator-beam-site/blob/asf-site/src/contribute/release-guide.md) still links to "/use/..." in a bunch of places. impact: links still work because of redirects, but it's tech debt we should fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-905) Archetype pom needs to generalize dependencies
[ https://issues.apache.org/jira/browse/BEAM-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-905: --- Affects Version/s: 0.4.0-incubating > Archetype pom needs to generalize dependencies > -- > > Key: BEAM-905 > URL: https://issues.apache.org/jira/browse/BEAM-905 > Project: Beam > Issue Type: Bug >Affects Versions: 0.4.0-incubating > Environment: Currently the archetype pom includes the direct runner > and the dataflow one, but not the others. It should do the same magic as the > main examples. >Reporter: Frances Perry >Assignee: Pei He > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-909) Starter archetype's pom doesn't include the right dependencies
[ https://issues.apache.org/jira/browse/BEAM-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635305#comment-15635305 ] Frances Perry commented on BEAM-909: Whoops, type. Meant BEAM-905, which is about the example archetype. > Starter archetype's pom doesn't include the right dependencies > -- > > Key: BEAM-909 > URL: https://issues.apache.org/jira/browse/BEAM-909 > Project: Beam > Issue Type: Bug >Affects Versions: 0.4.0-incubating >Reporter: Frances Perry > > Repro: > $ mvn archetype:generate -DarchetypeGroupId=org.apache.beam > -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter > -DarchetypeVersion=LATEST -DgroupId=org.example > -DartifactId=beam-starter -Dversion="0.1" -DinteractiveMode=false > The resulting pom doesn't seem to have dependencies on any runners or a > profile for enabling them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-909) Starter archetype's pom doesn't include the right dependencies
Frances Perry created BEAM-909: -- Summary: Starter archetype's pom doesn't include the right dependencies Key: BEAM-909 URL: https://issues.apache.org/jira/browse/BEAM-909 Project: Beam Issue Type: Bug Affects Versions: 0.4.0-incubating Reporter: Frances Perry Repro: $ mvn archetype:generate -DarchetypeGroupId=org.apache.beam -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter -DarchetypeVersion=LATEST -DgroupId=org.example -DartifactId=beam-starter -Dversion="0.1" -DinteractiveMode=false The resulting pom doesn't seem to have dependencies on any runners or a profile for enabling them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-909) Starter archetype's pom doesn't include the right dependencies
[ https://issues.apache.org/jira/browse/BEAM-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635282#comment-15635282 ] Frances Perry commented on BEAM-909: (Related to BEAM-904, although that one includes on Direct and Dataflow.) > Starter archetype's pom doesn't include the right dependencies > -- > > Key: BEAM-909 > URL: https://issues.apache.org/jira/browse/BEAM-909 > Project: Beam > Issue Type: Bug >Affects Versions: 0.4.0-incubating >Reporter: Frances Perry > > Repro: > $ mvn archetype:generate -DarchetypeGroupId=org.apache.beam > -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter > -DarchetypeVersion=LATEST -DgroupId=org.example > -DartifactId=beam-starter -Dversion="0.1" -DinteractiveMode=false > The resulting pom doesn't seem to have dependencies on any runners or a > profile for enabling them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-899) Flink quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634565#comment-15634565 ] Frances Perry commented on BEAM-899: The quickstart uses archetypes, so unfortunately the instructions will need a step to hand edit the pom until BEAM-905 is fixed and released. > Flink quickstart instructions > - > > Key: BEAM-899 > URL: https://issues.apache.org/jira/browse/BEAM-899 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: James Malone > > After initial quickstart structure is pushed, add commandlines for Flink > execution to quickstart.md and detailed Flink setup instructions to > learn/runners/flink.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-900) Spark quickstart instructions
[ https://issues.apache.org/jira/browse/BEAM-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634564#comment-15634564 ] Frances Perry commented on BEAM-900: The quickstart uses archetypes, so unfortunately the instructions will need a step to hand edit the pom until BEAM-905 is fixed and released. > Spark quickstart instructions > - > > Key: BEAM-900 > URL: https://issues.apache.org/jira/browse/BEAM-900 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: James Malone > > After initial quickstart structure is pushed, add commandlines for Spark > execution to quickstart.md and detailed Spark setup instructions to > learn/runners/spark.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-904) Dataflow setup instructions
[ https://issues.apache.org/jira/browse/BEAM-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634567#comment-15634567 ] Frances Perry commented on BEAM-904: If BEAM-899 and BEAM-900 are fixed before this is released, they will likely include a hack that will need to be removed. > Dataflow setup instructions > --- > > Key: BEAM-904 > URL: https://issues.apache.org/jira/browse/BEAM-904 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Frances Perry >Assignee: Melissa Pashniak > > As you are working on the Dataflow Runner page, please include the getting > started instructions, as I'm linking there from the quickstart. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-905) Archetype pom needs to generalize dependencies
Frances Perry created BEAM-905: -- Summary: Archetype pom needs to generalize dependencies Key: BEAM-905 URL: https://issues.apache.org/jira/browse/BEAM-905 Project: Beam Issue Type: Bug Environment: Currently the archetype pom includes the direct runner and the dataflow one, but not the others. It should do the same magic as the main examples. Reporter: Frances Perry Assignee: Pei He -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-904) Dataflow setup instructions
Frances Perry created BEAM-904: -- Summary: Dataflow setup instructions Key: BEAM-904 URL: https://issues.apache.org/jira/browse/BEAM-904 Project: Beam Issue Type: Sub-task Components: website Reporter: Frances Perry Assignee: Melissa Pashniak As you are working on the Dataflow Runner page, please include the getting started instructions, as I'm linking there from the quickstart. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-902) Add runner toggles
Frances Perry created BEAM-902: -- Summary: Add runner toggles Key: BEAM-902 URL: https://issues.apache.org/jira/browse/BEAM-902 Project: Beam Issue Type: Sub-task Components: website Reporter: Frances Perry Assignee: Abdullah Bashir As discussed on pull/752, extend the language toggle support to be able to toggle commandlines between different runners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-900) Spark quickstart instructions
Frances Perry created BEAM-900: -- Summary: Spark quickstart instructions Key: BEAM-900 URL: https://issues.apache.org/jira/browse/BEAM-900 Project: Beam Issue Type: Sub-task Components: website Reporter: Frances Perry Assignee: James Malone After initial quickstart structure is pushed, add commandlines for Spark execution to quickstart.md and detailed Spark setup instructions to learn/runners/spark.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-899) Flink quickstart instructions
Frances Perry created BEAM-899: -- Summary: Flink quickstart instructions Key: BEAM-899 URL: https://issues.apache.org/jira/browse/BEAM-899 Project: Beam Issue Type: Sub-task Components: website Reporter: Frances Perry Assignee: James Malone After initial quickstart structure is pushed, add commandlines for Flink execution to quickstart.md and detailed Flink setup instructions to learn/runners/flink.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-895) Transport.newStorageClient requires credentials
Frances Perry created BEAM-895: -- Summary: Transport.newStorageClient requires credentials Key: BEAM-895 URL: https://issues.apache.org/jira/browse/BEAM-895 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Frances Perry Assignee: Davor Bonaci Fix For: 0.4.0-incubating Transport.newStorageClient requires credentials, even if those aren't needed. Impact: Examples use publicly accessible files on Google Cloud Storage, however reading those is still requiring the user to authenticate with Google Cloud Storage. java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Unable to get application default credentials. Please see https://developers.google.com/accounts/docs/application-default-credentials for details on how to specify credentials. This version of the SDK is dependent on the gcloud core component version 2015.02.05 or newer to be able to get credentials from the currently authorized user via gcloud auth. at org.apache.beam.sdk.util.Credentials.getCredential(Credentials.java:123) at org.apache.beam.sdk.util.GcpCredentialFactory.getCredential(GcpCredentialFactory.java:43) at org.apache.beam.sdk.options.GcpOptions$GcpUserCredentialsFactory.create(GcpOptions.java:264) at org.apache.beam.sdk.options.GcpOptions$GcpUserCredentialsFactory.create(GcpOptions.java:254) at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:549) at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:490) at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:152) at com.sun.proxy.$Proxy52.getGcpCredential(Unknown Source) at org.apache.beam.sdk.util.Transport.newStorageClient(Transport.java:148) at org.apache.beam.sdk.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:96) at org.apache.beam.sdk.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:84) at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:549) at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:490) at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:152) at com.sun.proxy.$Proxy52.getGcsUtil(Unknown Source) at org.apache.beam.sdk.util.GcsIOChannelFactory.match(GcsIOChannelFactory.java:43) at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:283) at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:195) at org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76) at org.apache.beam.runners.direct.DirectRunner.apply(DirectRunner.java:226) at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:400) at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:323) at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:58) at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:173) at org.apache.beam.examples.WordCount.main(WordCount.java:195) ... 6 more Caused by: java.io.IOException: The Application Default Credentials are not available. They are available if running on Google App Engine, Google Compute Engine, or Google Cloud Shell. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information. at com.google.api.client.googleapis.auth.oauth2.DefaultCredentialProvider.getDefaultCredential(DefaultCredentialProvider.java:98) at com.google.api.client.googleapis.auth.oauth2.GoogleCredential.getApplicationDefault(GoogleCredential.java:213) at com.google.api.client.googleapis.auth.oauth2.GoogleCredential.getApplicationDefault(GoogleCredential.java:191) at org.apache.beam.sdk.util.Credentials.getCredential(Credentials.java:121) ... 30 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-892) revamp quickstart
[ https://issues.apache.org/jira/browse/BEAM-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-892: -- Assignee: Frances Perry (was: James Malone) > revamp quickstart > - > > Key: BEAM-892 > URL: https://issues.apache.org/jira/browse/BEAM-892 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > > We need to make this quickstart actually a quickstart! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-892) revamp quickstart
Frances Perry created BEAM-892: -- Summary: revamp quickstart Key: BEAM-892 URL: https://issues.apache.org/jira/browse/BEAM-892 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone We need to make this quickstart actually a quickstart! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-752) infrastructure for toggling code snippets in documentation
[ https://issues.apache.org/jira/browse/BEAM-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-752: --- Assignee: Abdullah Bashir (was: James Malone) > infrastructure for toggling code snippets in documentation > -- > > Key: BEAM-752 > URL: https://issues.apache.org/jira/browse/BEAM-752 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Abdullah Bashir > Labels: starter > > Once the python sdk gets merged to the master branch, a lot of our > documentation (programming guide, walkthroughs, etc) will need to support > multiple languages. > The hope is that the vast bulk of the prose can be written about Beam > concepts in a language independent way. But for code snippets it would be > great to be able to toggle languages. > Goals: > * Support tabbed language toggles for both code and small sections of text. > * Support easily changing the default per-user-visit so that the entire file > (or even better entire site) defaults to showing a specific language -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-842) dependency.py: package not found when running on Windows
[ https://issues.apache.org/jira/browse/BEAM-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-842: --- Assignee: Ahmet Altay (was: Frances Perry) > dependency.py: package not found when running on Windows > > > Key: BEAM-842 > URL: https://issues.apache.org/jira/browse/BEAM-842 > Project: Beam > Issue Type: Bug > Components: sdk-py >Affects Versions: 0.4.0-incubating > Environment: Windows 10, Python 2.7.11 >Reporter: Matthias Baetens >Assignee: Ahmet Altay >Priority: Minor > Labels: newbie > > When having splitting your pipeline into multiple files and configuring your > project according to the Juliaset example > (https://cloud.google.com/dataflow/pipelines/dependencies-python#multiple-file-dependencies), > the Pipeline still crashes when using Windows. > This is caused by setuptools defaulting to a .zip on Windows, and the current > Beam code looks for a .tar.gz (dependency.py, line 400). When changing this > line to: output_files = glob.glob(os.path.join(temp_dir, '*.zip')), it works. > Suggestion: checking the OS would probably solve this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-835) add Intellij instructions to the contribution guide
Frances Perry created BEAM-835: -- Summary: add Intellij instructions to the contribution guide Key: BEAM-835 URL: https://issues.apache.org/jira/browse/BEAM-835 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Priority: Minor Add Intellij-specific instructions to the contribution guide, to go alongside the Eclipse instructions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-193) Port existing Dataflow SDK documentation to Beam Programming Guide
[ https://issues.apache.org/jira/browse/BEAM-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-193: --- Assignee: Melissa Pashniak (was: Devin Donnelly) > Port existing Dataflow SDK documentation to Beam Programming Guide > -- > > Key: BEAM-193 > URL: https://issues.apache.org/jira/browse/BEAM-193 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Melissa Pashniak > > There is an extensive amount of documentation on the Dataflow SDK programming > model and classes. Port this documentation over as a new Beam Programming > Guide covering the following major topics: > - Programming model overview > - Pipeline structure > - PCollections > - Transforms > - I/O -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-505) Fill in the learn/runners/direct portion of the website
[ https://issues.apache.org/jira/browse/BEAM-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-505: --- Assignee: Melissa Pashniak (was: James Malone) > Fill in the learn/runners/direct portion of the website > --- > > Key: BEAM-505 > URL: https://issues.apache.org/jira/browse/BEAM-505 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: Melissa Pashniak > > As per > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. > Should be a landing page for the Direct runner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-508) Fill in the learn/runners/dataflow portion of the website
[ https://issues.apache.org/jira/browse/BEAM-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-508: --- Assignee: Melissa Pashniak (was: James Malone) > Fill in the learn/runners/dataflow portion of the website > - > > Key: BEAM-508 > URL: https://issues.apache.org/jira/browse/BEAM-508 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: Melissa Pashniak > > As per > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. > Should be a landing page for Dataflow-runner-specific content -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-749) Syntax highlight on website
[ https://issues.apache.org/jira/browse/BEAM-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593317#comment-15593317 ] Frances Perry commented on BEAM-749: Confirmed James isn't currently working on this. Reassigning to myself. > Syntax highlight on website > --- > > Key: BEAM-749 > URL: https://issues.apache.org/jira/browse/BEAM-749 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: James Malone > > We should able to enable rouge on the website in order to get syntax > highlighting in the programming guide, walkthroughs, etc. > https://jekyllrb.com/docs/templates/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-749) Syntax highlight on website
[ https://issues.apache.org/jira/browse/BEAM-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-749: -- Assignee: Frances Perry (was: James Malone) > Syntax highlight on website > --- > > Key: BEAM-749 > URL: https://issues.apache.org/jira/browse/BEAM-749 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > > We should able to enable rouge on the website in order to get syntax > highlighting in the programming guide, walkthroughs, etc. > https://jekyllrb.com/docs/templates/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-602) make feature branches more discoverable
[ https://issues.apache.org/jira/browse/BEAM-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry resolved BEAM-602. Resolution: Fixed Fix Version/s: Not applicable > make feature branches more discoverable > --- > > Key: BEAM-602 > URL: https://issues.apache.org/jira/browse/BEAM-602 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > Fix For: Not applicable > > > We have great things happening on feature branches, but they are a bit hidden. > - update the contribution guide to add instructions for working on branches > - add a page under contribute/ that lists the feature branches, links to > their JIRAs, etc. > - add a quick link from pages in use/ and learn/ to help make this > discoverable for adventurous users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-721) Travis CI fails to run Python tox tests on Mac
[ https://issues.apache.org/jira/browse/BEAM-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry resolved BEAM-721. Resolution: Fixed Fix Version/s: Not applicable > Travis CI fails to run Python tox tests on Mac > -- > > Key: BEAM-721 > URL: https://issues.apache.org/jira/browse/BEAM-721 > Project: Beam > Issue Type: Bug > Components: sdk-py > Environment: Mac >Reporter: Pablo Estrada >Assignee: Frances Perry > Fix For: Not applicable > > > Some Travis CI runs on Mac are failing because the test script can not find > tox. > See: https://travis-ci.org/apache/incubator-beam/jobs/165306424#L86 > The travis.yml file does attempt to install tox (See: > https://github.com/apache/incubator-beam/blob/python-sdk/.travis.yml#L66) > Looking at the logs, it seems that tox is available in a different directory > (/usr/local), and TOX_HOME is set to $HOME/Library/Python/2.7/bin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (BEAM-753) Travis failure (cannot import name locked_file)
[ https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reopened BEAM-753: (doh, wrong tab ;-) ) > Travis failure (cannot import name locked_file) > --- > > Key: BEAM-753 > URL: https://issues.apache.org/jira/browse/BEAM-753 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Ahmet Altay >Assignee: Ahmet Altay > Fix For: Not applicable > > > ERROR: Failure: ImportError (cannot import name locked_file) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py", > line 418, in loadTestsFromName > addr.filename, addr.module) > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py", > line 47, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py", > line 94, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py", > line 78, in > from apache_beam import io > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py", > line 21, in > from apache_beam.io.avroio import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py", > line 29, in > from apache_beam.io import filebasedsource > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py", > line 31, in > from apache_beam.io import concat_source > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py", > line 24, in > from apache_beam.io import iobase > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py", > line 818, in > from apache_beam.runners.dataflow.native_io.iobase import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py", > line 23, in > from apache_beam.runners.dataflow_runner import DataflowPipelineRunner > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py", > line 43, in > from apache_beam.internal.clients import dataflow as dataflow_api > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py", > line 23, in > from apitools.base.py import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py", > line 22, in > from apitools.base.py.credentials_lib import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py", > line 50, in > from oauth2client import locked_file > ImportError: cannot import name locked_file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-753) Travis failure (cannot import name locked_file)
[ https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry resolved BEAM-753. Resolution: Fixed Fix Version/s: Not applicable > Travis failure (cannot import name locked_file) > --- > > Key: BEAM-753 > URL: https://issues.apache.org/jira/browse/BEAM-753 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Ahmet Altay >Assignee: Ahmet Altay > Fix For: Not applicable > > > ERROR: Failure: ImportError (cannot import name locked_file) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py", > line 418, in loadTestsFromName > addr.filename, addr.module) > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py", > line 47, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py", > line 94, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py", > line 78, in > from apache_beam import io > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py", > line 21, in > from apache_beam.io.avroio import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py", > line 29, in > from apache_beam.io import filebasedsource > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py", > line 31, in > from apache_beam.io import concat_source > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py", > line 24, in > from apache_beam.io import iobase > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py", > line 818, in > from apache_beam.runners.dataflow.native_io.iobase import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py", > line 23, in > from apache_beam.runners.dataflow_runner import DataflowPipelineRunner > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py", > line 43, in > from apache_beam.internal.clients import dataflow as dataflow_api > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py", > line 23, in > from apitools.base.py import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py", > line 22, in > from apitools.base.py.credentials_lib import * > File > "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py", > line 50, in > from oauth2client import locked_file > ImportError: cannot import name locked_file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-751) infrastructure for extracting code snippets into documentation
[ https://issues.apache.org/jira/browse/BEAM-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-751: --- Issue Type: Improvement (was: Bug) > infrastructure for extracting code snippets into documentation > -- > > Key: BEAM-751 > URL: https://issues.apache.org/jira/browse/BEAM-751 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: James Malone > Labels: starter > > As we fill in more and more documentation, the number of code snippets is > going to drastically increase, and we should ensure the quality of those > snippets by automatically extracting them from code that is regularly > compiled and tested. > Goals: > * automatically extract code snippets from incubator-beam for use in the beam > website documentation > * use stable references so folks editing the code can clearly tell what > documentation changes this will result in (good: specially formatted comment, > bad: line number) > * freshness (is live possible? or at least during the general 'jekyll build' > phase?) > The best we've found so far is using jekyll-gist with gist-it, but that would > rely on fragile line numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-749) Syntax highlight on website
[ https://issues.apache.org/jira/browse/BEAM-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-749: --- Issue Type: Improvement (was: Bug) > Syntax highlight on website > --- > > Key: BEAM-749 > URL: https://issues.apache.org/jira/browse/BEAM-749 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: James Malone > > We should able to enable rouge on the website in order to get syntax > highlighting in the programming guide, walkthroughs, etc. > https://jekyllrb.com/docs/templates/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-752) infrastructure for toggling code snippets in documentation
Frances Perry created BEAM-752: -- Summary: infrastructure for toggling code snippets in documentation Key: BEAM-752 URL: https://issues.apache.org/jira/browse/BEAM-752 Project: Beam Issue Type: Improvement Components: website Reporter: Frances Perry Assignee: James Malone Once the python sdk gets merged to the master branch, a lot of our documentation (programming guide, walkthroughs, etc) will need to support multiple languages. The hope is that the vast bulk of the prose can be written about Beam concepts in a language independent way. But for code snippets it would be great to be able to toggle languages. Goals: * Support tabbed language toggles for both code and small sections of text. * Support easily changing the default per-user-visit so that the entire file (or even better entire site) defaults to showing a specific language -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-751) infrastructure for extracting code snippets into documentation
Frances Perry created BEAM-751: -- Summary: infrastructure for extracting code snippets into documentation Key: BEAM-751 URL: https://issues.apache.org/jira/browse/BEAM-751 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As we fill in more and more documentation, the number of code snippets is going to drastically increase, and we should ensure the quality of those snippets by automatically extracting them from code that is regularly compiled and tested. Goals: * automatically extract code snippets from incubator-beam for use in the beam website documentation * use stable references so folks editing the code can clearly tell what documentation changes this will result in (good: specially formatted comment, bad: line number) * freshness (is live possible? or at least during the general 'jekyll build' phase?) The best we've found so far is using jekyll-gist with gist-it, but that would rely on fragile line numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-749) Syntax highlight on website
Frances Perry created BEAM-749: -- Summary: Syntax highlight on website Key: BEAM-749 URL: https://issues.apache.org/jira/browse/BEAM-749 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone We should able to enable rouge on the website in order to get syntax highlighting in the programming guide, walkthroughs, etc. https://jekyllrb.com/docs/templates/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-728) Javadoc should clearly separate facts from runner requirements
Frances Perry created BEAM-728: -- Summary: Javadoc should clearly separate facts from runner requirements Key: BEAM-728 URL: https://issues.apache.org/jira/browse/BEAM-728 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Frances Perry Assignee: Davor Bonaci The javadoc for View.asMap() says the map needs to fit in memory. That's not true in all runners. (For example, Dataflow has distributed map support.) https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/View.java This is likely just one specific case of a more general issue -- different runners will have common constraints on the scalability of portions of the model. Currently these are documented in the capability matrix on the website, but for usability we should consider surfacing these constraints on particularly relevant methods. But keeping things in sync in multiple locations is hard... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types
[ https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553995#comment-15553995 ] Frances Perry commented on BEAM-570: Assigning to Konstantinos to follow up after #1053 is in. > Update AvroSource to support more compression types > --- > > Key: BEAM-570 > URL: https://issues.apache.org/jira/browse/BEAM-570 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Konstantinos Katsiapis > > Python AvroSource [1] currently only support 'deflate' compression. We should > update it to support other compression types supported by the Avro library > (e.g.: snappy, bzip2). > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-570) Update AvroSource to support more compression types
[ https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-570: --- Assignee: Konstantinos Katsiapis > Update AvroSource to support more compression types > --- > > Key: BEAM-570 > URL: https://issues.apache.org/jira/browse/BEAM-570 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Konstantinos Katsiapis > > Python AvroSource [1] currently only support 'deflate' compression. We should > update it to support other compression types supported by the Avro library > (e.g.: snappy, bzip2). > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-602) make feature branches more discoverable
Frances Perry created BEAM-602: -- Summary: make feature branches more discoverable Key: BEAM-602 URL: https://issues.apache.org/jira/browse/BEAM-602 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: Frances Perry We have great things happening on feature branches, but they are a bit hidden. - update the contribution guide to add instructions for working on branches - add a page under contribute/ that lists the feature branches, links to their JIRAs, etc. - add a quick link from pages in use/ and learn/ to help make this discoverable for adventurous users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-590) Port examples web docs from Dataflow to Beam website.
[ https://issues.apache.org/jira/browse/BEAM-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437677#comment-15437677 ] Frances Perry commented on BEAM-590: Is this a dup of BEAM-194? > Port examples web docs from Dataflow to Beam website. > - > > Key: BEAM-590 > URL: https://issues.apache.org/jira/browse/BEAM-590 > Project: Beam > Issue Type: New Feature > Components: examples-java >Reporter: Pei He >Priority: Minor > > I am removing references to dataflow website in examples, such as: > https://cloud.google.com/dataflow/java-sdk/wordcount-example > Creating this issue to track web docs that we might want to port to Beam. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-276) Add PCollections Section
[ https://issues.apache.org/jira/browse/BEAM-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry closed BEAM-276. -- Resolution: Fixed Fix Version/s: Not applicable Done by Devin: http://beam.incubator.apache.org/learn/programming-guide/#pcollection > Add PCollections Section > > > Key: BEAM-276 > URL: https://issues.apache.org/jira/browse/BEAM-276 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Devin Donnelly > Fix For: Not applicable > > > Add section with overview and usage of PCollection class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-277) Add Transforms Section
[ https://issues.apache.org/jira/browse/BEAM-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435328#comment-15435328 ] Frances Perry commented on BEAM-277: Partially completed: http://beam.incubator.apache.org/learn/programming-guide/#transforms > Add Transforms Section > -- > > Key: BEAM-277 > URL: https://issues.apache.org/jira/browse/BEAM-277 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Devin Donnelly > > Document general transforms usage and ParDo usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-275) Add Pipelines Section
[ https://issues.apache.org/jira/browse/BEAM-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry closed BEAM-275. -- Resolution: Fixed Fix Version/s: Not applicable Completed by Devin: http://beam.incubator.apache.org/learn/programming-guide/#pipeline > Add Pipelines Section > - > > Key: BEAM-275 > URL: https://issues.apache.org/jira/browse/BEAM-275 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Devin Donnelly > Fix For: Not applicable > > > Document overview and usage of Pipeline object, including creation and > options assignment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-274) Add Programming Guide Skeleton
[ https://issues.apache.org/jira/browse/BEAM-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry closed BEAM-274. -- Resolution: Fixed Fix Version/s: Not applicable > Add Programming Guide Skeleton > -- > > Key: BEAM-274 > URL: https://issues.apache.org/jira/browse/BEAM-274 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Devin Donnelly > Fix For: Not applicable > > > Creating headings, front matter, and TOC for table of contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-274) Add Programming Guide Skeleton
[ https://issues.apache.org/jira/browse/BEAM-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435322#comment-15435322 ] Frances Perry commented on BEAM-274: Looks like this was already completed: http://beam.incubator.apache.org/learn/programming-guide/ Sorry for the miscommunication. I'll do a pass over Devin's issues and close the ones he finished. > Add Programming Guide Skeleton > -- > > Key: BEAM-274 > URL: https://issues.apache.org/jira/browse/BEAM-274 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Devin Donnelly > Fix For: Not applicable > > > Creating headings, front matter, and TOC for table of contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-566) Implement proposal process
Frances Perry created BEAM-566: -- Summary: Implement proposal process Key: BEAM-566 URL: https://issues.apache.org/jira/browse/BEAM-566 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: Frances Perry As discussed on the dev list... - Update contribution guide to explain what the design doc / proposal should include (like is done in https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals) - Clearly track the open proposals (potentially in JIRA with a known label and incrementing proposal IDs). - Set expectations around the timelines for proposals -- both to ensure enough feedback is gathered and perhaps inactive proposals are archived. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-555) Documentation in BiqQueryIO.java has awkward cut-and-paste error.
[ https://issues.apache.org/jira/browse/BEAM-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-555: --- Assignee: Frank Yellin (was: Davor Bonaci) > Documentation in BiqQueryIO.java has awkward cut-and-paste error. > - > > Key: BEAM-555 > URL: https://issues.apache.org/jira/browse/BEAM-555 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Frank Yellin >Assignee: Frank Yellin >Priority: Trivial > Original Estimate: 5m > Remaining Estimate: 5m > > Twice in the documentation, the sample code reads from > samples.weather_stations and called the resulting TableRow "shakespeare". > I suspect that these lines of code were copied from a different example, and > then only partially modified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-556) typo in documentation
[ https://issues.apache.org/jira/browse/BEAM-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-556: --- Assignee: Frank Yellin (was: Frances Perry) > typo in documentation > - > > Key: BEAM-556 > URL: https://issues.apache.org/jira/browse/BEAM-556 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Frank Yellin >Assignee: Frank Yellin >Priority: Trivial > Original Estimate: 2m > Remaining Estimate: 2m > > transform.py: > ergument -> argument > in documentation for parse_label_and_args -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-541) Add more documentation on Java DoFn Annotations
Frances Perry created BEAM-541: -- Summary: Add more documentation on Java DoFn Annotations Key: BEAM-541 URL: https://issues.apache.org/jira/browse/BEAM-541 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone Priority: Minor https://github.com/apache/incubator-beam-site/pull/36 made the basic documentation changes that correspond to BEAM-498, but we should add more details on how to use the advance configurations for window access, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-515) Add feature logo and incubator logo
[ https://issues.apache.org/jira/browse/BEAM-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405326#comment-15405326 ] Frances Perry commented on BEAM-515: Submitted https://github.com/apache/incubator-beam-site/pull/33 > Add feature logo and incubator logo > --- > > Key: BEAM-515 > URL: https://issues.apache.org/jira/browse/BEAM-515 > Project: Beam > Issue Type: Bug > Components: website >Affects Versions: Not applicable >Reporter: Daniel Halperin >Assignee: Frances Perry >Priority: Critical > > Except from: > http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E > A feather ASF logo would be a nice addition as well. [4] > http://www.apache.org/foundation/press/kit/#links > While we're in there, I believe we still need to add the Apache Incubator egg > logo. http://incubator.apache.org/images/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-514) Add all mandatory links
[ https://issues.apache.org/jira/browse/BEAM-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-514: -- Assignee: Frances Perry (was: James Malone) > Add all mandatory links > --- > > Key: BEAM-514 > URL: https://issues.apache.org/jira/browse/BEAM-514 > Project: Beam > Issue Type: Bug > Components: website >Affects Versions: Not applicable >Reporter: Daniel Halperin >Assignee: Frances Perry > > Except from: > http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E > > Branding wise I think you are missing a few of the > required links [3] including a link back to the Apache homepage. > http://www.apache.org/foundation/marks/pmcs.html#navigation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-515) Add feature logo and incubator logo
[ https://issues.apache.org/jira/browse/BEAM-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-515: -- Assignee: Frances Perry (was: James Malone) > Add feature logo and incubator logo > --- > > Key: BEAM-515 > URL: https://issues.apache.org/jira/browse/BEAM-515 > Project: Beam > Issue Type: Bug > Components: website >Affects Versions: Not applicable >Reporter: Daniel Halperin >Assignee: Frances Perry >Priority: Critical > > Except from: > http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E > A feather ASF logo would be a nice addition as well. [4] > http://www.apache.org/foundation/press/kit/#links > While we're in there, I believe we still need to add the Apache Incubator egg > logo. http://incubator.apache.org/images/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-516) Update navigation for Javadoc
[ https://issues.apache.org/jira/browse/BEAM-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404687#comment-15404687 ] Frances Perry commented on BEAM-516: Added a basic link. Didn't do anything fancy yet with a latest link -- so leaving this bug to track that. > Update navigation for Javadoc > -- > > Key: BEAM-516 > URL: https://issues.apache.org/jira/browse/BEAM-516 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Ismaël Mejía >Assignee: Frances Perry >Priority: Minor > Attachments: screenshot.png > > > The link to the latest version of the java documentation dissapeared with the > recent changes to the website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-516) Update navigation for Javadoc
[ https://issues.apache.org/jira/browse/BEAM-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-516: --- Assignee: James Malone (was: Frances Perry) > Update navigation for Javadoc > -- > > Key: BEAM-516 > URL: https://issues.apache.org/jira/browse/BEAM-516 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Ismaël Mejía >Assignee: James Malone >Priority: Minor > Attachments: screenshot.png > > > The link to the latest version of the java documentation dissapeared with the > recent changes to the website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-500) Update website layout
[ https://issues.apache.org/jira/browse/BEAM-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry closed BEAM-500. -- Resolution: Fixed Fix Version/s: Not applicable > Update website layout > - > > Key: BEAM-500 > URL: https://issues.apache.org/jira/browse/BEAM-500 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > Fix For: Not applicable > > > As discussed on dev@, update the website layout to use this: > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-500) Update website layout
[ https://issues.apache.org/jira/browse/BEAM-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404682#comment-15404682 ] Frances Perry commented on BEAM-500: Filed bugs to fill in remaining missing content. Closing this root issue. > Update website layout > - > > Key: BEAM-500 > URL: https://issues.apache.org/jira/browse/BEAM-500 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > > As discussed on dev@, update the website layout to use this: > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-516) The Javadoc link dissapeared in the website refactoring
[ https://issues.apache.org/jira/browse/BEAM-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404075#comment-15404075 ] Frances Perry commented on BEAM-516: [~jbonofre] Wow, that looks awesome ;-) [~iemejia] Thanks for the report! I'll assign this to myself, since I just created a java SDK subdirectory as part of BEAM-500. Int he meantime, the workaround is to go to the url directly: http://beam.incubator.apache.org/javadoc/0.1.0-incubating/ > The Javadoc link dissapeared in the website refactoring > --- > > Key: BEAM-516 > URL: https://issues.apache.org/jira/browse/BEAM-516 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Ismaël Mejía >Assignee: James Malone >Priority: Minor > Attachments: screenshot.png > > > The link to the latest version of the java documentation dissapeared with the > recent changes to the website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-512) Fill in the contribute/testing section of the website
Frances Perry created BEAM-512: -- Summary: Fill in the contribute/testing section of the website Key: BEAM-512 URL: https://issues.apache.org/jira/browse/BEAM-512 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-511) Fill in the contribute/technical-vision section of the website
Frances Perry created BEAM-511: -- Summary: Fill in the contribute/technical-vision section of the website Key: BEAM-511 URL: https://issues.apache.org/jira/browse/BEAM-511 Project: Beam Issue Type: Bug Reporter: Frances Perry As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-509) Fill in the learn/resources portion of the website
Frances Perry created BEAM-509: -- Summary: Fill in the learn/resources portion of the website Key: BEAM-509 URL: https://issues.apache.org/jira/browse/BEAM-509 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit Do a nicer curation of great Beam articles, videos, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-508) Fill in the learn/runners/dataflow portion of the website
Frances Perry created BEAM-508: -- Summary: Fill in the learn/runners/dataflow portion of the website Key: BEAM-508 URL: https://issues.apache.org/jira/browse/BEAM-508 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. Should be a landing page for Dataflow-runner-specific content -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-507) Fill in the learn/runners/spark portion of the website
Frances Perry created BEAM-507: -- Summary: Fill in the learn/runners/spark portion of the website Key: BEAM-507 URL: https://issues.apache.org/jira/browse/BEAM-507 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. Should be a landing page for Spark-specific information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-506) Fill in the learn/runners/flink portion of the website
Frances Perry created BEAM-506: -- Summary: Fill in the learn/runners/flink portion of the website Key: BEAM-506 URL: https://issues.apache.org/jira/browse/BEAM-506 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. Should be a landing page for Flink-specific details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-505) Fill in the learn/runners/direct portion of the website
Frances Perry created BEAM-505: -- Summary: Fill in the learn/runners/direct portion of the website Key: BEAM-505 URL: https://issues.apache.org/jira/browse/BEAM-505 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. Should be a landing page for the Direct runner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-504) Fill in the learn/sdks/java portion of the website
[ https://issues.apache.org/jira/browse/BEAM-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-504: --- Summary: Fill in the learn/sdks/java portion of the website (was: Fill in use/sdks/java portion of the website) > Fill in the learn/sdks/java portion of the website > -- > > Key: BEAM-504 > URL: https://issues.apache.org/jira/browse/BEAM-504 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: James Malone > > As per > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. > Should be a landing page for Java-SDK-specific content like existing IO > connectors, javadoc, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-504) Fill in use/sdks/java portion of the website
Frances Perry created BEAM-504: -- Summary: Fill in use/sdks/java portion of the website Key: BEAM-504 URL: https://issues.apache.org/jira/browse/BEAM-504 Project: Beam Issue Type: Bug Components: website Reporter: Frances Perry Assignee: James Malone As per https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. Should be a landing page for Java-SDK-specific content like existing IO connectors, javadoc, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-500) Update website layout
[ https://issues.apache.org/jira/browse/BEAM-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-500: --- Summary: Update website layout (was: Update website layou) > Update website layout > - > > Key: BEAM-500 > URL: https://issues.apache.org/jira/browse/BEAM-500 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > > As discussed on dev@, update the website layout to use this: > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-500) Update website layout
[ https://issues.apache.org/jira/browse/BEAM-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402601#comment-15402601 ] Frances Perry commented on BEAM-500: (To be clear, this is just the page/navigation structure. The skin / main page is covered in BEAM-501.) > Update website layout > - > > Key: BEAM-500 > URL: https://issues.apache.org/jira/browse/BEAM-500 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > > As discussed on dev@, update the website layout to use this: > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-500) Update website layout
[ https://issues.apache.org/jira/browse/BEAM-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402598#comment-15402598 ] Frances Perry commented on BEAM-500: Devin started this process in https://github.com/apache/incubator-beam-site/pull/25 I'll do the next round. > Update website layout > - > > Key: BEAM-500 > URL: https://issues.apache.org/jira/browse/BEAM-500 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Frances Perry > > As discussed on dev@, update the website layout to use this: > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-501) Update website skin
Frances Perry created BEAM-501: -- Summary: Update website skin Key: BEAM-501 URL: https://issues.apache.org/jira/browse/BEAM-501 Project: Beam Issue Type: Improvement Components: website Reporter: Frances Perry Assignee: Jean-Baptiste Onofré Update the main landing page and website skin as discussed here https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-500) Update website layou
Frances Perry created BEAM-500: -- Summary: Update website layou Key: BEAM-500 URL: https://issues.apache.org/jira/browse/BEAM-500 Project: Beam Issue Type: Improvement Components: website Reporter: Frances Perry Assignee: Frances Perry As discussed on dev@, update the website layout to use this: https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-434) When examples write output to file it creates many output files instead of one
[ https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373374#comment-15373374 ] Frances Perry commented on BEAM-434: Not overly constraining the sharding to allow the runner to choose bundling that allows good performance is pretty key to the model. So I think it's pretty important to introduce users to this idea in the examples. The direct runner should be careful to create a small (but variable) number of files to show that the default is *not* one or a fixed number. I'd prefer we fix this in a way that is *not* specific to TextIO.Write -- the same thing will happen in many other places. Can we wait for Thomas to return from vacation tomorrow and get his opinion? > When examples write output to file it creates many output files instead of one > -- > > Key: BEAM-434 > URL: https://issues.apache.org/jira/browse/BEAM-434 > Project: Beam > Issue Type: Bug > Components: examples-java >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > > When using `TextIO.Write.to("/path/to/output")` without any restrictions on > the number of shards, it might generate many output files (depending on your > input), for WordCount for example, you'll get as many output files as unique > words in your input. > Since I think examples are expected to execute in a friendly manner to "see" > what it does and not optimize for performance in some way, I suggest to use > `withoutSharding()` when writing the example output to an output file. > Examples I could find that behave this way: > org.apache.beam.examples.WordCount > org.apache.beam.examples.complete.TfIdf > org.apache.beam.examples.cookbook.DeDupExample -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-320) Provide Beam keyturn binary distributions embedding runners and execution runtime
[ https://issues.apache.org/jira/browse/BEAM-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311053#comment-15311053 ] Frances Perry commented on BEAM-320: Ready to use distributions for common usage patterns sounds like a good idea -- it will make things much easier for users. For the Dataflow Runner, I think Google may prefer to provide a Google-built binary distribution based on Beam instead of providing this convenience as part of Beam, because for Google Cloud Platform customers, we may want to package in a few additional libraries for interacting with other Google Cloud Platform services. It doesn't sound right to complicate Beam with those dependencies. But I can definitely see there are some that would make sense as part of Beam. And in any case, we should make all of these distributions easy to find via documentation on the Beam site. (Also keep in mind, that there will be multiple SDKs, so we likely want to name things to include both the runner and the sdk -- beam-java-spark, etc.) > Provide Beam keyturn binary distributions embedding runners and execution > runtime > - > > Key: BEAM-320 > URL: https://issues.apache.org/jira/browse/BEAM-320 > Project: Beam > Issue Type: Wish > Components: build-system >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Now, the only distribution Beam provides is the source distribution. > For new users, it could be interesting to have ready-to-use binary > distribution embedding the SDK, a specific runner with the backend execution > runtime. > For instance, we could provide: > - beam-spark-xxx.tar.gz containing SDK, Spark runner, Spark > - beam-flink-xxx.tar.gz containing SDK, Flink runner, Flink > Thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-262) Native Runners | Direct Compiler
[ https://issues.apache.org/jira/browse/BEAM-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276724#comment-15276724 ] Frances Perry commented on BEAM-262: I'm not sure why you think using Flink or Spark for execution is overkill for what Beam does? Creating a backend that can handle all Beam pipelines at scale is a huge undertaking! I agree with Davor that building backends is generally beyond the scope of Beam currently. Right now we're looking at creating the best programming model for writing data processing pipelines that generalizes functionality of a number of current distributed processing backends. Each backend has its own strengths in terms of what use cases it handles well, and users can choose which one fits their needs. Having a single backend that automatically does the best thing would be great, but I don't think it's feasible yet. > Native Runners | Direct Compiler > - > > Key: BEAM-262 > URL: https://issues.apache.org/jira/browse/BEAM-262 > Project: Beam > Issue Type: Improvement > Components: runner-ideas >Reporter: Suminda Dharmasena >Assignee: Davor Bonaci > > Having to depend on other frameworks to do the heavy lifting means that the > quakes, limitation and overhead of the other platform limits what can be > achieved. Hence is it possible to have Beam directly generate code for LLVM, > JVM and .Net platforms without dependence on any other platform. > Also perhaps there can be code generation than directly native code in high > level languages like C/C++, Java, C#, F#, Rust, Julia, D, Nim, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-193) Port existing Dataflow SDK documentation to Beam Programming Guide
[ https://issues.apache.org/jira/browse/BEAM-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-193: --- Assignee: Devin Donnelly (was: James Malone) > Port existing Dataflow SDK documentation to Beam Programming Guide > -- > > Key: BEAM-193 > URL: https://issues.apache.org/jira/browse/BEAM-193 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Devin Donnelly > > There is an extensive amount of documentation on the Dataflow SDK programming > model and classes. Port this documentation over as a new Beam Programming > Guide covering the following major topics: > - Programming model overview > - Pipeline structure > - PCollections > - Transforms > - I/O -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-192) Create new landing page for Apache Beam Documentation
[ https://issues.apache.org/jira/browse/BEAM-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-192: --- Assignee: Devin Donnelly (was: James Malone) > Create new landing page for Apache Beam Documentation > - > > Key: BEAM-192 > URL: https://issues.apache.org/jira/browse/BEAM-192 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Devin Donnelly > > Revise the current stopgap Apache Beam landing page. > - Explain the benefits of the Beam programming model > - Disclose the status of the various Beam SDKs and runners > - Provide an easy place to access release notes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-194) Create a walkthrough of Beam examples in mobile gaming domain
[ https://issues.apache.org/jira/browse/BEAM-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-194: --- Assignee: Devin Donnelly (was: James Malone) > Create a walkthrough of Beam examples in mobile gaming domain > - > > Key: BEAM-194 > URL: https://issues.apache.org/jira/browse/BEAM-194 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Devin Donnelly > > The Beam SDKs provide a series of example pipelines in the mobile gaming > domain. The Dataflow documentation contains an detailed walkthrough of these > examples, explaining the use case, pipeline design, and some of the code. > Port these examples to the Beam website for Beam users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-138) Extend TextIO to new protocols (and maybe rename to FileIO)
[ https://issues.apache.org/jira/browse/BEAM-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204593#comment-15204593 ] Frances Perry commented on BEAM-138: TextIO is really about a specific file format -- it requires newline-deliminated records. It'd be great to increase the number of things it can read those from though. [~dhalp...@google.com] You probably know the status of generalizing the file system? > Extend TextIO to new protocols (and maybe rename to FileIO) > --- > > Key: BEAM-138 > URL: https://issues.apache.org/jira/browse/BEAM-138 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > The current TextIO supports: > - local file when using directly path like /path/to... > - Google Service file using path like gs:... > On the other hand, we have a contribution (from Tom) to support HDFS. > For an user perspective, it would be easier to use an unique IO supporting > different protocol: > - file: > - gs: > - hdfs: > - mvn: > - ... > It would also be convenient to be able to combine protocols and eventually > use a different coder (for instance xml:hdfs:). > In that case, maybe I would make sense to rename TextIO as generic FileIO. > Thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-137) Add implicit conf/pipeline-default.conf options file
[ https://issues.apache.org/jira/browse/BEAM-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204607#comment-15204607 ] Frances Perry commented on BEAM-137: [~lcwik] Do you have plans for generalizing PipelineOptions in a multi-runner world? How would that affect this? > Add implicit conf/pipeline-default.conf options file > > > Key: BEAM-137 > URL: https://issues.apache.org/jira/browse/BEAM-137 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Jean-Baptiste Onofré >Assignee: Davor Bonaci > > Right now, most of users provide the pipeline options via the main arguments. > For instance, it's the classic way to provide pipeline input, etc. > For convenience, it would be great that the pipeline looks for options in > conf/[pipeline_name]-default.conf by default, and override the options using > the main arguments. > Thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177206#comment-15177206 ] Frances Perry commented on BEAM-91: --- Did you mean "backsies"? ;-) > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-79) Gearpump runner
[ https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173229#comment-15173229 ] Frances Perry commented on BEAM-79: --- Happy to assign it to you, as you will clearly be the expert on Gearpump ;-) But please note that Beam is still very much under construction and there are a number of breaking API changes likely in the near future. So please reach out before getting beyond the early design phase / determining how well the models align. If you haven't yet, I'd start with these resources: http://beam.incubator.apache.org/getting_started/ > Gearpump runner > --- > > Key: BEAM-79 > URL: https://issues.apache.org/jira/browse/BEAM-79 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Tyler Akidau >Assignee: Manu Zhang > > Intel is submitting Gearpump (http://www.gearpump.io) to ASF > (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of > low-level primitives a la MillWheel, with some higher level primitives like > non-merging windowing mixed in. Seems like it would make a nice Beam runner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-79) Gearpump runner
[ https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-79: -- Assignee: Manu Zhang (was: James Malone) > Gearpump runner > --- > > Key: BEAM-79 > URL: https://issues.apache.org/jira/browse/BEAM-79 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Tyler Akidau >Assignee: Manu Zhang > > Intel is submitting Gearpump (http://www.gearpump.io) to ASF > (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of > low-level primitives a la MillWheel, with some higher level primitives like > non-merging windowing mixed in. Seems like it would make a nice Beam runner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-79) Gearpump runner
[ https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-79: -- Assignee: James Malone > Gearpump runner > --- > > Key: BEAM-79 > URL: https://issues.apache.org/jira/browse/BEAM-79 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Tyler Akidau >Assignee: James Malone > > Intel is submitting Gearpump (http://www.gearpump.io) to ASF > (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of > low-level primitives a la MillWheel, with some higher level primitives like > non-merging windowing mixed in. Seems like it would make a nice Beam runner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-77) Reorganize Directory structure
Frances Perry created BEAM-77: - Summary: Reorganize Directory structure Key: BEAM-77 URL: https://issues.apache.org/jira/browse/BEAM-77 Project: Beam Issue Type: Task Components: project-management Reporter: Frances Perry Assignee: Frances Perry Now that we've done the initial Dataflow code drop, we will restructure directories to provide space for additional SDKs and Runners. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-14) Add data integration DSL
[ https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148117#comment-15148117 ] Frances Perry commented on BEAM-14: --- I think there's a few rough concepts in here that may need model extensions, but general this seems to be about supporting a different DSL on top of the existing model. > Add data integration DSL > > > Key: BEAM-14 > URL: https://issues.apache.org/jira/browse/BEAM-14 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Even if users would still be able to use directly the API, it would be great > to provide a DSL on top of the API covering batch and streaming data > processing but also data integration. > Instead of designing a pipeline as a chain of apply() wrapping function > (DoFn), we can provide a fluent DSL allowing users to directly leverage > keyturn functions. > For instance, an user would be able to design a pipeline like: > {code} > .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”); > {code} > The DSL will allow to use existing pipelines, for instance: > {code} > .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all") > {code} > So it means that we will have to create a IO Sink that can trigger the > execution of a target pipeline: (from("trigger:other") triggering the > pipeline execution when another pipeline design starts with > pipeline("other")). We can also imagine to mix the runners: the pipeline() > can be on one runner, the from("trigger:other") can be on another runner). > It's not trivial, but it will give strong flexibility and key value for Beam. > In a second step, we can provide DSLs in different languages (the first one > would be Java, but why not providing XML, akka, scala DSLs). > We can note in previous examples that the DSL would also provide data > integration support to bean in addition of data processing. Data Integration > is an extension of Beam API to support some Enterprise Integration Patterns > (EIPs). As we would need metadata for data integration (even if metadata can > also be interesting in stream/batch data processing pipeline), we can provide > a DataxMessage built on top of PCollection. A DataxMessage would contain: > structured headers > binary payload > For instance, the headers can contains an Avro schema to describe the payload. > The headers can also contains useful information coming from the IO Source > (for instance the partition/path where the data comes from, …). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-12) Apply GroupByKey transforms on PCollection of normal type other than KV
[ https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-12: -- Assignee: Frances Perry Priority: Trivial (was: Major) Component/s: sdk-java-core If you need to do something to the elements to extract the key before grouping, you can use a ParDo (or a derivative like MapElements). So something like: input.apply(ParDo.of(new ExtractFn())) .apply(GroupByKey.create()); I'm not sure what you meant by automatically extracting keys from data -- that sounds like something that would application or domain specific. As always, if you find yourself using a pattern often in your applications, you can create your own composite PTransform do it more compactly. > Apply GroupByKey transforms on PCollection of normal type other than KV > --- > > Key: BEAM-12 > URL: https://issues.apache.org/jira/browse/BEAM-12 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: bakeypan >Assignee: Frances Perry >Priority: Trivial > > Now the GroupByKey transforms can only apply on PCollection >.So I > have to transform PCollection to PCollection > before I want to > apply GroupByKey. > I think we can do better by apply GroupByKey on normal type of PCollection > other than KV.And user can offer one custome extract key function or we can > offer default extract key function.Just like this: > PCollection input = ... > PCollection > result = input.apply(GroupByKey. V>create(new ExtractFn())); -- This message was sent by Atlassian JIRA (v6.3.4#6332)