[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c66525cc Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c66525cc Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c66525cc Branch: refs/heads/asf-site Commit: c66525cc62c38bfd10b3a295ed97f036aa3b856a Parents: 920a0be Author: Ismaël MejÃa Authored: Tue Jun 27 11:57:06 2017 +0200 Committer: Ismaël MejÃa Committed: Tue Jun 27 11:57:06 2017 +0200 -- .../documentation/io/built-in/hadoop/index.html | 42 1 file changed, 42 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/c66525cc/content/documentation/io/built-in/hadoop/index.html -- diff --git a/content/documentation/io/built-in/hadoop/index.html b/content/documentation/io/built-in/hadoop/index.html index a18c9b9..ce66332 100644 --- a/content/documentation/io/built-in/hadoop/index.html +++ b/content/documentation/io/built-in/hadoop/index.html @@ -362,6 +362,48 @@ +Amazon DynamoDB - DynamoDBInputFormat + +To read data from Amazon DynamoDB, use org.apache.hadoop.dynamodb.read.DynamoDBInputFormat. +DynamoDBInputFormat implements the older org.apache.hadoop.mapred.InputFormat interface and to make it compatible with HadoopInputFormatIO which uses the newer abstract class org.apache.hadoop.mapreduce.InputFormat, +a wrapper API is required which acts as an adapter between HadoopInputFormatIO and DynamoDBInputFormat (or in general any InputFormat implementing org.apache.hadoop.mapred.InputFormat) +The below example uses one such available wrapper API - https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java";>https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java + +Configuration dynamoDBConf = new Configuration(); +Job job = Job.getInstance(dynamoDBConf); +com.twitter.elephantbird.mapreduce.input.MapReduceInputFormatWrapper.setInputFormat(org.apache.hadoop.dynamodb.read.DynamoDBInputFormat.class, job); +dynamoDBConf = job.getConfiguration(); +dynamoDBConf.setClass("key.class", Text.class, WritableComparable.class); +dynamoDBConf.setClass("value.class", org.apache.hadoop.dynamodb.DynamoDBItemWritable.class, Writable.class); +dynamoDBConf.set("dynamodb.servicename", "dynamodb"); +dynamoDBConf.set("dynamodb.input.tableName", "table_name"); +dynamoDBConf.set("dynamodb.endpoint", "dynamodb.us-west-1.amazonaws.com"); +dynamoDBConf.set("dynamodb.regionid", "us-west-1"); +dynamoDBConf.set("dynamodb.throughput.read", "1"); +dynamoDBConf.set("dynamodb.throughput.read.percent", "1"); +dynamoDBConf.set("dynamodb.version", "2011-12-05"); +dynamoDBConf.set(DynamoDBConstants.DYNAMODB_ACCESS_KEY_CONF, "aws_access_key"); +dynamoDBConf.set(DynamoDBConstants.DYNAMODB_SECRET_KEY_CONF, "aws_secret_key"); + + + + # The Beam SDK for Python does not support Hadoop InputFormat IO. + + + +Call Read transform as follows: + +PCollectiondynamoDBData = + p.apply("read", + HadoopInputFormatIO. read() + .withConfiguration(dynamoDBConf); + + + + # The Beam SDK for Python does not support Hadoop InputFormat IO. + + +
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/9d94c4bc Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/9d94c4bc Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/9d94c4bc Branch: refs/heads/asf-site Commit: 9d94c4bc5818d24de48524f7c016b773e62bf373 Parents: 2220c8e Author: Ahmet Altay Authored: Mon Jun 26 15:56:23 2017 -0700 Committer: Ahmet Altay Committed: Mon Jun 26 15:56:23 2017 -0700 -- content/contribute/maturity-model/index.html | 2 +- content/get-started/quickstart-py/index.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/9d94c4bc/content/contribute/maturity-model/index.html -- diff --git a/content/contribute/maturity-model/index.html b/content/contribute/maturity-model/index.html index ad623c6..3a3bcf9 100644 --- a/content/contribute/maturity-model/index.html +++ b/content/contribute/maturity-model/index.html @@ -281,7 +281,7 @@ graduation process and is no longer being maintained. QU50 The project strives to respond to documented bug reports in a timely manner. - YES. The project has resolved https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Resolved%2C%20Closed)">550 issues during incubation.Even further, https://issues.apache.org/jira/browse/BEAM/?selectedTab%3Dcom.atlassian.jira.jira-projects-plugin:components-panel=undefined&selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel";>all project components have designated a single committer who gets assigned all newly filed issues for a triage/re-assignment to ensure timely action. + YES. The project has resolved https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Resolved%2C%20Closed)">550 issues during incubation.Even further, https://issues.apache.org/jira/projects/BEAM?selectedItem=com.atlassian.jira.jira-projects-plugin%3Acomponents-page&selectedTab%3Dcom.atlassian.jira.jira-projects-plugin%3Acomponents-panel=undefined";>all project components have designated a single committer who gets assigned all newly filed issues for a triage/re-assignment to ensure timely action. http://git-wip-us.apache.org/repos/asf/beam-site/blob/9d94c4bc/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index d9e1f98..c56034d 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -263,7 +263,7 @@ environmentâs directories. For example, to run wordcount.py, run: -python -m apache_beam.examples.wordcount --input MANIFEST.in --output counts +python -m apache_beam.examples.wordcount --input--output counts
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/94543119 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/94543119 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/94543119 Branch: refs/heads/asf-site Commit: 945431195c1470e916f1ae96daa32ce51c42f21a Parents: d3fb678 Author: Ahmet Altay Authored: Fri Jun 23 17:35:48 2017 -0700 Committer: Ahmet Altay Committed: Fri Jun 23 17:35:48 2017 -0700 -- .../sdks/python-pipeline-dependencies/index.html| 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/94543119/content/documentation/sdks/python-pipeline-dependencies/index.html -- diff --git a/content/documentation/sdks/python-pipeline-dependencies/index.html b/content/documentation/sdks/python-pipeline-dependencies/index.html index 02ada4c..a856add 100644 --- a/content/documentation/sdks/python-pipeline-dependencies/index.html +++ b/content/documentation/sdks/python-pipeline-dependencies/index.html @@ -199,6 +199,15 @@ --extra_package /path/to/package/package-name + +where package-name is the packageâs tarball. If you have the setup.py for that +package then you can build the tarball with the following command: + + python setup.py sdist + + + +See the https://docs.python.org/2/distutils/sourcedist.html";>sdist documentation for more details on this command.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b49b5e01 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b49b5e01 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b49b5e01 Branch: refs/heads/asf-site Commit: b49b5e01cd00513bf3d8884da4f200a91dd172b7 Parents: 993d2c4 Author: Ismaël MejÃa Authored: Thu Jun 15 07:44:52 2017 +0200 Committer: Ismaël MejÃa Committed: Thu Jun 15 07:44:52 2017 +0200 -- content/documentation/io/built-in/index.html | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/b49b5e01/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 5860f77..d88360b 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -170,6 +170,7 @@ https://github.com/apache/beam/tree/master/sdks/java/io/cassandra";>Apache Cassandra Apache Hadoop InputFormat https://github.com/apache/beam/tree/master/sdks/java/io/hbase";>Apache HBase +https://github.com/apache/beam/tree/master/sdks/java/io/hcatalog";>Apache Hive (HCatalog) https://github.com/apache/beam/tree/master/sdks/java/io/mongodb";>MongoDB https://github.com/apache/beam/tree/master/sdks/java/io/jdbc";>JDBC https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery";>Google BigQuery @@ -210,10 +211,6 @@ https://issues.apache.org/jira/browse/BEAM-607";>BEAM-607 -Apache HiveJava -https://issues.apache.org/jira/browse/BEAM-1158";>BEAM-1158 - - Apache ParquetJava https://issues.apache.org/jira/browse/BEAM-214";>BEAM-214
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/110da3a9 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/110da3a9 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/110da3a9 Branch: refs/heads/asf-site Commit: 110da3a9926792abfe9e36fb4341192d32cb449c Parents: df9f33d Author: Ismaël MejÃa Authored: Sat Jun 10 02:17:28 2017 +0200 Committer: Ismaël MejÃa Committed: Sat Jun 10 02:17:28 2017 +0200 -- content/documentation/io/built-in/index.html | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/110da3a9/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index bb6b642..5860f77 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -167,6 +167,7 @@ https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io";>Google Cloud PubSub +https://github.com/apache/beam/tree/master/sdks/java/io/cassandra";>Apache Cassandra Apache Hadoop InputFormat https://github.com/apache/beam/tree/master/sdks/java/io/hbase";>Apache HBase https://github.com/apache/beam/tree/master/sdks/java/io/mongodb";>MongoDB @@ -205,10 +206,6 @@ https://issues.apache.org/jira/browse/BEAM-1237";>BEAM-1237 -Apache CassandraJava -https://issues.apache.org/jira/browse/BEAM-245";>BEAM-245 - - Apache DistributedLogJava https://issues.apache.org/jira/browse/BEAM-607";>BEAM-607
[2/3] beam-site git commit: Regenerate website
http://git-wip-us.apache.org/repos/asf/beam-site/blob/5c993c61/content/contribute/runner-guide/index.html -- diff --git a/content/contribute/runner-guide/index.html b/content/contribute/runner-guide/index.html new file mode 100644 index 000..2dc6917 --- /dev/null +++ b/content/contribute/runner-guide/index.html @@ -0,0 +1,1375 @@ + + + + + + + Runner Authoring Guide + + https://fonts.googleapis.com/css?family=Roboto:100,300,400"; rel="stylesheet"> + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/contribute/runner-guide/"; data-proofer-ignore> + + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + Toggle navigation + + + + + + + + + Get Started + +Beam Overview +Quickstart - Java +Quickstart - Python + +Example Walkthroughs +WordCount +Mobile Gaming + +Resources +Downloads +Support + + + + Documentation + +Using the Documentation + +Beam Concepts +Programming Guide +Additional Resources + +Pipeline Fundamentals +Design Your Pipeline +Create Your Pipeline +Test Your Pipeline +Pipeline I/O + +SDKs +Java SDK +Java SDK API Reference + +Python SDK +Python SDK API Reference + + +Runners +Capability Matrix +Direct Runner +Apache Apex Runner +Apache Flink Runner +Apache Spark Runner +Cloud Dataflow Runner + + + + Contribute + +Get Started Contributing + +Guides +Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide +Runner Authoring Guide + +Technical References +Design Principles +Ongoing Projects +Source Repository + +Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:20px;"> + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + Runner Authoring Guide + +This guide walks through how to implement a new runner. It is aimed at someone +who has a data processing system and wants to use it to execute a Beam +pipeline. The guide starts from the basics, to help you evaluate the work +ahead. Then the sections become more and more detailed, to be a resource +throughout the development of your runner. + +Topics covered: + + + Basics of the Beam model + Pipeline + PTransforms + PCollections + Bounded vs Unbounded + Timestamps + Watermarks + Windowed elements + Coder + Windowing Strategy + + + User-Defined Functions (UDFs) + Runner + + + Implementing the Beam Primitives + What if you havenât implemented some of these features? + Implementing the ParDo primitive + Bundles + The DoFn Lifecycle + DoFnRunner(s) + Side Inputs + State and Timers + Splittable DoFn + + + Implementing the GroupByKey (and window) primitive + Group By Encoded Bytes + Window Merging + Implementing via GroupByKeyOnly + GroupAlsoByWindow + Dropping late data + Triggering + TimestampCombiner +
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f8d9fc15 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f8d9fc15 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f8d9fc15 Branch: refs/heads/asf-site Commit: f8d9fc15b9c7c14f5229e529f8d2813a1c462adc Parents: 8ff65fe Author: Ismaël MejÃa Authored: Tue Jun 6 09:33:30 2017 +0200 Committer: Ismaël MejÃa Committed: Tue Jun 6 09:33:30 2017 +0200 -- .../documentation/io/built-in/hadoop/index.html | 31 1 file changed, 31 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f8d9fc15/content/documentation/io/built-in/hadoop/index.html -- diff --git a/content/documentation/io/built-in/hadoop/index.html b/content/documentation/io/built-in/hadoop/index.html index ed7ee0b..e323674 100644 --- a/content/documentation/io/built-in/hadoop/index.html +++ b/content/documentation/io/built-in/hadoop/index.html @@ -330,6 +330,37 @@ The org.elasticsearch.hadoop.mr.EsInputFormatâs EsInputFormat key class is org.apache.hadoop.io.Text Text, and its value class is org.elasticsearch.hadoop.mr.LinkedMapWritable LinkedMapWritable. Both key and value classes have Beam Coders. +HCatalog - HCatInputFormat + +To read data using HCatalog, use org.apache.hive.hcatalog.mapreduce.HCatInputFormat, which needs the following properties to be set: + +Configuration hcatConf = new Configuration(); +hcatConf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class, InputFormat.class); +hcatConf.setClass("key.class", LongWritable.class, Object.class); +hcatConf.setClass("value.class", HCatRecord.class, Object.class); +hcatConf.set("hive.metastore.uris", "thrift://metastore-host:port"); + +org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(hcatConf, "my_database", "my_table", "my_filter"); + + + + # The Beam SDK for Python does not support Hadoop InputFormat IO. + + + +Call Read transform as follows: + +PCollection> hcatData = + p.apply("read", + HadoopInputFormatIO. read() + .withConfiguration(hcatConf); + + + + # The Beam SDK for Python does not support Hadoop InputFormat IO. + + +
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c59f3122 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c59f3122 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c59f3122 Branch: refs/heads/asf-site Commit: c59f31220f73badccb4ddb9b74e7533ee13829b8 Parents: 13e2e31 Author: Ahmet Altay Authored: Wed May 31 10:55:11 2017 -0700 Committer: Ahmet Altay Committed: Wed May 31 10:55:11 2017 -0700 -- .../contribute/contribution-guide/index.html| 48 +--- 1 file changed, 42 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/c59f3122/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index 110f966..a651497 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -152,6 +152,7 @@ Obtain a GitHub account Fork the repository on GitHub Clone the repository locally + [Python SDK Only] Set up a virtual environemt [Optional] IDE Setup IntelliJ Enable Annotation Processing @@ -171,7 +172,11 @@ Create a branch in your fork Syncing and pushing your branch - Testing + Testing + Java SDK + Python SDK + + Review @@ -280,8 +285,6 @@ Fork the repository on GitHub Go to the https://github.com/apache/beam/";>Beam GitHub mirror and fork the repository to your own private account. This will be your private workspace for staging changes. -We recommend enabling Travis-CI continuous integration coverage on your private fork in order to easily test your changes before proposing a pull request. Go to https://travis-ci.org";>Travis-CI, log in with your GitHub account, and enable coverage for your repository. - Clone the repository locally You are now ready to create the development environment on your local machine. Feel free to repeat these steps on all machines that you want to use for development. @@ -302,6 +305,10 @@ $ cd beam You are now ready to start developing! +[Python SDK Only] Set up a virtual environemt + +We recommend setting up a virtual envioment for developing Python SDK. Please see instructions available in Quickstart (Python) for setting up a virtual environment. + [Optional] IDE Setup Depending on your preferred development environment, you may need to prepare it to develop Beam code. @@ -475,12 +482,41 @@ $ git checkout -borigin/master Testing All code should have appropriate unit testing coverage. New code should have new tests in the same contribution. Bug fixes should include a regression test to prevent the issue from reoccurring. -For contributions to the Java code, run unit tests locally via Maven. Alternatively, you can use Travis-CI. +Java SDK + +For contributions to the Java code, run unit tests locally via Maven. $ mvn clean verify +Python SDK + +For contributions to the Python code, you can use command given below to run unit tests locally. If you update any of the http://cython.org";>cythonized files in Python SDK, you must install âcythonâ package before running following command to properly test your code. We recommend setting up a virtual environment before testing your code. + +$ python setup.py test + + + +You can use following command to run a single test method. + +$ python setup.py test -s . . + + + +To Check for lint errors locally, install âtoxâ package and run following command. + +$ pip install tox +$ tox -e lint + + + +Beam supports running Python SDK tests using Maven. For this, navigate to root directory of your Apache Beam clone and execute following command. Currently this cannot be run from a virtual environment. + +$ mvn clean verify -pl sdks/python + + + Review Once the initial code is complete and the tests pass, itâs time to start the code review process. We review and discuss all code, no matter who authors it. Itâs a great way to build community, since you can learn from other developers, and they become familiar with your contribution. It also builds a strong project by encouraging a high quality bar and keeping code consistent throughout the project. @@ -512,7 +548,7 @@ $ git checkout -b origin/master Code Review and Revision During the code review process, donât rebase your branch or otherwise modify published commits, since this can remove existing comment history and be confusing t
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/047a2107 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/047a2107 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/047a2107 Branch: refs/heads/asf-site Commit: 047a2107bdfacb5c2d89241c874ce033391e579a Parents: 1f78f64 Author: Ahmet Altay Authored: Wed May 24 16:16:00 2017 -0700 Committer: Ahmet Altay Committed: Wed May 24 16:16:00 2017 -0700 -- content/documentation/io/built-in/index.html| 4 .../documentation/programming-guide/index.html | 22 ++-- .../sdks/python-custom-io/index.html| 20 +- .../sdks/python-type-safety/index.html | 1 - .../get-started/wordcount-example/index.html| 2 +- 5 files changed, 26 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/047a2107/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 79e47f7..6b3de1b 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -255,6 +255,10 @@ RestIOJava https://issues.apache.org/jira/browse/BEAM-1946";>BEAM-1946 + +TikaIOJava +https://issues.apache.org/jira/browse/BEAM-2328";>BEAM-2328 + http://git-wip-us.apache.org/repos/asf/beam-site/blob/047a2107/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 56b5481..7b27b37 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -402,14 +402,14 @@ -p = beam.Pipeline(options=pipeline_options) +with beam.Pipeline(options=pipeline_options) as p: -lines = (p - | beam.Create([ - 'To be, or not to be: that is the question: ', - 'Whether \'tis nobler in the mind to suffer ', - 'The slings and arrows of outrageous fortune, ', - 'Or to take arms against a sea of troubles, '])) + lines = (p + | beam.Create([ + 'To be, or not to be: that is the question: ', + 'Whether \'tis nobler in the mind to suffer ', + 'The slings and arrows of outrageous fortune, ', + 'Or to take arms against a sea of troubles, '])) @@ -1128,10 +1128,10 @@ guest, [[], [order4]] lower_bound=pvalue.AsSingleton(avg_word_len))) # Mix and match. -small_but_nontrivial = words | beam.FlatMap(filter_using_length, -lower_bound=2, -upper_bound=pvalue.AsSingleton( -avg_word_len)) +small_but_nontrivial = words | beam.FlatMap( +filter_using_length, +lower_bound=2, +upper_bound=pvalue.AsSingleton(avg_word_len)) # We can also pass side inputs to a ParDo transform, which will get passed to its process method. http://git-wip-us.apache.org/repos/asf/beam-site/blob/047a2107/content/documentation/sdks/python-custom-io/index.html -- diff --git a/content/documentation/sdks/python-custom-io/index.html b/content/documentation/sdks/python-custom-io/index.html index d6eb9f2..2952ea4 100644 --- a/content/documentation/sdks/python-custom-io/index.html +++ b/content/documentation/sdks/python-custom-io/index.html @@ -368,8 +368,8 @@ To read data from the source in your pipeline, use the Read transform: -p = beam.Pipeline(options=PipelineOptions()) -numbers = p | 'ProduceNumbers' >> beam.io.Read(CountingSource(count)) +with beam.Pipeline(options=PipelineOptions()) as p: + numbers = p | 'ProduceNumbers' >> beam.io.Read(CountingSource(count)) @@ -512,11 +512,11 @@ numbers = p | 'ProduceNumbers' >> beam.io.Read(CountingSource(count)) The following code demonstrates how to write to the sink using the Write transform. -p = beam.Pipeline(options=PipelineOptions()) -kvs = p | 'CreateKVs' >> beam.Create(KVs) +with beam.Pipeline(options=PipelineOptions()) as p: + kvs = p | 'CreateKVs' >> beam.Create(KVs) -kvs | 'WriteToSimpleKV' >> beam.io.Write( -SimpleKVSink('http://url_to_simple_kv/', final_table_name)) + kvs | 'WriteToSimpleKV' >> beam.io.Write( + SimpleKVSink('http://url_to_simple_kv/', final_table_name)) @@ -569,10 +569,10 @@ numbers = p | 'ProduceNumbers' >> ReadFromCountingSource(count) Finally, write to the sink: -p =
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0b678294 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0b678294 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0b678294 Branch: refs/heads/asf-site Commit: 0b678294a57237842a57e7afa86c31f5c8eb1e55 Parents: 0c8e759 Author: Davor Bonaci Authored: Wed May 17 04:14:58 2017 -0700 Committer: Davor Bonaci Committed: Wed May 17 04:14:58 2017 -0700 -- .../2017/05/17/beam-first-stable-release.html | 308 +++ content/blog/index.html | 16 + content/feed.xml| 148 ++--- content/index.html | 10 +- 4 files changed, 442 insertions(+), 40 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/0b678294/content/blog/2017/05/17/beam-first-stable-release.html -- diff --git a/content/blog/2017/05/17/beam-first-stable-release.html b/content/blog/2017/05/17/beam-first-stable-release.html new file mode 100644 index 000..fabace3 --- /dev/null +++ b/content/blog/2017/05/17/beam-first-stable-release.html @@ -0,0 +1,308 @@ + + + + + + + Apache Beam publishes the first stable release + + https://fonts.googleapis.com/css?family=Roboto:100,300,400"; rel="stylesheet"> + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/blog/2017/05/17/beam-first-stable-release.html"; data-proofer-ignore> + + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + Toggle navigation + + + + + + + + + Get Started + +Beam Overview +Quickstart - Java +Quickstart - Python + +Example Walkthroughs +WordCount +Mobile Gaming + +Resources +Downloads +Support + + + + Documentation + +Using the Documentation + +Beam Concepts +Programming Guide +Additional Resources + +Pipeline Fundamentals +Design Your Pipeline +Create Your Pipeline +Test Your Pipeline +Pipeline I/O + +SDKs +Java SDK +Java SDK API Reference + +Python SDK +Python SDK API Reference + + +Runners +Capability Matrix +Direct Runner +Apache Apex Runner +Apache Flink Runner +Apache Spark Runner +Cloud Dataflow Runner + + + + Contribute + +Get Started Contributing + +Guides +Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical References +Design Principles +Ongoing Projects +Source Repository + +Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:20px;"> + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + +http://schema.org/BlogPosting";> + + +Apache Beam publishes the first stable release +May 17, 2017 ⢠Davor Bonaci [https://twitter.com/BonaciDavor";>@BonaciDavor] & Dan Halperin + + + + +The Apache Beam community is pleased to https://blogs.apache.org/foundation/e
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/30a93035 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/30a93035 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/30a93035 Branch: refs/heads/asf-site Commit: 30a93035d450b23e45a64fb8d884b4d57e861707 Parents: 101e813 Author: Ahmet Altay Authored: Mon May 15 11:21:28 2017 -0700 Committer: Ahmet Altay Committed: Mon May 15 11:21:28 2017 -0700 -- content/documentation/index.html | 2 +- content/get-started/index.html | 2 +- content/get-started/wordcount-example/index.html | 6 +++--- content/index.html | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/30a93035/content/documentation/index.html -- diff --git a/content/documentation/index.html b/content/documentation/index.html index 7d006b8..1399ac2 100644 --- a/content/documentation/index.html +++ b/content/documentation/index.html @@ -200,7 +200,7 @@ Beam is designed to enable pipelines to be portable across different runners. However, given every runner has different capabilities, they also have different abilities to implement the core concepts in the Beam model. The Capability Matrix provides a detailed comparison of runner functionality. -Once you have chosen which runner to use, see that runnerâs page for more information about any initial runner-specific setup as well as any required or optional PipelineOptions for configuring itâs execution. You may also want to refer back to the Quickstart for instructions on executing the sample WordCount pipeline. +Once you have chosen which runner to use, see that runnerâs page for more information about any initial runner-specific setup as well as any required or optional PipelineOptions for configuring itâs execution. You may also want to refer back to the Quickstart for Java or Python for instructions on executing the sample WordCount pipeline. http://git-wip-us.apache.org/repos/asf/beam-site/blob/30a93035/content/get-started/index.html -- diff --git a/content/get-started/index.html b/content/get-started/index.html index b33241c..bfcd605 100644 --- a/content/get-started/index.html +++ b/content/get-started/index.html @@ -158,7 +158,7 @@ Learn about the Beam model, the currently available Beam SDKs and Runners, and Beamâs native I/O connectors. -Quickstart +Quickstart for Java or Python Learn how to set up a Beam project and run a simple example Beam pipeline on your local machine. http://git-wip-us.apache.org/repos/asf/beam-site/blob/30a93035/content/get-started/wordcount-example/index.html -- diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html index 333cfb9..f0c027b 100644 --- a/content/get-started/wordcount-example/index.html +++ b/content/get-started/wordcount-example/index.html @@ -211,7 +211,7 @@ Minimal WordCount demonstrates a simple pipeline that can read from a text file, apply transforms to tokenize and count the words, and write the data to an output text file. This example hard-codes the locations for its input and output files and doesnât perform any error checking; it is intended to only show you the âbare bonesâ of creating a Beam pipeline. This lack of parameterization makes this particular pipeline less portable across different runners than standard Beam pipelines. In later examples, we will parameterize the pipelineâs input and output sources and show other best practices. -To run this example, follow the instructions in the Quickstart java or python. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java";>MinimalWordCount. +To run this example, follow the instructions in the Quickstart for Java or Python. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java";>MinimalWordCount. Key Concepts: @@ -383,7 +383,7 @@ Figure 1: The pipeline data flow. This section assumes that you have a good understanding of the basic concepts in building a pipeline. If you feel that you arenât at that point yet, read the above section, Minimal WordCount. -To run this example, follow the instructions in the Quickstart java or python. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examp
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/12121557 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/12121557 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/12121557 Branch: refs/heads/asf-site Commit: 12121557b155ec7d0aea865afa5c6f2801217d56 Parents: 9fb214f Author: Davor Bonaci Authored: Fri May 12 16:17:49 2017 -0700 Committer: Davor Bonaci Committed: Fri May 12 16:17:49 2017 -0700 -- .../documentation/programming-guide/index.html | 2 +- .../sdks/python-custom-io/index.html| 2 +- .../get-started/wordcount-example/index.html| 80 +++- 3 files changed, 47 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/12121557/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 1e6e9a7..3b3cb14 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1939,7 +1939,7 @@ Subsequent transforms, however, are applied to the result of the unix_timestamp = extract_timestamp_from_log_entry(element) # Wrap and emit the current entry and new timestamp in a # TimestampedValue. -yield beam.TimestampedValue(element, unix_timestamp) +yield beam.window.TimestampedValue(element, unix_timestamp) timestamped_items = items | 'timestamp' >> beam.ParDo(AddTimestampDoFn()) http://git-wip-us.apache.org/repos/asf/beam-site/blob/12121557/content/documentation/sdks/python-custom-io/index.html -- diff --git a/content/documentation/sdks/python-custom-io/index.html b/content/documentation/sdks/python-custom-io/index.html index 629ef0f..fb2646f 100644 --- a/content/documentation/sdks/python-custom-io/index.html +++ b/content/documentation/sdks/python-custom-io/index.html @@ -464,7 +464,7 @@ numbers = p | 'ProduceNumbers' >> beam.io.Read(CountingSource(count)) FileSink -If your data source uses files, you can derive your Sink and Writer classes from the FileSink and FileSinkWriter classes, which can be found in the https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/fileio.py";>fileio.py module. These classes implement code common sinks that interact with files, including: +If your data source uses files, you can derive your Sink and Writer classes from the FileBasedSink and FileBasedSinkWriter classes, which can be found in the https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py";>filebasedsink.py module. These classes implement code common sinks that interact with files, including: Setting file headers and footers http://git-wip-us.apache.org/repos/asf/beam-site/blob/12121557/content/get-started/wordcount-example/index.html -- diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html index 5cc32f3..333cfb9 100644 --- a/content/get-started/wordcount-example/index.html +++ b/content/get-started/wordcount-example/index.html @@ -172,6 +172,7 @@ Dataflow Runner Apache Spark Runner Apache Flink Runner + Apache Apex Runner Testing your Pipeline via PAssert @@ -228,7 +229,7 @@ Creating the Pipeline -The first step in creating a Beam pipeline is to create a PipelineOptions object. This object lets us set various options for our pipeline, such as the pipeline runner that will execute our pipeline and any runner-specific configuration required by the chosen runner. In this example we set these options programmatically, but more often command-line arguments are used to set PipelineOptions. +The first step in creating a Beam pipeline is to create a PipelineOptions object. This object lets us set various options for our pipeline, such as the pipeline runner that will execute our pipeline and any runner-specific configuration required by the chosen runner. In this example we set these options programmatically, but more often command-line arguments are used to set PipelineOptions. You can specify a runner for executing your pipeline, such as the DataflowRunner or SparkRunner. If you omit specifying a runner, as in this example, your pipeline will be executed locally using the DirectRunner. In the next sections, we will specify the pipelineâs runner. @@ -273,7 +274,7 @@ The Minimal WordCount pipeline contains several transforms to read data into the pipeline, manipulate or otherwise transf
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0a96efa2 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0a96efa2 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0a96efa2 Branch: refs/heads/asf-site Commit: 0a96efa27864751af87488af44109fc946156c3f Parents: 1e58519 Author: Davor Bonaci Authored: Thu May 11 14:20:25 2017 -0700 Committer: Davor Bonaci Committed: Thu May 11 14:20:25 2017 -0700 -- content/contribute/release-guide/index.html | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/0a96efa2/content/contribute/release-guide/index.html -- diff --git a/content/contribute/release-guide/index.html b/content/contribute/release-guide/index.html index 2ef2032..105bd87 100644 --- a/content/contribute/release-guide/index.html +++ b/content/contribute/release-guide/index.html @@ -555,16 +555,15 @@ TAG="v${VERSION}-RC${RC_NUM}" cp ${BEAM_ROOT}/target/apache-beam-${VERSION}-source-release.zip . cp ${BEAM_ROOT}/target/apache-beam-${VERSION}-source-release.zip.asc . - cp ${BEAM_ROOT}/sdks/python/target/apache-beam-${VERSION}.zip - apache-beam-${VERSION}-python.zip + cp ${BEAM_ROOT}/sdks/python/target/apache-beam-${VERSION}.zip apache-beam-${VERSION}-python.zip Create hashes for source files and sign the python source file file - sha1sum apache-beam-${VERSION}.tar.gz > apache-beam-${VERSION}.tar.gz.sha1 - md5sum apache-beam-${VERSION}.tar.gz > apache-beam-${VERSION}.tar.gz.md5 + sha1sum apache-beam-${VERSION}-source-release.zip > apache-beam-${VERSION}-source-release.zip.sha1 + md5sum apache-beam-${VERSION}-source-release.zip > apache-beam-${VERSION}-source-release.zip.md5 gpg --armor --detach-sig apache-beam-${VERSION}-python.zip sha1sum apache-beam-${VERSION}-python.zip > apache-beam-${VERSION}-python.zip.sha1 md5sum apache-beam-${VERSION}-python.zip > apache-beam-${VERSION}-python.zip.md5
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/d5140a40 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/d5140a40 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/d5140a40 Branch: refs/heads/asf-site Commit: d5140a4096a9958d9d674bf80eb9f24f0a59006b Parents: 212b9c7 Author: Ahmet Altay Authored: Wed May 10 22:33:17 2017 -0700 Committer: Ahmet Altay Committed: Wed May 10 22:33:17 2017 -0700 -- content/documentation/io/built-in/hadoop/index.html | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/d5140a40/content/documentation/io/built-in/hadoop/index.html -- diff --git a/content/documentation/io/built-in/hadoop/index.html b/content/documentation/io/built-in/hadoop/index.html index 7fda51a..c5f051a 100644 --- a/content/documentation/io/built-in/hadoop/index.html +++ b/content/documentation/io/built-in/hadoop/index.html @@ -219,7 +219,7 @@ Read data with configuration and key translation -For example scenario: Beam Coder is not available for key class hence key translation is required. +For example, a Beam Coder is not available for Key class, so key translation is required. p.apply("read", HadoopInputFormatIO.read() @@ -234,7 +234,7 @@ Read data with configuration and value translation -For example scenario: Beam Coder is not available for value class hence value translation is required. +For example, a Beam Coder is not available for Value class, so value translation is required. p.apply("read", HadoopInputFormatIO. read() @@ -249,7 +249,7 @@ Read data with configuration, value translation and key translation -For example scenario: Beam Coders are not available for both Key class and Value class of InputFormat hence key and value translation is required. +For example, Beam Coders are not available for both Key class and Value classes of InputFormat, so key and value translation are required. p.apply("read", HadoopInputFormatIO. read()
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/d2d65698 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/d2d65698 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/d2d65698 Branch: refs/heads/asf-site Commit: d2d65698b846f9e94a4c5d6122ac705869262ca4 Parents: f2fea83 Author: Ahmet Altay Authored: Wed May 10 13:37:01 2017 -0700 Committer: Ahmet Altay Committed: Wed May 10 13:37:01 2017 -0700 -- content/get-started/quickstart-py/index.html | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/d2d65698/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index ac498d9..9c9ff2e 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -157,6 +157,7 @@ Set up your environment + Check your Python version Install pip Install Python virtual environment @@ -175,9 +176,17 @@ Set up your environment +Check your Python version + +The Beam SDK for Python requires Python version 2.7.x. Check that you have version 2.7.x by running: + +python --version + + + Install pip -Install https://pip.pypa.io/en/stable/installing/";>pip, Pythonâs package manager. Check that you have version 7.0.0 or newer, by running: +Install https://pip.pypa.io/en/stable/installing/";>pip, Pythonâs package manager. Check that you have version 7.0.0 or newer by running: pip --version
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/aa01f116 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/aa01f116 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/aa01f116 Branch: refs/heads/asf-site Commit: aa01f1161bbbfc0060ab76ebad6bfcee7826e963 Parents: 8ed3f24 Author: Ahmet Altay Authored: Tue May 9 21:24:35 2017 -0700 Committer: Ahmet Altay Committed: Tue May 9 21:24:35 2017 -0700 -- content/documentation/runners/flink/index.html | 2 +- .../sdks/java-extensions/index.html | 237 +++ content/documentation/sdks/java/index.html | 9 + 3 files changed, 247 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/aa01f116/content/documentation/runners/flink/index.html -- diff --git a/content/documentation/runners/flink/index.html b/content/documentation/runners/flink/index.html index 64ae583..deb04c8 100644 --- a/content/documentation/runners/flink/index.html +++ b/content/documentation/runners/flink/index.html @@ -287,7 +287,7 @@ -See the reference documentation for the FlinkPipelineOptionshttps://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py";>PipelineOptions interface (and its subinterfaces) for the complete list of pipeline configuration options. +See the reference documentation for the FlinkPipelineOptionshttps://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py";>PipelineOptions interface (and its subinterfaces) for the complete list of pipeline configuration options. Additional information and caveats http://git-wip-us.apache.org/repos/asf/beam-site/blob/aa01f116/content/documentation/sdks/java-extensions/index.html -- diff --git a/content/documentation/sdks/java-extensions/index.html b/content/documentation/sdks/java-extensions/index.html new file mode 100644 index 000..ee93238 --- /dev/null +++ b/content/documentation/sdks/java-extensions/index.html @@ -0,0 +1,237 @@ + + + + + + + + + Beam Java SDK Extensions + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/documentation/sdks/java-extensions/"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + Pipeline I/O + + SDKs + Java SDK + Java SDK API Reference + +Python SDK +Python SDK API Reference + + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical Refer
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/75c59b81 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/75c59b81 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/75c59b81 Branch: refs/heads/asf-site Commit: 75c59b817f85f9fdc91aa8cee3261968257a6c53 Parents: 69ee8d5 Author: Ahmet Altay Authored: Tue May 9 14:24:56 2017 -0700 Committer: Ahmet Altay Committed: Tue May 9 14:24:56 2017 -0700 -- .../documentation/io/built-in/hadoop/index.html | 373 +++ content/documentation/io/built-in/index.html| 2 +- 2 files changed, 374 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/75c59b81/content/documentation/io/built-in/hadoop/index.html -- diff --git a/content/documentation/io/built-in/hadoop/index.html b/content/documentation/io/built-in/hadoop/index.html new file mode 100644 index 000..7fda51a --- /dev/null +++ b/content/documentation/io/built-in/hadoop/index.html @@ -0,0 +1,373 @@ + + + + + + + + + Apache Hadoop InputFormat IO + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/documentation/io/built-in/hadoop/"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + Pipeline I/O + + SDKs + Java SDK + Java SDK API Reference + +Python SDK +Python SDK API Reference + + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + +Pipeline I/O Table of Contents + +Hadoop InputFormat IO + +A HadoopInputFormatIO is a transform for reading data from any source that imple
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/a9cd2275 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/a9cd2275 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/a9cd2275 Branch: refs/heads/asf-site Commit: a9cd2275d3d0bb2b1362dab839cf7437db683ba9 Parents: 8f84d6a Author: Davor Bonaci Authored: Tue May 9 10:17:54 2017 -0700 Committer: Davor Bonaci Committed: Tue May 9 10:17:54 2017 -0700 -- content/documentation/io/built-in/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/a9cd2275/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 8a664f5..cf262d3 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -178,7 +178,7 @@ https://github.com/apache/beam/tree/master/sdks/java/io/jms";>JMS https://github.com/apache/beam/tree/master/sdks/java/io/kafka";>Apache Kafka https://github.com/apache/beam/tree/master/sdks/java/io/kinesis";>Amazon Kinesis -https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub";>Google Cloud PubSub +https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io";>Google Cloud PubSub https://github.com/apache/beam/tree/master/sdks/java/io/hadoop";>Apache Hadoop InputFormat
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/35d4d4f3 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/35d4d4f3 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/35d4d4f3 Branch: refs/heads/asf-site Commit: 35d4d4f35af03418d1b9609aebeed005f0cfd729 Parents: d56bbc8 Author: Ahmet Altay Authored: Mon May 8 16:26:38 2017 -0700 Committer: Ahmet Altay Committed: Mon May 8 16:26:38 2017 -0700 -- content/documentation/programming-guide/index.html | 4 ++-- content/get-started/quickstart-py/index.html | 2 +- content/get-started/wordcount-example/index.html | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/35d4d4f3/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index bc71346..1e6e9a7 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -244,7 +244,7 @@ import apache_beam as beam -from apache_beam.utils.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import PipelineOptions p = beam.Pipeline(options=PipelineOptions()) @@ -268,7 +268,7 @@ import apache_beam as beam -from apache_beam.utils.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import PipelineOptions p = beam.Pipeline(options=PipelineOptions()) http://git-wip-us.apache.org/repos/asf/beam-site/blob/35d4d4f3/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index 7c8f2d3..ac498d9 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -268,7 +268,7 @@ environmentâs directories. For example, to run wordcount.py, run: -python -m apache_beam.examples.wordcount --input README.md --output counts +python -m apache_beam.examples.wordcount --input MANIFEST.in --output counts http://git-wip-us.apache.org/repos/asf/beam-site/blob/35d4d4f3/content/get-started/wordcount-example/index.html -- diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html index 030c3e2..5cc32f3 100644 --- a/content/get-started/wordcount-example/index.html +++ b/content/get-started/wordcount-example/index.html @@ -210,7 +210,7 @@ Minimal WordCount demonstrates a simple pipeline that can read from a text file, apply transforms to tokenize and count the words, and write the data to an output text file. This example hard-codes the locations for its input and output files and doesnât perform any error checking; it is intended to only show you the âbare bonesâ of creating a Beam pipeline. This lack of parameterization makes this particular pipeline less portable across different runners than standard Beam pipelines. In later examples, we will parameterize the pipelineâs input and output sources and show other best practices. -To run this example, follow the instructions in the https://github.com/apache/beam/blob/master/examples/java/README.md#building-and-running";>Beam Examples README. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java";>MinimalWordCount. +To run this example, follow the instructions in the Quickstart java or python. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java";>MinimalWordCount. Key Concepts: @@ -380,7 +380,7 @@ Figure 1: The pipeline data flow. This section assumes that you have a good understanding of the basic concepts in building a pipeline. If you feel that you arenât at that point yet, read the above section, Minimal WordCount. -To run this example, follow the instructions in the https://github.com/apache/beam/blob/master/examples/java/README.md#building-and-running";>Beam Examples README. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java";>WordCount. +To run this example, follow the instructions in the Quickstart java or python. To view the full code, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java";>WordCount. New Concepts: @@ -511,7 +511,7 @@
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f45be5f7 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f45be5f7 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f45be5f7 Branch: refs/heads/asf-site Commit: f45be5f7ec4d60202e5b421929083b507767c941 Parents: 5e18a82 Author: Davor Bonaci Authored: Thu May 4 15:35:39 2017 -0700 Committer: Davor Bonaci Committed: Thu May 4 15:35:39 2017 -0700 -- content/get-started/downloads/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f45be5f7/content/get-started/downloads/index.html -- diff --git a/content/get-started/downloads/index.html b/content/get-started/downloads/index.html index 9aedfeb..394142a 100644 --- a/content/get-started/downloads/index.html +++ b/content/get-started/downloads/index.html @@ -151,7 +151,7 @@ -Apache Beam Downloads +Apache Beam⢠Downloads The easiest way to use Apache Beam is via one of the released versions in a central repository. Java SDK is available on https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22";>Maven Central Repository,
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/722bdfb7 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/722bdfb7 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/722bdfb7 Branch: refs/heads/asf-site Commit: 722bdfb7820a87c318f48fb015b3e7341930900c Parents: 8ea4481 Author: Davor Bonaci Authored: Thu May 4 00:38:28 2017 -0700 Committer: Davor Bonaci Committed: Thu May 4 00:38:28 2017 -0700 -- .../pipelines/create-your-pipeline/index.html | 89 +-- .../documentation/programming-guide/index.html | 160 +-- 2 files changed, 116 insertions(+), 133 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/722bdfb7/content/documentation/pipelines/create-your-pipeline/index.html -- diff --git a/content/documentation/pipelines/create-your-pipeline/index.html b/content/documentation/pipelines/create-your-pipeline/index.html index 8911488..6cfe938 100644 --- a/content/documentation/pipelines/create-your-pipeline/index.html +++ b/content/documentation/pipelines/create-your-pipeline/index.html @@ -154,14 +154,7 @@ Create Your Pipeline - Creating Your Pipeline Object - Configuring Pipeline Options - Setting PipelineOptions from Command-Line Arguments - Creating Custom Options - - - - + Creating Your Pipeline Object Reading Data Into Your Pipeline Applying Transforms to Process Pipeline Data Writing or Outputting Your Final Pipeline Data @@ -185,7 +178,7 @@ In the Beam SDKs, each pipeline is represented by an explicit object of type Pipeline. Each Pipeline object is an independent entity that encapsulates both the data the pipeline operates over and the transforms that get applied to that data. -To create a pipeline, declare a Pipeline object, and pass it some configuration options, which are explained in a section below. You pass the configuration options by creating an object of type PipelineOptions, which you can build by using the static method PipelineOptionsFactory.create(). +To create a pipeline, declare a Pipeline object, and pass it some configuration options. // Start by defining the options for the pipeline. PipelineOptions options = PipelineOptionsFactory.create(); @@ -195,75 +188,6 @@ -Configuring Pipeline Options - -Use the pipeline options to configure different aspects of your pipeline, such as the pipeline runner that will execute your pipeline and any runner-specific configuration required by the chosen runner. Your pipeline options will potentially include information such as your project ID or a location for storing files. - -When you run the pipeline on a runner of your choice, a copy of the PipelineOptions will be available to your code. For example, you can read PipelineOptions from a DoFnâs Context. - -Setting PipelineOptions from Command-Line Arguments - -While you can configure your pipeline by creating a PipelineOptions object and setting the fields directly, the Beam SDKs include a command-line parser that you can use to set fields in PipelineOptions using command-line arguments. - -To read options from the command-line, construct your PipelineOptions object as demonstrated in the following example code: - -MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create(); - - - -This interprets command-line arguments that follow the format: - ---
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b4f5243c Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b4f5243c Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b4f5243c Branch: refs/heads/asf-site Commit: b4f5243c2329b6c1f130f20eee68a440f05ae722 Parents: 7f914af Author: Davor Bonaci Authored: Wed May 3 14:54:30 2017 -0700 Committer: Davor Bonaci Committed: Wed May 3 14:54:30 2017 -0700 -- content/contribute/work-in-progress/index.html | 6 ++ 1 file changed, 6 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/b4f5243c/content/contribute/work-in-progress/index.html -- diff --git a/content/contribute/work-in-progress/index.html b/content/contribute/work-in-progress/index.html index 07abdf1..215f9db 100644 --- a/content/contribute/work-in-progress/index.html +++ b/content/contribute/work-in-progress/index.html @@ -192,6 +192,12 @@ - https://lists.apache.org/thread.html/e38ac4e4914a6cb1b865b1f32a6ca06c2be28ea4aa0f6b18393de66f@%3Cdev.beam.apache.org%3E";>thread + + Beam SQL DSL + https://github.com/apache/beam/tree/DSL_SQL";>DSL_SQL + https://issues.apache.org/jira/browse/BEAM/component/12332480";>dsl-sql + https://issues.apache.org/jira/browse/BEAM-301";>BEAM-301 +
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b76eb56b Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b76eb56b Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b76eb56b Branch: refs/heads/asf-site Commit: b76eb56b6986f99aa73ce38f8e719803ab18fb4e Parents: bf62dc9 Author: Ahmet Altay Authored: Tue May 2 18:26:45 2017 -0700 Committer: Ahmet Altay Committed: Tue May 2 18:26:45 2017 -0700 -- .../mobile-gaming-example/index.html| 161 ++- 1 file changed, 157 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/b76eb56b/content/get-started/mobile-gaming-example/index.html -- diff --git a/content/get-started/mobile-gaming-example/index.html b/content/get-started/mobile-gaming-example/index.html index 054024c..7a3fbb4 100644 --- a/content/get-started/mobile-gaming-example/index.html +++ b/content/get-started/mobile-gaming-example/index.html @@ -188,12 +188,24 @@ + + Adapt for: + +Java SDK +Python SDK + + + This section provides a walkthrough of a series of example Apache Beam pipelines that demonstrate more complex functionality than the basic WordCount examples. The pipelines in this section process data from a hypothetical game that users play on their mobile phones. The pipelines demonstrate processing at increasing levels of complexity; the first pipeline, for example, shows how to run a batch analysis job to obtain relatively simple score data, while the later pipelines use Beamâs windowing and triggers features to provide low-latency data analysis and more complex intelligence about userâs play patterns. - + Note: These examples assume some familiarity with the Beam programming model. If you havenât already, we recommend familiarizing yourself with the programming model documentation and running a basic example pipeline before continuing. Note also that these examples use the Java 8 lambda syntax, and thus require Java 8. However, you can create pipelines with equivalent functionality using Java 7. + + Note: These examples assume some familiarity with the Beam programming model. If you havenât already, we recommend familiarizing yourself with the programming model documentation and running a basic example pipeline before continuing. + + Every time a user plays an instance of our hypothetical mobile game, they generate a data event. Each data event consists of the following information: @@ -224,10 +236,14 @@ The UserScore pipeline is the simplest example for processing mobile game data. UserScore determines the total score per user over a finite data set (for example, one dayâs worth of scores stored on the game server). Pipelines like UserScore are best run periodically after all relevant data has been gathered. For example, UserScore could run as a nightly job over data gathered during that day. - + Note: See https://github.com/apache/beam/blob/master/examples/java8/src/main/java/org/apache/beam/examples/complete/game/UserScore.java";>UserScore on GitHub for the complete example pipeline program. + + Note: See https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/user_score.py";>UserScore on GitHub for the complete example pipeline program. + + What Does UserScore Do? In a dayâs worth of scoring data, each user ID may have multiple records (if the user plays more than one instance of the game during the analysis window), each with their own score value and timestamp. If we want to determine the total score over all the instances a user plays during the day, our pipeline will need to group all the records together per individual user. @@ -283,6 +299,28 @@ +class ExtractAndSumScore(beam.PTransform): + """A transform to extract key/score information and sum the scores. + The constructor argument `field` determines whether 'team' or 'user' info is + extracted. + """ + def __init__(self, field): +super(ExtractAndSumScore, self).__init__() +self.field = field + + def expand(self, pcoll): +return (pcoll +| beam.Map(lambda info: (info[self.field], info['score'])) +| beam.CombinePerKey(sum_ints)) + +def configure_bigquery_write(): + return [ + ('user', 'STRING', lambda e: e[0]), + ('total_score', 'INTEGER', lambda e: e[1]), + ] + + + ExtractAndSumScore is written to be more general, in that you can pass in the field by which you want to group the data (in the case of our game, by unique user or unique team). This means we can re-use ExtractAndSumScore in other pipelines that group score data by team, for example. Hereâs
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5a7282fe Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5a7282fe Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5a7282fe Branch: refs/heads/asf-site Commit: 5a7282fe42e9a396673aacfac8f466284dffa1ea Parents: 6289505 Author: Ahmet Altay Authored: Tue May 2 17:59:11 2017 -0700 Committer: Ahmet Altay Committed: Tue May 2 17:59:11 2017 -0700 -- content/documentation/io/built-in/index.html| 2 +- .../documentation/programming-guide/index.html | 46 +--- .../sdks/python-type-safety/index.html | 10 ++--- 3 files changed, 45 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/5a7282fe/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 639a877..cf262d3 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -170,7 +170,7 @@ Java https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java";>AvroIO -https://github.com/apache/beam/tree/master/sdks/java/io/hdfs";>Apache Hadoop HDFS +https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system";>Apache Hadoop File System https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java";>TextIO https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/";>XML http://git-wip-us.apache.org/repos/asf/beam-site/blob/5a7282fe/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index b57f89c..5ad4bea 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1200,18 +1200,18 @@ guest, [[], [order4]] yield element else: # Emit this word's long length to the 'above_cutoff_lengths' output. - yield pvalue.OutputValue( + yield pvalue.TaggedOutput( 'above_cutoff_lengths', len(element)) if element.startswith(marker): # Emit this word to a different output with the 'marked strings' tag. - yield pvalue.OutputValue('marked strings', element) + yield pvalue.TaggedOutput('marked strings', element) # Producing multiple outputs is also available in Map and FlatMap. # Here is an example that uses FlatMap and shows that the tags do not need to be specified ahead of time. def even_odd(x): - yield pvalue.OutputValue('odd' if x % 2 else 'even', x) + yield pvalue.TaggedOutput('odd' if x % 2 else 'even', x) if x % 10 == 0: yield x @@ -1267,7 +1267,17 @@ guest, [[], [order4]] - Python code snippet coming soon (BEAM-1926) +# The CountWords Composite Transform inside the WordCount pipeline. +class CountWords(beam.PTransform): + + def expand(self, pcoll): +return (pcoll +# Convert lines of text into individual words. +| 'ExtractWords' >> beam.ParDo(ExtractWordsFn()) +# Count the number of times each word occurs. +| beam.combiners.Count.PerElement() +# Format each word and count into a printable string. +| 'FormatCounts' >> beam.ParDo(FormatCountsFn())) @@ -1286,7 +1296,10 @@ guest, [[], [order4]] - Python code snippet coming soon (BEAM-1926) +class ComputeWordLengths(beam.PTransform): + def expand(self, pcoll): +# transform logic goes here +return pcoll | beam.Map(lambda x: len(x)) @@ -1307,7 +1320,10 @@ guest, [[], [order4]] - Python code snippet coming soon (BEAM-1926) +class ComputeWordLengths(beam.PTransform): + def expand(self, pcoll): +# transform logic goes here +return pcoll | beam.Map(lambda x: len(x)) @@ -1920,7 +1936,7 @@ Subsequent transforms, however, are applied to the result of the The Beam SDK for Python does not support triggers. + # The Beam SDK for Python does not support triggers. @@ -1956,6 +1972,10 @@ Subsequent transforms, however, are applied to the result of the # The Beam SDK for Python does not support triggers. + + + This code sample sets a time-based trigger for a PCollection, which emits results one minute after the first element in that window has been processed. The last line in the code sample, .discardingFiredPanes(), is the windowâs accumulation mode. Window Accumulation Modes @@ -2009,
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/1b931625 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/1b931625 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/1b931625 Branch: refs/heads/asf-site Commit: 1b931625fc692dd81ead8ab4ce8e315d96d5dea6 Parents: faa12ff Author: Ahmet Altay Authored: Fri Apr 28 17:39:13 2017 -0700 Committer: Ahmet Altay Committed: Fri Apr 28 17:39:13 2017 -0700 -- content/documentation/programming-guide/index.html | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/1b931625/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 439d48b..b57f89c 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -774,6 +774,19 @@ guest, [[], [order4]] pc = ... +class AverageFn(beam.CombineFn): + def create_accumulator(self): +return (0.0, 0) + + def add_input(self, (sum, count), input): +return sum + input, count + 1 + + def merge_accumulators(self, accumulators): +sums, counts = zip(*accumulators) +return sum(sums), sum(counts) + + def extract_output(self, (sum, count)): +return sum / count if count else float('NaN') @@ -794,6 +807,7 @@ guest, [[], [order4]] # sum combines the elements in the input PCollection. # The resulting PCollection, called result, contains one value: the sum of all the elements in the input PCollection. pc = ... +average = pc | beam.CombineGlobally(AverageFn()) @@ -927,6 +941,7 @@ guest, [[], [order4]] # Provide an int value with the desired number of result partitions, and a partitioning function (partition_fn in this example). # Returns a tuple of PCollection objects containing each of the resulting partitions as individual PCollection objects. +students = ... def partition_fn(student, num_partitions): return int(get_percentile(student) * num_partitions / 100) @@ -1019,7 +1034,7 @@ guest, [[], [order4]] # Optional, positional, and keyword arguments are all supported. Deferred arguments are unwrapped into their actual values. # For example, using pvalue.AsIter(pcoll) at pipeline construction time results in an iterable of the actual elements of pcoll being passed into each process invocation. # In this example, side inputs are passed to a FlatMap transform as extra arguments and consumed by filter_using_length. - +words = ... # Callable takes additional arguments. def filter_using_length(word, lower_bound, upper_bound=float('inf')): if lower_bound <= len(word) <= upper_bound:
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/a163bcf4 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/a163bcf4 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/a163bcf4 Branch: refs/heads/asf-site Commit: a163bcf42026b00146a2e21df034d03f1efb2acc Parents: 4348e5f Author: Davor Bonaci Authored: Thu Apr 27 16:20:30 2017 -0700 Committer: Davor Bonaci Committed: Thu Apr 27 16:20:30 2017 -0700 -- .../documentation/programming-guide/index.html | 279 +++ content/images/trigger-accumulation.png | Bin 0 -> 11144 bytes 2 files changed, 221 insertions(+), 58 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/a163bcf4/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 38f7bfc..439d48b 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -156,7 +156,7 @@ The Beam Programming Guide is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. It provides guidance for using the Beam SDK classes to build and test your pipeline. It is not intended as an exhaustive reference, but as a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines. - Adapt for: + Adapt for: Java SDK Python SDK @@ -215,7 +215,7 @@ PCollection: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. -Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, perfroms a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. +Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. I/O Source and Sink: Beam provides Source and Sink APIs to represent reading and writing data, respectively. Source encapsulates the code necessary to read data into your Beam pipeline from some external source, such as cloud file storage or a subscription to a streaming data source. Sink likewise encapsulates the code necessary to write the elements of a PCollection to an external data sink. @@ -258,6 +258,7 @@ # Will parse the arguments passed into the application and construct a PipelineOptions object. # Note that --help will print registered options. +import apache_beam as beam from apache_beam.utils.pipeline_options import PipelineOptions p = beam.Pipeline(options=PipelineOptions()) @@ -285,7 +286,7 @@ public static void main(String[] args) { // Create the pipeline. -PipelineOptions options = +PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create(); Pipeline p = Pipeline.create(options); @@ -321,7 +322,7 @@ "Or to take arms against a sea of troubles, "); // Create the pipeline. -PipelineOptions options = +PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create(); Pipeline p = Pipeline.create(options); @@ -333,15 +334,12 @@ p = beam.Pipeline(options=pipeline_options) -(p - | beam.Create([ - 'To be, or not to be: that is the question: ', - 'Whether \'tis nobler in the mind to suffer ', - 'The slings and arrows of outrageous fortune, ', - 'Or to take arms against a sea of troubles, ']) - | beam.io.WriteToText(my_options.output)) - -result = p.run() +lines = (p + | beam.Create([ + 'To be, or not to be: that is the question: ', + 'Whether \'tis nobler in the mind to suffer ', + 'The slings and arrows of outrageous fortune, ', + 'Or to tak
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/daa87382 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/daa87382 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/daa87382 Branch: refs/heads/asf-site Commit: daa873826c26aeaa839dd67c46d9ba6d792886c2 Parents: 676f675 Author: Davor Bonaci Authored: Tue Apr 25 18:29:14 2017 -0700 Committer: Davor Bonaci Committed: Tue Apr 25 18:29:14 2017 -0700 -- content/contribute/work-in-progress/index.html | 6 -- content/documentation/sdks/python-custom-io/index.html | 2 ++ 2 files changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/daa87382/content/contribute/work-in-progress/index.html -- diff --git a/content/contribute/work-in-progress/index.html b/content/contribute/work-in-progress/index.html index f2eeab1..90eff24 100644 --- a/content/contribute/work-in-progress/index.html +++ b/content/contribute/work-in-progress/index.html @@ -192,12 +192,6 @@ - https://lists.apache.org/thread.html/e38ac4e4914a6cb1b865b1f32a6ca06c2be28ea4aa0f6b18393de66f@%3Cdev.beam.apache.org%3E";>thread - - Beam SQL DSL - https://github.com/apache/beam/tree/DSL_SQL";>DSL_SQL - https://issues.apache.org/jira/browse/BEAM/component/12332480";>dsl-sql - https://issues.apache.org/jira/browse/BEAM-301";>BEAM-301 - http://git-wip-us.apache.org/repos/asf/beam-site/blob/daa87382/content/documentation/sdks/python-custom-io/index.html -- diff --git a/content/documentation/sdks/python-custom-io/index.html b/content/documentation/sdks/python-custom-io/index.html index d078e33..629ef0f 100644 --- a/content/documentation/sdks/python-custom-io/index.html +++ b/content/documentation/sdks/python-custom-io/index.html @@ -342,6 +342,7 @@ class CountingSource(iobase.BoundedSource): def __init__(self, count): +self.records_read = Metrics.counter(self.__class__, 'recordsRead') self._count = count def estimate_size(self): @@ -359,6 +360,7 @@ for i in range(self._count): if not range_tracker.try_claim(i): return + self.records_read.inc() yield i def split(self, desired_bundle_size, start_position=None,
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5b11965c Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5b11965c Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5b11965c Branch: refs/heads/asf-site Commit: 5b11965c209c3d5fe08a0b93776d2b749ef63e82 Parents: e98da81 Author: Davor Bonaci Authored: Fri Apr 21 11:13:41 2017 -0700 Committer: Davor Bonaci Committed: Fri Apr 21 11:13:41 2017 -0700 -- .../documentation/programming-guide/index.html | 100 ++- 1 file changed, 95 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/5b11965c/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index edb184b..38f7bfc 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -398,7 +398,7 @@ -Because Beam uses a generic apply method for PCollection, you can both chain transforms sequentially and also apply transforms that contain other transforms nested within (called composite transforms in the Beam SDKs). +Because Beam uses a generic apply method for PCollection, you can both chain transforms sequentially and also apply transforms that contain other transforms nested within (called composite transforms in the Beam SDKs). How you apply your pipelineâs transforms determines the structure of your pipeline. The best way to think of your pipeline is as a directed acyclic graph, where the nodes are PCollections and the edges are transforms. For example, you can chain transforms to create a sequential pipeline, like this one: @@ -434,7 +434,7 @@ [Branching Graph Graphic] -You can also build your own composite transforms that nest multiple sub-steps inside a single, larger transform. Composite transforms are particularly useful for building a reusable sequence of simple steps that get used in a lot of different places. +You can also build your own composite transforms that nest multiple sub-steps inside a single, larger transform. Composite transforms are particularly useful for building a reusable sequence of simple steps that get used in a lot of different places. Transforms in the Beam SDK @@ -1242,9 +1242,99 @@ guest, [[], [order4]] Composite Transforms - - Note: This section is in progress (https://issues.apache.org/jira/browse/BEAM-1452";>BEAM-1452). - +Transforms can have a nested structure, where a complex transform performs multiple simpler transforms (such as more than one ParDo, Combine, GroupByKey, or even other composite transforms). These transforms are called composite transforms. Nesting multiple transforms inside a single composite transform can make your code more modular and easier to understand. + +The Beam SDK comes packed with many useful composite transforms. See the API reference pages for a list of transforms: + + Pre-written Beam transforms for Java + Pre-written Beam transforms for Python + + +An example of a composite transform + +The CountWords transform in the WordCount example program is an example of a composite transform. CountWords is a PTransform subclass that consists of multiple nested transforms. + +In its expand method, the CountWords transform applies the following transform operations: + + + It applies a ParDo on the input PCollection of text lines, producing an output PCollection of individual words. + It applies the Beam SDK library transform Count on the PCollection of words, producing a PCollection of key/value pairs. Each key represents a word in the text, and each value represents the number of times that word appeared in the original data. + + +Note that this is also an example of nested composite transforms, as Count is, by itself, a composite transform. + +Your composite transformâs parameters and return value must match the initial input type and final return type for the entire transform, even if the transformâs intermediate data changes type multiple times. + + public static class CountWords extends PTransform, + PCollection >> { +@Override +public PCollection > expand(PCollection lines) { + + // Convert lines of text into individual words. + PCollection words = lines.apply( + ParDo.of(new ExtractWordsFn())); + + // Count the number of times each word occurs. + PCollection > wordCounts = + words.apply(Count. perElement()); + + return wordCounts; +} + } + + + + Python code snippet coming so
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/d557ce3f Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/d557ce3f Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/d557ce3f Branch: refs/heads/asf-site Commit: d557ce3fc592e9e17a61c47416dbc3ca765d85bc Parents: 84eef87 Author: Davor Bonaci Authored: Wed Apr 19 12:08:06 2017 -0700 Committer: Davor Bonaci Committed: Wed Apr 19 12:08:06 2017 -0700 -- content/beam/capability/2016/03/17/capability-matrix.html | 4 ++-- .../capability/2016/04/03/presentation-materials.html | 4 ++-- content/beam/release/2016/06/15/first-release.html| 4 ++-- .../update/2016/10/11/strata-hadoop-world-and-beam.html | 2 +- content/blog/2016/08/03/six-months.html | 2 +- content/blog/2016/10/20/test-stream.html | 2 +- content/blog/index.html | 2 +- content/contribute/contribution-guide/index.html | 10 +- content/contribute/index.html | 4 ++-- content/contribute/release-guide/index.html | 8 content/documentation/sdks/java/index.html| 2 +- content/documentation/sdks/python/index.html | 2 +- content/feed.xml | 10 +- content/get-started/mobile-gaming-example/index.html | 2 +- content/v2/index.html | 2 +- 15 files changed, 30 insertions(+), 30 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/d557ce3f/content/beam/capability/2016/03/17/capability-matrix.html -- diff --git a/content/beam/capability/2016/03/17/capability-matrix.html b/content/beam/capability/2016/03/17/capability-matrix.html index 82da83a..9f08727 100644 --- a/content/beam/capability/2016/03/17/capability-matrix.html +++ b/content/beam/capability/2016/03/17/capability-matrix.html @@ -167,9 +167,9 @@ While weâd love to have a world where all runners support the full suite of semantics included in the Beam Model (formerly referred to as the http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf";>Dataflow Model), practically speaking, there will always be certain features that some runners canât provide. For example, a Hadoop-based runner would be inherently batch-based and may be unable to (easily) implement support for unbounded collections. However, that doesnât prevent it from being extremely useful for a large set of uses. In other cases, the implementations provided by one runner may have slightly different semantics that those provided by another (e.g. even though the current suite of runners all support exactly-once delivery guarantees, an http://samza.apache.org/";>Apache Samza runner, which would be a welcome addition, would currently only support at-least-once). -To help clarify things, weâve been working on enumerating the key features of the Beam model in a capability matrix for all existing runners, categorized around the four key questions addressed by the model: What / Where / When / How (if youâre not familiar with those questions, you might want to read through http://oreilly.com/ideas/the-world-beyond-batch-streaming-102";>Streaming 102 for an overview). This table will be maintained over time as the model evolves, our understanding grows, and runners are created or features added. +To help clarify things, weâve been working on enumerating the key features of the Beam model in a capability matrix for all existing runners, categorized around the four key questions addressed by the model: What / Where / When / How (if youâre not familiar with those questions, you might want to read through http://oreilly.com/ideas/the-world-beyond-batch-streaming-102";>Streaming 102 for an overview). This table will be maintained over time as the model evolves, our understanding grows, and runners are created or features added. -Included below is a summary snapshot of our current understanding of the capabilities of the existing runners (see the live version for full details, descriptions, and Jira links); since integration is still under way, the system as whole isnât yet in a completely stable, usable state. But that should be changing in the near future, and weâll be updating loud and clear on this blog when the first supported Beam 1.0 release happens. +Included below is a summary snapshot of our current understanding of the capabilities of the existing runners (see the live version for full details, descriptions, and Jira links); since integration is still under way, the system as whole isnât yet in a completely stable, usable state. B
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/456b2310 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/456b2310 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/456b2310 Branch: refs/heads/asf-site Commit: 456b231016499ea76990cfb355bea771b3a3649a Parents: dce0b39 Author: Davor Bonaci Authored: Wed Apr 19 12:06:25 2017 -0700 Committer: Davor Bonaci Committed: Wed Apr 19 12:06:25 2017 -0700 -- content/documentation/io/built-in/index.html | 1 - 1 file changed, 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/456b2310/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index ca089db..639a877 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -183,7 +183,6 @@ https://github.com/apache/beam/tree/master/sdks/java/io/hadoop";>Apache Hadoop InputFormat https://github.com/apache/beam/tree/master/sdks/java/io/hbase";>Apache HBase -https://github.com/apache/beam/tree/master/sdks/java/io/elasticsearch";>Elasticsearch https://github.com/apache/beam/tree/master/sdks/java/io/mongodb";>MongoDB https://github.com/apache/beam/tree/master/sdks/java/io/jdbc";>JDBC https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery";>Google BigQuery
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/ab9c8578 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/ab9c8578 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/ab9c8578 Branch: refs/heads/asf-site Commit: ab9c8578359f4c9fadc11fae251298e5b57645af Parents: f936a3d Author: Aljoscha Krettek Authored: Wed Apr 19 11:31:14 2017 +0200 Committer: Aljoscha Krettek Committed: Wed Apr 19 11:31:14 2017 +0200 -- content/documentation/runners/flink/index.html | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/ab9c8578/content/documentation/runners/flink/index.html -- diff --git a/content/documentation/runners/flink/index.html b/content/documentation/runners/flink/index.html index b0b8fdc..64ae583 100644 --- a/content/documentation/runners/flink/index.html +++ b/content/documentation/runners/flink/index.html @@ -153,6 +153,14 @@ Using the Apache Flink Runner + + Adapt for: + +Java SDK +Python SDK + + + The Apache Flink Runner can be used to execute Beam pipelines using https://flink.apache.org";>Apache Flink. When using the Flink Runner you will create a jar file containing your job that can be executed on a regular Flink cluster. Itâs also possible to execute a Beam pipeline using Flinkâs local execution mode without setting up a cluster. This is helpful for development and debugging of your pipeline. The Flink Runner and Flink are suitable for large scale, continuous jobs, and provide: @@ -187,8 +195,7 @@ Specify your dependency -You must specify your dependency on the Flink Runner. - +When using Java, you must specify your dependency on the Flink Runner in your pom.xml.org.apache.beam beam-runners-flink_2.10 @@ -198,6 +205,8 @@ +This section is not applicable to the Beam SDK for Python. + Executing a pipeline on a Flink cluster For executing a pipeline on a Flink cluster you need to package your program along will all dependencies in a so-called fat jar. How you do this depends on your build system but if you follow along the Beam Quickstart this is the command that you have to run: @@ -278,7 +287,7 @@ -See the reference documentation for the FlinkPipelineOptionshttps://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py";>PipelineOptions interface (and its subinterfaces) for the complete list of pipeline configuration options. +See the reference documentation for the FlinkPipelineOptionshttps://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py";>PipelineOptions interface (and its subinterfaces) for the complete list of pipeline configuration options. Additional information and caveats
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c22cf487 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c22cf487 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c22cf487 Branch: refs/heads/asf-site Commit: c22cf48760c7f236bb3a449dd93d4a4adaae15b0 Parents: 97964d7 Author: Davor Bonaci Authored: Tue Apr 18 16:27:07 2017 -0700 Committer: Davor Bonaci Committed: Tue Apr 18 16:27:07 2017 -0700 -- content/contribute/release-guide/index.html | 78 1 file changed, 67 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/c22cf487/content/contribute/release-guide/index.html -- diff --git a/content/contribute/release-guide/index.html b/content/contribute/release-guide/index.html index 18ccaa8..2c34a00 100644 --- a/content/contribute/release-guide/index.html +++ b/content/contribute/release-guide/index.html @@ -165,6 +165,7 @@ GPG Key Access to Apache Nexus repository Website development setup + Register to PyPI Create a new version in JIRA @@ -173,6 +174,8 @@ Verify that a Release Build Works Update and Verify Javadoc Create a release branch + Update the Python SDK version + Update release specific configurations Checklist to proceed to the next step @@ -196,7 +199,8 @@ Finalize the release - Deploy artifacts to Maven Central Repository + Deploy artifacts to Maven Central Repository + Deploy Python artifacts to PyPI Deploy source release to dist.apache.org @@ -291,7 +295,7 @@ sub 2048R/BA4D50BE 2016-02-23 Here, the key ID is the 8-digit hex string in the pub line: 845E6689. -Now, add your Apache GPG key to the Beamâs KEYS file both in https://dist.apache.org/repos/dist/dev/beam/KEYS";>dev and https://dist.apache.org/repos/dist/release/beam/KEYS";>release repositories at dist.apache.org. Follow the instructions listed at the top of these files. +Now, add your Apache GPG key to the Beamâs KEYS file both in https://dist.apache.org/repos/dist/dev/beam/KEYS";>dev and https://dist.apache.org/repos/dist/release/beam/KEYS";>release repositories at dist.apache.org. Follow the instructions listed at the top of these files. (Note: Only PMC members have write access to the release repository. If you end up getting 403 errors ask on the mailing list for assistance.) Configure git to use this key when signing code by giving it your key ID, as follows: @@ -344,6 +348,10 @@ export GPG_AGENT_INFO Get ready for updating the Beam website by following the website development instructions. +Register to PyPI + +Release manager needs to have an account with PyPI. If you need one, https://pypi.python.org/pypi?%3Aaction=register_form";>register at PyPI. You also need to be a maintainer (or an owner) of the https://pypi.python.org/pypi/apache-beam";>apache-beam package in order to push a new release. Ask on the mailing list for assistance. + Create a new version in JIRA When contributors resolve an issue in JIRA, they are tagging it with a release that will contain their changes. With the release currently underway, new issues should be resolved against a subsequent future release. Therefore, you should create a release item for this subsequent release, as follows: @@ -384,7 +392,7 @@ export GPG_AGENT_INFO Verify that a Release Build Works -Run mvn -Prelease to ensure that the build processes that are specific to that +Run mvn -Prelease clean install to ensure that the build processes that are specific to that profile are in good shape. Update and Verify Javadoc @@ -457,6 +465,21 @@ DEVELOPMENT_VERSION="${NEXT_VERSION}-SNAPSHOT" The rest of this guide assumes that commands are run in the root of a repository on ${BRANCH_NAME} with the above environment variables set. +Update the Python SDK version + +In the master branch, update Python SDK https://github.com/apache/beam/blob/master/sdks/python/apache_beam/version.py";>version identifier to the next development version (e.g. 1.2.3.dev to 1.3.0.dev). + +In the release branch, update the Python SDK version to the release version (e.g. 1.2.3.dev to 1.2.3). + +Update release specific configurations + + + Update archetypes: +https://github.com/apache/beam/commit/d375cfa126fd7be9c34f39c2b9b856f324bf";>example + Update runner specific configurations: +https://github.com/apache/beam/commit/f572328ce23e70adee8001e3d10f1479bd9a380d";>example + + Checklist to proceed to the next step @@ -523,16 +546,28 @@ TAG="v${VERSION}-RC${RC_NUM}" Make a di
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/dba78bbc Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/dba78bbc Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/dba78bbc Branch: refs/heads/asf-site Commit: dba78bbcf138f1ac635115edc028c93487a27e14 Parents: 4ef79d1 Author: Davor Bonaci Authored: Tue Apr 18 16:12:29 2017 -0700 Committer: Davor Bonaci Committed: Tue Apr 18 16:12:29 2017 -0700 -- content/get-started/beam-overview/index.html | 61 +++--- content/images/logos/runners/apex.png| Bin 0 -> 3717 bytes content/images/logos/runners/dataflow.png| Bin 0 -> 8277 bytes content/images/logos/runners/flink.png | Bin 0 -> 4584 bytes content/images/logos/runners/spark.png | Bin 0 -> 2701 bytes content/images/logos/sdks/java.png | Bin 0 -> 3726 bytes content/images/logos/sdks/python.png | Bin 0 -> 3735 bytes 7 files changed, 19 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/dba78bbc/content/get-started/beam-overview/index.html -- diff --git a/content/get-started/beam-overview/index.html b/content/get-started/beam-overview/index.html index caac276..c284b15 100644 --- a/content/get-started/beam-overview/index.html +++ b/content/get-started/beam-overview/index.html @@ -153,7 +153,7 @@ Apache Beam Overview -Apache Beam is an open source, unified programming model that you can use to create a data processing pipeline. You start by building a program that defines the pipeline using one of the open source Beam SDKs. The pipeline is then executed by one of Beamâs supported distributed processing back-ends, which include http://apex.apache.org";>Apache Apex, http://flink.apache.org";>Apache Flink, http://spark.apache.org";>Apache Spark, and https://cloud.google.com/dataflow";>Google Cloud Dataflow. +Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beamâs supported distributed processing back-ends, which include http://apex.apache.org";>Apache Apex, http://flink.apache.org";>Apache Flink, http://spark.apache.org";>Apache Spark, and https://cloud.google.com/dataflow";>Google Cloud Dataflow. Beam is particularly useful for http://en.wikipedia.org/wiki/Embarassingly_parallel";>Embarrassingly Parallel data processing tasks, in which the problem can be decomposed into many smaller bundles of data that can be processed independently and in parallel. You can also use Beam for Extract, Transform, and Load (ETL) tasks and pure data integration. These tasks are useful for moving data between different storage media and data sources, transforming data into a more desirable format, or loading data onto a new system. @@ -163,24 +163,10 @@ Beam currently supports the following language-specific SDKs: - - - Language - SDK Status - - - Java - Active Development - - - Python - Active Development - - - Other - TBD - - + + Java + Python + Apache Beam Pipeline Runners @@ -188,32 +174,16 @@ Beam currently supports Runners that work with the following distributed processing back-ends: - - - Runner - Status - - - Apache Apex - Active Development - - - Apache Flink - Active Development - - - Apache Spark - Active Development - - - Google Cloud Dataflow - Active Development - - + + Apache Apex + Apache Flink + Apache Spark + Google Cloud Dataflow + Note: You can always execute your pipeline locally for testing and debugging purposes. -Getting Started with Apache Beam +Get Started Get started using Beam for your data processing tasks. @@ -224,8 +194,15 @@ See the WordCount Examples Walkthrough for examples that introduce various features of the SDKs. + +Dive into the Documentation section for in-depth concepts and reference materials for the Beam model, SDKs, and runners. + +Contribute + +Beam is an http://www.apache.org";>Apache Software Foundation project, available under the Apache v2 license. Beam is an open source community and contributions are greatly appreciated! If youâd like to contribute, please see the Contribute section. + http://git-wip-us.apache.org/repos/asf/beam-site/blob/dba78bbc/content/images/logos/runners/apex.png -- diff --git a/content/images/logos/runners/apex.png b/content/images/logos/runners/apex.png new file mode 100644 index 000..9cc0367 Binary files /dev/null and b/content/ima
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/394bfe70 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/394bfe70 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/394bfe70 Branch: refs/heads/asf-site Commit: 394bfe70319d92ad68d1c13a40db936445e0bd99 Parents: f23d9cb Author: Davor Bonaci Authored: Tue Apr 18 15:45:02 2017 -0700 Committer: Davor Bonaci Committed: Tue Apr 18 15:45:02 2017 -0700 -- .../documentation/runners/dataflow/index.html | 79 content/documentation/runners/direct/index.html | 30 ++-- 2 files changed, 90 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/394bfe70/content/documentation/runners/dataflow/index.html -- diff --git a/content/documentation/runners/dataflow/index.html b/content/documentation/runners/dataflow/index.html index 2f3d9b0..4dda742 100644 --- a/content/documentation/runners/dataflow/index.html +++ b/content/documentation/runners/dataflow/index.html @@ -153,6 +153,14 @@ Using the Google Cloud Dataflow Runner + + Adapt for: + +Java SDK +Python SDK + + + The Google Cloud Dataflow Runner uses the https://cloud.google.com/dataflow/service/dataflow-service-desc";>Cloud Dataflow managed service. When you run your pipeline with the Cloud Dataflow service, the runner uploads your executable code and dependencies to a Google Cloud Storage bucket and creates a Cloud Dataflow job, which executes your pipeline on managed resources in Google Cloud Platform. The Cloud Dataflow Runner and service are suitable for large scale, continuous jobs, and provide: @@ -202,8 +210,7 @@ Specify your dependency -You must specify your dependency on the Cloud Dataflow Runner. - +When using Java, you must specify your dependency on the Cloud Dataflow Runner in your pom.xml.org.apache.beam beam-runners-google-cloud-dataflow-java @@ -213,6 +220,8 @@ +This section is not applicable to the Beam SDK for Python. + Authentication Before running your pipeline, you must authenticate with the Google Cloud Platform. Run the following command to get https://developers.google.com/identity/protocols/application-default-credentials";>Application Default Credentials. @@ -223,7 +232,8 @@ Pipeline options for the Cloud Dataflow Runner -When executing your pipeline with the Cloud Dataflow Runner, set these pipeline options. +When executing your pipeline with the Cloud Dataflow Runner (Java), consider these common pipeline options. +When executing your pipeline with the Cloud Dataflow Runner (Python), consider these common pipeline options. @@ -231,39 +241,80 @@ Description Default Value + runner The pipeline runner to use. This option allows you to determine the pipeline runner at runtime. - Set to dataflow to run on the Cloud Dataflow Service. + Set to dataflow or DataflowRunner to run on the Cloud Dataflow Service. + project The project ID for your Google Cloud Project. If not set, defaults to the default project in the current environment. The default project is set via gcloud. - + + + streaming Whether streaming mode is enabled or disabled; true if enabled. Set to true if running pipelines with unbounded PCollections. false + - tempLocation - Optional. Path for temporary files. If set to a valid Google Cloud Storage URL that begins with gs://, tempLocation is used as the default value for gcpTempLocation. + +tempLocation +temp_location + + +Optional. +Required. +Path for temporary files. Must be a valid Google Cloud Storage URL that begins with gs://. +If set, tempLocation is used as the default value for gcpTempLocation. + No default value. - + + + gcpTempLocation Cloud Storage bucket path for temporary files. Must be a valid Cloud Storage URL that begins with gs://. If not set, defaults to the value of tempLocation, provided that tempLocation is a valid Cloud Storage URL. If tempLocation is not a valid Cloud Storage URL, you must set gcpTempLocation. + - stagingLocation + +stagingLocation +staging_location + Optional. Cloud Storage bucket path for staging your binary and any temporary files. Must be a valid Cloud Storage URL that begins with gs://. - If not set, defaults to a staging directory within gcpTempLocation. + +If not set, defaults to a staging directory within gcpTempLocation. +If not set, defaults to a staging directory within temp_location. + + + + + + save_main_session + Save the main session state so that pickled functions and
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/031683d3 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/031683d3 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/031683d3 Branch: refs/heads/asf-site Commit: 031683d36dd6802aa807540de0c73f6f3c149de8 Parents: 679ed21 Author: Davor Bonaci Authored: Tue Apr 18 15:42:58 2017 -0700 Committer: Davor Bonaci Committed: Tue Apr 18 15:42:58 2017 -0700 -- .../documentation/programming-guide/index.html | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/031683d3/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index d7b1253..edb184b 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1156,7 +1156,7 @@ guest, [[], [order4]] -# To emit elements to a side output PCollection, invoke with_outputs() on the ParDo, optionally specifying the expected tags for the output. +# To emit elements to multiple output PCollections, invoke with_outputs() on the ParDo, and specify the expected tags for the outputs. # with_outputs() returns a DoOutputsTuple object. Tags specified in with_outputs are attributes on the returned DoOutputsTuple object. # The tags give access to the corresponding output PCollections. @@ -1205,9 +1205,9 @@ guest, [[], [order4]] -# Inside your ParDo's DoFn, you can emit an element to a side output by wrapping the value and the output tag (str). -# using the pvalue.SideOutputValue wrapper class. -# Based on the previous example, this shows the DoFn emitting to the main and side outputs. +# Inside your ParDo's DoFn, you can emit an element to a specific output by wrapping the value and the output tag (str). +# using the pvalue.OutputValue wrapper class. +# Based on the previous example, this shows the DoFn emitting to the main output and two additional outputs. class ProcessWords(beam.DoFn): @@ -1216,19 +1216,19 @@ guest, [[], [order4]] # Emit this short word to the main output. yield element else: - # Emit this word's long length to a side output. - yield pvalue.SideOutputValue( + # Emit this word's long length to the 'above_cutoff_lengths' output. + yield pvalue.OutputValue( 'above_cutoff_lengths', len(element)) if element.startswith(marker): - # Emit this word to a different side output. - yield pvalue.SideOutputValue('marked strings', element) + # Emit this word to a different output with the 'marked strings' tag. + yield pvalue.OutputValue('marked strings', element) -# Side outputs are also available in Map and FlatMap. +# Producing multiple outputs is also available in Map and FlatMap. # Here is an example that uses FlatMap and shows that the tags do not need to be specified ahead of time. def even_odd(x): - yield pvalue.SideOutputValue('odd' if x % 2 else 'even', x) + yield pvalue.OutputValue('odd' if x % 2 else 'even', x) if x % 10 == 0: yield x
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/d59128ff Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/d59128ff Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/d59128ff Branch: refs/heads/asf-site Commit: d59128ffaf98b4a2d16e6fd5f3ba58eae43c3dfa Parents: 1ce8785 Author: Davor Bonaci Authored: Tue Apr 18 11:31:50 2017 -0700 Committer: Davor Bonaci Committed: Tue Apr 18 11:31:50 2017 -0700 -- content/documentation/io/built-in/index.html | 4 .../design-your-pipeline-additional-outputs.png| Bin 0 -> 32797 bytes 2 files changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/d59128ff/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 797f12b..639a877 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -266,6 +266,10 @@ RabbitMQJava https://issues.apache.org/jira/browse/BEAM-1240";>BEAM-1240 + +RestIOJava +https://issues.apache.org/jira/browse/BEAM-1946";>BEAM-1946 + http://git-wip-us.apache.org/repos/asf/beam-site/blob/d59128ff/content/images/design-your-pipeline-additional-outputs.png -- diff --git a/content/images/design-your-pipeline-additional-outputs.png b/content/images/design-your-pipeline-additional-outputs.png new file mode 100644 index 000..a4fae32 Binary files /dev/null and b/content/images/design-your-pipeline-additional-outputs.png differ
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/67180c86 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/67180c86 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/67180c86 Branch: refs/heads/asf-site Commit: 67180c866a193e3d16a486f88e13e7eeacec3de4 Parents: e296472 Author: Ahmet Altay Authored: Thu Apr 13 16:34:06 2017 -0700 Committer: Ahmet Altay Committed: Thu Apr 13 16:34:06 2017 -0700 -- content/get-started/quickstart-java/index.html | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/67180c86/content/get-started/quickstart-java/index.html -- diff --git a/content/get-started/quickstart-java/index.html b/content/get-started/quickstart-java/index.html index 51f5e4c..6994a1d 100644 --- a/content/get-started/quickstart-java/index.html +++ b/content/get-started/quickstart-java/index.html @@ -253,8 +253,9 @@ You can monitor the running job by visiting the Flink dashboard at http:///tmp \ - --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs:// /counts" \ + -Dexec.args="--runner=DataflowRunner --project= \ + --gcpTempLocation=gs:// /tmp \ + --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs:// /counts" \ -Pdataflow-runner
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/ee61b91d Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/ee61b91d Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/ee61b91d Branch: refs/heads/asf-site Commit: ee61b91da10dd908159be3031ba1912a301c98ed Parents: d4e83a2 Author: Ismaël MejÃa Authored: Wed Apr 12 09:56:43 2017 +0200 Committer: Ismaël MejÃa Committed: Wed Apr 12 09:56:43 2017 +0200 -- content/documentation/io/built-in/index.html | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/ee61b91d/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 0cb0338..797f12b 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -247,6 +247,10 @@ https://issues.apache.org/jira/browse/BEAM-1893";>BEAM-1893 +JSONJava +https://issues.apache.org/jira/browse/BEAM-1581";>BEAM-1581 + + MemcachedJava https://issues.apache.org/jira/browse/BEAM-1678";>BEAM-1678
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/4810a749 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/4810a749 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/4810a749 Branch: refs/heads/asf-site Commit: 4810a749ad56574fa8fb829c8d43be9e4f46a032 Parents: 732378d Author: Ismaël MejÃa Authored: Tue Apr 11 00:16:49 2017 +0200 Committer: Ismaël MejÃa Committed: Tue Apr 11 00:16:49 2017 +0200 -- content/documentation/io/built-in/index.html | 40 --- 1 file changed, 28 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/4810a749/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index 5eed2ea..0cb0338 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -215,36 +215,52 @@ NameLanguageJIRA +AMQPJava +https://issues.apache.org/jira/browse/BEAM-1237";>BEAM-1237 + + Apache CassandraJava https://issues.apache.org/jira/browse/BEAM-245";>BEAM-245 -Apache ParquetJava -https://issues.apache.org/jira/browse/BEAM-214";>BEAM-214 +Apache DistributedLogJava +https://issues.apache.org/jira/browse/BEAM-607";>BEAM-607 -RedisJava -https://issues.apache.org/jira/browse/BEAM-1017";>BEAM-1017 +Apache HiveJava +https://issues.apache.org/jira/browse/BEAM-1158";>BEAM-1158 -MemcachedJava -https://issues.apache.org/jira/browse/BEAM-1678";>BEAM-1678 +Apache ParquetJava +https://issues.apache.org/jira/browse/BEAM-214";>BEAM-214 Apache SolrJava https://issues.apache.org/jira/browse/BEAM-1236";>BEAM-1236 -RabbitMQJava -https://issues.apache.org/jira/browse/BEAM-1240";>BEAM-1240 +Apache SqoopJava +https://issues.apache.org/jira/browse/BEAM-67";>BEAM-67 -AMQPJava -https://issues.apache.org/jira/browse/BEAM-1237";>BEAM-1237 +CouchbaseJava +https://issues.apache.org/jira/browse/BEAM-1893";>BEAM-1893 -Apache HiveJava -https://issues.apache.org/jira/browse/BEAM-1158";>BEAM-1158 +MemcachedJava +https://issues.apache.org/jira/browse/BEAM-1678";>BEAM-1678 + + +Neo4jJava +https://issues.apache.org/jira/browse/BEAM-1857";>BEAM-1857 + + +RedisJava +https://issues.apache.org/jira/browse/BEAM-1017";>BEAM-1017 + + +RabbitMQJava +https://issues.apache.org/jira/browse/BEAM-1240";>BEAM-1240
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6bded068 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6bded068 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6bded068 Branch: refs/heads/asf-site Commit: 6bded068a06eb7b17cf881d0871cb30f2e986084 Parents: 8c9cda3 Author: Ahmet Altay Authored: Thu Apr 6 16:11:57 2017 -0700 Committer: Ahmet Altay Committed: Thu Apr 6 16:11:57 2017 -0700 -- .../documentation/io/authoring-java/index.html | 9 ++ .../io/authoring-overview/index.html| 97 +++- content/documentation/io/io-toc/index.html | 11 ++- 3 files changed, 90 insertions(+), 27 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/6bded068/content/documentation/io/authoring-java/index.html -- diff --git a/content/documentation/io/authoring-java/index.html b/content/documentation/io/authoring-java/index.html index 5128d93..7f2a308 100644 --- a/content/documentation/io/authoring-java/index.html +++ b/content/documentation/io/authoring-java/index.html @@ -159,6 +159,15 @@ Note: This guide is still in progress. There is an open issue to finish the guide: https://issues.apache.org/jira/browse/BEAM-1025";>BEAM-1025. +Example I/O Transforms +Currently, Apache Beamâs I/O transforms use a variety of different +styles. These transforms are good examples to follow: + + https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreIO.java";>DatastoreIO - ParDo based database read and write that conforms to the PTransform style guide + https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java";>BigtableIO - Good test examples, and demonstrates Dynamic Work Rebalancing + https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java";>JdbcIO - Demonstrates reading using single ParDo+GroupByKey when data stores cannot be read in parallel + + Next steps Testing I/O Transforms http://git-wip-us.apache.org/repos/asf/beam-site/blob/6bded068/content/documentation/io/authoring-overview/index.html -- diff --git a/content/documentation/io/authoring-overview/index.html b/content/documentation/io/authoring-overview/index.html index 73fffa2..5e36676 100644 --- a/content/documentation/io/authoring-overview/index.html +++ b/content/documentation/io/authoring-overview/index.html @@ -157,54 +157,107 @@ A guide for users who need to connect to a data store that isnât supported by the Built-in I/O Transforms - - Note: This guide is still in progress. There is an open issue to finish the guide: https://issues.apache.org/jira/browse/BEAM-1025";>BEAM-1025. - - Introduction - Example I/O Transforms Suggested steps for implementers Read transforms - When to implement using the Source API + When to implement using the Source API Write transforms - When to implement using the Sink API + When to implement using the Sink API Introduction -TODO +This guide covers how to implement I/O transforms in the Beam model. Beam pipelines use these read and write transforms to import data for processing, and write data to a store. + +Reading and writing data in Beam is a parallel task, and using ParDos, GroupByKeys, etc⦠is usually sufficient. Rarely, you will need the more specialized Source and Sink classes for specific features. There are changes coming soon (SplittableDoFn, https://issues.apache.org/jira/browse/BEAM-65";>BEAM-65) that will make Source unnecessary. -Example I/O Transforms -TODO +As you work on your I/O Transform, be aware that the Beam community is excited to help those building new I/O Transforms and that there are many examples and helper classes. Suggested steps for implementers -TODO + + Check out this guide and come up with your design. If youâd like, you can email the Beam dev mailing list with any questions you might have. Itâs good to check there to see if anyone else is working on the same I/O Transform. + If you are planning to contribute your I/O transform to the Beam community, youâll be going through the normal Beam contribution life cycle - see the Apache Beam Contribution Guide for more details. + As youâre working on your IO transform, see the PTransform Style Guide for specific information about writing I/O Transforms. + Read transforms -TODO +Read transforms take data from outside of the Beam pipeline and produce PCollecti
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/234bccff Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/234bccff Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/234bccff Branch: refs/heads/asf-site Commit: 234bccffc8178b0a7922e56b791cc0aacbba1986 Parents: 815311d Author: Ahmet Altay Authored: Thu Apr 6 16:07:34 2017 -0700 Committer: Ahmet Altay Committed: Thu Apr 6 16:07:34 2017 -0700 -- content/documentation/io/built-in/index.html | 42 +++ 1 file changed, 42 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/234bccff/content/documentation/io/built-in/index.html -- diff --git a/content/documentation/io/built-in/index.html b/content/documentation/io/built-in/index.html index fe5405a..5eed2ea 100644 --- a/content/documentation/io/built-in/index.html +++ b/content/documentation/io/built-in/index.html @@ -181,6 +181,7 @@ https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io";>Google Cloud PubSub +https://github.com/apache/beam/tree/master/sdks/java/io/hadoop";>Apache Hadoop InputFormat https://github.com/apache/beam/tree/master/sdks/java/io/hbase";>Apache HBase https://github.com/apache/beam/tree/master/sdks/java/io/mongodb";>MongoDB https://github.com/apache/beam/tree/master/sdks/java/io/jdbc";>JDBC @@ -205,6 +206,47 @@ +In-Progress I/O Transforms + +This table contains I/O transforms that are currently planned or in-progress. Status information can be found on the JIRA issue, or on the GitHub PR linked to by the JIRA issue (if there is one). + + + +NameLanguageJIRA + + +Apache CassandraJava +https://issues.apache.org/jira/browse/BEAM-245";>BEAM-245 + + +Apache ParquetJava +https://issues.apache.org/jira/browse/BEAM-214";>BEAM-214 + + +RedisJava +https://issues.apache.org/jira/browse/BEAM-1017";>BEAM-1017 + + +MemcachedJava +https://issues.apache.org/jira/browse/BEAM-1678";>BEAM-1678 + + +Apache SolrJava +https://issues.apache.org/jira/browse/BEAM-1236";>BEAM-1236 + + +RabbitMQJava +https://issues.apache.org/jira/browse/BEAM-1240";>BEAM-1240 + + +AMQPJava +https://issues.apache.org/jira/browse/BEAM-1237";>BEAM-1237 + + +Apache HiveJava +https://issues.apache.org/jira/browse/BEAM-1158";>BEAM-1158 + +
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f011e303 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f011e303 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f011e303 Branch: refs/heads/asf-site Commit: f011e303c71c81675ec6d7e535cd815fb5c32074 Parents: 0dd610f Author: Ahmet Altay Authored: Mon Apr 3 15:52:46 2017 -0700 Committer: Ahmet Altay Committed: Mon Apr 3 15:52:46 2017 -0700 -- content/get-started/quickstart-py/index.html | 37 ++- 1 file changed, 36 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f011e303/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index ce774cb..7c8f2d3 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -163,7 +163,10 @@ Get Apache Beam Create and activate a virtual environment - Download and install + Download and install + Extra Requirements + + Execute a pipeline locally @@ -227,6 +230,38 @@ environmentâs directories. +Extra Requirements + +The above installation will not install all the extra dependencies for using features like the Google Cloud Dataflow runner. Information on what extra packages are required for different features are highlighted below. It is possible to install multitple extra requirements using something like pip install apache-beam[feature1, feature2]. + + + Google Cloud Platform + + Installation Command: pip install apache-beam[gcp] + Required for: + + Google Cloud Dataflow Runner + GCS IO + Datastore IO + BigQuery IO + + + + + Tests + + Installation Command: pip install apache-beam[test] + Required for developing on beam and running unittests + + + Docs + + Installation Command: pip install apache-beam[docs] + Generating API documentation using Sphinx + + + + Execute a pipeline locally The Apache Beam https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples";>examples directory has many examples. All examples can be run locally by passing the required arguments described in the example script.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/3ee76398 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/3ee76398 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/3ee76398 Branch: refs/heads/asf-site Commit: 3ee763988c610d145b6de6a6b532510aa3ba27e3 Parents: 30d2dc7 Author: Ahmet Altay Authored: Mon Apr 3 14:33:08 2017 -0700 Committer: Ahmet Altay Committed: Mon Apr 3 14:33:08 2017 -0700 -- content/contribute/contribution-guide/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/3ee76398/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index f5fefc1..b029d4c 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -283,7 +283,7 @@ One-time Setup [Potentially] Submit Contributor License Agreement -Apache Software Foundation (ASF) desires that all contributors of ideas, code, or documentation to the Apache projects complete, sign, and submit an https://www.apache.org/licenses/icla.txt";>Individual Contributor License Agreement (ICLA). The purpose of this agreement is to clearly define the terms under which intellectual property has been contributed to the ASF and thereby allow us to defend the project should there be a legal dispute regarding the software at some future time. +Apache Software Foundation (ASF) desires that all contributors of ideas, code, or documentation to the Apache projects complete, sign, and submit an https://www.apache.org/licenses/icla.pdf";>Individual Contributor License Agreement (ICLA). The purpose of this agreement is to clearly define the terms under which intellectual property has been contributed to the ASF and thereby allow us to defend the project should there be a legal dispute regarding the software at some future time. We require you to have an ICLA on file with the Apache Secretary for larger contributions only. For smaller ones, however, we rely on http://www.apache.org/licenses/LICENSE-2.0#contributions";>clause five of the Apache License, Version 2.0, describing licensing of intentionally submitted contributions and do not require an ICLA in that case.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8de1bcc6 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8de1bcc6 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8de1bcc6 Branch: refs/heads/asf-site Commit: 8de1bcc66bf5e9db6e05cd8f2fceb9905a7291bf Parents: 6261b1a Author: Davor Bonaci Authored: Sun Apr 2 09:51:17 2017 +0200 Committer: Davor Bonaci Committed: Sun Apr 2 09:51:17 2017 +0200 -- content/documentation/runners/capability-matrix/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/8de1bcc6/content/documentation/runners/capability-matrix/index.html -- diff --git a/content/documentation/runners/capability-matrix/index.html b/content/documentation/runners/capability-matrix/index.html index 04174c7..b1cca24 100644 --- a/content/documentation/runners/capability-matrix/index.html +++ b/content/documentation/runners/capability-matrix/index.html @@ -1372,7 +1372,7 @@ -Partially: streaming, non-merging windowsState is supported in streaming mode for non-merging windows. SetState and MapState are not yet supported. +Partially: streaming, non-merging windowsState is supported for non-merging windows. SetState and MapState are not yet supported. @@ -1866,7 +1866,7 @@ -Partially: streaming, non-merging windowsThe Flink runner support timers in non-merging windows when run in streaming mode. +Partially: streaming, non-merging windowsThe Flink Runner supports timers in non-merging windows.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7453a1a8 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7453a1a8 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7453a1a8 Branch: refs/heads/asf-site Commit: 7453a1a820a0d76fab7384e3504e6ae6bd4d15ce Parents: 81bafde Author: Davor Bonaci Authored: Mon Mar 27 15:36:54 2017 -0700 Committer: Davor Bonaci Committed: Mon Mar 27 15:36:54 2017 -0700 -- content/contribute/testing/index.html | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/7453a1a8/content/contribute/testing/index.html -- diff --git a/content/contribute/testing/index.html b/content/contribute/testing/index.html index c0ebd6d..f9bc735 100644 --- a/content/contribute/testing/index.html +++ b/content/contribute/testing/index.html @@ -167,13 +167,13 @@ Testing Types Unit - RunnableOnService (Working Title) + ValidatesRunner (Working Title) E2E Testing Systems E2E Testing Framework - RunnableOnService Tests + ValidatesRunner Tests Effective use of the TestPipeline JUnit rule API Surface testing @@ -331,11 +331,11 @@ details on those testing types. Correctness - E2E Test, https://github.com/apache/beam/blob/master/runners/pom.xml#L47";>@RunnableonService + E2E Test, https://github.com/apache/beam/blob/master/runners/pom.xml#L47";>@ValidatesRunner https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java#L78";>WordCountIT, https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java";>ParDoTest - E2E, @RunnableonService + E2E, @ValidatesRunner Postcommit @@ -450,7 +450,7 @@ viewed. Running in postcommit removes as stringent of a time constraint, which gives us the ability to do some more comprehensive testing. In postcommit we have a test -suite running the RunnableOnService tests against each supported runner, and +suite running the ValidatesRunner tests against each supported runner, and another for running the full set of E2E tests against each runner. Currently-supported runners are Dataflow, Flink, Spark, and Gearpump, with others soon to follow. Work is ongoing to enable Flink, Spark, and Gearpump in @@ -475,9 +475,9 @@ importance of testing, Beam has a robust set of unit tests, as well as testing coverage measurement tools, which protect the codebase from simple to moderate breakages. Beam Java unit tests are written in JUnit. -RunnableOnService (Working Title) +ValidatesRunner (Working Title) -RunnableOnService tests contain components of both component and end-to-end +ValidatesRunner tests contain components of both component and end-to-end tests. They fulfill the typical purpose of a component test - they are meant to test a well-scoped piece of Beam functionality or the interactions between two such pieces and can be run in a component-test-type fashion against the @@ -487,7 +487,7 @@ functionality, but runner functionality as well. They are more lightweight than a traditional end-to-end test and, because of their well-scoped nature, provide good signal as to what exactly is working or broken against a particular runner. -The name âRunnableOnServiceâ is an artifact of when Beam was still the Google +The name âValidatesRunnerâ is an artifact of when Beam was still the Google Cloud Dataflow SDK and https://issues.apache.org/jira/browse/BEAM-655";>will be changing to something more indicative of its use in the coming months. @@ -537,9 +537,9 @@ environments. We currently provide the ability to run against the DirectRunner, against a local Spark instance, a local Flink instance, and against the Google Cloud Dataflow service. -RunnableOnService Tests +ValidatesRunner Tests -RunnableOnService tests are tests built to use the Beam TestPipeline class, +ValidatesRunner tests are tests built to use the Beam TestPipeline class, which enables test authors to write simple functionality verification. They are meant to use some of the built-in utilities of the SDK, namely PAssert, to verify that the simple pipelines they run end in the correct state. @@ -568,7 +568,7 @@ due to the one of the following scenarios: Abandoned node detection is automatically enabled when a real pipeline runner (i.e. not a CrashingRunner) and/or a -@NeedsRunner / @RunnableOnService annotation are detected. +@NeedsRunner / @ValidatesRunner annotation are detected. Consider the following test:
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b17c1985 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b17c1985 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b17c1985 Branch: refs/heads/asf-site Commit: b17c19853df3dd0112aab223f305b5d8fa6888c0 Parents: c40342d Author: Davor Bonaci Authored: Wed Mar 22 10:16:23 2017 -0700 Committer: Davor Bonaci Committed: Wed Mar 22 10:16:23 2017 -0700 -- .../pipelines/design-your-pipeline/index.html | 88 ++-- 1 file changed, 82 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/b17c1985/content/documentation/pipelines/design-your-pipeline/index.html -- diff --git a/content/documentation/pipelines/design-your-pipeline/index.html b/content/documentation/pipelines/design-your-pipeline/index.html index c575446..8e0e51f 100644 --- a/content/documentation/pipelines/design-your-pipeline/index.html +++ b/content/documentation/pipelines/design-your-pipeline/index.html @@ -190,7 +190,7 @@ Figure 1: A linear pipeline. -However, your pipeline can be significantly more complex. A pipeline represents a https://en.wikipedia.org/wiki/Directed_acyclic_graph";>Directed Acyclic Graph of steps. It can have multiple input sources, multiple output sinks, and its operations (transforms) can output multiple PCollections. The following examples show some of the different shapes your pipeline can take. +However, your pipeline can be significantly more complex. A pipeline represents a https://en.wikipedia.org/wiki/Directed_acyclic_graph";>Directed Acyclic Graph of steps. It can have multiple input sources, multiple output sinks, and its operations (PTransforms) can both read and output multiple PCollections. The following examples show some of the different shapes your pipeline can take. Branching PCollections @@ -205,7 +205,28 @@ -Figure 2: A pipeline with multiple transforms. Note that the PCollection of the database table rows is processed by two transforms. +Figure 2: A pipeline with multiple transforms. Note that the PCollection of the database table rows is processed by two transforms. See the example code below: +PCollectiondbRowCollection = ...; + +PCollection aCollection = dbRowCollection.apply("aTrans", ParDo.of(new DoFn (){ + @ProcessElement + public void processElement(ProcessContext c) { +if(c.element().startsWith("A")){ + c.output(c.element()); +} + } +})); + +PCollection bCollection = dbRowCollection.apply("bTrans", ParDo.of(new DoFn (){ + @ProcessElement + public void processElement(ProcessContext c) { +if(c.element().startsWith("B")){ + c.output(c.element()); +} + } +})); + + A single transform that uses side outputs @@ -232,7 +253,37 @@ if (starts with 'A') { outputToPCollectionA } else if (starts with 'B') { outputToPCollectionB } -where each element in the input PCollection is processed once. +where each element in the input PCollection is processed once. See the example code below: +//define main stream and side output +final TupleTag mainStreamTag = new TupleTag (){}; +final TupleTag sideoutTag = new TupleTag (){}; + +PCollectionTuple mixedCollection = +dbRowCollection.apply( +ParDo +// Specify the tag for the main output, wordsBelowCutoffTag. +.withOutputTags(mainStreamTag, +// Specify the tags for the two side outputs as a TupleTagList. +TupleTagList.of(sideoutTag)) +.of(new DoFn () { + @ProcessElement +public void processElement(ProcessContext c) { + if(c.element().startsWith("A")){//output to main stream +c.output(c.element()); + }else if(c.element().startsWith("B")){//emit as Side outputs +c.sideOutput(sideoutTag, c.element()); + } +} +} +)); + +// get subset of main stream +mixedCollection.get(mainStreamTag).apply(...); + +// get subset of Side output +mixedCollection.get(sideoutTag).apply(...); + + You can use either mechanism to produce multiple output PCollections. However, using side outputs makes more sense if the transformâs computation per element is time-consuming. @@ -245,12 +296,21 @@ Join - You can use the CoGroupByKey transform in the Beam SDK to perform a relational join between two PCollections. The PCollections must be keyed (i.e. they must be collections of key/value pairs) and they must use the same key type. -The example depicted in Figure 4 below is a continuation of the example illustrated in Figure 2 in
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7d9208ea Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7d9208ea Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7d9208ea Branch: refs/heads/asf-site Commit: 7d9208ea92fb34c294455b0fb96066678abaf9f0 Parents: af0821f Author: Ahmet Altay Authored: Mon Mar 20 17:57:45 2017 -0700 Committer: Ahmet Altay Committed: Mon Mar 20 17:57:45 2017 -0700 -- content/get-started/quickstart-py/index.html | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/7d9208ea/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index b717987..ce774cb 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -221,8 +221,11 @@ environmentâs directories. Download and install -Install the latest Python SDK from PyPI: - pip install apache-beam +Install the latest Python SDK from PyPI: + +pip install apache-beam + + Execute a pipeline locally @@ -234,14 +237,13 @@ environmentâs directories. -# As part of the initial setup, install gcp specific extra components. -pip install dist/apache-beam-*.tar.gz .[gcp] +# As part of the initial setup, install Google Cloud Platform specific extra components. +pip install apache-beam[gcp] python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \ --output gs:///counts \ --runner DataflowRunner \ --project your-gcp-project \ - --temp_location gs:// /tmp/ \ - --sdk_location dist/apache-beam-*.tar.gz + --temp_location gs:// /tmp/
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/fa7d6168 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/fa7d6168 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/fa7d6168 Branch: refs/heads/asf-site Commit: fa7d61680afb67248c2b9ae55417bc0ce7148c9b Parents: ddb6079 Author: Davor Bonaci Authored: Mon Mar 20 14:56:42 2017 -0700 Committer: Davor Bonaci Committed: Mon Mar 20 14:56:42 2017 -0700 -- .../documentation/programming-guide/index.html | 248 ++- content/images/fixed-time-windows.png | Bin 0 -> 11717 bytes content/images/session-windows.png | Bin 0 -> 16697 bytes content/images/sliding-time-windows.png | Bin 0 -> 16537 bytes content/images/unwindowed-pipeline-bounded.png | Bin 0 -> 9589 bytes content/images/windowing-pipeline-bounded.png | Bin 0 -> 13325 bytes content/images/windowing-pipeline-unbounded.png | Bin 0 -> 21890 bytes 7 files changed, 245 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/fa7d6168/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 19853df..9d0a3b6 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -369,7 +369,7 @@ The bounded (or unbounded) nature of your PCollection affects how Beam processes your data. A bounded PCollection can be processed using a batch job, which might read the entire data set once, and perform processing in a job of finite length. An unbounded PCollection must be processed using a streaming job that runs continuously, as the entire collection can never be available for processing at any one time. -When performing an operation that groups elements in an unbounded PCollection, Beam requires a concept called Windowing to divide a continuously updating data set into logical windows of finite size. Beam processes each window as a bundle, and processing continues as the data set is generated. These logical windows are determined by some characteristic associated with a data element, such as a timestamp. +When performing an operation that groups elements in an unbounded PCollection, Beam requires a concept called windowing to divide a continuously updating data set into logical windows of finite size. Beam processes each window as a bundle, and processing continues as the data set is generated. These logical windows are determined by some characteristic associated with a data element, such as a timestamp. Element timestamps @@ -1522,8 +1522,250 @@ tree, [2] The Beam SDK for Python does not support annotating data types with a default coder. If you would like to set a default coder, use the method described in the previous section, Setting the default coder for a type. - - +Working with windowing + +Windowing subdivides a PCollection according to the timestamps of its individual elements. Transforms that aggregate multiple elements, such as GroupByKey and Combine, work implicitly on a per-window basisâthat is, they process each PCollection as a succession of multiple, finite windows, though the entire collection itself may be of unbounded size. + +A related concept, called triggers, determines when to emit the results of aggregation as unbounded data arrives. Using a trigger can help to refine the windowing strategy for your PCollection to deal with late-arriving data or to provide early results. See the triggers section for more information. + +Windowing basics + +Some Beam transforms, such as GroupByKey and Combine, group multiple elements by a common key. Ordinarily, that grouping operation groups all of the elements that have the same key within the entire data set. With an unbounded data set, it is impossible to collect all of the elements, since new elements are constantly being added and may be infinitely many (e.g. streaming data). If you are working with unbounded PCollections, windowing is especially useful. + +In the Beam model, any PCollection (including unbounded PCollections) can be subdivided into logical windows. Each element in a PCollection is assigned to one or more windows according to the PCollectionâs windowing function, and each individual window contains a finite number of elements. Grouping transforms then consider each PCollectionâs elements on a per-window basis. GroupByKey, for example, implicitly groups the elements of a PCollection by key and window. + +Caution: The default windowing behavior is to assign all elements of a PCollection to a single, glob
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/bf1a64e4 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/bf1a64e4 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/bf1a64e4 Branch: refs/heads/asf-site Commit: bf1a64e408a2d0443ed855f0aa9591fd48ac233e Parents: 9a89089 Author: Davor Bonaci Authored: Mon Mar 20 09:18:37 2017 -0700 Committer: Davor Bonaci Committed: Mon Mar 20 09:18:37 2017 -0700 -- content/contribute/contribution-guide/index.html | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/bf1a64e4/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index af492a9..f5fefc1 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -627,9 +627,7 @@ $ git checkout -b finish-pr-github/pr/ ;] \n\nThis closes # ' \ -finish-pr- +$ git merge --no-ff -m 'This closes # ' finish-pr-
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0765aa05 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0765aa05 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0765aa05 Branch: refs/heads/asf-site Commit: 0765aa05dbe3d4d736bdffe926367a2eb13f1989 Parents: 13a88d2 Author: Davor Bonaci Authored: Sun Mar 19 20:41:27 2017 -0700 Committer: Davor Bonaci Committed: Sun Mar 19 20:41:27 2017 -0700 -- content/contribute/team/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/0765aa05/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index 4564b10..95f5af6 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -266,9 +266,9 @@ Pei He pei pei [at] apache [dot] org - Google + Alibaba committer - -8 + +8
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/144599fd Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/144599fd Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/144599fd Branch: refs/heads/asf-site Commit: 144599fd88ce871ead449cd13ab3418fec702c28 Parents: 41b8245 Author: Ismaël MejÃa Authored: Sat Mar 18 14:56:30 2017 +0100 Committer: Ismaël MejÃa Committed: Sat Mar 18 14:56:30 2017 +0100 -- content/contribute/team/index.html | 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/144599fd/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index 8d58a55..4564b10 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -379,6 +379,15 @@ -8 + + Aviem Zur + aviemzur + aviemzur [at] apache [dot] org + PayPal + committer + +2 + +
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/598706dc Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/598706dc Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/598706dc Branch: refs/heads/asf-site Commit: 598706dcada2f85dd8475320149df035245fc0d7 Parents: 1fef905 Author: Davor Bonaci Authored: Fri Mar 17 18:38:30 2017 -0700 Committer: Davor Bonaci Committed: Fri Mar 17 18:38:30 2017 -0700 -- content/contribute/team/index.html | 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/598706dc/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index f0ef73d..8d58a55 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -272,6 +272,15 @@ + Chamikara Jayalath + chamikara + chamikara [at] apache [dot] org + Google + committer + -8 + + + Eugene Kirpichov jkff jkff [at] apache [dot] org
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b6cc120b Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b6cc120b Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b6cc120b Branch: refs/heads/asf-site Commit: b6cc120bb0d58b0c5bc27b850c8963d163b567c2 Parents: 3095913 Author: Davor Bonaci Authored: Fri Mar 17 14:14:29 2017 -0700 Committer: Davor Bonaci Committed: Fri Mar 17 14:14:29 2017 -0700 -- content/contribute/team/index.html | 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/b6cc120b/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index 1356e7d..81fb4d7 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -307,6 +307,15 @@ + Ismaël MejÃa + iemejia + iemejia [at] apache [dot] org + Talend + committer + +1 + + + Maximilian Michels mxm mxm [at] apache [dot] org
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/64b743de Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/64b743de Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/64b743de Branch: refs/heads/asf-site Commit: 64b743dea1a51c96961447bc5202842f29a33bf2 Parents: 2f34f2d Author: Davor Bonaci Authored: Fri Mar 17 13:42:42 2017 -0700 Committer: Davor Bonaci Committed: Fri Mar 17 13:42:42 2017 -0700 -- content/contribute/team/index.html | 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/64b743de/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index ebb1034..1356e7d 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -271,6 +271,15 @@ + Eugene Kirpichov + jkff + jkff [at] apache [dot] org + Google + committer + -8 + + + Kenneth Knowles kenn kenn [at] apache [dot] org
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/4acb6411 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/4acb6411 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/4acb6411 Branch: refs/heads/asf-site Commit: 4acb6411a230a543930e2672f1181ea64ad49094 Parents: be9e207 Author: Davor Bonaci Authored: Thu Mar 16 16:21:09 2017 -0700 Committer: Davor Bonaci Committed: Thu Mar 16 16:21:09 2017 -0700 -- content/blog/2017/03/16/python-sdk-release.html | 255 +++ content/blog/index.html | 16 ++ content/feed.xml| 166 ++-- content/index.html | 4 +- 4 files changed, 347 insertions(+), 94 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/4acb6411/content/blog/2017/03/16/python-sdk-release.html -- diff --git a/content/blog/2017/03/16/python-sdk-release.html b/content/blog/2017/03/16/python-sdk-release.html new file mode 100644 index 000..cb1320c --- /dev/null +++ b/content/blog/2017/03/16/python-sdk-release.html @@ -0,0 +1,255 @@ + + + + + + + + + Python SDK released in Apache Beam 0.6.0 + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/blog/2017/03/16/python-sdk-release.html"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + + SDKs + Java SDK + Java SDK API Reference + +Python SDK +Python SDK API Reference + + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + + + +http://schema.org/BlogPosting";
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0dd4a1e8 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0dd4a1e8 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0dd4a1e8 Branch: refs/heads/asf-site Commit: 0dd4a1e8dc747150297d669ccbcdce8dfeb4e309 Parents: 5111853 Author: Davor Bonaci Authored: Thu Mar 16 12:34:38 2017 -0700 Committer: Davor Bonaci Committed: Thu Mar 16 12:34:38 2017 -0700 -- content/contribute/testing/index.html| 5 ++--- content/documentation/index.html | 2 +- content/get-started/beam-overview/index.html | 2 +- content/get-started/downloads/index.html | 12 ++-- content/get-started/quickstart-py/index.html | 20 ++-- 5 files changed, 16 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/0dd4a1e8/content/contribute/testing/index.html -- diff --git a/content/contribute/testing/index.html b/content/contribute/testing/index.html index 6fb76df..fb004c9 100644 --- a/content/contribute/testing/index.html +++ b/content/contribute/testing/index.html @@ -413,9 +413,8 @@ details on those testing types. Python SDK -The Python SDK is currently under development on a feature branch. We have initial -postcommit tests by a Jenkins build; precommit testing and a full testing -matrix will be coming soon. +The Python SDK has postcommit tests by a Jenkins build; precommit testing and a +full testing matrix will be coming soon. Testing Scenarios http://git-wip-us.apache.org/repos/asf/beam-site/blob/0dd4a1e8/content/documentation/index.html -- diff --git a/content/documentation/index.html b/content/documentation/index.html index e998965..af5afb0 100644 --- a/content/documentation/index.html +++ b/content/documentation/index.html @@ -177,7 +177,7 @@ Java SDK - [Under Development] Python SDK + Python SDK Runners http://git-wip-us.apache.org/repos/asf/beam-site/blob/0dd4a1e8/content/get-started/beam-overview/index.html -- diff --git a/content/get-started/beam-overview/index.html b/content/get-started/beam-overview/index.html index f45903b..7a2b1f1 100644 --- a/content/get-started/beam-overview/index.html +++ b/content/get-started/beam-overview/index.html @@ -173,7 +173,7 @@ Python - Coming Soon + Active Development Other http://git-wip-us.apache.org/repos/asf/beam-site/blob/0dd4a1e8/content/get-started/downloads/index.html -- diff --git a/content/get-started/downloads/index.html b/content/get-started/downloads/index.html index 4053000..3438ee7 100644 --- a/content/get-started/downloads/index.html +++ b/content/get-started/downloads/index.html @@ -152,8 +152,9 @@ Apache Beam Downloads -The easiest way to use Apache Beam is via one of the released versions in the -https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22";>Maven Central Repository. +The easiest way to use Apache Beam is via one of the released versions in a central repository. +Java SDK is available on https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22";>Maven Central Repository, +and Python SDK is available on https://pypi.python.org/pypi/apache-beam";>PyPI. For example, if you are developing using Maven and want to use the SDK for Java with the DirectRunner, add the following dependencies to your @@ -173,6 +174,13 @@ Java with the DirectRunner, add the follo +Similarly in Python, if you are using PyPI and want to use the SDK for Python with +DirectRunner, add the following requirement to your setup.py file: + +apache-beam==0.6.0 + + + Additionally, you may want to depend on additional SDK modules, such as IO connectors or other extensions, and additional runners to execute your pipeline at scale. http://git-wip-us.apache.org/repos/asf/beam-site/blob/0dd4a1e8/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index b143d54..153eac6 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -220,24 +220,8 @@ environmentâs directories. Download and install - - -Clone the Apache Beam repo from GitHub: - git clone https://github.com/apache/beam.git - - -Navigate to the python directory: - cd beam/sdks/python/ - - -Create the Apache Beam Python SDK installation packa
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e1eb3fa8 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e1eb3fa8 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e1eb3fa8 Branch: refs/heads/asf-site Commit: e1eb3fa8cae561d8c1416e540bb7d82c9f478270 Parents: dd81235 Author: Davor Bonaci Authored: Tue Mar 14 16:43:42 2017 -0700 Committer: Davor Bonaci Committed: Tue Mar 14 16:43:42 2017 -0700 -- content/.htaccess | 15 +++ 1 file changed, 15 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/e1eb3fa8/content/.htaccess -- diff --git a/content/.htaccess b/content/.htaccess new file mode 100644 index 000..06fc74b --- /dev/null +++ b/content/.htaccess @@ -0,0 +1,15 @@ +RewriteEngine On + +# This is a 301 (permanent) redirect from HTTP to HTTPS. + +# The next rule applies conditionally: +# * the host is "beam.apache.org", +# * the host comparison is case insensitive (NC), +# * HTTPS is not used. +RewriteCond %{HTTP_HOST} ^beam\.apache\.org [NC] +RewriteCond %{HTTPS} !on + +# Rewrite the URL as follows: +# * Redirect (R) permanently (301) to https://beam.apache.org/, +# * Stop processing more rules (L). +RewriteRule ^(.*)$ https://beam.apache.org/$1 [L,R=301]
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b0ba36e5 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b0ba36e5 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b0ba36e5 Branch: refs/heads/asf-site Commit: b0ba36e56d0c82567929550abab046386c87d7f1 Parents: b91ee7d Author: Davor Bonaci Authored: Tue Mar 14 13:59:25 2017 -0700 Committer: Davor Bonaci Committed: Tue Mar 14 13:59:25 2017 -0700 -- .../contribute/contribution-guide/index.html| 61 +++- 1 file changed, 58 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/b0ba36e5/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index 92e009a..9663ea4 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -170,9 +170,15 @@ IntelliJ Enable Annotation Processing Checkstyle + Code Style + + + Eclipse + Initial setup + Checkstyle + Code Style - Eclipse @@ -368,11 +374,29 @@ clicking the âCheck Moduleâ button. The scan should report no errors. Note: Selecting âCheck Projectâ may report some errors from the archetype modules as they are not configured for Checkstyle validation. +Code Style +IntelliJ supports code styles within the IDE. Use one of the following to ensure your code style +matches the projectâs checkstyle enforcements. + + + (Option 1) Configure IntelliJ to use âbeam-codestyle.xmlâ. + + Go to Settings -> Code Style -> Java. + Click the cogwheel icon next to âSchemeâ and select Import Scheme -> Eclipse XML Profile. + Select âsdks/java/build-tools/src/main/resources/beam/beam-codestyle.xmlâ. + Click âOKâ. + Click âApplyâ and âOKâ. + + + (Option 2) Install https://plugins.jetbrains.com/plugin/8527-google-java-format";>Google Java Format plugin. + + Eclipse -Use a recent eclipse version that includes m2e. Currently we recommend Eclipse Neon. -Start eclipse with a fresh workspace in a separate directory from your checkout. +Use a recent Eclipse version that includes m2e. Currently we recommend Eclipse Neon. +Start Eclipse with a fresh workspace in a separate directory from your checkout. +Initial setup Install m2e-apt: Beam uses apt annotation processing to provide auto generated code. One example is the usage of https://github.com/google/auto/tree/master/value";>Google AutoValue. By default m2e does not support this and you will see compile errors. @@ -406,6 +430,37 @@ Start eclipse with a fresh workspace in a separate directory from your checkout. You now should have all the beam projects imported into eclipse and should see no compile errors. +Checkstyle +Eclipse supports checkstyle within the IDE using the Checkstyle plugin. + + + Install the https://marketplace.eclipse.org/content/checkstyle-plug";>Checkstyle plugin. + Configure Checkstyle plugin by going to Preferences - Checkstyle. + + Click âNewâ¦â. + Select âExternal Configuration Fileâ for type. + Click âBrowseâ¦â and select âsdks/java/build-tools/src/main/resources/beam/checkstyle.xmlâ. + Enter âBeam Checksâ under âName:â. + Click âOKâ, then âOKâ. + + + + +Code Style +Eclipse supports code styles within the IDE. Use one of the following to ensure your code style +matches the projectâs checkstyle enforcements. + + + (Option 1) Configure Eclipse to use âbeam-codestyle.xmlâ. + + Go to Preferences -> Java -> Code Style -> Formatter. + Click âImportâ¦â and select âsdks/java/build-tools/src/main/resources/beam/beam-codestyle.xmlâ. + Click âApplyâ and âOKâ. + + + (Option 2) Install https://github.com/google/google-java-format#eclipse";>Google Java Format plugin. + + Create a branch in your fork Youâll work on your contribution in a branch in your own (forked) repository. Create a local branch, initialized with the state of the branch you expect your changes to be merged into. Keep in mind that we use several branches, including master, feature-specific, and release-specific branches. If you are unsure, initialize with the state of the master branch.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/07748b57 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/07748b57 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/07748b57 Branch: refs/heads/asf-site Commit: 07748b578f352ed0a84f244b8f37f7928df27d1f Parents: 9cdf959 Author: Davor Bonaci Authored: Thu Mar 9 16:29:14 2017 -0800 Committer: Davor Bonaci Committed: Thu Mar 9 16:29:14 2017 -0800 -- content/contribute/ptransform-style-guide/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/07748b57/content/contribute/ptransform-style-guide/index.html -- diff --git a/content/contribute/ptransform-style-guide/index.html b/content/contribute/ptransform-style-guide/index.html index 3006e9e..9e61954 100644 --- a/content/contribute/ptransform-style-guide/index.html +++ b/content/contribute/ptransform-style-guide/index.html @@ -240,7 +240,7 @@ As a rule of thumb: expose these if you anticipate that the full packaged Do: - Respect language-specific naming conventions, e.g. name classes in CamelCase in Java and Python, functions in snakeCase in Java but with_underscores in Python, etc. + Respect language-specific naming conventions, e.g. name classes in PascalCase in Java and Python, functions in camelCase in Java but snake_case in Python, etc. Name factory functions so that either the function name is a verb, or referring to the transform reads like a verb: e.g. MongoDbIO.read(), Flatten.iterables(). In typed languages, name PTransform classes also like verbs (e.g.: MongoDbIO.Read, Flatten.Iterables). Name families of transforms for interacting with a storage system using the word âIOâ: MongoDbIO, JdbcIO.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/51425e85 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/51425e85 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/51425e85 Branch: refs/heads/asf-site Commit: 51425e85ee9644a1547557364f0258b78669886f Parents: 9e1f59f Author: Ahmet Altay Authored: Thu Mar 9 09:54:43 2017 -0800 Committer: Ahmet Altay Committed: Thu Mar 9 09:54:43 2017 -0800 -- content/documentation/programming-guide/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/51425e85/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 0752642..50bf398 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1390,7 +1390,7 @@ tree, [2] p.begin() .apply(TextIO.Read.named("ReadNumbers") .from("gs://my_bucket/path/to/numbers-*.txt") -.withCoder(TextualIntegerCoder.of()));``` +.withCoder(TextualIntegerCoder.of()));
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c0f78d7a Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c0f78d7a Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c0f78d7a Branch: refs/heads/asf-site Commit: c0f78d7a3b45047671ef85697cfadbf97113660e Parents: f2a4d29 Author: Davor Bonaci Authored: Tue Mar 7 09:06:03 2017 -0800 Committer: Davor Bonaci Committed: Tue Mar 7 09:06:03 2017 -0800 -- content/documentation/programming-guide/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/c0f78d7a/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index a6403e0..0752642 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -464,7 +464,7 @@ In such roles, ParDo is a common intermediate step in a pipeline. You might use it to extract certain fields from a set of raw input records, or convert raw input into a different format; you might also use ParDo to convert processed data into a format suitable for output, like database table rows or printable strings. -When you apply a ParDo transform, youâll need to provide user code in the form of a DoFn object. DoFn is a Beam SDK class that defines a distribured processing function. +When you apply a ParDo transform, youâll need to provide user code in the form of a DoFn object. DoFn is a Beam SDK class that defines a distributed processing function. When you create a subclass of DoFn, note that your subclass should adhere to the General Requirements for Writing User Code for Beam Transforms.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8b84da8a Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8b84da8a Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8b84da8a Branch: refs/heads/asf-site Commit: 8b84da8ab6f210e90fcf53c125af4ce16cb787d7 Parents: c71edc8 Author: Davor Bonaci Authored: Mon Mar 6 15:21:07 2017 -0800 Committer: Davor Bonaci Committed: Mon Mar 6 15:21:07 2017 -0800 -- content/contribute/contribution-guide/index.html | 6 ++ .../documentation/pipelines/create-your-pipeline/index.html| 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/8b84da8a/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index 5f8ec98..92e009a 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -156,6 +156,7 @@ Engage Mailing list(s) JIRA issue tracker + Online discussions Design @@ -254,6 +255,11 @@ For moderate or large contributions, you should not start coding or writing a design document unless there is a corresponding JIRA issue assigned to you for that work. Simple changes, like fixing typos, do not require an associated issue. +Online discussions +We donât have an official IRC channel. Most of the online discussions happen in the https://apachebeam.slack.com/";>Apache Beam Slack channel. If you want access, you need to send an email to the user mailing list to mailto:u...@beam.apache.org?subject=Regarding Beam Slack Channel&body=Hello%0D%0A%0ACan someone please add me to the Beam slack channel?%0D%0A%0AThanks.">request access. + +Chat rooms are great for quick questions or discussions on specialized topics. Remember that we strongly encourage communication via the mailing lists, and we prefer to discuss more complex subjects by email. Developers should be careful to move or duplicate all the official or useful discussions to the issue tracking system and/or the dev mailing list. + Design To avoid potential frustration during the code review cycle, we encourage you to clearly scope and design non-trivial contributions with the Beam community before you start coding. http://git-wip-us.apache.org/repos/asf/beam-site/blob/8b84da8a/content/documentation/pipelines/create-your-pipeline/index.html -- diff --git a/content/documentation/pipelines/create-your-pipeline/index.html b/content/documentation/pipelines/create-your-pipeline/index.html index c7cf3fb..e360299 100644 --- a/content/documentation/pipelines/create-your-pipeline/index.html +++ b/content/documentation/pipelines/create-your-pipeline/index.html @@ -272,7 +272,7 @@ The following example code shows how to apply a TextIO.Read root transform to read data from a text file. The transform is applied to a Pipeline object p, and returns a pipeline data set in the form of a PCollection: PCollection lines = p.apply( - apply("ReadLines", TextIO.Read.from("gs://some/inputData.txt")); + "ReadLines", TextIO.Read.from("gs://some/inputData.txt"));
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/1b1757bb Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/1b1757bb Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/1b1757bb Branch: refs/heads/asf-site Commit: 1b1757bb9157e733d3aea7c7367f21a8eca10755 Parents: f1f7063 Author: Davor Bonaci Authored: Mon Feb 27 18:10:21 2017 -0800 Committer: Davor Bonaci Committed: Mon Feb 27 18:10:21 2017 -0800 -- content/blog/2017/02/01/graduation-media-recap.html | 2 +- content/feed.xml| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/1b1757bb/content/blog/2017/02/01/graduation-media-recap.html -- diff --git a/content/blog/2017/02/01/graduation-media-recap.html b/content/blog/2017/02/01/graduation-media-recap.html index 70e317f..81409a7 100644 --- a/content/blog/2017/02/01/graduation-media-recap.html +++ b/content/blog/2017/02/01/graduation-media-recap.html @@ -188,7 +188,7 @@ include: Datanami: âhttps://www.datanami.com/2017/01/10/google-lauds-outside-influence-apache-beam/";>Google Lauds Outside Influence on Apache Beamâ by Alex Woodie. InfoWorld / JavaWorld: âhttp://www.infoworld.com/article/3156598/big-data/apache-beam-unifies-batch-and-streaming-for-big-data.html";>Apache Beam unifies batch and streaming for big dataâ by Serdar Yegulalp, and republished in http://www.javaworld.com/article/3156598/big-data/apache-beam-unifies-batch-and-streaming-for-big-data.html";>JavaWorld. JAXenter: âhttps://jaxenter.com/apache-beam-interview-131314.html";>In a way, Apache Beam is the glue that connects many big data systems togetherâ by Kypriani Sinaris. - OStatic: âhttp://ostatic.com/blog/apache-beam-unifies-batch-and-streaming-data-processing";>Apache Beam Unifies Batch and Streaming Data Processingâ by Sam Dean. + OStatic: âApache Beam Unifies Batch and Streaming Data Processingâ by Sam Dean. Enterprise Apps Today: âhttp://www.enterpriseappstoday.com/business-intelligence/data-analytics/apache-beam-graduates-to-help-define-streaming-data-processing.html";>Apache Beam Graduates to Help Define Streaming Data Processingâ by Sean Michael Kerner. The Register: âhttp://www.theregister.co.uk/2017/01/10/google_must_be_ibeamiing_as_apache_announces_its_new_top_level_projects/";>Google must be Beaming as Apache announces its new top-level projectsâ by Alexander J. Martin. SiliconANGLE: âhttp://siliconangle.com/blog/2017/01/11/apache-software-foundation-announces-2-top-level-projects/";>Apache Software Foundation announces two more top-level open source projectsâ by Mike Wheatley. http://git-wip-us.apache.org/repos/asf/beam-site/blob/1b1757bb/content/feed.xml -- diff --git a/content/feed.xml b/content/feed.xml index 5c0cd90..63a513b 100644 --- a/content/feed.xml +++ b/content/feed.xml @@ -619,7 +619,7 @@ include:Datanami: âGoogle Lauds Outside Influence on Apache Beamâ by Alex Woodie. InfoWorld / JavaWorld: âApache Beam unifies batch and streaming for big dataâ by Serdar Yegulalp, and republished in JavaWorld. ;JAXenter: âIn a way, Apache Beam is the glue that connects many big data systems togetherâ by Kypriani Sinaris. -OStatic: âApache Beam Unifies Batch and Streaming Data Processingâ by Sam Dean. +OStatic: âApache Beam Unifies Batch and Streaming Data Processingâ by Sam Dean. Enterprise Apps Today: âApache Beam Graduates to Help Define Streaming Data Processingâ by Sean Michael Kerner. The Register: âGoogle must be Beaming as Apache announces its new top-level projectsâ by Alexander J. Martin.
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/582a7f45 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/582a7f45 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/582a7f45 Branch: refs/heads/asf-site Commit: 582a7f452eed30d815cfddc6af331812e63ccf61 Parents: 2f35897 Author: Davor Bonaci Authored: Mon Feb 27 16:18:18 2017 -0800 Committer: Davor Bonaci Committed: Mon Feb 27 16:18:18 2017 -0800 -- content/contribute/contribution-guide/index.html | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/582a7f45/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index 580b7e6..5f8ec98 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -419,9 +419,9 @@ $ git checkout -borigin/master Remember to always use --rebase parameter to avoid extraneous merge commits. -To push your local, committed changes to your (forked) repository on GitHub, run: +Then you can push your local, committed changes to your (forked) repository on GitHub. Since rebase may change that branchâs history, you may need to force push. Youâll run: -$ git push +$ git push --force @@ -438,12 +438,14 @@ $ git checkout -b origin/master Once the initial code is complete and the tests pass, itâs time to start the code review process. We review and discuss all code, no matter who authors it. Itâs a great way to build community, since you can learn from other developers, and they become familiar with your contribution. It also builds a strong project by encouraging a high quality bar and keeping code consistent throughout the project. Create a pull request -Organize your commits to make your reviewerâs job easier. Use the following command to re-order, squash, edit, or change description of individual commits. +Organize your commits to make your reviewerâs job easier. Reviewers normally prefer multiple small pull requests, instead of a single large pull request. Within a pull request, a relatively small number of commits that break the problem into logical steps is preferred. For most pull requests, youâll squash your changes down to 1 commit. You can use the following command to re-order, squash, edit, or change description of individual commits. $ git rebase -i origin/master +Youâll then push to your branch on GitHub. Note: when updating your commit after pull request feedback and use squash to get back to one commit, you will need to do a force submit to the branch on your repo. + Navigate to the https://github.com/apache/beam";>Beam GitHub mirror to create a pull request. The title of the pull request should be strictly in the following format: [BEAM- ]
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f0af7937 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f0af7937 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f0af7937 Branch: refs/heads/asf-site Commit: f0af7937291e92e703af6b52fff10c257cbddf01 Parents: 1f44221 Author: Davor Bonaci Authored: Mon Feb 27 16:12:13 2017 -0800 Committer: Davor Bonaci Committed: Mon Feb 27 16:12:13 2017 -0800 -- content/documentation/programming-guide/index.html | 1 + content/documentation/sdks/java/index.html | 1 + 2 files changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f0af7937/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 75a04e5..a6403e0 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1316,6 +1316,7 @@ tree, [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io";>Google Cloud PubSub +https://github.com/apache/beam/tree/master/sdks/java/io/hbase";>Apache HBase https://github.com/apache/beam/tree/master/sdks/java/io/mongodb";>MongoDB https://github.com/apache/beam/tree/master/sdks/java/io/jdbc";>JDBC https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery";>Google BigQuery http://git-wip-us.apache.org/repos/asf/beam-site/blob/f0af7937/content/documentation/sdks/java/index.html -- diff --git a/content/documentation/sdks/java/index.html b/content/documentation/sdks/java/index.html index ce0a6be..fc3123e 100644 --- a/content/documentation/sdks/java/index.html +++ b/content/documentation/sdks/java/index.html @@ -169,6 +169,7 @@ Amazon Kinesis Apache Hadoopâs FileInputFormat in Hadoop Distributed File System (HDFS) + Apache HBase Apache Kafka Avro Files Google BigQuery
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5a77ddc4 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5a77ddc4 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5a77ddc4 Branch: refs/heads/asf-site Commit: 5a77ddc4dfff92c2cc4b4d99d6a30adc8b2f4388 Parents: e7721f1 Author: Ahmet Altay Authored: Fri Feb 24 15:27:01 2017 -0800 Committer: Ahmet Altay Committed: Fri Feb 24 15:27:01 2017 -0800 -- content/get-started/quickstart-py/index.html | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/5a77ddc4/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index 8301f3b..f73be31 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -235,7 +235,7 @@ environmentâs directories. Install the Apache Beam SDK - pip install dist/apache-beam-sdk-*.tar.gz .[gcp] + pip install dist/apache-beam-*.tar.gz @@ -245,7 +245,18 @@ environmentâs directories. For example, to run wordcount.py, run: -python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt --output output.txt +python -m apache_beam.examples.wordcount --input README.md --output counts + + + +# As part of the initial setup, install gcp specific extra components. +pip install dist/apache-beam-*.tar.gz .[gcp] +python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \ + --output gs:///counts \ + --runner DataflowRunner \ + --project your-gcp-project \ + --temp_location gs:// /tmp/ \ + --sdk_location dist/apache-beam-*.tar.gz
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/69f32664 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/69f32664 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/69f32664 Branch: refs/heads/asf-site Commit: 69f32664158293f6cc087cfe7d43edb462671e46 Parents: f3494f0 Author: Ahmet Altay Authored: Fri Feb 24 12:48:56 2017 -0800 Committer: Ahmet Altay Committed: Fri Feb 24 12:48:56 2017 -0800 -- content/get-started/quickstart-py/index.html | 12 1 file changed, 4 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/69f32664/content/get-started/quickstart-py/index.html -- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index 28a5a77..8301f3b 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -222,24 +222,20 @@ environmentâs directories. -Clone the Apache Beam repo from GitHub: +Clone the Apache Beam repo from GitHub: git clone https://github.com/apache/beam.git -Navigate to the python directory: +Navigate to the python directory: cd beam/sdks/python/ -Create the Apache Beam Python SDK installation package: +Create the Apache Beam Python SDK installation package: python setup.py sdist -Navigate to the dist directory: - cd dist/ - - Install the Apache Beam SDK - pip install apache-beam-sdk-*.tar.gz + pip install dist/apache-beam-sdk-*.tar.gz .[gcp]
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/1a607ad8 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/1a607ad8 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/1a607ad8 Branch: refs/heads/asf-site Commit: 1a607ad8ea7667addbfe27e097a1d4ca912bf15a Parents: 3c0c532 Author: Stas Levin Authored: Fri Feb 24 17:36:38 2017 +0200 Committer: Stas Levin Committed: Fri Feb 24 17:36:38 2017 +0200 -- content/contribute/testing/index.html | 157 + 1 file changed, 157 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/1a607ad8/content/contribute/testing/index.html -- diff --git a/content/contribute/testing/index.html b/content/contribute/testing/index.html index 92a0e1f..9b41686 100644 --- a/content/contribute/testing/index.html +++ b/content/contribute/testing/index.html @@ -173,6 +173,8 @@ Testing Systems E2E Testing Framework RunnableOnService Tests + Effective use of the TestPipeline JUnit rule + API Surface testing @@ -542,6 +544,161 @@ which enables test authors to write simple functionality verification. They are meant to use some of the built-in utilities of the SDK, namely PAssert, to verify that the simple pipelines they run end in the correct state. +Effective use of the TestPipeline JUnit rule + +TestPipeline is JUnit rule designed to facilitate testing pipelines. +In combination with PAssert, the two can be used for testing and +writing assertions over pipelines. However, in order for these assertions +to be effective, the constructed pipeline must be run by a pipeline +runner. If the pipeline is not run (i.e., executed) then the +constructed PAssert statements will not be triggered, and will thus +be ineffective. + +To prevent such cases, TestPipeline has some protection mechanisms in place. + +Abandoned node detection (performed automatically) + +Abandoned nodes are PTransforms, PAsserts included, that were not +executed by the pipeline runner. Abandoned nodes are most likely to occur +due to the one of the following scenarios: + + Lack of a pipeline.run() statement at the end of a test. + Addition of PTransforms after the pipeline has already run. + + +Abandoned node detection is automatically enabled when a real pipeline +runner (i.e. not a CrashingRunner) and/or a +@NeedsRunner / @RunnableOnService annotation are detected. + +Consider the following test: + +// Note the @Rule annotation here +@Rule +public final transient TestPipeline pipeline = TestPipeline.create(); + +@Test +@Category(NeedsRunner.class) +public void myPipelineTest() throws Exception { + +final PCollectionpCollection = + pipeline +.apply("Create", Create.of(WORDS).withCoder(StringUtf8Coder.of())) +.apply( +"Map1", +MapElements.via( +new SimpleFunction () { + + @Override + public String apply(final String input) { +return WHATEVER; + } +})); + +PAssert.that(pCollection).containsInAnyOrder(WHATEVER); + +/* ERROR: pipeline.run() is missing, PAsserts are ineffective */ +} + + + +# Unsupported in Beam's Python SDK. + + + +The PAssert at the end of this test method will not be executed, since +pipeline is never run, making this test ineffective. If this test method +is run using an actual pipeline runner, an exception will be thrown +indicating that there was no run() invocation in the test. + +Exceptions that are thrown prior to executing a pipeline, will fail +the test unless handled by an ExpectedException rule. + +Consider the following test: + +// Note the @Rule annotation here +@Rule +public final transient TestPipeline pipeline = TestPipeline.create(); + +@Test +public void testReadingFailsTableDoesNotExist() throws Exception { + final String table = "TEST-TABLE"; + + BigtableIO.Read read = + BigtableIO.read() + .withBigtableOptions(BIGTABLE_OPTIONS) + .withTableId(table) + .withBigtableService(service); + + // Exception will be thrown by read.validate() when read is applied. + thrown.expect(IllegalArgumentException.class); + thrown.expectMessage(String.format("Table %s does not exist", table)); + + p.apply(read); +} + + + +# Unsupported in Beam's Python SDK. + + + +The application of the read transform throws an exception, which is then +handled by the thrown ExpectedException rule. +In light of this exception, the fact this test has abandoned nodes +(the read transform) does not play a role since the test fails before +the pipeline would have been executed (had there been a run()
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7612eb22 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7612eb22 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7612eb22 Branch: refs/heads/asf-site Commit: 7612eb22994999ac7367d530af313b7a92e63d83 Parents: f5d8735 Author: Ahmet Altay Authored: Wed Feb 22 19:37:58 2017 -0800 Committer: Ahmet Altay Committed: Wed Feb 22 19:37:58 2017 -0800 -- content/documentation/programming-guide/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/7612eb22/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 89f18e5..75a04e5 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1332,8 +1332,8 @@ tree, [2] -https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py";>Google BigQuery -https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/google_cloud_platform/datastore";>Google Cloud Datastore +https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py";>Google BigQuery +https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore";>Google Cloud Datastore
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/77d285ff Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/77d285ff Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/77d285ff Branch: refs/heads/asf-site Commit: 77d285ff8445e6a9902741f21c7a63bcd07ff47e Parents: dca566f Author: Davor Bonaci Authored: Wed Feb 22 13:33:58 2017 -0800 Committer: Davor Bonaci Committed: Wed Feb 22 13:33:58 2017 -0800 -- .../sdks/python-custom-io/index.html| 48 ++-- content/documentation/sdks/python/index.html| 4 +- 2 files changed, 26 insertions(+), 26 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/77d285ff/content/documentation/sdks/python-custom-io/index.html -- diff --git a/content/documentation/sdks/python-custom-io/index.html b/content/documentation/sdks/python-custom-io/index.html index c43f606..7592a4a 100644 --- a/content/documentation/sdks/python-custom-io/index.html +++ b/content/documentation/sdks/python-custom-io/index.html @@ -6,7 +6,7 @@ - Beam Custom Sources and Sinks for Python + Apache Beam: Creating New Sources and Sinks with the Python SDK @@ -146,24 +146,24 @@ -Beam Custom Sources and Sinks for Python +Creating New Sources and Sinks with the Python SDK -The Beam SDK for Python provides an extensible API that you can use to create custom data sources and sinks. This tutorial shows how to create custom sources and sinks using https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py";>Beamâs Source and Sink API. +The Apache Beam SDK for Python provides an extensible API that you can use to create new data sources and sinks. This tutorial shows how to create new sources and sinks using https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py";>Beamâs Source and Sink API. - Create a custom source by extending the BoundedSource and RangeTracker interfaces. - Create a custom sink by implementing the Sink and Writer classes. + Create a new source by extending the BoundedSource and RangeTracker interfaces. + Create a new sink by implementing the Sink and Writer classes. -Why Create a Custom Source or Sink +Why Create a New Source or Sink -Youâll need to create a custom source or sink if you want your pipeline to read data from (or write data to) a storage system for which the Beam SDK for Python does not provide native support. +Youâll need to create a new source or sink if you want your pipeline to read data from (or write data to) a storage system for which the Beam SDK for Python does not provide native support. -In simple cases, you may not need to create a custom source or sink. For example, if you need to read data from an SQL database using an arbitrary query, none of the advanced Source API features would benefit you. Likewise, if youâd like to write data to a third-party API via a protocol that lacks deduplication support, the Sink API wouldnât benefit you. In such cases it makes more sense to use a ParDo. +In simple cases, you may not need to create a new source or sink. For example, if you need to read data from an SQL database using an arbitrary query, none of the advanced Source API features would benefit you. Likewise, if youâd like to write data to a third-party API via a protocol that lacks deduplication support, the Sink API wouldnât benefit you. In such cases it makes more sense to use a ParDo. -However, if youâd like to use advanced features such as dynamic splitting and size estimation, you should use Beamâs APIs and create a custom source or sink. +However, if youâd like to use advanced features such as dynamic splitting and size estimation, you should use Beamâs APIs and create a new source or sink. -Basic Code Requirements for Custom Sources and Sinks +Basic Code Requirements for New Sources and Sinks Services use the classes you provide to read and/or write data using multiple worker instances in parallel. As such, the code you provide for Source and Sink subclasses must meet some basic requirements: @@ -185,9 +185,9 @@ You can use test harnesses and utility methods available in the https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/source_test_utils.py";>source_test_utils module to develop tests for your source. -Creating a Custom Source +Creating a New Source -You should create a custom source if youâd like to use the advanced features that the Source API provides: +You should create a new source if youâd like to use the advanced features that the Source API provides: Dynamic splitting @@ -198,9
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f0a4fde4 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f0a4fde4 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f0a4fde4 Branch: refs/heads/asf-site Commit: f0a4fde4d376b18ca138767b6e67af149c8715d2 Parents: 46ca242 Author: Davor Bonaci Authored: Wed Feb 22 13:28:49 2017 -0800 Committer: Davor Bonaci Committed: Wed Feb 22 13:28:49 2017 -0800 -- content/documentation/programming-guide/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f0a4fde4/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 0aa0575..20cf8c5 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1328,8 +1328,8 @@ tree, [2] -https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py";>Google BigQuery -https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/datastore";>Google Cloud Datastore +https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py";>Google BigQuery +https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/google_cloud_platform/datastore";>Google Cloud Datastore
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/83a6e401 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/83a6e401 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/83a6e401 Branch: refs/heads/asf-site Commit: 83a6e4011f7eae8003fe85a7c51cb824440cffa4 Parents: a7e8b60 Author: Ahmet Altay Authored: Fri Feb 17 13:17:53 2017 -0800 Committer: Ahmet Altay Committed: Fri Feb 17 13:17:53 2017 -0800 -- .../sdks/python-custom-io/index.html| 613 +++ content/documentation/sdks/python/index.html| 5 + 2 files changed, 618 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/83a6e401/content/documentation/sdks/python-custom-io/index.html -- diff --git a/content/documentation/sdks/python-custom-io/index.html b/content/documentation/sdks/python-custom-io/index.html new file mode 100644 index 000..c43f606 --- /dev/null +++ b/content/documentation/sdks/python-custom-io/index.html @@ -0,0 +1,613 @@ + + + + + + + + + Beam Custom Sources and Sinks for Python + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/documentation/sdks/python-custom-io/"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + + SDKs + Java SDK + Java SDK API Reference + +Python SDK + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + +Beam Custom Sources and Sinks for Python + +The Beam SDK for Python provides an extensible API that you can use to create custom data sources and sinks. This tutorial shows how to create custom
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/575e4598 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/575e4598 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/575e4598 Branch: refs/heads/asf-site Commit: 575e45987aa00dca16ee562441dd071362c873b1 Parents: 466edb3 Author: Davor Bonaci Authored: Wed Feb 15 14:54:18 2017 -0800 Committer: Davor Bonaci Committed: Wed Feb 15 14:54:18 2017 -0800 -- content/blog/2017/02/13/stateful-processing.html | 19 +-- .../runners/capability-matrix/index.html | 12 ++-- content/feed.xml | 19 +-- 3 files changed, 32 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/blog/2017/02/13/stateful-processing.html -- diff --git a/content/blog/2017/02/13/stateful-processing.html b/content/blog/2017/02/13/stateful-processing.html index 833b2fd..a936f4b 100644 --- a/content/blog/2017/02/13/stateful-processing.html +++ b/content/blog/2017/02/13/stateful-processing.html @@ -328,7 +328,7 @@ unique and consistent. Before diving into the code for how to do this in a Beam SDK, Iâll go over this example from the level of the model. In pictures, you want to write a transform that maps input to output like this: - + The order of the elements A, B, C, D, E is arbitrary, hence their assigned indices are arbitrary, but downstream transforms just need to be OK with this. @@ -400,10 +400,17 @@ key+window pairs, like this: keys and windows are independent dimensions) You can provide the opportunity for parallelism by making sure that table has -enough columns, either via many keys in few windows - for example, a globally -windowed stateful computation keyed by user ID - or via many windows over few -keys - for example, a fixed windowed stateful computation over a global key. -Caveat: all Beam runners today parallelize only over the key. +enough columns. You might have many keys and many windows, or you might have +many of just one or the other: + + + Many keys in few windows, for example a globally windowed stateful computation +keyed by user ID. + Many windows over few keys, for example a fixed windowed stateful computation +over a global key. + + +Caveat: all Beam runners today parallelize only over the key. Most often your mental model of state can be focused on only a single column of the table, a single key+window pair. Cross-column interactions do not occur @@ -610,7 +617,7 @@ outputs from the ParDo that will be proce output, then you cannot use a Filter transform to reduce data volume downstream. Stateful processing lets you address both the latency problem of side inputs -and the cost problem of excessive uninterseting output. Here is the code, using +and the cost problem of excessive uninteresting output. Here is the code, using only features I have already introduced: new DoFn, KV >() { http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/documentation/runners/capability-matrix/index.html -- diff --git a/content/documentation/runners/capability-matrix/index.html b/content/documentation/runners/capability-matrix/index.html index 60f62b2..88da8eb 100644 --- a/content/documentation/runners/capability-matrix/index.html +++ b/content/documentation/runners/capability-matrix/index.html @@ -441,7 +441,7 @@ -Keyed State +Stateful Processing @@ -1353,7 +1353,7 @@ -Keyed State +Stateful Processing @@ -1362,22 +1362,22 @@ -Partially: non-merging windowsKeyed state is fully supported for non-merging windows. +Partially: non-merging windowsState is supported for non-merging windows. SetState and MapState are not yet supported. -Partially: streaming, non-merging windowsKeyed state is supported in streaming mode for non-merging windows. +Partially: streaming, non-merging windowsState is supported in streaming mode for non-merging windows. SetState and MapState are not yet supported. -No: not implementedSpark supports keyed state with mapWithState() so support shuold be straight forward. +No: not implementedSpark supports per-key state with mapWithState() so support should be straightforward. -No: not implementedApex supports keyed state, so adding support for this should be easy. +No: not implementedApex supports per-key state, so adding support for this should be easy. http://git-wip-us.apache.org/r
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/aa9b7fea Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/aa9b7fea Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/aa9b7fea Branch: refs/heads/asf-site Commit: aa9b7fea05f5e1177ff40f7b3977cdfb9ec0dd19 Parents: 8bc6392 Author: Ahmet Altay Authored: Mon Feb 13 12:11:35 2017 -0800 Committer: Ahmet Altay Committed: Mon Feb 13 12:11:35 2017 -0800 -- content/documentation/programming-guide/index.html | 9 - content/get-started/wordcount-example/index.html | 4 ++-- 2 files changed, 6 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/aa9b7fea/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index f02fd40..0aa0575 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -515,7 +515,7 @@ Inside your DoFn subclass, youâll write a method annotated with @ProcessElement where you provide the actual processing logic. You donât need to manually extract the elements from the input collection; the Beam SDKs handle that for you. Your @ProcessElement method should accept an object of type ProcessContext. The ProcessContext object gives you access to an input element and a method for emitting an output element: -Inside your DoFn subclass, youâll write a method process where you provide the actual processing logic. You donât need to manually extract the elements from the input collection; the Beam SDKs handle that for you. Your process method should accept an object of type context. The context object gives you access to an input element and output is emitted by using yield or return statement inside process method. +Inside your DoFn subclass, youâll write a method process where you provide the actual processing logic. You donât need to manually extract the elements from the input collection; the Beam SDKs handle that for you. Your process method should accept an object of type element. This is the input element and output is emitted by using yield or return statement inside process method. static class ComputeWordLengthFn extends DoFn{ @ProcessElement @@ -610,11 +610,11 @@ Using GroupByKey -GroupByKey is a Beam transform for processing collections of key/value pairs. Itâs a parallel reduction operation, analagous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The input to GroupByKey is a collection of key/value pairs that represents a multimap, where the collection contains multiple pairs that have the same key, but different values. Given such a collection, you use GroupByKey to collect all of the values associated with each unique key. +GroupByKey is a Beam transform for processing collections of key/value pairs. Itâs a parallel reduction operation, analogous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The input to GroupByKey is a collection of key/value pairs that represents a multimap, where the collection contains multiple pairs that have the same key, but different values. Given such a collection, you use GroupByKey to collect all of the values associated with each unique key. GroupByKey is a good way to aggregate data that has something in common. For example, if you have a collection that stores records of customer orders, you might want to group together all the orders from the same postal code (wherein the âkeyâ of the key/value pair is the postal code field, and the âvalueâ is the remainder of the record). -Letâs examine the mechanics of GroupByKey with a simple xample case, where our data set consists of words from a text file and the line number on which they appear. We want to group together all the line numbers (values) that share the same word (key), letting us see all the places in the text where a particular word appears. +Letâs examine the mechanics of GroupByKey with a simple example case, where our data set consists of words from a text file and the line number on which they appear. We want to group together all the line numbers (values) that share the same word (key), letting us see all the places in the text where a particular word appears. Our input is a PCollection of key/value pairs where each word is a key, and the value is a line number in the file where the word appears. Hereâs a list of the key/value pairs in the input collection: @@ -1046,7 +1046,7 @@ tree, [2] # We can also pass side inputs to a ParDo transform, which will get pa
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e627b278 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e627b278 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e627b278 Branch: refs/heads/asf-site Commit: e627b27880ea4b7159063de5f0eab1bdd59a511b Parents: 2dd2c59 Author: Ahmet Altay Authored: Fri Feb 10 12:05:21 2017 -0800 Committer: Ahmet Altay Committed: Fri Feb 10 12:05:21 2017 -0800 -- .../python-pipeline-dependencies/index.html | 316 +++ content/documentation/sdks/python/index.html| 3 + 2 files changed, 319 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/e627b278/content/documentation/sdks/python-pipeline-dependencies/index.html -- diff --git a/content/documentation/sdks/python-pipeline-dependencies/index.html b/content/documentation/sdks/python-pipeline-dependencies/index.html new file mode 100644 index 000..4107f5d --- /dev/null +++ b/content/documentation/sdks/python-pipeline-dependencies/index.html @@ -0,0 +1,316 @@ + + + + + + + + + Managing Python Pipeline Dependencies + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + + SDKs + Java SDK + Java SDK API Reference + +Python SDK + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + +Managing Python Pipeline Dependencies + + + Note: This page is only applicable to runners that do remote execution. + + +When you run y
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6ebcb08c Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6ebcb08c Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6ebcb08c Branch: refs/heads/asf-site Commit: 6ebcb08cb503a3e58101fc73be11649116111c65 Parents: f277339 Author: Frances Perry Authored: Wed Feb 8 10:53:36 2017 -0800 Committer: Frances Perry Committed: Wed Feb 8 10:53:36 2017 -0800 -- .../documentation/programming-guide/index.html | 327 --- 1 file changed, 274 insertions(+), 53 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/6ebcb08c/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index dee4869..9830735 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -208,7 +208,7 @@ PCollection: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. -Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. +Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, perfroms a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. I/O Source and Sink: Beam provides Source and Sink APIs to represent reading and writing data, respectively. Source encapsulates the code necessary to read data into your Beam pipeline from some external source, such as cloud file storage or a subscription to a streaming data source. Sink likewise encapsulates the code necessary to write the elements of a PCollection to an external data sink. @@ -248,11 +248,13 @@ -from apache_beam.utils.pipeline_options import PipelineOptions - -# Will parse the arguments passed into the application and construct a PipelineOptions +# Will parse the arguments passed into the application and construct a PipelineOptions object. # Note that --help will print registered options. + +from apache_beam.utils.pipeline_options import PipelineOptions + p = beam.Pipeline(options=PipelineOptions()) + @@ -286,13 +288,8 @@ -import apache_beam as beam - -# Create the pipeline. -p = beam.Pipeline() +lines = p | 'ReadMyFile' >> beam.io.ReadFromText('gs://some/inputData.txt') -# Read the text file into a PCollection. -lines = p | 'ReadMyFile' >> beam.io.Read(beam.io.TextFileSource("protocol://path/to/some/inputData.txt")) @@ -327,20 +324,18 @@ -import apache_beam as beam +p = beam.Pipeline(options=pipeline_options) -# python list -lines = [ - "To be, or not to be: that is the question: ", - "Whether 'tis nobler in the mind to suffer ", - "The slings and arrows of outrageous fortune, ", - "Or to take arms against a sea of troubles, " -] +(p + | beam.Create([ + 'To be, or not to be: that is the question: ', + 'Whether \'tis nobler in the mind to suffer ', + 'The slings and arrows of outrageous fortune, ', + 'Or to take arms against a sea of troubles, ']) + | beam.io.WriteToText(my_options.output)) -# Create the pipeline. -p = beam.Pipeline() +result = p.run() -collection = p | 'ReadMyLines' >> beam.Create(lines) @@ -401,8 +396,8 @@ How you apply your pipelineâs transforms determines the structure of your pipeline. The best way to think of your pipeline is as a directed acyclic graph, where the nodes are PCollections and the edges are transforms. For example, you can chain transforms to create a sequential pipeline, like this one: [Final Output PCollection] = [Initial Input PCollection].apply([First Transform]) - .apply([Second Transform]) - .apply([Third Transform]) +.apply([Second Transform]) +.appl
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6e3389ee Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6e3389ee Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6e3389ee Branch: refs/heads/asf-site Commit: 6e3389eec8befdb044ab7d48b78d6115b258f475 Parents: e347f09 Author: Davor Bonaci Authored: Wed Feb 8 10:28:46 2017 -0800 Committer: Davor Bonaci Committed: Wed Feb 8 10:28:46 2017 -0800 -- .../runners/capability-matrix/index.html| 32 ++-- 1 file changed, 16 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/6e3389ee/content/documentation/runners/capability-matrix/index.html -- diff --git a/content/documentation/runners/capability-matrix/index.html b/content/documentation/runners/capability-matrix/index.html index 2217896..60f62b2 100644 --- a/content/documentation/runners/capability-matrix/index.html +++ b/content/documentation/runners/capability-matrix/index.html @@ -445,17 +445,17 @@ -✕ (https://issues.apache.org/jira/browse/BEAM-25";>BEAM-25) +✓ -✕ +~ -✕ +~ @@ -939,17 +939,17 @@ -✕ (https://issues.apache.org/jira/browse/BEAM-27";>BEAM-27) +✓ -✕ +~ -✕ +~ @@ -1357,27 +1357,27 @@ -No: storage per key, per window(https://issues.apache.org/jira/browse/BEAM-25";>BEAM-25)Allows fine-grained access to per-key, per-window persistent state. Necessary for certain use cases (e.g. high-volume windows which store large amounts of data, but typically only access small portions of it; complex state machines; etc.) that are not easily or efficiently addressed via Combine or GroupByKey+ParDo. +Yes: storage per key, per windowAllows fine-grained access to per-key, per-window persistent state. Necessary for certain use cases (e.g. high-volume windows which store large amounts of data, but typically only access small portions of it; complex state machines; etc.) that are not easily or efficiently addressed via Combine or GroupByKey+ParDo. -No: pending model supportDataflow already supports keyed state internally, so adding support for this should be easy once the Beam model exposes it. +Partially: non-merging windowsKeyed state is fully supported for non-merging windows. -No: pending model supportFlink already supports keyed state, so adding support for this should be easy once the Beam model exposes it. +Partially: streaming, non-merging windowsKeyed state is supported in streaming mode for non-merging windows. -No: pending model supportSpark supports keyed state with mapWithState() so support shuold be straight forward. +No: not implementedSpark supports keyed state with mapWithState() so support shuold be straight forward. -No: pending model supportApex supports keyed state, so adding support for this should be easy once the Beam model exposes it. +No: not implementedApex supports keyed state, so adding support for this should be easy. @@ -1851,27 +1851,27 @@ -No: delayed processing callbacks(https://issues.apache.org/jira/browse/BEAM-27";>BEAM-27)A fine-grained mechanism for performing work at some point in the future, in either the event-time or processing-time domain. Useful for orchestrating delayed events, timeouts, etc in complex state per-key, per-window state machines. +Yes: delayed processing callbacksA fine-grained mechanism for performing work at some point in the future, in either the event-time or processing-time domain. Useful for orchestrating delayed events, timeouts, etc in complex state per-key, per-window state machines. -No: pending model supportDataflow already supports timers internally, so adding support for this should be easy once the Beam model exposes it. +Partially: non-merging windowsDataflow supports timers in non-merging windows. -No: pending model supportFlink already supports timers internally, so adding support for this should be easy once the Beam model exposes it. +Partially: streaming, non-merging windowsThe Flink runner support timers in non-merging windows when run in streaming mode. -No: pending model support +No: not implemented -No: pending model support +No: not implemented
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/56642bb5 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/56642bb5 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/56642bb5 Branch: refs/heads/asf-site Commit: 56642bb51a247e520034919be7137a78ee14a4e9 Parents: 23a524f Author: Ahmet Altay Authored: Fri Feb 3 19:04:13 2017 -0800 Committer: Ahmet Altay Committed: Fri Feb 3 19:04:13 2017 -0800 -- .../documentation/programming-guide/index.html | 18 +- .../sdks/python-type-safety/index.html | 361 +++ content/documentation/sdks/python/index.html| 15 +- .../get-started/wordcount-example/index.html| 10 +- 4 files changed, 387 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/56642bb5/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index b301096..2f2e03f 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -1035,9 +1035,9 @@ tree, [2] # The only change is that the first arguments are self and a context, rather than the PCollection element itself. class FilterUsingLength(beam.DoFn): - def process(self, context, lower_bound, upper_bound=float('inf')): -if lower_bound <= len(context.element) <= upper_bound: - yield context.element + def process(self, element, lower_bound, upper_bound=float('inf')): +if lower_bound <= len(element) <= upper_bound: + yield element small_words = words | beam.ParDo(FilterUsingLength(), 0, 3) @@ -1166,17 +1166,17 @@ tree, [2] class ProcessWords(beam.DoFn): - def process(self, context, cutoff_length, marker): -if len(context.element) <= cutoff_length: + def process(self, element, cutoff_length, marker): +if len(element) <= cutoff_length: # Emit this short word to the main output. - yield context.element + yield element else: # Emit this word's long length to a side output. yield pvalue.SideOutputValue( - 'above_cutoff_lengths', len(context.element)) -if context.element.startswith(marker): + 'above_cutoff_lengths', len(element)) +if element.startswith(marker): # Emit this word to a different side output. - yield pvalue.SideOutputValue('marked strings', context.element) + yield pvalue.SideOutputValue('marked strings', element) # Side outputs are also available in Map and FlatMap. http://git-wip-us.apache.org/repos/asf/beam-site/blob/56642bb5/content/documentation/sdks/python-type-safety/index.html -- diff --git a/content/documentation/sdks/python-type-safety/index.html b/content/documentation/sdks/python-type-safety/index.html new file mode 100644 index 000..40928cf --- /dev/null +++ b/content/documentation/sdks/python-type-safety/index.html @@ -0,0 +1,361 @@ + + + + + + + + + Ensuring Python Type Safety + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/documentation/sdks/python-type-safety/"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Cr
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/debe8c2c Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/debe8c2c Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/debe8c2c Branch: refs/heads/asf-site Commit: debe8c2cedc871d4c1c5a1b3ba3a91c94436a1fb Parents: da5ee69 Author: Davor Bonaci Authored: Thu Feb 2 16:06:38 2017 -0800 Committer: Davor Bonaci Committed: Thu Feb 2 16:06:38 2017 -0800 -- content/documentation/programming-guide/index.html | 2 +- content/get-started/wordcount-example/index.html | 17 - 2 files changed, 9 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/debe8c2c/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index d6ccda9..b301096 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -208,7 +208,7 @@ PCollection: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. -Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, perfroms a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. +Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. I/O Source and Sink: Beam provides Source and Sink APIs to represent reading and writing data, respectively. Source encapsulates the code necessary to read data into your Beam pipeline from some external source, such as cloud file storage or a subscription to a streaming data source. Sink likewise encapsulates the code necessary to write the elements of a PCollection to an external data sink. http://git-wip-us.apache.org/repos/asf/beam-site/blob/debe8c2c/content/get-started/wordcount-example/index.html -- diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html index 1cc4f9a..eb4deec 100644 --- a/content/get-started/wordcount-example/index.html +++ b/content/get-started/wordcount-example/index.html @@ -556,13 +556,12 @@ Figure 1: The pipeline data flow. class FilterTextFn(beam.DoFn): """A DoFn that filters for a specific key based on a regular expression.""" - # A custom aggregator can track values in your pipeline as it runs. Create - # custom aggregators matched_word and unmatched_words. - matched_words = beam.Aggregator('matched_words') - umatched_words = beam.Aggregator('umatched_words') - def __init__(self, pattern): self.pattern = pattern +# A custom metric can track values in your pipeline as it runs. Create +# custom metrics matched_word and unmatched_words. +self.matched_words = Metrics.counter(self.__class__, 'matched_words') +self.umatched_words = Metrics.counter(self.__class__, 'umatched_words') def process(self, context): word, _ = context.element @@ -572,8 +571,8 @@ Figure 1: The pipeline data flow. # Logging UI. logging.info('Matched %s', word) - # Add 1 to the custom aggregator matched_words - context.aggregate_to(self.matched_words, 1) + # Add 1 to the custom metric counter matched_words + self.matched_words.inc() yield context.element else: # Log at the "DEBUG" level each element that is not matched. Different @@ -583,8 +582,8 @@ Figure 1: The pipeline data flow. # Logger. This log message will not be visible in the Cloud Logger. logging.debug('Did not match %s', word) - # Add 1 to the custom aggregator umatched_words - context.aggregate_to(self.umatched_words, 1) + # Add 1 to the custom metric counter umatched_words + self.umatched_words.inc()
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/5387243d Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/5387243d Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/5387243d Branch: refs/heads/asf-site Commit: 5387243d0dc6057fbf5854c3d26a13c2832efa95 Parents: b76e36a Author: Davor Bonaci Authored: Wed Feb 1 11:32:41 2017 -0800 Committer: Davor Bonaci Committed: Wed Feb 1 11:32:41 2017 -0800 -- .../blog/2017/02/01/graduation-media-recap.html | 235 +++ content/blog/index.html | 22 ++ content/feed.xml| 93 +--- content/index.html | 4 +- 4 files changed, 317 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/5387243d/content/blog/2017/02/01/graduation-media-recap.html -- diff --git a/content/blog/2017/02/01/graduation-media-recap.html b/content/blog/2017/02/01/graduation-media-recap.html new file mode 100644 index 000..f3176ac --- /dev/null +++ b/content/blog/2017/02/01/graduation-media-recap.html @@ -0,0 +1,235 @@ + + + + + + + + + Media recap of the Apache Beam graduation + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + https://beam.apache.org/blog/2017/02/01/graduation-media-recap.html"; data-proofer-ignore> + https://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview +Quickstart - Java +Quickstart - Python + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + + SDKs + Java SDK + Java SDK API Reference + +Python SDK + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide +PTransform Style Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + + + +http://schema.org/BlogPosting";> + + +Media recap
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/66c0a704 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/66c0a704 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/66c0a704 Branch: refs/heads/asf-site Commit: 66c0a7042272d8e0f8944b5c8c655733a04a42ae Parents: f67cdd1 Author: Davor Bonaci Authored: Tue Jan 31 14:38:28 2017 -0800 Committer: Davor Bonaci Committed: Tue Jan 31 14:38:28 2017 -0800 -- content/contribute/team/index.html | 27 +++ 1 file changed, 27 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/66c0a704/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index 3281eb3..47dcc98 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -177,6 +177,15 @@ + Ahmet Altay + altay + altay [at] apache [dot] org + Google + committer + -8 + + + Jesse Anderson jesseanderson jesseanderson [at] apache [dot] org @@ -249,6 +258,15 @@ + Pei He + pei + pei [at] apache [dot] org + Google + committer + -8 + + + Kenneth Knowles kenn kenn [at] apache [dot] org @@ -267,6 +285,15 @@ + Stas Levin + staslevin + staslevin [at] apache [dot] org + PayPal + committer + +2 + + + Maximilian Michels mxm mxm [at] apache [dot] org
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e8cb676b Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e8cb676b Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e8cb676b Branch: refs/heads/asf-site Commit: e8cb676b0f5f7c4f531f8ad93700a0951a0c791a Parents: f9eb9fc Author: Davor Bonaci Authored: Mon Jan 30 23:08:19 2017 -0800 Committer: Davor Bonaci Committed: Mon Jan 30 23:08:19 2017 -0800 -- .../2016/10/11/strata-hadoop-world-and-beam.html| 2 +- content/contribute/work-in-progress/index.html | 6 -- content/documentation/programming-guide/index.html | 16 content/documentation/runners/dataflow/index.html | 2 +- content/documentation/runners/direct/index.html | 4 ++-- content/documentation/runners/flink/index.html | 2 +- content/feed.xml| 2 +- content/get-started/quickstart-py/index.html| 4 ++-- 8 files changed, 16 insertions(+), 22 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html -- diff --git a/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html b/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html index a02c380..defada4 100644 --- a/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html +++ b/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html @@ -166,7 +166,7 @@ The Data Engineers are looking to Beam as a way to https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code";>future-proof, meaning that code is portable between the various Big Data frameworks. In fact, many of the attendees were still on Hadoop MapReduce and looking to transition to a new framework. Theyâre realizing that continually rewriting code isnât the most productive approach. -Data Scientists are really interested in using Beam. They interested in having a single API for doing analysis instead of several different APIs. We talked about Beamâs progress on the Python API. If you want to take a peek, itâs being actively developed on a https://github.com/apache/beam/tree/python-sdk";>feature branch. As Beam matures, weâre looking to add other supported languages. +Data Scientists are really interested in using Beam. They interested in having a single API for doing analysis instead of several different APIs. We talked about Beamâs progress on the Python API. If you want to take a peek, itâs being actively developed on a https://github.com/apache/beam/tree/master/sdks/python";>feature branch. As Beam matures, weâre looking to add other supported languages. We heard https://twitter.com/jessetanderson/status/781124173108305920";>loud and clear from Beam users that great runner support is crucial to adoption. We have great Apache Flink support. During the conference we had some more volunteers offer their help on the Spark runner. http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/contribute/work-in-progress/index.html -- diff --git a/content/contribute/work-in-progress/index.html b/content/contribute/work-in-progress/index.html index 992cf46..caa00d5 100644 --- a/content/contribute/work-in-progress/index.html +++ b/content/contribute/work-in-progress/index.html @@ -182,12 +182,6 @@ https://github.com/apache/beam/blob/gearpump-runner/runners/gearpump/README.md";>README - Python SDK - https://github.com/apache/beam/tree/python-sdk";>python-sdk - https://issues.apache.org/jira/browse/BEAM/component/12328910";>sdk-py - https://github.com/apache/beam/blob/python-sdk/sdks/python/README.md";>README - - Apache Spark 2.0 Runner https://github.com/apache/beam/tree/runners-spark2";>runners-spark2 - http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 80eee5f..bee08bc 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -229,13 +229,13 @@ Creating the pipeline -The Pipeline abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a Pipelinehttps://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/pipeline.py";>Pipeline object, and then using that object as t
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/38c9a1e6 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/38c9a1e6 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/38c9a1e6 Branch: refs/heads/asf-site Commit: 38c9a1e6f06e86bb0b8406c7b2688b4598ad29a7 Parents: f911065 Author: Davor Bonaci Authored: Mon Jan 30 09:14:01 2017 -0800 Committer: Davor Bonaci Committed: Mon Jan 30 09:14:01 2017 -0800 -- content/contribute/release-guide/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/38c9a1e6/content/contribute/release-guide/index.html -- diff --git a/content/contribute/release-guide/index.html b/content/contribute/release-guide/index.html index 624df88..5ef3dfd 100644 --- a/content/contribute/release-guide/index.html +++ b/content/contribute/release-guide/index.html @@ -580,7 +580,7 @@ The complete staging area is available for your review, which includes: * source code tag "v1.2.3-RC3" [5], * website pull request listing the release and publishing the API reference manual [6]. -The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PPMC affirmative votes. +The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Release Manager
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/491b9071 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/491b9071 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/491b9071 Branch: refs/heads/asf-site Commit: 491b90710b0645af49b23e8cdc92395ff98dad9d Parents: 65dbaab Author: Davor Bonaci Authored: Sat Jan 28 10:24:52 2017 -0800 Committer: Davor Bonaci Committed: Sat Jan 28 10:24:52 2017 -0800 -- content/get-started/beam-overview/index.html | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/491b9071/content/get-started/beam-overview/index.html -- diff --git a/content/get-started/beam-overview/index.html b/content/get-started/beam-overview/index.html index c864a68..178ff1e 100644 --- a/content/get-started/beam-overview/index.html +++ b/content/get-started/beam-overview/index.html @@ -177,7 +177,7 @@ Apache Beam Pipeline Runners -The Beam Pipeline Runners translate the data processing pipeline you define with your Beam program into the API compatible with the distributed processing back-end of your choice. When you run your Beam program, youâll need to specify the appropriate runner for the back-end where you want to execute your pipeline. +The Beam Pipeline Runners translate the data processing pipeline you define with your Beam program into the API compatible with the distributed processing back-end of your choice. When you run your Beam program, youâll need to specify an appropriate runner for the back-end where you want to execute your pipeline. Beam currently supports Runners that work with the following distributed processing back-ends: @@ -188,19 +188,19 @@ Apache Apex - In Development + Active Development Apache Flink - In Development + Active Development Apache Spark - In Development + Active Development Google Cloud Dataflow - In Development + Active Development
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/eaca9f54 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/eaca9f54 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/eaca9f54 Branch: refs/heads/asf-site Commit: eaca9f54181b93be18282835398eba9c99fb530f Parents: df96c7d Author: Davor Bonaci Authored: Tue Jan 10 03:03:09 2017 -0800 Committer: Davor Bonaci Committed: Tue Jan 10 03:03:09 2017 -0800 -- content/blog/2017/01/10/beam-graduates.html | 242 ++ content/blog/index.html | 20 + content/contribute/maturity-model/index.html | 4 +- content/feed.xml | 864 ++ content/index.html | 4 +- 5 files changed, 334 insertions(+), 800 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/eaca9f54/content/blog/2017/01/10/beam-graduates.html -- diff --git a/content/blog/2017/01/10/beam-graduates.html b/content/blog/2017/01/10/beam-graduates.html new file mode 100644 index 000..a7fcbef --- /dev/null +++ b/content/blog/2017/01/10/beam-graduates.html @@ -0,0 +1,242 @@ + + + + + + + + + Apache Beam established as a new top-level project + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + http://beam.apache.org/blog/2017/01/10/beam-graduates.html"; data-proofer-ignore> + http://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview + Quickstart + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + + SDKs + Java SDK + Java SDK API Reference + +Python SDK + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + + + +http://schema.org/BlogPosting";> + + +Apache Beam established as a new top-level project +Jan
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/ac3b3d8d Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/ac3b3d8d Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/ac3b3d8d Branch: refs/heads/asf-site Commit: ac3b3d8d3ac20a20a33e568c885bae43bb406925 Parents: 61d1fd6 Author: Davor Bonaci Authored: Tue Jan 10 01:57:15 2017 -0800 Committer: Davor Bonaci Committed: Tue Jan 10 01:57:15 2017 -0800 -- .../beam/capability/2016/03/17/capability-matrix.html | 13 + .../capability/2016/04/03/presentation-materials.html | 13 + .../python/sdk/2016/02/25/python-sdk-now-public.html | 13 + content/beam/release/2016/06/15/first-release.html | 13 + .../2016/10/11/strata-hadoop-world-and-beam.html | 13 + .../update/website/2016/02/22/beam-has-a-logo.html | 13 + content/blog/2016/05/18/splitAtFraction-method.html| 13 + .../2016/05/27/where-is-my-pcollection-dot-map.html| 13 + .../blog/2016/06/13/flink-batch-runner-milestone.html | 13 + content/blog/2016/08/03/six-months.html| 13 + content/blog/2016/10/20/test-stream.html | 13 + content/blog/2017/01/09/added-apex-runner.html | 13 + content/blog/index.html| 13 + content/coming-soon.html | 13 + content/contribute/contribution-guide/index.html | 13 + content/contribute/design-principles/index.html| 13 + content/contribute/index.html | 13 + content/contribute/logos/index.html| 13 + content/contribute/maturity-model/index.html | 13 + content/contribute/presentation-materials/index.html | 13 + content/contribute/release-guide/index.html| 13 + content/contribute/source-repository/index.html| 13 + content/contribute/team/index.html | 13 + content/contribute/testing/index.html | 13 + content/contribute/work-in-progress/index.html | 13 + content/documentation/index.html | 13 + .../pipelines/create-your-pipeline/index.html | 13 + .../pipelines/design-your-pipeline/index.html | 13 + .../pipelines/test-your-pipeline/index.html| 13 + content/documentation/programming-guide/index.html | 13 + content/documentation/resources/index.html | 13 + content/documentation/runners/apex/index.html | 13 + .../documentation/runners/capability-matrix/index.html | 13 + content/documentation/runners/dataflow/index.html | 13 + content/documentation/runners/direct/index.html| 13 + content/documentation/runners/flink/index.html | 13 + content/documentation/runners/spark/index.html | 13 + content/documentation/sdks/java/index.html | 13 + content/documentation/sdks/python/index.html | 13 + content/get-started/beam-overview/index.html | 13 + content/get-started/downloads/index.html | 13 + content/get-started/index.html | 13 + content/get-started/mobile-gaming-example/index.html | 13 + content/get-started/quickstart/index.html | 13 + content/get-started/support/index.html | 13 + content/get-started/wordcount-example/index.html | 13 + content/index.html | 13 + content/privacy_policy/index.html | 13 + 48 files changed, 432 insertions(+), 192 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/ac3b3d8d/content/beam/capability/2016/03/17/capability-matrix.html -- diff --git a/content/beam/capability/2016/03/17/capability-matrix.html b/content/beam/capability/2016/03/17/capability-matrix.html index 7c19748..285b4b4 100644 --- a/content/beam/capability/2016/03/17/capability-matrix.html +++ b/content/beam/capability/2016/03/17/capability-matrix.html @@ -949,10 +949,15 @@ - © Copyright 2016 -http://www.apache.org";>The Apache Software Foundation. All Rights Reserved. -Privacy
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8e9900da Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8e9900da Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8e9900da Branch: refs/heads/asf-site Commit: 8e9900dab746c0ce7ffc490452b5d1effa855fcd Parents: add05e1 Author: Davor Bonaci Authored: Tue Jan 10 01:05:30 2017 -0800 Committer: Davor Bonaci Committed: Tue Jan 10 01:05:30 2017 -0800 -- content/index.html | 10 -- 1 file changed, 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/8e9900da/content/index.html -- diff --git a/content/index.html b/content/index.html index 54682f7..1dc94cd 100644 --- a/content/index.html +++ b/content/index.html @@ -184,16 +184,6 @@ May 27, 2016 - Where's my PCollection.map()? -May 18, 2016 - Dynamic work rebalancing for Beam - -Apr 3, 2016 - Apache Beam Presentation Materials - -Mar 17, 2016 - Clarifying & Formalizing Runner Capabilities - -Feb 25, 2016 - Dataflow Python SDK is now public! - -Feb 22, 2016 - Apache Beam has a logo! -
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/bb97dd38 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/bb97dd38 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/bb97dd38 Branch: refs/heads/asf-site Commit: bb97dd381e7420ce08580770ddb0580b467b37c7 Parents: 578e1ac Author: Davor Bonaci Authored: Mon Jan 9 18:03:28 2017 -0800 Committer: Davor Bonaci Committed: Mon Jan 9 18:03:28 2017 -0800 -- .../contribute/contribution-guide/index.html| 36 ++-- content/contribute/testing/index.html | 9 +++-- 2 files changed, 22 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/bb97dd38/content/contribute/contribution-guide/index.html -- diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html index 1c77574..667bc8a 100644 --- a/content/contribute/contribution-guide/index.html +++ b/content/contribute/contribution-guide/index.html @@ -149,7 +149,7 @@ Engage Mailing list(s) - Apache JIRA + JIRA issue tracker Design @@ -207,7 +207,7 @@ -The Apache Beam community welcomes contributions from anyone with a passion for data processing! Beam has many different opportunities for contributions â write new examples, add new user-facing libraries (new statistical libraries, new IO connectors, etc), work on the core programming model, build specific runners (Apache Flink, Apache Spark, Google Cloud Dataflow, etc), or participate on the documentation effort. +The Apache Beam community welcomes contributions from anyone with a passion for data processing! Beam has many different opportunities for contributions â write new examples, add new user-facing libraries (new statistical libraries, new IO connectors, etc), work on the core programming model, build specific runners (Apache Apex, Apache Flink, Apache Spark, Google Cloud Dataflow, etc), or participate on the documentation effort. We use a review-then-commit workflow in Beam for all contributions. @@ -233,12 +233,12 @@ Engage Mailing list(s) -We discuss design and implementation issues on d...@beam.apache.org mailing list, which is archived https://lists.apache.org/list.html?d...@beam.apache.org";>here. Join by emailing mailto:dev-subscr...@beam.apache.org";>dev-subscr...@beam.apache.org. +We discuss design and implementation issues on the d...@beam.apache.org mailing list, which is archived https://lists.apache.org/list.html?d...@beam.apache.org";>here. Join by emailing mailto:dev-subscr...@beam.apache.org";>dev-subscr...@beam.apache.org. -If interested, you can also join the other mailing lists too. +If interested, you can also join the other mailing lists. -Apache JIRA -We use https://issues.apache.org/jira/browse/BEAM";>Apache JIRA as an issue tracking and project management tool, as well as a way to communicate among a very diverse and distributed set of contributors. To be able to gather feedback, avoid frustration, and avoid duplicated efforts all Beam-related work should be tracked there. +JIRA issue tracker +We use the Apache Software Foundationâs https://issues.apache.org/jira/browse/BEAM";>JIRA as an issue tracking and project management tool, as well as a way to communicate among a very diverse and distributed set of contributors. To be able to gather feedback, avoid frustration, and avoid duplicated efforts all Beam-related work should be tracked there. If you do not already have an Apache JIRA account, sign up https://issues.apache.org/jira/";>here. @@ -264,7 +264,7 @@ [Potentially] Submit Contributor License Agreement Apache Software Foundation (ASF) desires that all contributors of ideas, code, or documentation to the Apache projects complete, sign, and submit an https://www.apache.org/licenses/icla.txt";>Individual Contributor License Agreement (ICLA). The purpose of this agreement is to clearly define the terms under which intellectual property has been contributed to the ASF and thereby allow us to defend the project should there be a legal dispute regarding the software at some future time. -We require you to have an ICLA on file with the Apache Secretary for larger contributions only. For smaller ones, however, we rely on http://www.apache.org/licenses/LICENSE-2.0#contributions";>clause five of the Apache License, Version 2.0, describing licensing of intentionally submitted contributions and do not require an ICLA in that case. +We require you to have an ICLA on file with the Apache Secretary for larger contributions only. For smaller ones, however, we rely on http://www.apache.org/licenses/LICENSE-2.0#
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e7547825 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e7547825 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e7547825 Branch: refs/heads/asf-site Commit: e75478257414633893bd6037176fde9cf38c772b Parents: 430a84c Author: Davor Bonaci Authored: Mon Jan 9 17:37:52 2017 -0800 Committer: Davor Bonaci Committed: Mon Jan 9 17:37:52 2017 -0800 -- content/blog/2016/01/08/added-apex-runner.html | 211 content/blog/index.html| 16 ++ content/index.html | 2 + 3 files changed, 229 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/e7547825/content/blog/2016/01/08/added-apex-runner.html -- diff --git a/content/blog/2016/01/08/added-apex-runner.html b/content/blog/2016/01/08/added-apex-runner.html new file mode 100644 index 000..abb22bb --- /dev/null +++ b/content/blog/2016/01/08/added-apex-runner.html @@ -0,0 +1,211 @@ + + + + + + + + + Release 0.4.0 adds a runner for Apache Apex + + + + + https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";> + + + http://beam.apache.org/blog/2016/01/08/added-apex-runner.html"; data-proofer-ignore> + http://beam.apache.org/feed.xml";> + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + +ga('create', 'UA-73650088-1', 'auto'); +ga('send', 'pageview'); + + + + + + + + + + + + + + + +Toggle navigation + + + + + + + + + Get Started + + Beam Overview + Quickstart + + Example Walkthroughs + WordCount + Mobile Gaming + + Resources + Downloads + Support + + + + Documentation + + Using the Documentation + + Beam Concepts + Programming Guide + Additional Resources + + Pipeline Fundamentals + Design Your Pipeline + Create Your Pipeline + Test Your Pipeline + + SDKs + Java SDK + Java SDK API Reference + +Python SDK + + Runners + Capability Matrix + Direct Runner + Apache Apex Runner + Apache Flink Runner + Apache Spark Runner + Cloud Dataflow Runner + + + + Contribute + + Get Started Contributing + +Guides + Contribution Guide +Testing Guide +Release Guide + +Technical References +Design Principles + Ongoing Projects +Source Repository + + Promotion +Presentation Materials +Logos and Design + +Maturity Model +Team + + + +Blog + + + + https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache Logo" style="height:24px;">Apache Software Foundation + +http://www.apache.org/";>ASF Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/security/";>Security +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +https://www.apache.org/foundation/policies/conduct";>Code of Conduct + + + + + + + + + + + + + + + + +http://schema.org/BlogPosting";> + + +Release 0.4.0 adds a runner for Apache Apex +Jan 8, 2016 ⢠Thomas Weise [https://twitter.com/thweise";>@thweise] + + + + +The latest release 0.4.0 of https://beam.apac
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f12cc8c3 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f12cc8c3 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f12cc8c3 Branch: refs/heads/asf-site Commit: f12cc8c3632840d7b4158bd2cd1d0567a65f2742 Parents: c6d0390 Author: Davor Bonaci Authored: Thu Dec 29 11:31:32 2016 -0800 Committer: Davor Bonaci Committed: Thu Dec 29 11:31:32 2016 -0800 -- content/.htaccess | 15 +++ 1 file changed, 15 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f12cc8c3/content/.htaccess -- diff --git a/content/.htaccess b/content/.htaccess new file mode 100644 index 000..06fc74b --- /dev/null +++ b/content/.htaccess @@ -0,0 +1,15 @@ +RewriteEngine On + +# This is a 301 (permanent) redirect from HTTP to HTTPS. + +# The next rule applies conditionally: +# * the host is "beam.apache.org", +# * the host comparison is case insensitive (NC), +# * HTTPS is not used. +RewriteCond %{HTTP_HOST} ^beam\.apache\.org [NC] +RewriteCond %{HTTPS} !on + +# Rewrite the URL as follows: +# * Redirect (R) permanently (301) to https://beam.apache.org/, +# * Stop processing more rules (L). +RewriteRule ^(.*)$ https://beam.apache.org/$1 [L,R=301]
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6682c44a Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6682c44a Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6682c44a Branch: refs/heads/asf-site Commit: 6682c44abefa98697cfd91442f50218f04faf033 Parents: b2ae6a9 Author: Davor Bonaci Authored: Wed Dec 28 18:42:20 2016 -0800 Committer: Davor Bonaci Committed: Wed Dec 28 18:42:20 2016 -0800 -- content/contribute/maturity-model/index.html | 4 +- content/contribute/release-guide/index.html | 88 --- content/get-started/downloads/index.html | 6 +- 3 files changed, 19 insertions(+), 79 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/6682c44a/content/contribute/maturity-model/index.html -- diff --git a/content/contribute/maturity-model/index.html b/content/contribute/maturity-model/index.html index 10b68a0..be18b0f 100644 --- a/content/contribute/maturity-model/index.html +++ b/content/contribute/maturity-model/index.html @@ -237,7 +237,7 @@ RE10 Releases consist of source code, distributed using standard and open archive formats that are expected to stay readable in the long term. [6] - YES. https://dist.apache.org/repos/dist/release/incubator/beam/";>Source releases are distributed via dist.apache.org and linked from the website. + YES. https://dist.apache.org/repos/dist/release/beam/";>Source releases are distributed via dist.apache.org and linked from the website. RE20 @@ -247,7 +247,7 @@ RE30 Releases are signed and/or distributed along with digests that can be reliably used to validate the downloaded archives. - YES. All releases are signed, and the https://dist.apache.org/repos/dist/release/incubator/beam/KEYS";>KEYS file is provided on dist.apache.org. + YES. All releases are signed, and the https://dist.apache.org/repos/dist/release/beam/KEYS";>KEYS file is provided on dist.apache.org. RE40 http://git-wip-us.apache.org/repos/asf/beam-site/blob/6682c44a/content/contribute/release-guide/index.html -- diff --git a/content/contribute/release-guide/index.html b/content/contribute/release-guide/index.html index 97b9ef5..a1505bb 100644 --- a/content/contribute/release-guide/index.html +++ b/content/contribute/release-guide/index.html @@ -196,7 +196,7 @@ Promote the release Apache mailing lists - ASF press release + Recordkeeping Beam blog Social media Checklist to declare the process completed @@ -279,7 +279,7 @@ sub 2048R/BA4D50BE 2016-02-23 Here, the key ID is the 8-digit hex string in the pub line: 845E6689. -Now, add your Apache GPG key to the Beamâs KEYS file both in https://dist.apache.org/repos/dist/dev/incubator/beam/KEYS";>dev and https://dist.apache.org/repos/dist/release/incubator/beam/KEYS";>release repositories at dist.apache.org. Follow the instructions listed at the top of these files. +Now, add your Apache GPG key to the Beamâs KEYS file both in https://dist.apache.org/repos/dist/dev/beam/KEYS";>dev and https://dist.apache.org/repos/dist/release/beam/KEYS";>release repositories at dist.apache.org. Follow the instructions listed at the top of these files. Configure git to use this key when signing code by giving it your key ID, as follows: @@ -382,8 +382,8 @@ export GPG_AGENT_INFO Set up a few environment variables to simplify Maven commands that follow. (We use bash Unix syntax in this guide.) -VERSION="1.2.3-incubating" -NEXT_VERSION="1.2.4-incubating" +VERSION="1.2.3" +NEXT_VERSION="1.2.4" BRANCH_NAME="release-${VERSION}" DEVELOPMENT_VERSION="${NEXT_VERSION}-SNAPSHOT" @@ -474,7 +474,7 @@ TAG="v${VERSION}-RC${RC_NUM}" If you have not already, check out the Beam section of the dev repository on dist.apache.org via Subversion. In a fresh directory: - svn co https://dist.apache.org/repos/dist/dev/incubator/beam + svn co https://dist.apache.org/repos/dist/dev/beam @@ -504,7 +504,7 @@ TAG="v${VERSION}-RC${RC_NUM}" -Verify that files are https://dist.apache.org/repos/dist/dev/incubator/beam";>present. +Verify that files are https://dist.apache.org/repos/dist/dev/beam";>present. @@ -551,7 +551,7 @@ TAG="v${VERSION}-RC${RC_NUM}" Maven artifacts deployed to the staging repository of https://repository.apache.org/content/repositories/";>repository.apache.org - Source distribution deployed to the dev repository of https://dist.apache.org/repos/dist/dev/incubator/beam/";>dist.apache.org + Source distri
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/00c736ce Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/00c736ce Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/00c736ce Branch: refs/heads/asf-site Commit: 00c736ce6dd742d2673007d68edb6920e9795f9f Parents: 303fb26 Author: Davor Bonaci Authored: Tue Dec 27 17:54:19 2016 -0800 Committer: Davor Bonaci Committed: Tue Dec 27 17:54:19 2016 -0800 -- content/documentation/index.html| 14 +++- content/documentation/resources/index.html | 2 +- content/documentation/sdks/java/index.html | 32 +++- content/documentation/sdks/python/index.html| 2 - .../mobile-gaming-example/index.html| 37 + .../get-started/wordcount-example/index.html| 77 ++- content/images/gaming-example-basic.png | Bin 0 -> 63121 bytes 7 files changed, 88 insertions(+), 76 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/index.html -- diff --git a/content/documentation/index.html b/content/documentation/index.html index 1259a66..4767f70 100644 --- a/content/documentation/index.html +++ b/content/documentation/index.html @@ -146,7 +146,7 @@ Apache Beam Documentation -Get in-depth conceptual information and reference material for the Beam Model, SDKs and Runners: +This section provides in-depth conceptual information and reference material for the Beam Model, SDKs, and Runners: Concepts @@ -157,6 +157,14 @@ Visit Additional Resources for some of our favorite articles and talks about Beam. +Pipeline Fundamentals + + + Design Your Pipeline by planning your pipelineâs structure, choosing transforms to apply to your data, and determining your input and output methods. + Create Your Pipeline using the classes in the Beam SDKs. + Test Your Pipeline to minimize debugging a pipelineâs remote execution. + + SDKs Find status and reference information on all of the available Beam SDKs. @@ -183,9 +191,9 @@ Choosing a Runner -Beam is designed to enable pipelines to be portable across different runners. However, given every runner has different capabilities, they also have different abilities to implement the core concepts in the Beam model. The Capability Matrix provides a detailed comparison of runner functionality. +Beam is designed to enable pipelines to be portable across different runners. However, given every runner has different capabilities, they also have different abilities to implement the core concepts in the Beam model. The Capability Matrix provides a detailed comparison of runner functionality. -Once you have chosen which runner to use, see that runnerâs page for more information about any initial runner-specific setup as well as any required or optional PipelineOptions for configuring itâs execution. You may also want to refer back to the Quickstart for instructions on executing the sample WordCount pipeline. +Once you have chosen which runner to use, see that runnerâs page for more information about any initial runner-specific setup as well as any required or optional PipelineOptions for configuring itâs execution. You may also want to refer back to the Quickstart for instructions on executing the sample WordCount pipeline. http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/resources/index.html -- diff --git a/content/documentation/resources/index.html b/content/documentation/resources/index.html index 21a707f..9aceea2 100644 --- a/content/documentation/resources/index.html +++ b/content/documentation/resources/index.html @@ -187,7 +187,7 @@ Hadoop Summit, San Jose, CA, 2016 -Presented by Davor Bonacci, Apache Beam PPMC member +Presented by Davor Bonaci, Apache Beam PPMC member https://www.youtube.com/embed/7DZ8ONmeP5A"; frameborder="0" allowfullscreen=""> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/sdks/java/index.html -- diff --git a/content/documentation/sdks/java/index.html b/content/documentation/sdks/java/index.html index 3b3ead4..bd29d00 100644 --- a/content/documentation/sdks/java/index.html +++ b/content/documentation/sdks/java/index.html @@ -146,11 +146,37 @@ Apache Beam Java SDK -This page is under construction (https://issues.apache.org/jira/browse/BEAM-504";>BEAM-504). +The Java SDK for Apache Beam provides a simple, powerful API for building both batch and streaming parallel data pr
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/d83813c4 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/d83813c4 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/d83813c4 Branch: refs/heads/asf-site Commit: d83813c417fa04f748ef29cba5e48955b9d5c60b Parents: 94c2366 Author: Davor Bonaci Authored: Tue Dec 27 17:39:31 2016 -0800 Committer: Davor Bonaci Committed: Tue Dec 27 17:39:31 2016 -0800 -- content/contribute/release-guide/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/d83813c4/content/contribute/release-guide/index.html -- diff --git a/content/contribute/release-guide/index.html b/content/contribute/release-guide/index.html index 74d3b96..97b9ef5 100644 --- a/content/contribute/release-guide/index.html +++ b/content/contribute/release-guide/index.html @@ -524,7 +524,7 @@ TAG="v${VERSION}-RC${RC_NUM}" -Ddoctitle="Apache Beam SDK for Java, version ${VERSION}" \ -Dwindowtitle="Apache Beam SDK for Java, version ${VERSION}" \ -Dmaven.javadoc.failOnError=false \ - -DexcludePackageNames="org.apache.beam.examples,org.apache.beam.runners.dataflow.internal,org.apache.beam.runners.flink.examples,org.apache.beam.runners.flink.translation,org.apache.beam.runners.spark.examples,org.apache.beam.runners.spark.translation,org.apache.beam.sdk.microbenchmarks.coders.generated,org.apache.beam.sdk.microbenchmarks.transforms.generated,org.openjdk.jmh.infra.generated" + -DexcludePackageNames="org.apache.beam.examples,org.apache.beam.runners.dataflow.internal,org.apache.beam.runners.flink.examples,org.apache.beam.runners.flink.translation,org.apache.beam.runners.spark.examples,org.apache.beam.runners.spark.translation,org.apache.beam.runners.apex.translation,org.apache.beam.sdk.microbenchmarks.coders.generated,org.apache.beam.sdk.microbenchmarks.transforms.generated,org.openjdk.jmh.infra.generated"
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/bee9579e Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/bee9579e Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/bee9579e Branch: refs/heads/asf-site Commit: bee9579e89bef81336228b22b6964cae3ca63c83 Parents: 7240745 Author: Davor Bonaci Authored: Tue Dec 27 17:10:01 2016 -0800 Committer: Davor Bonaci Committed: Tue Dec 27 17:10:01 2016 -0800 -- content/contribute/team/index.html | 178 +--- 1 file changed, 71 insertions(+), 107 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/bee9579e/content/contribute/team/index.html -- diff --git a/content/contribute/team/index.html b/content/contribute/team/index.html index e4f3aa2..a303c25 100644 --- a/content/contribute/team/index.html +++ b/content/contribute/team/index.html @@ -166,129 +166,102 @@ - Aljoscha Krettek - aljoscha - aljoscha [at] apache [dot] org - data Artisans - committer, PPMC - +1 - - - - Amit Sela - amitsela - amitsela [at] apache [dot] org - PayPal - committer, PPMC - +2 - - - - Ben Chambers - bchambers - bchambers [at] apache [dot] org + Tyler Akidau + takidau + takidau [at] apache [dot] org Google - committer, PPMC + committer, PMC -8 - Craig Chambers - - - Google - committer, PPMC + Jesse Anderson + jesseanderson + jesseanderson [at] apache [dot] org + Smoking Hand + committer -8 - Dan Halperin - dhalperi - dhalperi [at] apache [dot] org + Davor Bonaci + davor + davor [at] apache [dot] org Google - committer, PPMC + committer, PMC Chair -8 - Davor Bonaci - davor - davor [at] apache [dot] org + Robert Bradshaw + robertwb + robertwb [at] apache [dot] org Google - committer, PPMC + committer, PMC -8 - Frances Perry - frances - frances [at] apache [dot] org + Ben Chambers + bchambers + bchambers [at] apache [dot] org Google - committer, PPMC + committer, PMC -8 - James Malone - jamesmalone - jamesmalone [at] apache [dot] org + Luke Cwik + lcwik + lcwik [at] apache [dot] org Google - committer, PPMC + committer, PMC -8 - Jean-Baptiste Onofré - jbonofre - jbonofre [at] apache [dot] org - Talend - champion, committer, PPMC + Stephan Ewen + sewen + sewen [at] apache [dot] org + data Artisans + committer, PMC +1 - Jesse Anderson - jesseanderson - jesseanderson [at] apache [dot] org - Smoking Hand + Thomas Groh + tgroh + tgroh [at] apache [dot] org + Google committer -8 - Josh Wills - jwills - jwills [at] apache [dot] org - - committer, PPMC + Dan Halperin + dhalperi + dhalperi [at] apache [dot] org + Google + committer, PMC -8 - Kostas Tzoumas - ktzoumas - ktzoumas [at] apache [dot] org - data Artisans - committer, PPMC - +1 - - - Kenneth Knowles kenn kenn [at] apache [dot] org Google - committer, PPMC + committer, PMC -8 - Luke Cwik - lcwik - lcwik [at] apache [dot] org - Google - committer, PPMC - -8 + Aljoscha Krettek + aljoscha + aljoscha [at] apache [dot] org + data Artisans + committer, PMC + +1 @@ -296,61 +269,52 @@ mxm mxm [at] apache [dot] org data Artisans - committer, PPMC + committer,
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7ce54294 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7ce54294 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7ce54294 Branch: refs/heads/asf-site Commit: 7ce542945e1bad899d91ed85bf5878aa7d19b0aa Parents: 0b5e9d2 Author: Davor Bonaci Authored: Tue Dec 27 16:29:14 2016 -0800 Committer: Davor Bonaci Committed: Tue Dec 27 16:29:14 2016 -0800 -- content/downloads/beam-doap.rdf | 39 1 file changed, 39 insertions(+) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/7ce54294/content/downloads/beam-doap.rdf -- diff --git a/content/downloads/beam-doap.rdf b/content/downloads/beam-doap.rdf new file mode 100644 index 000..8b1cbd8 --- /dev/null +++ b/content/downloads/beam-doap.rdf @@ -0,0 +1,39 @@ + + +http://usefulinc.com/ns/doap#"; + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; + xmlns:asfext="http://projects.apache.org/ns/asfext#"; + xmlns:foaf="http://xmlns.com/foaf/0.1/";> + + http://beam.apache.org";> +2016-12-21 +http://www.apache.org/licenses/LICENSE-2.0"; /> +Apache Beam +http://beam.apache.org"; /> +http://beam.apache.org"; /> +Apache Beam is a programming model, SDKs, and runners for defining and executing data processing pipelines. +Apache Beam is a unified programming model for both batch and streaming data processing, enabling efficient execution across diverse distributed execution engines and providing extensibility points for connecting to different technologies and user communities. +https://issues.apache.org/jira/browse/BEAM"; /> +http://beam.apache.org/get-started/support/"; /> +http://beam.apache.org/get-started/downloads/"; /> +Java +Python +http://projects.apache.org/category/big-data"; /> + +