This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new d90bb3bc550 Publishing website 2023/07/11 22:16:02 at commit 33518fb d90bb3bc550 is described below commit d90bb3bc5504632b0875bc75b703150a6efd9b19 Author: jenkins <bui...@apache.org> AuthorDate: Tue Jul 11 22:16:02 2023 +0000 Publishing website 2023/07/11 22:16:02 at commit 33518fb --- website/generated-content/contribute/index.xml | 5 +++- .../contribute/release-guide/index.html | 6 +++-- .../io/built-in/google-bigquery/index.html | 28 ++++++++++++++++++---- website/generated-content/sitemap.xml | 2 +- 4 files changed, 32 insertions(+), 9 deletions(-) diff --git a/website/generated-content/contribute/index.xml b/website/generated-content/contribute/index.xml index 0af6e7da522..47ea65bedc0 100644 --- a/website/generated-content/contribute/index.xml +++ b/website/generated-content/contribute/index.xml @@ -1201,6 +1201,7 @@ You don&rsquo;t need to wait for the action to complete to start running the <ol> <li>Clone the repo at the selected RC tag.</li> <li>Run gradle publish to push java artifacts into Maven staging repo.</li> +<li>Stage SDK docker images to <a href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">docker hub Apache organization</a>.</li> </ol> </li> </ul> @@ -1235,9 +1236,11 @@ Some additional validation should be done during the rc validation step.</li> <p><strong>The script will:</strong></p> <ol> <li>Clone the repo at the selected RC tag.</li> -<li>Stage source release into dist.apache.org dev <a href="https://dist.apache.org/repos/dist/dev/beam/">repo</a>.</li> +<li>Stage source release into dist.apache.org dev <a href="https://dist.apache.org/repos/dist/dev/beam/">repo</a>. +Skip this step if you already did it with the build_release_candidate GitHub Actions workflow.</li> <li>Stage, sign and hash python source distribution and wheels into dist.apache.org dev repo python dir</li> <li>Stage SDK docker images to <a href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">docker hub Apache organization</a>. +Skip this step if you already did it with the build_release_candidate GitHub Actions workflow. Note: if you are not a member of the <a href="https://hub.docker.com/orgs/apache/teams/beam"><code>beam</code> DockerHub team</a> you will need help with this step. Please email <code>dev@</code> and ask a member of the <code>beam</code> DockerHub team for help.</li> <li>Create a PR to update beam-site, changes includes: diff --git a/website/generated-content/contribute/release-guide/index.html b/website/generated-content/contribute/release-guide/index.html index 31108ece97c..28af8361a77 100644 --- a/website/generated-content/contribute/release-guide/index.html +++ b/website/generated-content/contribute/release-guide/index.html @@ -142,12 +142,14 @@ The final state of the repository should match this diagram:</p><p><img src=/ima adjust the version, and add the tag locally. If it looks good, run it again with <code>--push-tag</code>. If you already have a clone that includes the <code>${COMMIT_REF}</code> then you can omit <code>--clone</code>. This is perfectly safe since the script does not depend on the current working tree.</p><p>See the source of the script for more details, or to run commands manually in case of a problem.</p><h3 id=run-build_release_candidate-github-action-to-create-a-release-candidate>Run build_release_candidate GitHub Action to create a release candidate</h3><p>Note: This step is partially automated (in progress), so part of the rc creation is done by GitHub Actions and the rest is done by a script. -You don’t need to wait for the action to complete to start running the script.</p><ul><li><p><strong>Action</strong> <a href=https://github.com/apache/beam/actions/workflows/build_release_candidate.yml>build_release_candidate</a> (click <code>run workflow</code>)</p></li><li><p><strong>The script will:</strong></p><ol><li>Clone the repo at the selected RC tag.</li><li>Run gradle publish to push java artifacts into Maven staging repo.</li></ol></li></ul><h4 id=tasks-you-need-to-do-m [...] +You don’t need to wait for the action to complete to start running the script.</p><ul><li><p><strong>Action</strong> <a href=https://github.com/apache/beam/actions/workflows/build_release_candidate.yml>build_release_candidate</a> (click <code>run workflow</code>)</p></li><li><p><strong>The script will:</strong></p><ol><li>Clone the repo at the selected RC tag.</li><li>Run gradle publish to push java artifacts into Maven staging repo.</li><li>Stage SDK docker images to <a href="http [...] They should contain all relevant parts for each module, including <code>pom.xml</code>, jar, test jar, javadoc, etc. Artifact names should follow <a href=https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22>the existing format</a> in which artifact name mirrors directory structure, e.g., <code>beam-sdks-java-io-kafka</code>. Carefully review any new artifacts. Some additional validation should be done during the rc validation step.</li></ol></li></ol><h3 id=run-build_release_candidatesh-to-create-a-release-candidate>Run build_release_candidate.sh to create a release candidate</h3><ul><li><p><strong>Script:</strong> <a href=https://github.com/apache/beam/blob/master/release/src/main/scripts/build_release_candidate.sh>build_release_candidate.sh</a></p></li><li><p><strong>Usage</strong></p><pre><code>./beam/release/src/main/scripts/build_release_ [...] -</code></pre></li><li><p><strong>The script will:</strong></p><ol><li>Clone the repo at the selected RC tag.</li><li>Stage source release into dist.apache.org dev <a href=https://dist.apache.org/repos/dist/dev/beam/>repo</a>.</li><li>Stage, sign and hash python source distribution and wheels into dist.apache.org dev repo python dir</li><li>Stage SDK docker images to <a href="https://hub.docker.com/search?q=apache%2Fbeam&type=image">docker hub Apache organization</a>. +</code></pre></li><li><p><strong>The script will:</strong></p><ol><li>Clone the repo at the selected RC tag.</li><li>Stage source release into dist.apache.org dev <a href=https://dist.apache.org/repos/dist/dev/beam/>repo</a>. +Skip this step if you already did it with the build_release_candidate GitHub Actions workflow.</li><li>Stage, sign and hash python source distribution and wheels into dist.apache.org dev repo python dir</li><li>Stage SDK docker images to <a href="https://hub.docker.com/search?q=apache%2Fbeam&type=image">docker hub Apache organization</a>. +Skip this step if you already did it with the build_release_candidate GitHub Actions workflow. Note: if you are not a member of the <a href=https://hub.docker.com/orgs/apache/teams/beam><code>beam</code> DockerHub team</a> you will need help with this step. Please email <code>dev@</code> and ask a member of the <code>beam</code> DockerHub team for help.</li><li>Create a PR to update beam-site, changes includes:<ul><li>Copy python doc into beam-site</li><li>Copy java doc into beam-site</li><li><strong>NOTE</strong>: Do not merge this PR until after an RC has been approved (see “Finalize the Release”).</li></ul></li></ol></li></ul><h4 id=tasks-you-need-to-do-manually-1>Tasks you need to do manually</h4><ol><li [...] Please note that dependencies for the SDKs with different Python versions vary. diff --git a/website/generated-content/documentation/io/built-in/google-bigquery/index.html b/website/generated-content/documentation/io/built-in/google-bigquery/index.html index 896c92d6904..5d99d128df3 100644 --- a/website/generated-content/documentation/io/built-in/google-bigquery/index.html +++ b/website/generated-content/documentation/io/built-in/google-bigquery/index.html @@ -327,7 +327,11 @@ GitHub</a>.</p><div class="language-java snippet"><div class="notebook-skip code <span class=k>return</span> <span class=n>rows</span><span class=o>;</span> <span class=o>}</span> -<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=c1># The SDK for Python does not support the BigQuery Storage API.</span></code></pre></div></div></div><p>The following code snippet reads wit [...] +<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>max_temperatures</span> <span class=o>=</span> <span class=p>(</span> + <span class=n>pipeline</span> + <span class=o>|</span> <span class=s1>'ReadTableWithStorageAPI'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromBigQuery</span><span class=p>(</span> + <span class=n>table</span><span class=o>=</span><span class=n>table_spec</span><span class=p>,</span> <span class=n>method</span><span class=o>=</span><span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromBigQuery</span><span class=o>.</span><span class=n>Method</span><span class=o>.</span><span class=n>DIRECT_READ</span><span class=p>)</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>elem</span><span class=p>:</span> <span class=n>elem</span><span class=p>[</span><span class=s1>'max_temperature'</span><span class=p>]))</span></code></pre></div></div></div><p>The following code snippet reads with a query string.</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=cop [...] <span class=kn>import</span> <span class=nn>org.apache.beam.sdk.Pipeline</span><span class=o>;</span> <span class=kn>import</span> <span class=nn>org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO</span><span class=o>;</span> <span class=kn>import</span> <span class=nn>org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method</span><span class=o>;</span> @@ -623,7 +627,7 @@ be replaced.</p><div class="language-java snippet"><div class="notebook-skip cod You can either keep retrying, or return the failed records in a separate <code>PCollection</code> using the <code>WriteResult.getFailedInserts()</code> method.</p><h3 id=storage-write-api>Using the Storage Write API</h3><p>Starting with version 2.36.0 of the Beam SDK for Java, you can use the <a href=https://cloud.google.com/bigquery/docs/write-api>BigQuery Storage Write API</a> -from the BigQueryIO connector.</p><h4 id=exactly-once-semantics>Exactly-once semantics</h4><p>To write to BigQuery using the Storage Write API, set <code>withMethod</code> to +from the BigQueryIO connector.</p><p>Also after version 2.47.0 of Beam SDK for Python, SDK supports BigQuery Storage Write API.</p><p class=language-py>BigQuery Storage Write API for Python SDK currently has some limitations on supported data types. As this method makes use of cross-language transforms, we are limited to the types supported at the cross-language boundary. For example, <code>apache_beam.utils.timestamp.Timestamp</code> is needed to write a <code>TIMESTAMP</code> BigQuery [...] <a href=https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#STORAGE_WRITE_API><code>Method.STORAGE_WRITE_API</code></a>. Here’s an example transform that writes to BigQuery using the Storage Write API and exactly-once semantics:</p><p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>WriteResult</span> <span class=n>writeResult</span> <span class=o>=</span> [...] <span class=n>BigQueryIO</span><span class=o>.</span><span class=na>writeTableRows</span><span class=o>()</span> @@ -631,7 +635,10 @@ Here’s an example transform that writes to BigQuery using the Storage Write AP <span class=o>.</span><span class=na>withWriteDisposition</span><span class=o>(</span><span class=n>WriteDisposition</span><span class=o>.</span><span class=na>WRITE_APPEND</span><span class=o>)</span> <span class=o>.</span><span class=na>withCreateDisposition</span><span class=o>(</span><span class=n>CreateDisposition</span><span class=o>.</span><span class=na>CREATE_NEVER</span><span class=o>)</span> <span class=o>.</span><span class=na>withMethod</span><span class=o>(</span><span class=n>Method</span><span class=o>.</span><span class=na>STORAGE_WRITE_API</span><span class=o>)</span> -<span class=o>);</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=c1># The SDK for Python does not support the BigQuery Storage API.</span></code></pre></div></div></div></p><p>If you want to change the behav [...] +<span class=o>);</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>quotes</span> <span class=o>|</span> <span class=s2>"WriteTableWithStorageAPI"</span> <span class=o>>></span> <span class=n>be [...] + <span class=n>table_spec</span><span class=p>,</span> + <span class=n>schema</span><span class=o>=</span><span class=n>table_schema</span><span class=p>,</span> + <span class=n>method</span><span class=o>=</span><span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>WriteToBigQuery</span><span class=o>.</span><span class=n>Method</span><span class=o>.</span><span class=n>STORAGE_WRITE_API</span><span class=p>)</span></code></pre></div></div></div></p><p>If you want to change the behavior of BigQueryIO so that all the BigQuery sinks for your pipeline use the Storage Write API by default, set the <a href=https://github.com/apache/beam/blob/2c18ce0ccd7705473aa9ecc443dcdbe223dd9449/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L82-L86><code>UseStorageWriteApi</code> option</a>.</p><p>If your pipeline needs to create the table (in case it doesn’t exist and you specified the create disposition as <code>CREATE_IF_NEEDED</code>), you must provide a @@ -645,12 +652,23 @@ binary protocol.</p><p><div class="language-java snippet"><div class="notebook-s <span class=k>new</span> <span class=n>TableFieldSchema</span><span class=o>()</span> <span class=o>.</span><span class=na>setName</span><span class=o>(</span><span class=s>"user_name"</span><span class=o>)</span> <span class=o>.</span><span class=na>setType</span><span class=o>(</span><span class=s>"STRING"</span><span class=o>)</span> - <span class=o>.</span><span class=na>setMode</span><span class=o>(</span><span class=s>"REQUIRED"</span><span class=o>)));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=c1># The SDK [...] + <span class=o>.</span><span class=na>setMode</span><span class=o>(</span><span class=s>"REQUIRED"</span><span class=o>)));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>table_sche [...] + <span class=s1>'fields'</span><span class=p>:</span> <span class=p>[{</span> + <span class=s1>'name'</span><span class=p>:</span> <span class=s1>'source'</span><span class=p>,</span> <span class=s1>'type'</span><span class=p>:</span> <span class=s1>'STRING'</span><span class=p>,</span> <span class=s1>'mode'</span><span class=p>:</span> <span class=s1>'NULLABLE'</span> + <span class=p>},</span> <span class=p>{</span> + <span class=s1>'name'</span><span class=p>:</span> <span class=s1>'quote'</span><span class=p>,</span> <span class=s1>'type'</span><span class=p>:</span> <span class=s1>'STRING'</span><span class=p>,</span> <span class=s1>'mode'</span><span class=p>:</span> <span class=s1>'REQUIRED'</span> + <span class=p>}]</span> +<span class=p>}</span></code></pre></div></div></div></p><p>For streaming pipelines, you need to set two additional parameters: the number of streams and the triggering frequency.</p><p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>BigQueryIO</span><span class=o>.</span><span class=na>writeTableRows</span><span class=o>()</span> <span class=c1>// ... </span><span class=c1></span> <span class=o>.</span><span class=na>withTriggeringFrequency</span><span class=o>(</span><span class=n>Duration</span><span class=o>.</span><span class=na>standardSeconds</span><span class=o>(</span><span class=n>5</span><span class=o>))</span> <span class=o>.</span><span class=na>withNumStorageWriteApiStreams</span><span class=o>(</span><span class=n>3</span><span class=o>)</span> -<span class=o>);</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=c1># The SDK for Python does not support the BigQuery Storage API.</span></code></pre></div></div></div></p><p>The number of streams defines t [...] +<span class=o>);</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=c1># The Python SDK doesn't currently support setting the number of write streams</span> +<span class=n>quotes</span> <span class=o>|</span> <span class=s2>"StorageWriteAPIWithFrequency"</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>WriteToBigQuery</span><span class=p>(</span> + <span class=n>table_spec</span><span class=p>,</span> + <span class=n>schema</span><span class=o>=</span><span class=n>table_schema</span><span class=p>,</span> + <span class=n>method</span><span class=o>=</span><span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>WriteToBigQuery</span><span class=o>.</span><span class=n>Method</span><span class=o>.</span><span class=n>STORAGE_WRITE_API</span><span class=p>,</span> + <span class=n>triggering_frequency</span><span class=o>=</span><span class=mi>5</span><span class=p>)</span></code></pre></div></div></div></p><p>The number of streams defines the parallelism of the BigQueryIO Write transform and roughly corresponds to the number of Storage Write API streams that the pipeline uses. You can set it explicitly on the transform via <a href=https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withNumStorageWriteApiStreams-int-><code>withNumStorageWriteApiStreams</code></a> diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml index 75596dc1504..43a9b580b16 100644 --- a/website/generated-content/sitemap.xml +++ b/website/generated-content/sitemap.xml @@ -1 +1 @@ -<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod></url><url><loc>/blog/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod></url><url><loc>/categories/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod></url><url><loc>/blog/managing-beam-dependencies-in-java/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod> [...] \ No newline at end of file +<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod></url><url><loc>/blog/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod></url><url><loc>/categories/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod></url><url><loc>/blog/managing-beam-dependencies-in-java/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod> [...] \ No newline at end of file