[jira] [Closed] (BEAM-3333) Create Elasticsearch IO compatible with ES 6.x

2017-12-11 Thread Fokko van der Wal (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko van der Wal closed BEAM-.
---
Resolution: Duplicate

> Create Elasticsearch IO compatible with ES 6.x
> --
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Fokko van der Wal
>Assignee: Etienne Chauchot
>Priority: Minor
> Fix For: 2.2.0
>
>
> The current Elasticsearch IO is only compatible with Elasticsearch v 2.x and 
> v 5.x. The aim is to have an IO compatible with ES v 6.x. Beyond being able 
> to address v6.x elasticsearch instances, we could also leverage the use of 
> the Elasticsearch pipeline API and also better split the dataset (be as close 
> as possible of desiredBundleSize) thanks to the new ES split API that allows 
> ES shards splitting.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3333) Create Elasticsearch IO compatible with ES 6.x

2017-12-11 Thread Fokko van der Wal (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko van der Wal updated BEAM-:

Description: The current Elasticsearch IO is only compatible with 
Elasticsearch v 2.x and v 5.x. The aim is to have an IO compatible with ES v 
6.x. Beyond being able to address v6.x elasticsearch instances, we could also 
leverage the use of the Elasticsearch pipeline API and also better split the 
dataset (be as close as possible of desiredBundleSize) thanks to the new ES 
split API that allows ES shards splitting.  (was: The current Elasticsearch IO 
(see https://issues.apache.org/jira/browse/BEAM-425) is only compatible with 
Elasticsearch v 2.x. The aim is to have an IO compatible with ES v 5.x. Beyond 
being able to address v5.x elasticsearch instances, we could also leverage the 
use of the Elasticsearch pipeline API and also better split the dataset (be as 
close as possible of desiredBundleSize) thanks to the new ES split API that 
allows ES shards splitting.)

> Create Elasticsearch IO compatible with ES 6.x
> --
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Fokko van der Wal
>Assignee: Etienne Chauchot
>Priority: Minor
> Fix For: 2.2.0
>
>
> The current Elasticsearch IO is only compatible with Elasticsearch v 2.x and 
> v 5.x. The aim is to have an IO compatible with ES v 6.x. Beyond being able 
> to address v6.x elasticsearch instances, we could also leverage the use of 
> the Elasticsearch pipeline API and also better split the dataset (be as close 
> as possible of desiredBundleSize) thanks to the new ES split API that allows 
> ES shards splitting.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3333) Create Elasticsearch IO compatible with ES 6.x

2017-12-11 Thread Fokko van der Wal (JIRA)
Fokko van der Wal created BEAM-:
---

 Summary: Create Elasticsearch IO compatible with ES 6.x
 Key: BEAM-
 URL: https://issues.apache.org/jira/browse/BEAM-
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-extensions
Reporter: Fokko van der Wal
Assignee: Etienne Chauchot
Priority: Minor
 Fix For: 2.2.0


The current Elasticsearch IO (see 
https://issues.apache.org/jira/browse/BEAM-425) is only compatible with 
Elasticsearch v 2.x. The aim is to have an IO compatible with ES v 5.x. Beyond 
being able to address v5.x elasticsearch instances, we could also leverage the 
use of the Elasticsearch pipeline API and also better split the dataset (be as 
close as possible of desiredBundleSize) thanks to the new ES split API that 
allows ES shards splitting.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3332) AfterProcessingTimer trigger not firing if invoked exactly on time

2017-12-11 Thread Shen Li (JIRA)
Shen Li created BEAM-3332:
-

 Summary: AfterProcessingTimer trigger not firing if invoked 
exactly on time
 Key: BEAM-3332
 URL: https://issues.apache.org/jira/browse/BEAM-3332
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Affects Versions: 2.2.0, 2.1.0, 2.0.0
Reporter: Shen Li
Assignee: Kenneth Knowles
Priority: Trivial


I occasionally run into an issue that the processing time trigger is invoked on 
time, but the TriggerStateMachienRunner#shouldFire() returns false. After 
comparing time instances, I found that this issue occurs when the trigger is 
invoked exactly on time. It is because the 
AfterDelayFromFirstElementStateMachine does the following:

{quote}return delayedUntil != null
&& getCurrentTime(context) != null
&& getCurrentTime(context).isAfter(delayedUntil);{quote}

which only returns true when the current processing time is AFTER (exclude 
equals) delayUntil. Should it actually be 
!getCurrentTime(context).isBefore(delayedUntil) ?





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3194) Support annotating that a DoFn requires stable / deterministic input for replay/retry

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287091#comment-16287091
 ] 

ASF GitHub Bot commented on BEAM-3194:
--

kennknowles closed pull request #4135: [BEAM-3194] Add @RequiresStableInput 
annotation
URL: https://github.com/apache/beam/pull/4135
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
index 3e023db679d..9f8dd45d110 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
@@ -567,6 +567,29 @@ public Duration getAllowedTimestampSkew() {
   @Target(ElementType.METHOD)
   public @interface ProcessElement {}
 
+  /**
+   * Experimental - no backwards compatibility guarantees. The exact 
name or usage of this
+   * feature may change.
+   *
+   * Annotation that may be added to a {@link ProcessElement} or {@link 
OnTimer} method to
+   * indicate that the runner must ensure that the observable contents of the 
input {@link
+   * PCollection} or mutable state must be stable upon retries.
+   *
+   * This is important for sinks, which must ensure exactly-once semantics 
when writing to a
+   * storage medium outside of your pipeline. A general pattern for a basic 
sink is to write a
+   * {@link DoFn} that can perform an idempotent write, and annotate that it 
requires stable input.
+   * Combined, these allow the write to be freely retried until success.
+   *
+   * An example of an unstable input would be anything computed using 
nondeterministic logic. In
+   * Beam, any user-defined function is permitted to be nondeterministic, and 
any {@link
+   * PCollection} is permitted to be recomputed in any manner.
+   */
+  @Documented
+  @Experimental
+  @Retention(RetentionPolicy.RUNTIME)
+  @Target(ElementType.METHOD)
+  public @interface RequiresStableInput {}
+
   /**
* Annotation for the method to use to finish processing a batch of elements.
* The method annotated with this must satisfy the following constraints:
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java
index bfad69ea776..1e126611452 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java
@@ -426,6 +426,12 @@ public static TimerParameter 
timerParameter(TimerDeclaration decl) {
 @Override
 public abstract List extraParameters();
 
+/**
+ * Whether this method requires stable input, expressed via {@link
+ * org.apache.beam.sdk.transforms.DoFn.RequiresStableInput}.
+ */
+public abstract boolean requiresStableInput();
+
 /** Concrete type of the {@link RestrictionTracker} parameter, if present. 
*/
 @Nullable
 public abstract TypeDescriptor trackerT();
@@ -440,12 +446,14 @@ public static TimerParameter 
timerParameter(TimerDeclaration decl) {
 static ProcessElementMethod create(
 Method targetMethod,
 List extraParameters,
+boolean requiresStableInput,
 TypeDescriptor trackerT,
 @Nullable TypeDescriptor windowT,
 boolean hasReturnValue) {
   return new AutoValue_DoFnSignature_ProcessElementMethod(
   targetMethod,
   Collections.unmodifiableList(extraParameters),
+  requiresStableInput,
   trackerT,
   windowT,
   hasReturnValue);
@@ -487,6 +495,13 @@ public boolean isSplittable() {
 @Override
 public abstract Method targetMethod();
 
+/**
+ * Whether this method requires stable input, expressed via {@link
+ * org.apache.beam.sdk.transforms.DoFn.RequiresStableInput}. For timers, 
this means that any
+ * state must be stably persisted prior to calling it.
+ */
+public abstract boolean requiresStableInput();
+
 /** The window type used by this method, if any. */
 @Nullable
 public abstract TypeDescriptor windowT();
@@ -498,10 +513,15 @@ public boolean isSplittable() {
 static OnTimerMethod create(
 Method targetMethod,
 String id,
+boolean requiresStableInput,
 TypeDescriptor windowT,
 List extraParameters) {
   return new AutoValue_DoFnSignature_OnTimerMethod(
-  id, targetMethod, windowT, 
Collections.unmodifiableList(extraParameters));
+  id,
+  targetMethod,
+  requiresStableInput,
+  

[beam] 01/01: Merge pull request #4135: [BEAM-3194] Add @RequiresStableInput annotation

2017-12-11 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 91196ccbeb4ca33fc0b99c6f6367d0811ab2057a
Merge: 17a2aba fc29e67
Author: Kenn Knowles 
AuthorDate: Mon Dec 11 20:40:35 2017 -0800

Merge pull request #4135: [BEAM-3194] Add @RequiresStableInput annotation

 .../java/org/apache/beam/sdk/transforms/DoFn.java  | 23 ++
 .../beam/sdk/transforms/reflect/DoFnSignature.java | 22 -
 .../sdk/transforms/reflect/DoFnSignatures.java |  9 +++--
 .../sdk/transforms/reflect/DoFnSignaturesTest.java | 12 +++
 4 files changed, 63 insertions(+), 3 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam] branch master updated (17a2aba -> 91196cc)

2017-12-11 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 17a2aba  Merge pull request #4197
 add fc29e67  Add @RequiresStableInput annotation
 new 91196cc  Merge pull request #4135: [BEAM-3194] Add 
@RequiresStableInput annotation

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../java/org/apache/beam/sdk/transforms/DoFn.java  | 23 ++
 .../beam/sdk/transforms/reflect/DoFnSignature.java | 22 -
 .../sdk/transforms/reflect/DoFnSignatures.java |  9 +++--
 .../sdk/transforms/reflect/DoFnSignaturesTest.java | 12 +++
 4 files changed, 63 insertions(+), 3 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[jira] [Commented] (BEAM-3239) Enable debug server for python sdk workers

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286325#comment-16286325
 ] 

ASF GitHub Bot commented on BEAM-3239:
--

asfgit closed pull request #4178: [BEAM-3239] Adding debug server to sdk worker 
to get threaddumps
URL: https://github.com/apache/beam/pull/4178
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py 
b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
index 684269eee9b..1db8b29175f 100644
--- a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
+++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
@@ -14,13 +14,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-
 """SDK Fn Harness entry point."""
 
+import BaseHTTPServer
 import json
 import logging
 import os
 import sys
+import threading
 import traceback
 
 from google.protobuf import text_format
@@ -34,6 +35,51 @@
 # This module is experimental. No backwards-compatibility guarantees.
 
 
+class StatusServer(object):
+
+  @classmethod
+  def get_thread_dump(cls):
+lines = []
+frames = sys._current_frames()  # pylint: disable=protected-access
+
+for t in threading.enumerate():
+  lines.append('--- Thread #%s name: %s ---\n' % (t.ident, t.name))
+  lines.append(''.join(traceback.format_stack(frames[t.ident])))
+
+return lines
+
+  def start(self, status_http_port=0):
+"""Executes the serving loop for the status server.
+
+Args:
+  status_http_port(int): Binding port for the debug server.
+Default is 0 which means any free unsecured port
+"""
+
+class StatusHttpHandler(BaseHTTPServer.BaseHTTPRequestHandler):
+  """HTTP handler for serving stacktraces of all threads."""
+
+  def do_GET(self):  # pylint: disable=invalid-name
+"""Return all thread stacktraces information for GET request."""
+self.send_response(200)
+self.send_header('Content-Type', 'text/plain')
+self.end_headers()
+
+for line in StatusServer.get_thread_dump():
+  self.wfile.write(line)
+
+  def log_message(self, f, *args):
+"""Do not log any messages."""
+pass
+
+self.httpd = httpd = BaseHTTPServer.HTTPServer(
+('localhost', status_http_port), StatusHttpHandler)
+logging.info('Status HTTP server running at %s:%s', httpd.server_name,
+ httpd.server_port)
+
+httpd.serve_forever()
+
+
 def main(unused_argv):
   """Main entry point for SDK Fn Harness."""
   if 'LOGGING_API_SERVICE_DESCRIPTOR' in os.environ:
@@ -49,6 +95,12 @@ def main(unused_argv):
   else:
 fn_log_handler = None
 
+  # Start status HTTP server thread.
+  thread = threading.Thread(target=StatusServer().start)
+  thread.daemon = True
+  thread.setName('status-server-demon')
+  thread.start()
+
   if 'PIPELINE_OPTIONS' in os.environ:
 sdk_pipeline_options = json.loads(os.environ['PIPELINE_OPTIONS'])
   else:
@@ -89,8 +141,8 @@ def main(unused_argv):
 def _load_main_session(semi_persistent_directory):
   """Loads a pickled main session from the path specified."""
   if semi_persistent_directory:
-session_file = os.path.join(
-semi_persistent_directory, 'staged', names.PICKLED_MAIN_SESSION_FILE)
+session_file = os.path.join(semi_persistent_directory, 'staged',
+names.PICKLED_MAIN_SESSION_FILE)
 if os.path.isfile(session_file):
   pickler.load_session(session_file)
 else:
diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py 
b/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py
new file mode 100644
index 000..9305c990b10
--- /dev/null
+++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py
@@ -0,0 +1,44 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Tests for 

[jira] [Commented] (BEAM-2469) Handling Kinesis shards splits and merges

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286318#comment-16286318
 ] 

ASF GitHub Bot commented on BEAM-2469:
--

pawel-kaczmarczyk opened a new pull request #4243: [BEAM-2469] Handling Kinesis 
shards splits and merges
URL: https://github.com/apache/beam/pull/4243
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [X] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [X] Each commit in the pull request should have a meaningful subject line 
and body.
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [X] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [X] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   This pull requests tries to address stream resharding operations described 
in detail 
[here](http://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html).
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Handling Kinesis shards splits and merges
> -
>
> Key: BEAM-2469
> URL: https://issues.apache.org/jira/browse/BEAM-2469
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Paweł Kaczmarczyk
>Assignee: Paweł Kaczmarczyk
>
> Kinesis stream consists of 
> [shards|http://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#shard]
>  that allow for capacity scaling. In order to increase/decrease the capacity 
> shards have to be split/merged together. Such operations are currently not 
> handled properly and will end with errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #4510

2017-12-11 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Spark #1107

2017-12-11 Thread Apache Jenkins Server
See 


Changes:

[arostami] Ensure temp file is closed before returning.

[altay] Adding debug server to sdk worker to get threaddumps

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision d65dea667935f153477a2b94d19aaef6eda21f9e (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d65dea667935f153477a2b94d19aaef6eda21f9e
Commit message: "This closes #4178"
 > git rev-list 41da1ab4ddf530741d2d264136ae97215cc1ae0a # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Spark] $ /bin/bash -xe /tmp/jenkins8985590904124184272.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Spark] $ /bin/bash -xe /tmp/jenkins9146292681428964803.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Spark] $ /bin/bash -xe /tmp/jenkins941444664696429816.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in /usr/lib/python2.7/dist-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied: numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: colorama; extra == "windows" in 
/usr/lib/python2.7/dist-packages (from colorlog[windows]==2.6.0->-r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: xmltodict in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: requests-ntlm>=0.3.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: requests>=2.9.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: ntlm-auth>=1.0.2 in 
/home/jenkins/.local/lib/python2.7/site-packages (from 

Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #4513

2017-12-11 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-3328) Create and relate portability service servers in the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh updated BEAM-3328:
--
Description: Initially this requires creating a Logging Service, Data 
Service, and Control Service instance, and making them available to any remote 
container.

> Create and relate portability service servers in the Universal Local Runner
> ---
>
> Key: BEAM-3328
> URL: https://issues.apache.org/jira/browse/BEAM-3328
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>  Labels: portability
>
> Initially this requires creating a Logging Service, Data Service, and Control 
> Service instance, and making them available to any remote container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3329) Add an in-process 'container' manager for testing in the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3329:
-

 Summary: Add an in-process 'container' manager for testing in the 
Universal Local Runner
 Key: BEAM-3329
 URL: https://issues.apache.org/jira/browse/BEAM-3329
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Thomas Groh
Priority: Minor


For unit testing remote stage execution, a local 'container' would be useful to 
verify the 

This necessitates adding a test-only dependency on the Java SDK harness from 
the ULR, and adding an in-process channel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3331) Construct executable stages for a Pipeline within the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3331:
-

 Summary: Construct executable stages for a Pipeline within the 
Universal Local Runner
 Key: BEAM-3331
 URL: https://issues.apache.org/jira/browse/BEAM-3331
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Thomas Groh


This generally splits a Pipeline into multiple distinct executable stages, each 
of which can be invoked over a Fn API RPC into a single environment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2899) Universal Local Runner

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287004#comment-16287004
 ] 

ASF GitHub Bot commented on BEAM-2899:
--

tgroh closed pull request #4197: [BEAM-2899] Port DataBufferingOutboundObserver
URL: https://github.com/apache/beam/pull/4197
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClient.java
 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClient.java
index 5b47a581804..d51318c9351 100644
--- 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClient.java
+++ 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/SdkHarnessClient.java
@@ -25,7 +25,7 @@
 import java.util.concurrent.Future;
 import java.util.concurrent.atomic.AtomicLong;
 import org.apache.beam.model.fnexecution.v1.BeamFnApi;
-import org.apache.beam.runners.fnexecution.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
 
 /**
  * A high-level client for an SDK harness.
diff --git 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/FnDataService.java
 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/FnDataService.java
index 1be01a7b851..fcbcea10b86 100644
--- 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/FnDataService.java
+++ 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/FnDataService.java
@@ -20,6 +20,7 @@
 
 import com.google.common.util.concurrent.ListenableFuture;
 import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
 import org.apache.beam.sdk.fn.data.LogicalEndpoint;
 import org.apache.beam.sdk.util.WindowedValue;
 
diff --git a/sdks/java/fn-execution/build.gradle 
b/sdks/java/fn-execution/build.gradle
index 34be858783a..45213a75e5f 100644
--- a/sdks/java/fn-execution/build.gradle
+++ b/sdks/java/fn-execution/build.gradle
@@ -26,6 +26,7 @@ dependencies {
   shadow project(path: ":beam-model-parent:beam-model-pipeline", 
configuration: "shadow")
   shadow project(path: ":beam-model-parent:beam-model-fn-execution", 
configuration: "shadow")
   shadow project(path: 
":beam-sdks-parent:beam-sdks-java-parent:beam-sdks-java-core", configuration: 
"shadow")
+  shadow library.java.slf4j_api
   shadow library.java.grpc_core
   shadow library.java.grpc_stub
   shadow library.java.grpc_netty
diff --git a/sdks/java/fn-execution/pom.xml b/sdks/java/fn-execution/pom.xml
index 6e32e032bc6..82fc3ec7ce9 100644
--- a/sdks/java/fn-execution/pom.xml
+++ b/sdks/java/fn-execution/pom.xml
@@ -45,6 +45,8 @@
   beam-model-fn-execution
 
 
+  
 
   org.apache.beam
   beam-sdks-java-core
@@ -81,6 +83,11 @@
   guava
 
 
+
+  org.slf4j
+  slf4j-api
+
+
 
 
   com.google.auto.value
diff --git 
a/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java
 
b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java
new file mode 100644
index 000..21c73509511
--- /dev/null
+++ 
b/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataBufferingOutboundObserver.java
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.fn.data;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.protobuf.ByteString;
+import io.grpc.stub.StreamObserver;
+import java.io.IOException;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A 

[beam] branch master updated (7237e59 -> 17a2aba)

2017-12-11 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 7237e59  Merge pull request #4240 from aaltay/cancel
 add 3937847  Port DataBufferingOutboundObserver
 new 17a2aba  Merge pull request #4197

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../fnexecution/control/SdkHarnessClient.java  |  2 +-
 .../runners/fnexecution/data/FnDataService.java|  1 +
 sdks/java/fn-execution/build.gradle|  1 +
 sdks/java/fn-execution/pom.xml |  7 ++
 .../data/BeamFnDataBufferingOutboundObserver.java  | 57 ---
 .../apache/beam/sdk/fn}/data/FnDataReceiver.java   | 15 ++--
 .../BeamFnDataBufferingOutboundObserverTest.java   | 80 ++
 7 files changed, 100 insertions(+), 63 deletions(-)
 copy sdks/java/{harness/src/main/java/org/apache/beam/fn/harness => 
fn-execution/src/main/java/org/apache/beam/sdk/fn}/data/BeamFnDataBufferingOutboundObserver.java
 (73%)
 rename 
{runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution => 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn}/data/FnDataReceiver.java
 (75%)
 copy sdks/java/{harness/src/test/java/org/apache/beam/fn/harness => 
fn-execution/src/test/java/org/apache/beam/sdk/fn}/data/BeamFnDataBufferingOutboundObserverTest.java
 (69%)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[beam] 01/01: Merge pull request #4197

2017-12-11 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 17a2abaf2c1beca922da224ce142e09541ecc43d
Merge: 7237e59 3937847
Author: Thomas Groh 
AuthorDate: Mon Dec 11 18:45:07 2017 -0800

Merge pull request #4197

[BEAM-2899] Port DataBufferingOutboundObserver

 .../fnexecution/control/SdkHarnessClient.java  |   2 +-
 .../runners/fnexecution/data/FnDataService.java|   1 +
 sdks/java/fn-execution/build.gradle|   1 +
 sdks/java/fn-execution/pom.xml |   7 +
 .../data/BeamFnDataBufferingOutboundObserver.java  | 133 
 .../apache/beam/sdk/fn}/data/FnDataReceiver.java   |  15 +-
 .../BeamFnDataBufferingOutboundObserverTest.java   | 169 +
 7 files changed, 323 insertions(+), 5 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[jira] [Updated] (BEAM-3327) Create and Manage Containers in the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh updated BEAM-3327:
--
Labels: portability  (was: )

> Create and Manage Containers in the Universal Local Runner
> --
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>  Labels: portability
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3327) Create and Manage Containers in the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3327:
-

 Summary: Create and Manage Containers in the Universal Local Runner
 Key: BEAM-3327
 URL: https://issues.apache.org/jira/browse/BEAM-3327
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Thomas Groh
Assignee: Thomas Groh


This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3328) Create and relate portability service servers in the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3328:
-

 Summary: Create and relate portability service servers in the 
Universal Local Runner
 Key: BEAM-3328
 URL: https://issues.apache.org/jira/browse/BEAM-3328
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Thomas Groh
Assignee: Thomas Groh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3330) Add a representation for a Fused Stage in the Universal Local Runner

2017-12-11 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3330:
-

 Summary: Add a representation for a Fused Stage in the Universal 
Local Runner
 Key: BEAM-3330
 URL: https://issues.apache.org/jira/browse/BEAM-3330
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Thomas Groh


This needs to be some executable stage that can be converted directly into a 
ProcessBundleDescriptor and provided as a RegisterBundleRequest. This should be 
usable as a token within a single execution of the Universal Local Runner to 
indicate some stage that can be executed within a single environment.

There should initially be a single input PCollection, some number of output 
PCollections, and associated RemoteGrpcPort read and write nodes associated 
with those nodes. Determining what is placed within a single executable stage 
lies elsewhere (see BEAM-3331)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2017-12-11 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3326:
-

 Summary: Execute a Stage via the portability framework in the 
ReferenceRunner
 Key: BEAM-3326
 URL: https://issues.apache.org/jira/browse/BEAM-3326
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Thomas Groh
Assignee: Thomas Groh


This is the supertask for remote execution in the Universal Local Runner 
(BEAM-2899).

This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-3013) The Python worker should report lulls

2017-12-11 Thread Pablo Estrada (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-3013.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> The Python worker should report lulls
> -
>
> Key: BEAM-3013
> URL: https://issues.apache.org/jira/browse/BEAM-3013
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
> Fix For: 2.3.0
>
>
> Whenever too much time has been spent on the same state (e.g. > 5 minutes), 
> the worker should report it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-3284) Python SDK, dataflow runner, the method modify_job_status is calling the wrong API endpoint

2017-12-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-3284.
---
Resolution: Fixed

> Python SDK, dataflow runner, the method modify_job_status is calling the 
> wrong API endpoint
> ---
>
> Key: BEAM-3284
> URL: https://issues.apache.org/jira/browse/BEAM-3284
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0, 2.2.0
>Reporter:  Ron Mahoney
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: 2.3.0
>
>
> In the Python SDK, for dataflow runner, the method modify_job_status is 
> calling the wrong API endpoint.  Discovered while trying to cancel a job by 
> setting the status to JOB_STATE_CANCELLED, received the following error:
> {noformat}
> WARNING:root:Retry with exponential backoff: waiting for 154.109699453 
> seconds before retrying modify_job_state because we caught exception: 
> TypecheckError: Type of arg is " 'apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_messages.DataflowProjectsLocationsJobsUpdateRequest'>",
>  not " 'apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_messages.DataflowProjectsJobsUpdateRequest'>"
> {noformat}
> The following change fixes this:
> {noformat}
> diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
> b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> index edac9d7d5..1124ee182 100644
> --- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> +++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> @@ -512,7 +512,7 @@ class DataflowApplicationClient(object):
># Other states could only be set by the service.
>return False
> -request = dataflow.DataflowProjectsLocationsJobsUpdateRequest()
> +request = dataflow.DataflowProjectsJobsUpdateRequest()
>  request.jobId = job_id
>  request.projectId = self.google_cloud_options.project
>  request.location = self.google_cloud_options.region
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2104) Consider adding interactive notebooks as examples

2017-12-11 Thread Pablo Estrada (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-2104:

Labels: community-onboarding  (was: )

> Consider adding interactive notebooks as examples
> -
>
> Key: BEAM-2104
> URL: https://issues.apache.org/jira/browse/BEAM-2104
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: community-onboarding
>
> For an example see:
> https://github.com/quantopian/zipline/blob/master/zipline/examples/buyapple.ipynb
> These notebook, hosted on github might be a nice interactive basic examples 
> for first time users.
> cc: [~tibor.k...@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2265) Python word count with DirectRunner gets stuck during application termination on Windows

2017-12-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-2265:
-

Assignee: (was: Ahmet Altay)

> Python word count with DirectRunner gets stuck during application termination 
> on Windows
> 
>
> Key: BEAM-2265
> URL: https://issues.apache.org/jira/browse/BEAM-2265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.0.0
>Reporter: Luke Cwik
>Priority: Minor
>
> Using virtualenv 15 + python 2.7.13 + pip 9.0.1 on Windows 2016
> Example logs from DirectRunner:
> {code}
> (beamRC2)PS C:\Users\lcwik\.virtualenvs\beamRC2> python -m 
> apache_beam.examples.wordcount --input ".\input\*" --output l
> ocal_counts
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> INFO:root:Missing pipeline option (runner). Executing pipeline using the 
> default runner: DirectRunner.
> INFO:root:Running pipeline with DirectRunner.
> {code}
> Application gets stuck here, pressing ctrl-z gets it unstuck and the 
> remainder below is logged
> {code}
> INFO:root:Starting finalize_write threads with num_shards: 1, batches: 1, 
> num_threads: 1
> INFO:root:Renamed 1 shards in 0.14 seconds.
> INFO:root:number of empty lines: 47851
> INFO:root:average word length: 4
> {code}
> Output is correct, so it seems as though the bug is somewhere in shutdown.
> Happens when using a local or gs path with the DirectRunner. Enabling DEBUG 
> logging did not add any additional details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3323) Create a generator of finite-but-unbounded PCollection's for integration testing

2017-12-11 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286911#comment-16286911
 ] 

Eugene Kirpichov commented on BEAM-3323:


That's a pretty good idea. Yes it does, though only in runners that support 
both SDF and TestStream, which right now is only the direct runner :) We could 
also implement GenerateSequence.from().to().asUnbounded() via an SDF to start 
getting some more SDF coverage.

> Create a generator of finite-but-unbounded PCollection's for integration 
> testing
> 
>
> Key: BEAM-3323
> URL: https://issues.apache.org/jira/browse/BEAM-3323
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>
> Several IOs have features that exhibit nontrivial behavior when writing 
> unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO. 
> We need to be able to write integration tests for these features.
> Currently we have two ways to generate an unbounded PCollection without 
> reading from a real-world external streaming system such as pubsub or kafka:
> 1) TestStream, which only works in direct runner - sufficient for some tests 
> but not all: definitely not sufficient for large-scale tests or for tests 
> that need to interact with a real instance of the external system (e.g. 
> BigQueryIO). It is also quite verbose to use.
> 2) GenerateSequence.from(0) without a .to(), which returns an infinite amount 
> of data.
> GenerateSequence.from(a).to(b) returns a finite amount of data, but returns 
> it as a bounded PCollection, and doesn't report the watermark.
> I think the right thing to do here, for now, is to make 
> GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where 
> it will return an unbounded PCollection, go through UnboundedSource (or 
> potentially via SDF in runners that support it), and track the watermark 
> properly (or via a configurable watermark fn).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3284) Python SDK, dataflow runner, the method modify_job_status is calling the wrong API endpoint

2017-12-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-3284:
--
Fix Version/s: 2.3.0

> Python SDK, dataflow runner, the method modify_job_status is calling the 
> wrong API endpoint
> ---
>
> Key: BEAM-3284
> URL: https://issues.apache.org/jira/browse/BEAM-3284
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0, 2.2.0
>Reporter:  Ron Mahoney
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: 2.3.0
>
>
> In the Python SDK, for dataflow runner, the method modify_job_status is 
> calling the wrong API endpoint.  Discovered while trying to cancel a job by 
> setting the status to JOB_STATE_CANCELLED, received the following error:
> {noformat}
> WARNING:root:Retry with exponential backoff: waiting for 154.109699453 
> seconds before retrying modify_job_state because we caught exception: 
> TypecheckError: Type of arg is " 'apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_messages.DataflowProjectsLocationsJobsUpdateRequest'>",
>  not " 'apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_messages.DataflowProjectsJobsUpdateRequest'>"
> {noformat}
> The following change fixes this:
> {noformat}
> diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
> b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> index edac9d7d5..1124ee182 100644
> --- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> +++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> @@ -512,7 +512,7 @@ class DataflowApplicationClient(object):
># Other states could only be set by the service.
>return False
> -request = dataflow.DataflowProjectsLocationsJobsUpdateRequest()
> +request = dataflow.DataflowProjectsJobsUpdateRequest()
>  request.jobId = job_id
>  request.projectId = self.google_cloud_options.project
>  request.location = self.google_cloud_options.region
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3323) Create a generator of finite-but-unbounded PCollection's for integration testing

2017-12-11 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286900#comment-16286900
 ] 

Kenneth Knowles commented on BEAM-3323:
---

Doesn't TestStream + SDF yield plenty of events?

> Create a generator of finite-but-unbounded PCollection's for integration 
> testing
> 
>
> Key: BEAM-3323
> URL: https://issues.apache.org/jira/browse/BEAM-3323
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>
> Several IOs have features that exhibit nontrivial behavior when writing 
> unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO. 
> We need to be able to write integration tests for these features.
> Currently we have two ways to generate an unbounded PCollection without 
> reading from a real-world external streaming system such as pubsub or kafka:
> 1) TestStream, which only works in direct runner - sufficient for some tests 
> but not all: definitely not sufficient for large-scale tests or for tests 
> that need to interact with a real instance of the external system (e.g. 
> BigQueryIO). It is also quite verbose to use.
> 2) GenerateSequence.from(0) without a .to(), which returns an infinite amount 
> of data.
> GenerateSequence.from(a).to(b) returns a finite amount of data, but returns 
> it as a bounded PCollection, and doesn't report the watermark.
> I think the right thing to do here, for now, is to make 
> GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where 
> it will return an unbounded PCollection, go through UnboundedSource (or 
> potentially via SDF in runners that support it), and track the watermark 
> properly (or via a configurable watermark fn).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PerformanceTests_Python #664

2017-12-11 Thread Apache Jenkins Server
See 


Changes:

[dimon.ivan] - Adds ability to provide configuration function for 
BigtableOptions. -

[dimon.ivan] Fixes JavaDoc problem with @Override

[dimon.ivan] - Removes local instance related JavaDoc - Adds iinfo to toString 
and

[dimon.ivan] Simplifies if condition around deciding apply or not optiosn

[dariusz.aniszewski] added support for passing extra mvn properties to pkb to 
be transferred

[altay] Use the correct endpoint for cancel call

[github] Renaming properties form IO target counter name.

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 7237e59771a0c59006d0d7a2eec189ed2c425216 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7237e59771a0c59006d0d7a2eec189ed2c425216
Commit message: "Merge pull request #4240 from aaltay/cancel"
 > git rev-list d65dea667935f153477a2b94d19aaef6eda21f9e # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6770678708940152441.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/jenkins539400234703237035.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins8929637158788072715.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in /usr/lib/python2.7/dist-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied: numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: colorama; extra == "windows" in 
/usr/lib/python2.7/dist-packages (from colorlog[windows]==2.6.0->-r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: xmltodict in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: 

[jira] [Commented] (BEAM-3323) Create a generator of finite-but-unbounded PCollection's for integration testing

2017-12-11 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286863#comment-16286863
 ] 

Eugene Kirpichov commented on BEAM-3323:


I think the other big inconvenient thing about TestStream is that it's geared 
toward having a small number of events. I'd like to be able to write tests that 
emit, say, a few tens of thousands of events and then quiesce, e.g. for writing 
an integration test of triggered BigQuery loads. I suppose we could refactor 
TestStream into something usable in this fashion (e.g. let it take a function 
of the (time -> event) sort), but fixing GenerateSequence seems a lot easier, 
and such a refactoring of TestStream is likely to result in something similar 
anyway.

> Create a generator of finite-but-unbounded PCollection's for integration 
> testing
> 
>
> Key: BEAM-3323
> URL: https://issues.apache.org/jira/browse/BEAM-3323
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>
> Several IOs have features that exhibit nontrivial behavior when writing 
> unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO. 
> We need to be able to write integration tests for these features.
> Currently we have two ways to generate an unbounded PCollection without 
> reading from a real-world external streaming system such as pubsub or kafka:
> 1) TestStream, which only works in direct runner - sufficient for some tests 
> but not all: definitely not sufficient for large-scale tests or for tests 
> that need to interact with a real instance of the external system (e.g. 
> BigQueryIO). It is also quite verbose to use.
> 2) GenerateSequence.from(0) without a .to(), which returns an infinite amount 
> of data.
> GenerateSequence.from(a).to(b) returns a finite amount of data, but returns 
> it as a bounded PCollection, and doesn't report the watermark.
> I think the right thing to do here, for now, is to make 
> GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where 
> it will return an unbounded PCollection, go through UnboundedSource (or 
> potentially via SDF in runners that support it), and track the watermark 
> properly (or via a configurable watermark fn).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3325) optional parameter is not supported in BeamSQL

2017-12-11 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-3325:


 Summary: optional parameter is not supported in BeamSQL
 Key: BEAM-3325
 URL: https://issues.apache.org/jira/browse/BEAM-3325
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Xu Mingmin


I'm wring a UDF with optional parameters, the method code is:
{code}
  public static String eval(String kvString, String keyName
  , @Parameter(name = "pd", optional = true) String pairSeperator
  , @Parameter(name = "kvd", optional = true) String kvSeperator){
//...
  }
{code}

And I see this error when assembling a Beam pipeline:
{code}
Execution Plan: 
BeamProjectRel(ITEM_ID=[CAST(KV_EXT($0, 'itemId', DEFAULT(), 
DEFAULT())):BIGINT], TRANSACTION_ID=[CAST(KV_EXT($0, 'transactionId', 
DEFAULT(), DEFAULT())):BIGINT], TRANSACTION_TIME=[KV_EXT($0, 'createdDT', 
DEFAULT(), DEFAULT())])
  BeamIOSourceRel(table=[[]])

Exception in thread "main" java.lang.RuntimeException: 
java.lang.UnsupportedOperationException: Operator: DEFAULT is not supported yet!
at .
Caused by: java.lang.UnsupportedOperationException: Operator: DEFAULT is not 
supported yet!
at 
org.apache.beam.sdk.extensions.sql.impl.interpreter.BeamSqlFnExecutor.buildExpression(BeamSqlFnExecutor.java:425)
at 
org.apache.beam.sdk.extensions.sql.impl.interpreter.BeamSqlFnExecutor.buildExpression(BeamSqlFnExecutor.java:202)
at 
org.apache.beam.sdk.extensions.sql.impl.interpreter.BeamSqlFnExecutor.buildExpression(BeamSqlFnExecutor.java:202)
at 
org.apache.beam.sdk.extensions.sql.impl.interpreter.BeamSqlFnExecutor.(BeamSqlFnExecutor.java:126)
at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:70)
at 
org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner.compileBeamPipeline(BeamQueryPlanner.java:116)
at 
org.apache.beam.sdk.extensions.sql.BeamSqlCli.compilePipeline(BeamSqlCli.java:112)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3015) Need logging from C threads in Cython

2017-12-11 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286849#comment-16286849
 ] 

Pablo Estrada commented on BEAM-3015:
-

This issue is currently in pause - as for Python logging we'd like to reuse the 
logging library's setup.

> Need logging from C threads in Cython
> -
>
> Key: BEAM-3015
> URL: https://issues.apache.org/jira/browse/BEAM-3015
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2637) Post commit test for mobile gaming examples

2017-12-11 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286832#comment-16286832
 ] 

Pablo Estrada commented on BEAM-2637:
-

Is this fixed?

> Post commit test for mobile gaming examples
> ---
>
> Key: BEAM-2637
> URL: https://issues.apache.org/jira/browse/BEAM-2637
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>
> We need post commit test for mobile gaming examples to prevent failures 
> beyond DirectRunner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2814) test_as_singleton_with_different_defaults test is flaky

2017-12-11 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286842#comment-16286842
 ] 

Ahmet Altay commented on BEAM-2814:
---

The PR was about adding an additional debugging message. I do not know if we 
can close it yet. Do you know if the test is still flaky?

> test_as_singleton_with_different_defaults test is flaky
> ---
>
> Key: BEAM-2814
> URL: https://issues.apache.org/jira/browse/BEAM-2814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Critical
>
> {{test_as_singleton_with_different_defaults}} is flaky and failed in the post 
> commit test 3013, but there is no related change to trigger this.
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/3013/consoleFull
> (https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2017-08-28_11_08_56-17324181904913254210?project=apache-beam-testing)
> Dataflow error form the console:
>   (b4d390f9f9e033b4): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 582, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File "apache_beam/runners/worker/operations.py", line 294, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10607)
> def start(self):
>   File "apache_beam/runners/worker/operations.py", line 295, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10501)
> with self.scoped_start_state:
>   File "apache_beam/runners/worker/operations.py", line 323, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10322)
> self.dofn_runner = common.DoFnRunner(
>   File "apache_beam/runners/common.py", line 378, in 
> apache_beam.runners.common.DoFnRunner.__init__ 
> (apache_beam/runners/common.c:10018)
> self.do_fn_invoker = DoFnInvoker.create_invoker(
>   File "apache_beam/runners/common.py", line 154, in 
> apache_beam.runners.common.DoFnInvoker.create_invoker 
> (apache_beam/runners/common.c:5212)
> return PerWindowInvoker(
>   File "apache_beam/runners/common.py", line 219, in 
> apache_beam.runners.common.PerWindowInvoker.__init__ 
> (apache_beam/runners/common.c:7109)
> input_args, input_kwargs, [si[global_window] for si in side_inputs])
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py",
>  line 63, in __getitem__
> _FilteringIterable(self._iterable, target_window), self._view_options)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/pvalue.py", line 
> 332, in _from_runtime_iterable
> 'PCollection with more than one element accessed as '
> ValueError: PCollection with more than one element accessed as a singleton 
> view.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3008) BigtableIO should use ValueProviders

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286771#comment-16286771
 ] 

ASF GitHub Bot commented on BEAM-3008:
--

chamikaramj closed pull request #4205: [BEAM-3008] Adds BigtableOptions 
configurator to the BigtableIO
URL: https://github.com/apache/beam/pull/4205
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
index 8b4609da224..febdc1f53b0 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
@@ -122,43 +122,19 @@
  * idempotent transformation to that row.
  *
  * To configure a Cloud Bigtable sink, you must supply a table id, a 
project id, an instance id
- * and optionally and optionally a {@link BigtableOptions} to provide more 
specific connection
- * configuration, for example:
+ * and optionally a configuration function for {@link BigtableOptions} to 
provide more specific
+ * connection configuration, for example:
  *
  * {@code
  * PCollection> data = ...;
  *
  * data.apply("write",
  * BigtableIO.write()
- * .setProjectId("project")
- * .setInstanceId("instance")
+ * .withProjectId("project")
+ * .withInstanceId("instance")
  * .withTableId("table"));
  * }
  *
- * Using local emulator
- *
- * In order to use local emulator for Bigtable you should use:
- *
- * {@code
- * BigtableOptions.Builder optionsBuilder =
- * new BigtableOptions.Builder()
- * .setUsePlaintextNegotiation(true)
- * .setCredentialOptions(CredentialOptions.nullCredential())
- * .setDataHost("127.0.0.1") // network interface where Bigtable 
emulator is bound
- * .setInstanceAdminHost("127.0.0.1")
- * .setTableAdminHost("127.0.0.1")
- * .setPort(LOCAL_EMULATOR_PORT))
- *
- * PCollection> data = ...;
- *
- * data.apply("write",
- * BigtableIO.write()
- * .withBigtableOptions(optionsBuilder)
- * .setProjectId("project")
- * .setInstanceId("instance")
- * .withTableId("table");
- * }
- *
  * Experimental
  *
  * This connector for Cloud Bigtable is considered experimental and may 
break or receive
@@ -239,12 +215,23 @@ public static Write write() {
 @Nullable
 abstract BigtableService getBigtableService();
 
-/** Returns the Google Cloud Bigtable instance being read from, and other 
parameters. */
+/**
+ * Returns the Google Cloud Bigtable instance being read from, and other 
parameters.
+ * @deprecated will be replaced by bigtable options configurator.
+ */
+@Deprecated
 @Nullable
 public abstract BigtableOptions getBigtableOptions();
 
 public abstract boolean getValidate();
 
+/**
+ * Configurator of the effective Bigtable Options.
+ */
+@Nullable
+abstract SerializableFunction getBigtableOptionsConfigurator();
+
 abstract Builder toBuilder();
 
 @AutoValue.Builder
@@ -260,12 +247,17 @@ public static Write write() {
 
   abstract Builder setTableId(String tableId);
 
+  /** @deprecated will be replaced by bigtable options configurator. */
+  @Deprecated
   abstract Builder setBigtableOptions(BigtableOptions options);
 
   abstract Builder setBigtableService(BigtableService bigtableService);
 
   abstract Builder setValidate(boolean validate);
 
+  abstract Builder setBigtableOptionsConfigurator(
+SerializableFunction 
optionsConfigurator);
+
   abstract Read build();
 }
 
@@ -302,7 +294,10 @@ public Read withInstanceId(String instanceId) {
  * indicated by {@link #withProjectId(String)}, and using any other 
specified customizations.
  *
  * Does not modify this object.
+ *
+ * @deprecated will be replaced by bigtable options configurator.
  */
+@Deprecated
 public Read withBigtableOptions(BigtableOptions options) {
   checkArgument(options != null, "options can not be null");
   return withBigtableOptions(options.toBuilder());
@@ -320,17 +315,29 @@ public Read withBigtableOptions(BigtableOptions options) {
  * will have no effect on the returned {@link BigtableIO.Read}.
  *
  * Does not modify this object.
+ *
+ * @deprecated will be replaced by 

[jira] [Commented] (BEAM-2265) Python word count with DirectRunner gets stuck during application termination on Windows

2017-12-11 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286836#comment-16286836
 ] 

Pablo Estrada commented on BEAM-2265:
-

Should this be closed?

> Python word count with DirectRunner gets stuck during application termination 
> on Windows
> 
>
> Key: BEAM-2265
> URL: https://issues.apache.org/jira/browse/BEAM-2265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.0.0
>Reporter: Luke Cwik
>Assignee: Ahmet Altay
>Priority: Minor
>
> Using virtualenv 15 + python 2.7.13 + pip 9.0.1 on Windows 2016
> Example logs from DirectRunner:
> {code}
> (beamRC2)PS C:\Users\lcwik\.virtualenvs\beamRC2> python -m 
> apache_beam.examples.wordcount --input ".\input\*" --output l
> ocal_counts
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> INFO:root:Missing pipeline option (runner). Executing pipeline using the 
> default runner: DirectRunner.
> INFO:root:Running pipeline with DirectRunner.
> {code}
> Application gets stuck here, pressing ctrl-z gets it unstuck and the 
> remainder below is logged
> {code}
> INFO:root:Starting finalize_write threads with num_shards: 1, batches: 1, 
> num_threads: 1
> INFO:root:Renamed 1 shards in 0.14 seconds.
> INFO:root:number of empty lines: 47851
> INFO:root:average word length: 4
> {code}
> Output is correct, so it seems as though the bug is somewhere in shutdown.
> Happens when using a local or gs path with the DirectRunner. Enabling DEBUG 
> logging did not add any additional details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3323) Create a generator of finite-but-unbounded PCollection's for integration testing

2017-12-11 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286837#comment-16286837
 ] 

Kenneth Knowles commented on BEAM-3323:
---

+1 to allowing to just explicitly make finite unbounded PCollections. That's 
the lowest hanging fruit.

Beyond that, it would be more accurate that TestStream is only _supported_ by 
the direct runner. To the extent implementation in distributed runners is 
infeasible, we should open a design discussion.

TestStream makes some nondeterministic things deterministic so they are 
testable directly. The obvious alternative is to not test those details (like 
whether an element showed up before/after a watermark-driven timer) directly, 
but only test properties that should be true for all instantiations of the 
nondeterminism. That is all you can do with GenerateSequence alone. It is great 
to have such properties available, but most useful to weight testing towards 
corner cases.

At scale, interacting with a real deployment of an IO endpoint, our PKB 
framework would be used, right?

> Create a generator of finite-but-unbounded PCollection's for integration 
> testing
> 
>
> Key: BEAM-3323
> URL: https://issues.apache.org/jira/browse/BEAM-3323
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>
> Several IOs have features that exhibit nontrivial behavior when writing 
> unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO. 
> We need to be able to write integration tests for these features.
> Currently we have two ways to generate an unbounded PCollection without 
> reading from a real-world external streaming system such as pubsub or kafka:
> 1) TestStream, which only works in direct runner - sufficient for some tests 
> but not all: definitely not sufficient for large-scale tests or for tests 
> that need to interact with a real instance of the external system (e.g. 
> BigQueryIO). It is also quite verbose to use.
> 2) GenerateSequence.from(0) without a .to(), which returns an infinite amount 
> of data.
> GenerateSequence.from(a).to(b) returns a finite amount of data, but returns 
> it as a bounded PCollection, and doesn't report the watermark.
> I think the right thing to do here, for now, is to make 
> GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where 
> it will return an unbounded PCollection, go through UnboundedSource (or 
> potentially via SDF in runners that support it), and track the watermark 
> properly (or via a configurable watermark fn).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2814) test_as_singleton_with_different_defaults test is flaky

2017-12-11 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286830#comment-16286830
 ] 

Pablo Estrada commented on BEAM-2814:
-

Can this be closed?

> test_as_singleton_with_different_defaults test is flaky
> ---
>
> Key: BEAM-2814
> URL: https://issues.apache.org/jira/browse/BEAM-2814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Critical
>
> {{test_as_singleton_with_different_defaults}} is flaky and failed in the post 
> commit test 3013, but there is no related change to trigger this.
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/3013/consoleFull
> (https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2017-08-28_11_08_56-17324181904913254210?project=apache-beam-testing)
> Dataflow error form the console:
>   (b4d390f9f9e033b4): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 582, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File "apache_beam/runners/worker/operations.py", line 294, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10607)
> def start(self):
>   File "apache_beam/runners/worker/operations.py", line 295, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10501)
> with self.scoped_start_state:
>   File "apache_beam/runners/worker/operations.py", line 323, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10322)
> self.dofn_runner = common.DoFnRunner(
>   File "apache_beam/runners/common.py", line 378, in 
> apache_beam.runners.common.DoFnRunner.__init__ 
> (apache_beam/runners/common.c:10018)
> self.do_fn_invoker = DoFnInvoker.create_invoker(
>   File "apache_beam/runners/common.py", line 154, in 
> apache_beam.runners.common.DoFnInvoker.create_invoker 
> (apache_beam/runners/common.c:5212)
> return PerWindowInvoker(
>   File "apache_beam/runners/common.py", line 219, in 
> apache_beam.runners.common.PerWindowInvoker.__init__ 
> (apache_beam/runners/common.c:7109)
> input_args, input_kwargs, [si[global_window] for si in side_inputs])
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py",
>  line 63, in __getitem__
> _FilteringIterable(self._iterable, target_window), self._view_options)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/pvalue.py", line 
> 332, in _from_runtime_iterable
> 'PCollection with more than one element accessed as '
> ValueError: PCollection with more than one element accessed as a singleton 
> view.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Apex #2974

2017-12-11 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286814#comment-16286814
 ] 

ASF GitHub Bot commented on BEAM-3042:
--

aaltay closed pull request #4241: [BEAM-3042] Renaming properties form IO 
target counter name.
URL: https://github.com/apache/beam/pull/4241
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/utils/counters.py 
b/sdks/python/apache_beam/utils/counters.py
index ae974344259..e2e0a1a730b 100644
--- a/sdks/python/apache_beam/utils/counters.py
+++ b/sdks/python/apache_beam/utils/counters.py
@@ -29,19 +29,18 @@
 from apache_beam.transforms import cy_combiners
 
 # Information identifying the IO being measured by a counter.
-IOTargetName = namedtuple('IOTargetName', ['side_input_step_name',
-   'side_input_index',
-   'original_shuffle_step_name'])
+IOTargetName = namedtuple('IOTargetName', ['requesting_step_name',
+   'input_index'])
 
 
 def side_input_id(step_name, input_index):
   """Create an IOTargetName that identifies the reading of a side input."""
-  return IOTargetName(step_name, input_index, None)
+  return IOTargetName(step_name, input_index)
 
 
 def shuffle_id(step_name):
   """Create an IOTargetName that identifies a GBK step."""
-  return IOTargetName(None, None, step_name)
+  return IOTargetName(step_name, None)
 
 
 _CounterName = namedtuple('_CounterName', ['name',


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3284) Python SDK, dataflow runner, the method modify_job_status is calling the wrong API endpoint

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286815#comment-16286815
 ] 

ASF GitHub Bot commented on BEAM-3284:
--

aaltay closed pull request #4240: [BEAM-3284] Use the correct endpoint for 
modify job state
URL: https://github.com/apache/beam/pull/4240
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 6253c80f83b..38c6df1ab12 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -936,7 +936,7 @@ def _is_in_terminal_state(self):
 return self._job.currentState in [
 values_enum.JOB_STATE_STOPPED, values_enum.JOB_STATE_DONE,
 values_enum.JOB_STATE_FAILED, values_enum.JOB_STATE_CANCELLED,
-values_enum.JOB_STATE_DRAINED]
+values_enum.JOB_STATE_UPDATED, values_enum.JOB_STATE_DRAINED]
 
   def wait_until_finish(self, duration=None):
 if not self._is_in_terminal_state():
@@ -957,7 +957,7 @@ def wait_until_finish(self, duration=None):
 
   # TODO: Merge the termination code in poll_for_job_completion and
   # _is_in_terminal_state.
-  terminated = (str(self._job.currentState) != 'JOB_STATE_RUNNING')
+  terminated = self._is_in_terminal_state()
   assert duration or terminated, (
   'Job did not reach to a terminal state after waiting indefinitely.')
 
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
index 64c4ac98ac2..dd6bf95706e 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
@@ -569,7 +569,7 @@ def modify_job_state(self, job_id, new_state):
 request.location = self.google_cloud_options.region
 request.job = dataflow.Job(requestedState=new_state)
 
-self._client.projects_jobs.Update(request)
+self._client.projects_locations_jobs.Update(request)
 return True
 
   @retry.with_exponential_backoff()  # Using retry defaults from utils/retry.py


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Python SDK, dataflow runner, the method modify_job_status is calling the 
> wrong API endpoint
> ---
>
> Key: BEAM-3284
> URL: https://issues.apache.org/jira/browse/BEAM-3284
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0, 2.2.0
>Reporter:  Ron Mahoney
>Assignee: Ahmet Altay
>Priority: Minor
>
> In the Python SDK, for dataflow runner, the method modify_job_status is 
> calling the wrong API endpoint.  Discovered while trying to cancel a job by 
> setting the status to JOB_STATE_CANCELLED, received the following error:
> {noformat}
> WARNING:root:Retry with exponential backoff: waiting for 154.109699453 
> seconds before retrying modify_job_state because we caught exception: 
> TypecheckError: Type of arg is " 'apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_messages.DataflowProjectsLocationsJobsUpdateRequest'>",
>  not " 'apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_messages.DataflowProjectsJobsUpdateRequest'>"
> {noformat}
> The following change fixes this:
> {noformat}
> diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
> b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> index edac9d7d5..1124ee182 100644
> --- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> +++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
> @@ -512,7 +512,7 @@ class DataflowApplicationClient(object):
># Other states could only be set by the service.
>return False
> -request = dataflow.DataflowProjectsLocationsJobsUpdateRequest()
> +request = dataflow.DataflowProjectsJobsUpdateRequest()
>  request.jobId = job_id
>  request.projectId = self.google_cloud_options.project
>  request.location = self.google_cloud_options.region
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2484) Add streaming examples for Python

2017-12-11 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286816#comment-16286816
 ] 

Pablo Estrada commented on BEAM-2484:
-

Can we mark this issue as closed?

> Add streaming examples for Python
> -
>
> Key: BEAM-2484
> URL: https://issues.apache.org/jira/browse/BEAM-2484
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Reporter: David Cavazos
>Priority: Trivial
>
> Port the mobile gaming example to Python



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[beam] branch master updated (b97df71 -> 7237e59)

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from b97df71  Merge pull request #4241 from pabloem/patch-2
 add 8bed72e  Use the correct endpoint for cancel call
 new 7237e59  Merge pull request #4240 from aaltay/cancel

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/runners/dataflow/dataflow_runner.py| 4 ++--
 sdks/python/apache_beam/runners/dataflow/internal/apiclient.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[beam] branch master updated (f7fc4bd -> b97df71)

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from f7fc4bd  Merge pull request #4205 from dmytroivanov4206/master
 add b7396c4  Renaming properties form IO target counter name.
 new b97df71  Merge pull request #4241 from pabloem/patch-2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/utils/counters.py | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[beam] 01/01: Merge pull request #4240 from aaltay/cancel

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 7237e59771a0c59006d0d7a2eec189ed2c425216
Merge: b97df71 8bed72e
Author: Ahmet Altay 
AuthorDate: Mon Dec 11 16:05:49 2017 -0800

Merge pull request #4240 from aaltay/cancel

[BEAM-3284] Use the correct endpoint for modify job state

 sdks/python/apache_beam/runners/dataflow/dataflow_runner.py| 4 ++--
 sdks/python/apache_beam/runners/dataflow/internal/apiclient.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam] 01/01: Merge pull request #4241 from pabloem/patch-2

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit b97df712ba7588a4b1547eb52808f464cef895a3
Merge: f7fc4bd b7396c4
Author: Ahmet Altay 
AuthorDate: Mon Dec 11 16:05:28 2017 -0800

Merge pull request #4241 from pabloem/patch-2

[BEAM-3042] Renaming properties form IO target counter name.

 sdks/python/apache_beam/utils/counters.py | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[jira] [Created] (BEAM-3324) symtab.go shouldn't read entire file into memory

2017-12-11 Thread Bill Neubauer (JIRA)
Bill Neubauer created BEAM-3324:
---

 Summary: symtab.go shouldn't read entire file into memory
 Key: BEAM-3324
 URL: https://issues.apache.org/jira/browse/BEAM-3324
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Bill Neubauer
Assignee: Henning Rohde
Priority: Minor


The implementation of symtab.go reads the entire binary into memory. This is 
wasteful of memory, and it should just use os.File as the backing reader. If 
performance becomes an issue, we can use a modest amount of memory to cache 
lookups and avoid filesystem reads.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[beam] 01/01: Merge pull request #4205 from dmytroivanov4206/master

2017-12-11 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit f7fc4bd30b19a6e6a6f8835a76a4d875557363a9
Merge: 4b09488 871fd72
Author: Chamikara Jayalath 
AuthorDate: Mon Dec 11 15:45:24 2017 -0800

Merge pull request #4205 from dmytroivanov4206/master

[BEAM-3008] Adds BigtableOptions configurator to the BigtableIO

 .../beam/sdk/io/gcp/bigtable/BigtableIO.java   | 228 ++---
 .../beam/sdk/io/gcp/bigtable/BigtableIOTest.java   |  17 +-
 2 files changed, 172 insertions(+), 73 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam] branch master updated (4b09488 -> f7fc4bd)

2017-12-11 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 4b09488  Merge pull request #4238 from 
DariuszAniszewski/custom-properties-in-pkb
 add fac6bf0  - Adds ability to provide configuration function for 
BigtableOptions. - Deprecates previous BigtableOptions providing methods.
 add 814e320  Fixes JavaDoc problem with @Override
 add 0b87e46  - Removes local instance related JavaDoc - Adds iinfo to 
toString and populateDisplayData about effective BigtableOptions
 add 871fd72  Simplifies if condition around deciding apply or not optiosn 
configurator
 new f7fc4bd  Merge pull request #4205 from dmytroivanov4206/master

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../beam/sdk/io/gcp/bigtable/BigtableIO.java   | 228 ++---
 .../beam/sdk/io/gcp/bigtable/BigtableIOTest.java   |  17 +-
 2 files changed, 172 insertions(+), 73 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[jira] [Commented] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286682#comment-16286682
 ] 

ASF GitHub Bot commented on BEAM-3060:
--

chamikaramj closed pull request #4238: [BEAM-3060] added support for passing 
extra mvn properties to pkb
URL: https://github.com/apache/beam/pull/4238
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/java/io/file-based-io-tests/pom.xml 
b/sdks/java/io/file-based-io-tests/pom.xml
index fc523f614fd..44119ec79ff 100644
--- a/sdks/java/io/file-based-io-tests/pom.xml
+++ b/sdks/java/io/file-based-io-tests/pom.xml
@@ -124,6 +124,11 @@
 
-beam_it_class=${fileBasedIoItClass}
 
 
-beam_it_options=${integrationTestPipelineOptions}
+
+
-beam_extra_mvn_properties=${pkbExtraProperties}
 
 
 
diff --git a/sdks/java/io/pom.xml b/sdks/java/io/pom.xml
index 0f8bc78fbe1..07e1b5cb9ff 100644
--- a/sdks/java/io/pom.xml
+++ b/sdks/java/io/pom.xml
@@ -37,6 +37,7 @@
 
 
 
+
   
 
   


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[beam] 01/01: Merge pull request #4238 from DariuszAniszewski/custom-properties-in-pkb

2017-12-11 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 4b094889cbd620f871736725471a99f0a1f311fa
Merge: d65dea6 5a6dacf
Author: Chamikara Jayalath 
AuthorDate: Mon Dec 11 14:41:42 2017 -0800

Merge pull request #4238 from DariuszAniszewski/custom-properties-in-pkb

[BEAM-3060] added support for passing extra mvn properties to pkb

 sdks/java/io/file-based-io-tests/pom.xml | 5 +
 sdks/java/io/pom.xml | 1 +
 2 files changed, 6 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[jira] [Commented] (BEAM-3199) Upgrade to Elasticsearch 6.x

2017-12-11 Thread Jeroen Steggink (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286578#comment-16286578
 ] 

Jeroen Steggink commented on BEAM-3199:
---

Upgrading to Elasticsearch 6.x is not trivial. I would recommend splitting the 
Maven artifacts to have seperate for Elasticsearch 2.x - 5.x and Elasticsearch 
6.x, just like the testing modules are split now. Another caveat is the 
elasticsearch-tests-common artifact. It's not compatible with version 6.x since 
some of the queries were deprecated in version 5.x and removed in 6.x.

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286601#comment-16286601
 ] 

ASF GitHub Bot commented on BEAM-3042:
--

pabloem closed pull request #3943: [BEAM-3042] Add tracking of bytes read / 
time spent when reading side inputs
URL: https://github.com/apache/beam/pull/3943
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/worker/opcounters.py 
b/sdks/python/apache_beam/runners/worker/opcounters.py
index f4ba6b9a9a8..c997d23a39d 100644
--- a/sdks/python/apache_beam/runners/worker/opcounters.py
+++ b/sdks/python/apache_beam/runners/worker/opcounters.py
@@ -25,6 +25,7 @@
 import random
 
 from apache_beam.utils.counters import Counter
+from apache_beam.utils.counters import CounterName
 
 # This module is experimental. No backwards-compatibility guarantees.
 
@@ -42,6 +43,58 @@ def value(self):
 return self._value
 
 
+class TransformIoCounter(object):
+
+  def add_bytes_read(self, n):
+pass
+
+  def __enter__(self):
+self.enter()
+
+  def __exit__(self, unused_exc_type, unused_exc_value, unused_traceback):
+self.exit()
+
+  def enter(self):
+pass
+
+  def exit(self):
+pass
+
+  def check_step(self):
+pass
+
+
+class SideInputReadCounter(TransformIoCounter):
+
+  def __init__(self, counter_factory, state_sampler, io_target):
+self._counter_factory = counter_factory
+self._state_sampler = state_sampler
+self._bytes_read_cache = 0
+self.io_target = io_target
+self.check_step()
+
+  def check_step(self):
+current_state = self._state_sampler.current_state()
+operation_name = current_state.name.step_name
+self.scoped_state = self._state_sampler.scoped_state(
+operation_name, 'read-sideinput', io_target=self.io_target)
+self.bytes_read_counter = self._counter_factory.get_counter(
+CounterName('bytes-read',
+step_name=operation_name,
+io_target=self.io_target),
+Counter.SUM)
+
+  def add_bytes_read(self, n):
+if n > 0:
+  self.bytes_read_counter.update(n)
+
+  def enter(self):
+self.scoped_state.__enter__()
+
+  def exit(self):
+self.scoped_state.__exit__(None, None, None)
+
+
 class OperationCounters(object):
   """The set of basic counters to attach to an Operation."""
 
diff --git a/sdks/python/apache_beam/runners/worker/operations.py 
b/sdks/python/apache_beam/runners/worker/operations.py
index ed3f3b8f466..132a61fb131 100644
--- a/sdks/python/apache_beam/runners/worker/operations.py
+++ b/sdks/python/apache_beam/runners/worker/operations.py
@@ -42,6 +42,7 @@
 from apache_beam.transforms.combiners import PhasedCombineFnExecutor
 from apache_beam.transforms.combiners import curry_combine_fn
 from apache_beam.transforms.window import GlobalWindows
+from apache_beam.utils import counters
 from apache_beam.utils.windowed_value import WindowedValue
 
 # Allow some "pure mode" declarations.
@@ -281,7 +282,7 @@ def _read_side_inputs(self, tags_and_types):
 # Note that for each tag there could be several read operations in the
 # specification. This can happen for instance if the source has been
 # sharded into several files.
-for side_tag, view_class, view_options in tags_and_types:
+for i, (side_tag, view_class, view_options) in enumerate(tags_and_types):
   sources = []
   # Using the side_tag in the lambda below will trigger a pylint warning.
   # However in this case it is fine because the lambda is used right away
@@ -293,7 +294,13 @@ def _read_side_inputs(self, tags_and_types):
 if not isinstance(si, operation_specs.WorkerSideInputSource):
   raise NotImplementedError('Unknown side input type: %r' % si)
 sources.append(si.source)
-  iterator_fn = sideinputs.get_iterator_fn_for_sources(sources)
+
+  si_counter = opcounters.SideInputReadCounter(
+  self.counter_factory, self.state_sampler,
+  # Inputs are 1-indexed, so we add 1 to i in the side input id
+  counters.side_input_id(self.operation_name, i+1))
+  iterator_fn = sideinputs.get_iterator_fn_for_sources(
+  sources, read_counter=si_counter)
 
   # Backwards compatibility for pre BEAM-733 SDKs.
   if isinstance(view_options, tuple):
diff --git a/sdks/python/apache_beam/runners/worker/sideinputs.py 
b/sdks/python/apache_beam/runners/worker/sideinputs.py
index bdf9f4e71f5..b11ab3cfba6 100644
--- a/sdks/python/apache_beam/runners/worker/sideinputs.py
+++ b/sdks/python/apache_beam/runners/worker/sideinputs.py
@@ -24,6 +24,7 @@
 import traceback
 
 from apache_beam.io import iobase
+from apache_beam.runners.worker import 

Jenkins build is back to normal : beam_PostCommit_Python_ValidatesRunner_Dataflow #433

2017-12-11 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #4509

2017-12-11 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Python_ValidatesRunner_Dataflow #432

2017-12-11 Thread Apache Jenkins Server
See 


Changes:

[altay] Adding debug server to sdk worker to get threaddumps

--
[...truncated 1.07 MB...]
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": 
"assert_that/Group/Map(_merge_tagged_vals_under_key).out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s13"
}, 
"serialized_fn": "", 
"user_name": "assert_that/Group/Map(_merge_tagged_vals_under_key)"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s15", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": ""
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "assert_that/Unkey.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s14"
}, 
"serialized_fn": "", 
"user_name": "assert_that/Unkey"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s16", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": "_equal"
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 

[jira] [Closed] (BEAM-2370) BigQuery Insert with Partition Decorator throwing error

2017-12-11 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2370.
--
   Resolution: Duplicate
Fix Version/s: 2.2.0

> BigQuery Insert with Partition Decorator throwing error
> ---
>
> Key: BEAM-2370
> URL: https://issues.apache.org/jira/browse/BEAM-2370
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
> Environment: DirectRunner
>Reporter: Andre
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>
> Running a DataFlow job with the DirectRunner which is inserting data into a 
> partitioned table using decorators throws the following error multiple times 
> BUT still inserts records into the right partition.
> {code:java|title=Error}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl executeWithRetries
> INFO: Ignore the error and retry the request.
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> {
>   "code" : 400,
>   "errors" : [ {
> "domain" : "global",
> "message" : "Invalid table ID \"mytable_orders$20170516\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> "reason" : "invalid"
>   } ],
>   "message" : "Invalid table ID \"mytable_orders$20170516\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used."
> }
>   at 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
> {code}
> {code:java|title=Code}
> // Write TableRows to BQ
> rows.apply("TransformationStep", ParDo.of(new Outputter()))
>  .apply("WindowDaily", Window.  into(CalendarWindows.days(1)))
>  .apply("WriteToBQ", BigQueryIO.writeTableRows()
>   .to(new SerializableFunction  TableDestination> () {
>private static final long serialVersionUID = 8196602721734820219 L;
>@Override
>public TableDestination apply(ValueInSingleWindow  value) {
> String dayString = 
> DateTimeFormat.forPattern("MMdd").withZone(DateTimeZone.UTC)
>  .print(((IntervalWindow) value.getWindow()).start());
> TableDestination td = new 
> TableDestination("my-project:dataset.mytable_orders$" + dayString, "");
> return td;
>}
>   }).withSchema(mySchema)
>   .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
>   .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-1834) Bigquery Write validation doesn't work well with ValueInSingleWindow

2017-12-11 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-1834.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Support for data-dependent schemas has been added a pretty long time ago via 
BigQueryIO.write().to(DynamicDestinations).

> Bigquery Write validation doesn't work well with ValueInSingleWindow
> 
>
> Key: BEAM-1834
> URL: https://issues.apache.org/jira/browse/BEAM-1834
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Kevin Peterson
> Fix For: 2.2.0
>
>
> I am using the new {{Write to(SerializableFunction String> tableSpecFunction)}} function to write data to different Bigquery 
> tables depending on the values. I'm my case, the values can have a different 
> schema (it starts as an {{Any}} encoded protobuf, which I parse and expand to 
> a {{TableRow}} object).
> Since the tables have different schemas, the existing implementation of 
> {{withSchema}} doesn't work.
> Some options:
> # Allow {{CreateDisposition.CREATE_NEVER}} in this situation. Failed inserts 
> from a missing table just fail (and eventually pass through via BEAM-190).
> # Add a new {{withSchema(SerializableFunction TableSchema>}} function.
> I think eventually both of the above should be allowable configurations, but 
> just one will unblock my current error. Happy to implement, given some 
> guidance on design preferences.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[beam] branch master updated: Adding debug server to sdk worker to get threaddumps

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 2d86d16  Adding debug server to sdk worker to get threaddumps
 new d65dea6  This closes #4178
2d86d16 is described below

commit 2d86d168118249d3c8bdcb0907f0a3b08db2eb2c
Author: Ankur Goenka 
AuthorDate: Wed Nov 22 16:54:30 2017 -0800

Adding debug server to sdk worker to get threaddumps
---
 .../apache_beam/runners/worker/sdk_worker_main.py  | 58 --
 .../runners/worker/sdk_worker_main_test.py | 44 
 2 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py 
b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
index 684269e..1db8b29 100644
--- a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
+++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
@@ -14,13 +14,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-
 """SDK Fn Harness entry point."""
 
+import BaseHTTPServer
 import json
 import logging
 import os
 import sys
+import threading
 import traceback
 
 from google.protobuf import text_format
@@ -34,6 +35,51 @@ from apache_beam.runners.worker.sdk_worker import SdkHarness
 # This module is experimental. No backwards-compatibility guarantees.
 
 
+class StatusServer(object):
+
+  @classmethod
+  def get_thread_dump(cls):
+lines = []
+frames = sys._current_frames()  # pylint: disable=protected-access
+
+for t in threading.enumerate():
+  lines.append('--- Thread #%s name: %s ---\n' % (t.ident, t.name))
+  lines.append(''.join(traceback.format_stack(frames[t.ident])))
+
+return lines
+
+  def start(self, status_http_port=0):
+"""Executes the serving loop for the status server.
+
+Args:
+  status_http_port(int): Binding port for the debug server.
+Default is 0 which means any free unsecured port
+"""
+
+class StatusHttpHandler(BaseHTTPServer.BaseHTTPRequestHandler):
+  """HTTP handler for serving stacktraces of all threads."""
+
+  def do_GET(self):  # pylint: disable=invalid-name
+"""Return all thread stacktraces information for GET request."""
+self.send_response(200)
+self.send_header('Content-Type', 'text/plain')
+self.end_headers()
+
+for line in StatusServer.get_thread_dump():
+  self.wfile.write(line)
+
+  def log_message(self, f, *args):
+"""Do not log any messages."""
+pass
+
+self.httpd = httpd = BaseHTTPServer.HTTPServer(
+('localhost', status_http_port), StatusHttpHandler)
+logging.info('Status HTTP server running at %s:%s', httpd.server_name,
+ httpd.server_port)
+
+httpd.serve_forever()
+
+
 def main(unused_argv):
   """Main entry point for SDK Fn Harness."""
   if 'LOGGING_API_SERVICE_DESCRIPTOR' in os.environ:
@@ -49,6 +95,12 @@ def main(unused_argv):
   else:
 fn_log_handler = None
 
+  # Start status HTTP server thread.
+  thread = threading.Thread(target=StatusServer().start)
+  thread.daemon = True
+  thread.setName('status-server-demon')
+  thread.start()
+
   if 'PIPELINE_OPTIONS' in os.environ:
 sdk_pipeline_options = json.loads(os.environ['PIPELINE_OPTIONS'])
   else:
@@ -89,8 +141,8 @@ def main(unused_argv):
 def _load_main_session(semi_persistent_directory):
   """Loads a pickled main session from the path specified."""
   if semi_persistent_directory:
-session_file = os.path.join(
-semi_persistent_directory, 'staged', names.PICKLED_MAIN_SESSION_FILE)
+session_file = os.path.join(semi_persistent_directory, 'staged',
+names.PICKLED_MAIN_SESSION_FILE)
 if os.path.isfile(session_file):
   pickler.load_session(session_file)
 else:
diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py 
b/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py
new file mode 100644
index 000..9305c99
--- /dev/null
+++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main_test.py
@@ -0,0 +1,44 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 

[beam] branch master updated (41da1ab -> c66b5a9)

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 41da1ab  Merge pull request #4236
 add 9ffb796  Ensure temp file is closed before returning.
 new c66b5a9  Merge pull request #4242 from arostamianfar/tempdir

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/testing/test_utils.py | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[beam] 01/01: Merge pull request #4242 from arostamianfar/tempdir

2017-12-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit c66b5a9ad0a6ba0a6ae758fbf5ebc18599ffa800
Merge: 41da1ab 9ffb796
Author: Ahmet Altay 
AuthorDate: Mon Dec 11 09:38:22 2017 -0800

Merge pull request #4242 from arostamianfar/tempdir

[BEAM-2774] Ensure temp file is closed before returning.

 sdks/python/apache_beam/testing/test_utils.py | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[jira] [Updated] (BEAM-3308) Improve Go exec runtime error handling

2017-12-11 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3308:

Priority: Major  (was: Minor)

> Improve Go exec runtime error handling
> --
>
> Key: BEAM-3308
> URL: https://issues.apache.org/jira/browse/BEAM-3308
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>
> Move to Go exec unit failures instead of panic. We should perhaps also catch 
> user code panics to more gracefully fail bundle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3323) Create a generator of finite-but-unbounded PCollection's for integration testing

2017-12-11 Thread Eugene Kirpichov (JIRA)
Eugene Kirpichov created BEAM-3323:
--

 Summary: Create a generator of finite-but-unbounded PCollection's 
for integration testing
 Key: BEAM-3323
 URL: https://issues.apache.org/jira/browse/BEAM-3323
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Kenneth Knowles


Several IOs have features that exhibit nontrivial behavior when writing 
unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO. We 
need to be able to write integration tests for these features.

Currently we have two ways to generate an unbounded PCollection without reading 
from a real-world external streaming system such as pubsub or kafka:

1) TestStream, which only works in direct runner - sufficient for some tests 
but not all: definitely not sufficient for large-scale tests or for tests that 
need to interact with a real instance of the external system (e.g. BigQueryIO). 
It is also quite verbose to use.
2) GenerateSequence.from(0) without a .to(), which returns an infinite amount 
of data.

GenerateSequence.from(a).to(b) returns a finite amount of data, but returns it 
as a bounded PCollection, and doesn't report the watermark.

I think the right thing to do here, for now, is to make 
GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where it 
will return an unbounded PCollection, go through UnboundedSource (or 
potentially via SDF in runners that support it), and track the watermark 
properly (or via a configurable watermark fn).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2774) Add I/O source for VCF files (python)

2017-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286152#comment-16286152
 ] 

ASF GitHub Bot commented on BEAM-2774:
--

arostamianfar opened a new pull request #4242: [BEAM-2774] Ensure temp file is 
closed before returning.
URL: https://github.com/apache/beam/pull/4242
 
 
   This works fine on linux, but fails on Windows machines as it expects the 
files to be properly closed.
   
   @aaltay


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add I/O source for VCF files (python)
> -
>
> Key: BEAM-2774
> URL: https://issues.apache.org/jira/browse/BEAM-2774
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Asha Rostamianfar
>Assignee: Miles Saul
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> A new I/O source for reading (and eventually writing) VCF files [1] for 
> Python. The design doc is available at 
> https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
> [1] http://samtools.github.io/hts-specs/VCFv4.3.pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-14) Add declarative DSLs (XML & JSON)

2017-12-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16285955#comment-16285955
 ] 

Jean-Baptiste Onofré commented on BEAM-14:
--

After discussing with Saj, we are preparing a PR with a XML DSL. We are doing 
some polishing changes (like moving in the {{extension}}) and legal update (ASF 
headers in the sources, ...).

> Add declarative DSLs (XML & JSON)
> -
>
> Key: BEAM-14
> URL: https://issues.apache.org/jira/browse/BEAM-14
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Sajeevan Achuthan
>
> Even if users would still be able to use directly the API, it would be great 
> to provide a DSL on top of the API covering batch and streaming data 
> processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function 
> (DoFn), we can provide a fluent DSL allowing users to directly leverage 
> keyturn functions.
> For instance, an user would be able to design a pipeline like:
> {code}
> .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
> {code}
> The DSL will allow to use existing pipelines, for instance:
> {code}
> .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all")
> {code}
> So it means that we will have to create a IO Sink that can trigger the 
> execution of a target pipeline: (from("trigger:other") triggering the 
> pipeline execution when another pipeline design starts with 
> pipeline("other")). We can also imagine to mix the runners: the pipeline() 
> can be on one runner, the from("trigger:other") can be on another runner). 
> It's not trivial, but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one 
> would be Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data 
> integration support to bean in addition of data processing. Data Integration 
> is an extension of Beam API to support some Enterprise Integration Patterns 
> (EIPs). As we would need metadata for data integration (even if metadata can 
> also be interesting in stream/batch data processing pipeline), we can provide 
> a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source 
> (for instance the partition/path where the data comes from, …).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-14) Add declarative DSLs (XML & JSON)

2017-12-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reassigned BEAM-14:


Assignee: Sajeevan Achuthan  (was: Jean-Baptiste Onofré)

> Add declarative DSLs (XML & JSON)
> -
>
> Key: BEAM-14
> URL: https://issues.apache.org/jira/browse/BEAM-14
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Sajeevan Achuthan
>
> Even if users would still be able to use directly the API, it would be great 
> to provide a DSL on top of the API covering batch and streaming data 
> processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function 
> (DoFn), we can provide a fluent DSL allowing users to directly leverage 
> keyturn functions.
> For instance, an user would be able to design a pipeline like:
> {code}
> .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
> {code}
> The DSL will allow to use existing pipelines, for instance:
> {code}
> .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all")
> {code}
> So it means that we will have to create a IO Sink that can trigger the 
> execution of a target pipeline: (from("trigger:other") triggering the 
> pipeline execution when another pipeline design starts with 
> pipeline("other")). We can also imagine to mix the runners: the pipeline() 
> can be on one runner, the from("trigger:other") can be on another runner). 
> It's not trivial, but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one 
> would be Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data 
> integration support to bean in addition of data processing. Data Integration 
> is an extension of Beam API to support some Enterprise Integration Patterns 
> (EIPs). As we would need metadata for data integration (even if metadata can 
> also be interesting in stream/batch data processing pipeline), we can provide 
> a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source 
> (for instance the partition/path where the data comes from, …).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-14) Add declarative DSLs (XML & JSON)

2017-12-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-14:
-
Component/s: (was: sdk-ideas)
 sdk-java-extensions

> Add declarative DSLs (XML & JSON)
> -
>
> Key: BEAM-14
> URL: https://issues.apache.org/jira/browse/BEAM-14
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Even if users would still be able to use directly the API, it would be great 
> to provide a DSL on top of the API covering batch and streaming data 
> processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function 
> (DoFn), we can provide a fluent DSL allowing users to directly leverage 
> keyturn functions.
> For instance, an user would be able to design a pipeline like:
> {code}
> .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
> {code}
> The DSL will allow to use existing pipelines, for instance:
> {code}
> .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all")
> {code}
> So it means that we will have to create a IO Sink that can trigger the 
> execution of a target pipeline: (from("trigger:other") triggering the 
> pipeline execution when another pipeline design starts with 
> pipeline("other")). We can also imagine to mix the runners: the pipeline() 
> can be on one runner, the from("trigger:other") can be on another runner). 
> It's not trivial, but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one 
> would be Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data 
> integration support to bean in addition of data processing. Data Integration 
> is an extension of Beam API to support some Enterprise Integration Patterns 
> (EIPs). As we would need metadata for data integration (even if metadata can 
> also be interesting in stream/batch data processing pipeline), we can provide 
> a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source 
> (for instance the partition/path where the data comes from, …).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PerformanceTests_Spark #1106

2017-12-11 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 41da1ab4ddf530741d2d264136ae97215cc1ae0a (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 41da1ab4ddf530741d2d264136ae97215cc1ae0a
Commit message: "Merge pull request #4236"
 > git rev-list 41da1ab4ddf530741d2d264136ae97215cc1ae0a # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Spark] $ /bin/bash -xe /tmp/jenkins3157745370946528753.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Spark] $ /bin/bash -xe /tmp/jenkins8510390110149555423.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Spark] $ /bin/bash -xe /tmp/jenkins4026105165976048076.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in /usr/lib/python2.7/dist-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied: numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: colorama; extra == "windows" in 
/usr/lib/python2.7/dist-packages (from colorlog[windows]==2.6.0->-r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: xmltodict in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: requests-ntlm>=0.3.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: requests>=2.9.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: ntlm-auth>=1.0.2 in 
/home/jenkins/.local/lib/python2.7/site-packages (from 
requests-ntlm>=0.3.0->pywinrm->-r PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: cryptography>=1.3 in 

[beam] branch hdfs deleted (was 162369a)

2017-12-11 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch hdfs
in repository https://gitbox.apache.org/repos/asf/beam.git.


 was 162369a  This closes #2681

This change permanently discards the following revisions:

 discard 162369a  This closes #2681
 discard fac7b83  Add HadoopResourceId

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].