[beam] 01/01: Merge pull request #5191 [BEAM-4097] Set environment for Python sdk function specs.

2018-04-23 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 07f2a45c686ef0ae829849a77bd0622be6dd7ec8
Merge: eb5f7eb c51279c
Author: Robert Bradshaw 
AuthorDate: Mon Apr 23 23:58:59 2018 -0700

Merge pull request #5191 [BEAM-4097] Set environment for Python sdk 
function specs.

 sdks/python/apache_beam/coders/coders.py   |  2 ++
 sdks/python/apache_beam/pipeline.py|  5 ++--
 sdks/python/apache_beam/pvalue.py  |  4 ++-
 .../python/apache_beam/runners/pipeline_context.py | 30 --
 .../runners/portability/universal_local_runner.py  | 19 --
 sdks/python/apache_beam/transforms/core.py |  1 +
 sdks/python/apache_beam/utils/urns.py  |  1 +
 7 files changed, 55 insertions(+), 7 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[jira] [Work logged] (BEAM-3925) Allow ValueProvider for KafkaIO

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3925?focusedWorklogId=94481&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94481
 ]

ASF GitHub Bot logged work on BEAM-3925:


Author: ASF GitHub Bot
Created on: 24/Apr/18 06:46
Start Date: 24/Apr/18 06:46
Worklog Time Spent: 10m 
  Work Description: pupamanyu commented on a change in pull request #5141: 
[BEAM-3925] Allow ValueProvider for KafkaIO so that we can create Beam 
Templates using KafkaIO
URL: https://github.com/apache/beam/pull/5141#discussion_r183619623
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
 ##
 @@ -359,20 +386,129 @@
  * of how the partitions are distributed among the splits.
  */
 public Read withTopicPartitions(List 
topicPartitions) {
-  checkState(getTopics().isEmpty(), "Only topics or topicPartitions can be 
set, not both");
-  return 
toBuilder().setTopicPartitions(ImmutableList.copyOf(topicPartitions)).build();
+  return withTopicPartitions(
+  
ValueProvider.StaticValueProvider.of(Joiner.on(',').join(topicPartitions)));
+}
+
+/**
+ * Sets a list of partitions to read from. Partitions are provided as a 
comma separated list of
+ * Strings in the format: topic1-partition1,topic1-partition2...
+ * This allows reading only a subset of partitions for one or more topics 
when (if ever) needed.
+ *
+ * See {@link KafkaUnboundedSource#split(int, PipelineOptions)} for 
description
+ * of how the partitions are distributed among the splits.
+ */
+public Read withTopicPartitions(String topicPartitions) {
+  return 
withTopicPartitions(ValueProvider.StaticValueProvider.of(topicPartitions));
+}
+
+/**
+ * Like above but with a {@link ValueProvider ValueProvider}.
+ */
+public Read withTopicPartitions(ValueProvider 
topicPartitions) {
+  checkState(getTopics().get().isEmpty(),
+  "Only topics or topicPartitions can be set, not both");
+  return toBuilder().setTopicPartitions(ValueProvider
+  .NestedValueProvider.of(topicPartitions, new 
TopicPartitionTranslator())).build();
+}
+
+/**
+ * Used to build a {@link ValueProvider} for {@link List 
List}.
+ */
+private static class TopicTranslator implements 
SerializableFunction> {
+  @Override
+  public List apply(String topics) {
+return ImmutableList.copyOf(
+
Splitter.on(',').trimResults().omitEmptyStrings().splitToList(topics));
+  }
+
+}
+
+/**
+ * Used to build a {@link ValueProvider} for {@link List 
List}.
+ */
+private static class TopicPartitionTranslator
+implements SerializableFunction> {
+  @Override
+  public List apply(String topicPartitions) {
+List topicPartitionList = new ArrayList<>();
+for (String topicPartition: 
Splitter.on(',').trimResults().omitEmptyStrings()
+  .splitToList(topicPartitions)) {
+  topicPartitionList
+  .add(new 
TopicPartition(Splitter.on('-').splitToList(topicPartition).get(0),
+  
Integer.parseInt(Splitter.on('-').splitToList(topicPartition).get(1;
+}
+  return ImmutableList.copyOf(topicPartitionList);
+  }
+}
+
+/**
+ * Sets a Kafka {@link Deserializer Deserializer} for 
interpreting key bytes read.
+ * This uses the {@link String} provided to set the Deserializer
+ */
+public Read withKeyDeserializer(String keyDeserializer) {
 
 Review comment:
   This is used outside of KafkaIO. Not sure if renaming this may cause issues


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94481)
Time Spent: 1h 40m  (was: 1.5h)

> Allow ValueProvider for KafkaIO
> ---
>
> Key: BEAM-3925
> URL: https://issues.apache.org/jira/browse/BEAM-3925
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Sameer Abhyankar
>Assignee: Pramod Upamanyu
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add ValueProvider support for the various methods in KafkaIO. This would 
> allow us to use KafkaIO in reusable pipeline templates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Spark #1628

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
[...truncated 95.80 KB...]
'apache-beam-testing:bqjob_r383c741ee672f7a4_0162f64e92b8_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r383c741ee672f7a4_0162f64e92b8_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r383c741ee672f7a4_0162f64e92b8_1 ... (0s) Current status: DONE   
2018-04-24 06:19:28,980 0aee8bbf MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-24 06:19:51,719 0aee8bbf MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-24 06:19:53,961 0aee8bbf MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r1a0016f46c3f58f7_0162f64ef376_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r1a0016f46c3f58f7_0162f64ef376_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r1a0016f46c3f58f7_0162f64ef376_1 ... (0s) Current status: DONE   
2018-04-24 06:19:53,962 0aee8bbf MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-24 06:20:14,662 0aee8bbf MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-24 06:20:16,763 0aee8bbf MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r7deecd960b6c824_0162f64f4cfa_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r7deecd960b6c824_0162f64f4cfa_1 ... (0s) 
Current status: RUNNING 
Waiting on 
bqjob_r7deecd960b6c824_0162f64f4cfa_1 ... (0s) Current status: DONE   
2018-04-24 06:20:16,763 0aee8bbf MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-24 06:20:46,279 0aee8bbf MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-24 06:20:48,419 0aee8bbf MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-be

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #192

2018-04-23 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Compressed_TextIOIT_HDFS #87

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
[...truncated 91.50 KB...]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy62.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.copy(HadoopFileSystem.java:131)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:300)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy61.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy62.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.copy(HadoopFileSystem.java:131)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:300)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn$DoFnInvoker.invokeProcessElement(Unknown
 Source)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:141)
at 
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
at 
com.google.cloud.dataflow.worker.util.common.worker.

Build failed in Jenkins: beam_PerformanceTests_TextIOIT_HDFS #93

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
[...truncated 128.70 KB...]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.copy(HadoopFileSystem.java:131)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:301)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy62.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.copy(HadoopFileSystem.java:131)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:301)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn$DoFnInvoker.invokeProcessElement(Unknown
 Source)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:141)
at 
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
at 
com.google.cloud.dataflow.worker.util.common.worker.ParDoOpera

Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #88

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
[...truncated 99.62 KB...]
at 
com.mongodb.operation.MixedBulkWriteOperation$Run$2.executeWriteCommandProtocol(MixedBulkWriteOperation.java:455)
at 
com.mongodb.operation.MixedBulkWriteOperation$Run$RunExecutor.execute(MixedBulkWriteOperation.java:646)
at 
com.mongodb.operation.MixedBulkWriteOperation$Run.execute(MixedBulkWriteOperation.java:401)
at 
com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:179)
at 
com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:230)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:221)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(MongoDbIO.java:652)
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a 
server that matches WritableServerSelector. Client view of cluster state is 
{type=UNKNOWN, servers=[{address=104.197.171.217:27017, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception 
opening socket}, caused by {java.net.SocketTimeoutException: connect timed 
out}}]
at 
com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:219)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(MongoDbIO.java:652)
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a 
server that matches WritableServerSelector. Client view of cluster state is 
{type=UNKNOWN, servers=[{address=104.197.171.217:27017, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception 
opening socket}, caused by {java.net.SocketTimeoutException: connect timed 
out}}]
at 
com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:219)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(Mo

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=94470&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94470
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 24/Apr/18 06:10
Start Date: 24/Apr/18 06:10
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/metrics/execution.py 
b/sdks/python/apache_beam/metrics/execution.py
index f6c790de5d4..310faf6c9c8 100644
--- a/sdks/python/apache_beam/metrics/execution.py
+++ b/sdks/python/apache_beam/metrics/execution.py
@@ -127,25 +127,34 @@ def set_metrics_supported(self, supported):
 with self._METRICS_SUPPORTED_LOCK:
   self.METRICS_SUPPORTED = supported
 
-  def current_container(self):
+  def _old_style_container(self):
+"""Gets the current MetricsContainer based on the container stack.
+
+The container stack is the old method, and will be deprecated. Should
+rely on StateSampler instead."""
 self.set_container_stack()
 index = len(self.PER_THREAD.container) - 1
 if index < 0:
   return None
 return self.PER_THREAD.container[index]
 
-  def set_current_container(self, container):
-self.set_container_stack()
-self.PER_THREAD.container.append(container)
-
-  def unset_current_container(self):
-self.set_container_stack()
-self.PER_THREAD.container.pop()
+  def current_container(self):
+"""Returns the current MetricsContainer."""
+sampler = statesampler.get_current_tracker()
+if sampler is None:
+  return self._old_style_container()
+return sampler.current_state().metrics_container
 
 
 MetricsEnvironment = _MetricsEnvironment()
 
 
+def metrics_startup():
+  """Initialize metrics context to run."""
+  global statesampler  # pylint: disable=global-variable-not-assigned
+  from apache_beam.runners.worker import statesampler
+
+
 class MetricsContainer(object):
   """Holds the metrics of a single step and a single bundle."""
   def __init__(self, step_name):
@@ -227,10 +236,12 @@ def __init__(self, container=None):
 self._container = container
 
   def enter(self):
-self._stack.append(self._container)
+if self._container:
+  self._stack.append(self._container)
 
   def exit(self):
-self._stack.pop()
+if self._container:
+  self._stack.pop()
 
   def __enter__(self):
 self.enter()
diff --git a/sdks/python/apache_beam/metrics/execution_test.py 
b/sdks/python/apache_beam/metrics/execution_test.py
index 2367e35df4d..37d24f3407b 100644
--- a/sdks/python/apache_beam/metrics/execution_test.py
+++ b/sdks/python/apache_beam/metrics/execution_test.py
@@ -18,11 +18,7 @@
 import unittest
 
 from apache_beam.metrics.cells import CellCommitState
-from apache_beam.metrics.execution import MetricKey
 from apache_beam.metrics.execution import MetricsContainer
-from apache_beam.metrics.execution import MetricsEnvironment
-from apache_beam.metrics.execution import ScopedMetricsContainer
-from apache_beam.metrics.metric import Metrics
 from apache_beam.metrics.metricbase import MetricName
 
 
@@ -33,29 +29,6 @@ def test_create_new_counter(self):
 mc.get_counter(MetricName('namespace', 'name'))
 self.assertTrue(MetricName('namespace', 'name') in mc.counters)
 
-  def test_scoped_container(self):
-c1 = MetricsContainer('mystep')
-c2 = MetricsContainer('myinternalstep')
-with ScopedMetricsContainer(c1):
-  self.assertEqual(c1, MetricsEnvironment.current_container())
-  counter = Metrics.counter('ns', 'name')
-  counter.inc(2)
-
-  with ScopedMetricsContainer(c2):
-self.assertEqual(c2, MetricsEnvironment.current_container())
-counter = Metrics.counter('ns', 'name')
-counter.inc(3)
-self.assertEqual(
-list(c2.get_cumulative().counters.items()),
-[(MetricKey('myinternalstep', MetricName('ns', 'name')), 3)])
-
-  self.assertEqual(c1, MetricsEnvironment.current_container())
-  counter = Metrics.counter('ns', 'name')
-  counter.inc(4)
-  self.assertEqual(
-  list(c1.get_cumulative().counters.items()),
-  [(MetricKey('mystep', MetricName('ns', 'name')), 6)])
-
   def test_add_to_counter(self):
 mc = MetricsContainer('astep')
 counter = mc.get_counter(MetricName('namespace', 'name'))
@@ -118,29 +91,5 @@ def test_get_cumulative_or_updates(self):
  set([v.value for _, v in cumulative.gauges.items()]))
 
 
-class TestMetricsEnvironment(unittest.T

[beam] 01/01: Merge pull request #4387 [BEAM-2732] Metrics rely on statesampler state

2018-04-23 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit eb5f7eb581495e34a508114aed762cd80f4ae3b0
Merge: caf9d64 d605391
Author: Robert Bradshaw 
AuthorDate: Mon Apr 23 23:10:14 2018 -0700

Merge pull request #4387 [BEAM-2732] Metrics rely on statesampler state

 sdks/python/apache_beam/metrics/execution.py   | 31 ---
 sdks/python/apache_beam/metrics/execution_test.py  | 51 ---
 sdks/python/apache_beam/metrics/metric_test.py | 60 +++--
 sdks/python/apache_beam/runners/common.py  |  3 +-
 sdks/python/apache_beam/runners/direct/executor.py | 98 ++
 .../runners/direct/transform_evaluator.py  | 53 ++--
 .../apache_beam/runners/worker/bundle_processor.py |  3 +-
 .../apache_beam/runners/worker/operations.py   | 49 ++-
 .../apache_beam/runners/worker/statesampler.py | 32 ++-
 .../runners/worker/statesampler_fast.pyx   | 26 --
 .../runners/worker/statesampler_slow.py| 22 ++---
 sdks/python/apache_beam/transforms/util.py |  1 +
 12 files changed, 226 insertions(+), 203 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[beam] branch master updated (caf9d64 -> eb5f7eb)

2018-04-23 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from caf9d64  Merge pull request #5205 Fix python lint error
 add d605391  Python Metrics now rely on StateSampler state.
 new eb5f7eb  Merge pull request #4387 [BEAM-2732] Metrics rely on 
statesampler state

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/metrics/execution.py   | 31 ---
 sdks/python/apache_beam/metrics/execution_test.py  | 51 ---
 sdks/python/apache_beam/metrics/metric_test.py | 60 +++--
 sdks/python/apache_beam/runners/common.py  |  3 +-
 sdks/python/apache_beam/runners/direct/executor.py | 98 ++
 .../runners/direct/transform_evaluator.py  | 53 ++--
 .../apache_beam/runners/worker/bundle_processor.py |  3 +-
 .../apache_beam/runners/worker/operations.py   | 49 ++-
 .../apache_beam/runners/worker/statesampler.py | 32 ++-
 .../runners/worker/statesampler_fast.pyx   | 26 --
 .../runners/worker/statesampler_slow.py| 22 ++---
 sdks/python/apache_beam/transforms/util.py |  1 +
 12 files changed, 226 insertions(+), 203 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


Jenkins build is back to normal : beam_PerformanceTests_XmlIOIT_HDFS #86

2018-04-23 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Compressed_TextIOIT #413

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
[...truncated 83.11 KB...]
[INFO] Excluding com.google.oauth-client:google-oauth-client-java6:jar:1.23.0 
from the shaded jar.
[INFO] Excluding 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.5.0-SNAPSHOT from 
the shaded jar.
[INFO] Excluding 
org.apache.beam:beam-sdks-java-extensions-protobuf:jar:2.5.0-SNAPSHOT from the 
shaded jar.
[INFO] Excluding io.grpc:grpc-core:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.errorprone:error_prone_annotations:jar:2.0.15 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-context:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.instrumentation:instrumentation-api:jar:0.3.0 from 
the shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-bigquery:jar:v2-rev374-1.23.0 from the 
shaded jar.
[INFO] Excluding com.google.api:gax-grpc:jar:0.20.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.api:api-common:jar:1.0.0-rc2 from the shaded jar.
[INFO] Excluding com.google.auto.value:auto-value:jar:1.5.3 from the shaded jar.
[INFO] Excluding com.google.api:gax:jar:1.3.1 from the shaded jar.
[INFO] Excluding org.threeten:threetenbp:jar:1.3.3 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core-grpc:jar:1.2.0 from the 
shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-pubsub:jar:v1-rev382-1.23.0 from the shaded 
jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 from the 
shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-protobuf:jar:1.23.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson:jar:1.23.0 
from the shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-auth:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-netty:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http2:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-buffer:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-common:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-transport:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-resolver:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding commons-logging:commons-logging:jar:1.2 from the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.opencensus:op

Build failed in Jenkins: beam_PerformanceTests_Python #1186

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
[...truncated 30.28 KB...]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ 
beam-sdks-java-build-tools ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- build-helper-maven-plugin:3.0.0:regex-properties 
(render-artifact-id) @ beam-sdks-java-build-tools ---
[INFO] 
[INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ 
beam-sdks-java-build-tools ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
beam-sdks-java-build-tools ---
[INFO] Skipping because packaging 'jar' is not pom.
[INFO] 
[INFO] --- maven-jar-plugin:3.0.2:test-jar (default-test-jar) @ 
beam-sdks-java-build-tools ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-shade-plugin:3.1.0:shade (bundle-and-repackage) @ 
beam-sdks-java-build-tools ---
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing 

 with 

[INFO] Replacing original test artifact with shaded test artifact.
[INFO] Replacing 

 with 

[INFO] 
[INFO] --- maven-dependency-plugin:3.0.2:analyze-only (default) @ 
beam-sdks-java-build-tools ---
[INFO] No dependency problems found
[INFO] 
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
beam-sdks-java-build-tools ---
[INFO] Installing 

 to 
/home/jenkins/.m2/repository/org/apache/beam/beam-sdks-java-build-tools/2.5.0-SNAPSHOT/beam-sdks-java-build-tools-2.5.0-SNAPSHOT.jar
[INFO] Installing 

 to 
/home/jenkins/.m2/repository/org/apache/beam/beam-sdks-java-build-tools/2.5.0-SNAPSHOT/beam-sdks-java-build-tools-2.5.0-SNAPSHOT.pom
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Beam :: Parent .. SUCCESS [  9.385 s]
[INFO] Apache Beam :: SDKs :: Java :: Build Tools . FAILURE [  5.965 s]
[INFO] Apache Beam :: Model ... SKIPPED
[INFO] Apache Beam :: Model :: Pipeline ... SKIPPED
[INFO] Apache Beam :: Model :: Job Management . SKIPPED
[INFO] Apache Beam :: Model :: Fn Execution ... SKIPPED
[INFO] Apache Beam :: SDKs  SKIPPED
[INFO] Apache Beam :: SDKs :: Go .. SKIPPED
[INFO] Apache Beam :: SDKs :: Go :: Container . SKIPPED
[INFO] Apache Beam :: SDKs :: Java  SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: Core  SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: Fn Execution  SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: Extensions .. SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: Extensions :: Google Cloud Platform Core 
SKIPPED
[INFO] Apache Beam :: Runners . SKIPPED
[INFO] Apache Beam :: Runners :: Core Construction Java ... SKIPPED
[INFO] Apache Beam :: Runners :: Core Java  SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: Harness . SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: Container ... SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: IO .. SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: IO :: Amazon Web Services SKIPPED
[INFO] Apache Beam :: Runners :: Local Java Core .. SKIPPED
[INFO] Apache Beam :: Runners :: Direct Java .. SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: IO :: AMQP .. SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: IO :: Common  SKIPPED
[INFO] Apache Beam :: SDKs :: Java :: IO :

Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #179

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision caf9d6404d25c3e3040edbd0703ad21066a36295 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f caf9d6404d25c3e3040edbd0703ad21066a36295
Commit message: "Merge pull request #5205 Fix python lint error"
 > git rev-list --no-walk 0f2ba71e1b6db88ed79744e363586a8ff16dcb08 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_HadoopInputFormat] $ /bin/bash -xe 
/tmp/jenkins5794839528639777961.sh
+ gcloud container clusters get-credentials io-datastores --zone=us-central1-a 
--verbosity=debug
DEBUG: Running gcloud.container.clusters.get-credentials with 
Namespace(__calliope_internal_deepest_parser=ArgumentParser(prog='gcloud.container.clusters.get-credentials',
 usage=None, description='See 
https://cloud.google.com/container-engine/docs/kubectl for\nkubectl 
documentation.', version=None, formatter_class=, conflict_handler='error', add_help=False), 
account=None, api_version=None, authority_selector=None, 
authorization_token_file=None, cmd_func=>, 
command_path=['gcloud', 'container', 'clusters', 'get-credentials'], 
configuration=None, credential_file_override=None, document=None, format=None, 
h=None, help=None, http_timeout=None, log_http=None, name='io-datastores', 
project=None, quiet=None, trace_email=None, trace_log=None, trace_token=None, 
user_output_enabled=None, verbosity='debug', version=None, 
zone='us-central1-a').
WARNING: Accessing a Container Engine cluster requires the kubernetes 
commandline
client [kubectl]. To install, run
  $ gcloud components install kubectl

Fetching cluster endpoint and auth data.
DEBUG: Saved kubeconfig to /home/jenkins/.kube/config
kubeconfig entry generated for io-datastores.
INFO: Display format "default".
[beam_PerformanceTests_HadoopInputFormat] $ /bin/bash -xe 
/tmp/jenkins3821885580679605052.sh
+ cp /home/jenkins/.kube/config 

[beam_PerformanceTests_HadoopInputFormat] $ /bin/bash -xe 
/tmp/jenkins6748059615823201583.sh
+ kubectl 
--kubeconfig=
 create namespace hadoopinputformatioit-1524546085809
namespace "hadoopinputformatioit-1524546085809" created
[beam_PerformanceTests_HadoopInputFormat] $ /bin/bash -xe 
/tmp/jenkins613008953493198993.sh
++ kubectl config current-context
+ kubectl 
--kubeconfig=
 config set-context gke_apache-beam-testing_us-central1-a_io-datastores 
--namespace=hadoopinputformatioit-1524546085809
error: open /home/jenkins/.kube/config.lock: file exists
Build step 'Execute shell' marked build as failure


Build failed in Jenkins: beam_PerformanceTests_JDBC #490

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[sidhom] Fix python lint error

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam7 (beam) in workspace 

Cloning the remote Git repository
Cloning repository https://github.com/apache/beam.git
 > git init  # 
 > timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision caf9d6404d25c3e3040edbd0703ad21066a36295 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f caf9d6404d25c3e3040edbd0703ad21066a36295
Commit message: "Merge pull request #5205 Fix python lint error"
 > git rev-list --no-walk 0f2ba71e1b6db88ed79744e363586a8ff16dcb08 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins6949273945394657010.sh
+ gcloud container clusters get-credentials io-datastores --zone=us-central1-a 
--verbosity=debug
DEBUG: Running [gcloud.container.clusters.get-credentials] with arguments: 
[--verbosity: "debug", --zone: "us-central1-a", NAME: "io-datastores"]
Fetching cluster endpoint and auth data.
DEBUG: Saved kubeconfig to /home/jenkins/.kube/config
kubeconfig entry generated for io-datastores.
INFO: Display format "default".
DEBUG: SDK update checks are disabled.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4963278871881494230.sh
+ cp /home/jenkins/.kube/config 

[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins2998858072292067768.sh
+ kubectl 
--kubeconfig=
 create namespace jdbcioit-1524546092323
namespace "jdbcioit-1524546092323" created
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4980222838293174664.sh
++ kubectl config current-context
+ kubectl 
--kubeconfig=
 config set-context gke_apache-beam-testing_us-central1-a_io-datastores 
--namespace=jdbcioit-1524546092323
Context "gke_apache-beam-testing_us-central1-a_io-datastores" modified.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4864705631565302316.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4517336094340283130.sh
+ rm -rf .env
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4876090899299410933.sh
+ virtualenv .env --system-site-packages
New python executable in .env/bin/python
Installing setuptools, pip...done.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins7021530130634548446.sh
+ .env/bin/pip install --upgrade setuptools pip
Downloading/unpacking setuptools from 
https://files.pythonhosted.org/packages/20/d7/04a0b689d3035143e2ff288f4b9ee4bf6ed80585cc121c90bfd85a1a8c2e/setuptools-39.0.1-py2.py3-none-any.whl#sha256=8010754433e3211b9cdbbf784b50f30e80bf40fc6b05eb5f865fab83300599b8
Downloading/unpacking pip from 
https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl#sha256=717cdffb2833be8409433a93746744b59505f42146e8d37de6c62b430e25d6d7
Installing collected packages: setuptools, pip
  Found existing installation: setuptools 2.2
Uninstalling setuptools:
  Successfully uninstalled setuptools
  Found existing installation: pip 1.5.4
Uninstalling pip:
  Successfully uninstalled pip
Successfully installed setuptools pip
Cleaning up...
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins3683323884888611

Jenkins build is back to normal : beam_PostCommit_Python_Verify #4773

2018-04-23 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Python_Verify #4772

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

--
[...truncated 1.06 MB...]
test_getitem_param_must_be_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_have_length_2 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_proxy_to_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_constraint_compatibility 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_repr (apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_proxy_to_union 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_getitem_sequence_not_allowed 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_any_return_type_hint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_primitive_type_or_type_constraint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_single_return_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_no_kwargs_accepted 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_composite_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_simple_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_violation 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_compatibility (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_invalid_elem_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_must_be_set 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_composite_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_simple_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_any_argument_type_hint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_basic_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_composite_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_invalid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_must_be_primitive_type_or_constraint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_mix_positional_and_keyword_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_simple_type_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_functions_as_regular_generator 
(apache_beam.typehints.typehints_test.TestGeneratorWrapper) ... ok
test_compatibility (apache_beam.typehints.typehints_test.TupleHintTestCase) ... 
ok
test_compatibility_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_invalid_ellipsis_type_param 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_params_must_be_type_or_constraint 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_raw_tuple (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_types 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_must_be_tuple 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_must_hav

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Apex_Gradle #182

2018-04-23 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #212

2018-04-23 Thread Apache Jenkins Server
See 




[beam] 01/01: Merge pull request #5205 Fix python lint error

2018-04-23 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit caf9d6404d25c3e3040edbd0703ad21066a36295
Merge: 4cf96e7 628b3ba
Author: Robert Bradshaw 
AuthorDate: Mon Apr 23 20:57:52 2018 -0700

Merge pull request #5205 Fix python lint error

 sdks/python/apache_beam/runners/portability/fn_api_runner.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94446
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 24/Apr/18 03:56
Start Date: 24/Apr/18 03:56
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5191: 
[BEAM-4097] Set environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#discussion_r183599213
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/universal_local_runner.py
 ##
 @@ -83,6 +86,16 @@ def cleanup(self):
   time.sleep(0.1)
 self._subprocess = None
 
+  @staticmethod
+  def default_docker_image():
+if 'USER' in os.environ:
+  # Perhaps also test if this was built?
+  logging.info('Using latest built Python SDK docker image.')
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94446)
Time Spent: 2h 40m  (was: 2.5h)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (4cf96e7 -> caf9d64)

2018-04-23 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 4cf96e7  Merge pull request #5200: [BEAM-4129] Run WordCount example 
on Gearpump runner with Gradle
 add 628b3ba  Fix python lint error
 new caf9d64  Merge pull request #5205 Fix python lint error

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/runners/portability/fn_api_runner.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


Build failed in Jenkins: beam_PostCommit_Python_Verify #4771

2018-04-23 Thread Apache Jenkins Server
See 


--
[...truncated 1.06 MB...]
test_type_check_violation_valid_simple_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_enforce_kv_type_constraint 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_be_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_have_length_2 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_proxy_to_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_constraint_compatibility 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_repr (apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_proxy_to_union 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_getitem_sequence_not_allowed 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_any_return_type_hint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_primitive_type_or_type_constraint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_single_return_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_no_kwargs_accepted 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_composite_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_simple_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_violation 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_compatibility (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_invalid_elem_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_must_be_set 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_composite_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_simple_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_any_argument_type_hint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_basic_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_composite_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_invalid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_must_be_primitive_type_or_constraint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_mix_positional_and_keyword_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_simple_type_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_functions_as_regular_generator 
(apache_beam.typehints.typehints_test.TestGeneratorWrapper) ... ok
test_compatibility (apache_beam.typehints.typehints_test.TupleHintTestCase) ... 
ok
test_compatibility_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_invalid_ellipsis_type_param 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_params_must_be_type_or_constraint 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_raw_tuple (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_types 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_must_be_tuple 
(apache_beam.typehints.typehints_test.TupleH

[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94445&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94445
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 24/Apr/18 03:55
Start Date: 24/Apr/18 03:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5191: 
[BEAM-4097] Set environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#discussion_r183599178
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/universal_local_runner.py
 ##
 @@ -83,6 +86,16 @@ def cleanup(self):
   time.sleep(0.1)
 self._subprocess = None
 
+  @staticmethod
+  def default_docker_image():
+if 'USER' in os.environ:
+  # Perhaps also test if this was built?
+  logging.info('Using latest built Python SDK docker image.')
+  return os.environ['USER'] + 
'-docker.apache.bintray.io/beam/python:latest'
+else:
+  logging.warnign('Could not find a Python SDK docker image.')
 
 Review comment:
   USER is not set in the jenkins environment; when have enough for a full 
end-to-end test we may need to pass the appropriate path manually (which is 
probably better anyway...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94445)
Time Spent: 2.5h  (was: 2h 20m)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4129) Update Gearpump runner instructions for running WordCount example

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4129?focusedWorklogId=94442&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94442
 ]

ASF GitHub Bot logged work on BEAM-4129:


Author: ASF GitHub Bot
Created on: 24/Apr/18 03:40
Start Date: 24/Apr/18 03:40
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5200: [BEAM-4129] Run 
WordCount example on Gearpump runner with Gradle
URL: https://github.com/apache/beam/pull/5200
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/release/src/main/groovy/quickstart-java-gearpump.groovy 
b/release/src/main/groovy/quickstart-java-gearpump.groovy
new file mode 100644
index 000..85f2d23f476
--- /dev/null
+++ b/release/src/main/groovy/quickstart-java-gearpump.groovy
@@ -0,0 +1,43 @@
+#!groovy
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+t = new TestScripts(args)
+
+/*
+ * Run the Gearpump quickstart from 
https://beam.apache.org/get-started/quickstart-java/
+ */
+
+t.describe 'Run Apache Beam Java SDK Quickstart - Gearpump'
+
+t.intent 'Gets the WordCount Example Code'
+QuickstartArchetype.generate(t)
+
+t.intent 'Runs the WordCount Code with Gearpump runner'
+// Run the wordcount example with the gearpump runner
+t.run """mvn compile exec:java -q \
+  -Dexec.mainClass=org.apache.beam.examples.WordCount \
+  -Dexec.args="--inputFile=pom.xml --output=counts \
+  --runner=TestGearpumpRunner" -Pgearpump-runner"""
+
+// Verify text from the pom.xml input file
+String result = t.run "grep Foundation counts*"
+t.see "Foundation: 1", result
+
+// Clean up
+t.done()
diff --git a/runners/gearpump/README.md b/runners/gearpump/README.md
deleted file mode 100644
index d447d1b25a1..000
--- a/runners/gearpump/README.md
+++ /dev/null
@@ -1,61 +0,0 @@
-
-
-## Gearpump Beam Runner
-
-The Gearpump Beam runner allows users to execute pipelines written using the 
Apache Beam programming API with Apache Gearpump (incubating) as an execution 
engine.
-
-##Getting Started
-
-The following shows how to run the WordCount example that is provided with the 
source code on Beam.
-
-###Installing Beam
-
-To get the latest version of Beam with Gearpump-Runner, first clone the Beam 
repository:
-
-```
-git clone https://github.com/apache/beam
-git checkout gearpump-runner
-```
-
-Then run Gradle to build Apache Beam:
-
-```
-./gradlew :beam-runners-gearpump:build
-```
-
-###Running Wordcount Example
-
-Download something to count:
-
-```
-curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt > /tmp/kinglear.txt
-```
-
-> Note: There is an open issue to update this README for Gradle:
-[BEAM-4129](https://issues.apache.org/jira/browse/BEAM-4129).
-
-Run the pipeline, using the Gearpump runner:
-
-```
-cd examples/java
-mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount 
-Dexec.args="--inputFile=/tmp/kinglear.txt --output=/tmp/wordcounts.txt 
--runner=TestGearpumpRunner" -Pgearpump-runner
-```
-
-Once completed, check the output file /tmp/wordcounts.txt-0-of-1
diff --git a/runners/gearpump/build.gradle b/runners/gearpump/build.gradle
index c85d5cf7340..06007ca1534 100644
--- a/runners/gearpump/build.gradle
+++ b/runners/gearpump/build.gradle
@@ -86,3 +86,5 @@ task validatesRunner {
   description "Validates Gearpump runner"
   dependsOn validatesRunnerStreaming
 }
+
+createJavaExamplesArchetypeValidationTask(type: 'Quickstart', runner: 
'Gearpump')
\ No newline at end of file
diff --git 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
index 4ccedca5bbf..dcd9748311a 100644
--- 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
+++ 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
@@ -295,6 +295,17 @@
 

[beam] branch master updated (3740868 -> 4cf96e7)

2018-04-23 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 3740868  Merge pull request #5154: [BEAM-4044] [SQL] Make 
BeamCalciteTable self planning
 add 1e99451  [BEAM-4129] Run WordCount example on Gearpump runner with 
Gradle
 new 4cf96e7  Merge pull request #5200: [BEAM-4129] Run WordCount example 
on Gearpump runner with Gradle

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 ...park.groovy => quickstart-java-gearpump.groovy} | 26 -
 runners/gearpump/README.md | 61 --
 runners/gearpump/build.gradle  |  2 +
 .../src/main/resources/archetype-resources/pom.xml | 11 
 4 files changed, 26 insertions(+), 74 deletions(-)
 copy release/src/main/groovy/{quickstart-java-spark.groovy => 
quickstart-java-gearpump.groovy} (62%)
 delete mode 100644 runners/gearpump/README.md

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Work logged] (BEAM-4129) Update Gearpump runner instructions for running WordCount example

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4129?focusedWorklogId=94441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94441
 ]

ASF GitHub Bot logged work on BEAM-4129:


Author: ASF GitHub Bot
Created on: 24/Apr/18 03:40
Start Date: 24/Apr/18 03:40
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5200: [BEAM-4129] Run 
WordCount example on Gearpump runner with Gradle
URL: https://github.com/apache/beam/pull/5200#issuecomment-383793747
 
 
   Great. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94441)
Time Spent: 40m  (was: 0.5h)

> Update Gearpump runner instructions for running WordCount example
> -
>
> Key: BEAM-4129
> URL: https://issues.apache.org/jira/browse/BEAM-4129
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump
>Reporter: Scott Wegner
>Assignee: Manu Zhang
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With the move to Gradle, the Gearpump README.md needs to be updated with 
> instructions on how to execute WordCount with the new build system.
> [https://github.com/apache/beam/blob/master/runners/gearpump/README.md]
>  
> It may be easiest to generate a quickstart task similar to other runners: 
> https://github.com/apache/beam/blob/3cdecd0e3e28dd39990d7d1dea8931e871a6ba62/runners/flink/build.gradle#L135



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5200: [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

2018-04-23 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 4cf96e77295d8f2a28b2a5ee45c093c22cd3e961
Merge: 3740868 1e99451
Author: Kenn Knowles 
AuthorDate: Mon Apr 23 20:40:55 2018 -0700

Merge pull request #5200: [BEAM-4129] Run WordCount example on Gearpump 
runner with Gradle

 .../main/groovy/quickstart-java-gearpump.groovy| 43 +++
 runners/gearpump/README.md | 61 --
 runners/gearpump/build.gradle  |  2 +
 .../src/main/resources/archetype-resources/pom.xml | 11 
 4 files changed, 56 insertions(+), 61 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Work logged] (BEAM-4044) Take advantage of Calcite DDL

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4044?focusedWorklogId=94439&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94439
 ]

ASF GitHub Bot logged work on BEAM-4044:


Author: ASF GitHub Bot
Created on: 24/Apr/18 03:35
Start Date: 24/Apr/18 03:35
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5154: [BEAM-4044] 
[SQL] Make BeamCalciteTable self planning
URL: https://github.com/apache/beam/pull/5154
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java
 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java
index 7f77668379c..eeaf36e7af3 100644
--- 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java
+++ 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java
@@ -20,8 +20,8 @@
 import java.io.IOException;
 import java.io.ObjectInputStream;
 import java.io.Serializable;
-import java.util.HashMap;
-import java.util.Map;
+import java.util.Collection;
+import java.util.List;
 import org.apache.beam.sdk.coders.RowCoder;
 import org.apache.beam.sdk.extensions.sql.BeamSql;
 import org.apache.beam.sdk.extensions.sql.BeamSqlCli;
@@ -29,6 +29,8 @@
 import org.apache.beam.sdk.extensions.sql.BeamSqlUdf;
 import org.apache.beam.sdk.extensions.sql.impl.interpreter.operator.UdafImpl;
 import org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSinkRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
 import org.apache.beam.sdk.extensions.sql.impl.schema.BaseBeamTable;
 import org.apache.beam.sdk.extensions.sql.impl.schema.BeamPCollectionTable;
 import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
@@ -39,19 +41,22 @@
 import org.apache.beam.sdk.values.PCollectionTuple;
 import org.apache.beam.sdk.values.Row;
 import org.apache.beam.sdk.values.TupleTag;
-import org.apache.calcite.DataContext;
-import org.apache.calcite.config.CalciteConnectionConfig;
-import org.apache.calcite.linq4j.Enumerable;
+import org.apache.calcite.adapter.java.AbstractQueryableTable;
+import org.apache.calcite.jdbc.CalciteSchema;
+import org.apache.calcite.linq4j.QueryProvider;
+import org.apache.calcite.linq4j.Queryable;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.prepare.Prepare;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.TableModify;
 import org.apache.calcite.rel.type.RelDataType;
 import org.apache.calcite.rel.type.RelDataTypeFactory;
-import org.apache.calcite.schema.ScannableTable;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.schema.ModifiableTable;
 import org.apache.calcite.schema.SchemaPlus;
-import org.apache.calcite.schema.Statistic;
-import org.apache.calcite.schema.Statistics;
+import org.apache.calcite.schema.TranslatableTable;
 import org.apache.calcite.schema.impl.ScalarFunctionImpl;
-import org.apache.calcite.sql.SqlCall;
-import org.apache.calcite.sql.SqlNode;
-import org.apache.calcite.tools.Frameworks;
 
 /**
  * {@link BeamSqlEnv} prepares the execution context for {@link BeamSql} and 
{@link BeamSqlCli}.
@@ -60,21 +65,19 @@
  * {@link BeamQueryPlanner} which parse/validate/optimize/translate input SQL 
queries.
  */
 public class BeamSqlEnv implements Serializable {
-  transient SchemaPlus schema;
+  transient CalciteSchema schema;
   transient BeamQueryPlanner planner;
-  transient Map tables;
 
   public BeamSqlEnv() {
-tables = new HashMap<>(16);
-schema = Frameworks.createRootSchema(true);
-planner = new BeamQueryPlanner(this, schema);
+schema = CalciteSchema.createRootSchema(true);
+planner = new BeamQueryPlanner(this, schema.plus());
   }
 
   /**
* Register a UDF function which can be used in SQL expression.
*/
   public void registerUdf(String functionName, Class clazz, String method) {
-schema.add(functionName, ScalarFunctionImpl.create(clazz, method));
+schema.plus().add(functionName, ScalarFunctionImpl.create(clazz, method));
   }
 
   /**
@@ -97,7 +100,7 @@ public void registerUdf(String functionName, 
SerializableFunction sfn) {
* See {@link org.apache.beam.sdk.transforms.Combine.CombineFn} on how to 
implement a UDAF.
*/
   public void registerUdaf(String functionName, Combine.CombineFn combineFn) {
-schema.add(functionName, new UdafImpl(combineFn));
+schema.plus().add(functionName, new UdafImpl(combineFn));
   }
 
   /

[jira] [Work logged] (BEAM-3979) New DoFn should allow injecting of all parameters in ProcessContext

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3979?focusedWorklogId=94428&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94428
 ]

ASF GitHub Bot logged work on BEAM-3979:


Author: ASF GitHub Bot
Created on: 24/Apr/18 02:33
Start Date: 24/Apr/18 02:33
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #4989: 
[BEAM-3979] Start completing the new DoFn vision: plumb context parameters into 
process functions.
URL: https://github.com/apache/beam/pull/4989#discussion_r183590326
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
 ##
 @@ -361,7 +362,18 @@ public void setup() {
 public void processElement(final ProcessContext c) {
   final InputT element = c.element().getKey();
   invoker.invokeSplitRestriction(
-  element, c.element().getValue(), part -> c.output(KV.of(element, 
part)));
+  element, c.element().getValue(), new OutputReceiver() {
+@Override
+public void output(RestrictionT part) {
+  c.output(KV.of(element, part));
+}
+
+@Override
+public void outputWithTimestamp(RestrictionT part, Instant 
timestamp) {
+  c.outputWithTimestamp(KV.of(element, part), timestamp);
 
 Review comment:
   
   
   > **jkff** wrote:
   > As a tie-breaker: the dynamic splitting API does not allow you to assign 
different timestamps to the restrictions, and neither should the static one.
   
   
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94428)
Time Spent: 2h 40m  (was: 2.5h)

> New DoFn should allow injecting of all parameters in ProcessContext
> ---
>
> Key: BEAM-3979
> URL: https://issues.apache.org/jira/browse/BEAM-3979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This was intended in the past, but never completed. Ideally all primitive 
> parameters in ProcessContext should be injectable, and OutputReceiver 
> parameters can be used to collection output. So, we should be able to write a 
> DoFn as follows
> @ProcessElement
> public void process(@Element String word, OutputReceiver receiver) {
>   receiver.output(word.toUpperCase());
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2163) Remove the dependency on examples from ptransform_test

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2163?focusedWorklogId=94427&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94427
 ]

ASF GitHub Bot logged work on BEAM-2163:


Author: ASF GitHub Bot
Created on: 24/Apr/18 02:23
Start Date: 24/Apr/18 02:23
Worklog Time Spent: 10m 
  Work Description: JavierAntonioGonzalezTrejo commented on issue #5199: 
[BEAM-2163] Remove the dependency on examples from ptransform_test
URL: https://github.com/apache/beam/pull/5199#issuecomment-383782431
 
 
   It makes sense to put the class on the iobase.py  or is preferable to create 
a single file for this class?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94427)
Time Spent: 0.5h  (was: 20m)

> Remove the dependency on examples from ptransform_test
> --
>
> Key: BEAM-2163
> URL: https://issues.apache.org/jira/browse/BEAM-2163
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sourabh Bajaj
>Priority: Major
>  Labels: newbie, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform_test.py#L176
> This validates runner test depends on the Counting source snippet and the 
> source should be copied here instead of this dependency.
> The actual beam code should not depend on the examples package at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94421
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:36
Start Date: 24/Apr/18 01:36
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183583485
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go
 ##
 @@ -0,0 +1,89 @@
+package harness
+
+import (
+   "context"
+   "io"
+   "io/ioutil"
+   "log"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec"
+   pb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1"
+)
+
+type fakeClient struct {
+   t *testing.T
+   done  chan bool
+   calls int
+}
+
+func (f *fakeClient) Recv() (*pb.Elements, error) {
+   f.calls++
+   data := []byte{1, 2, 3, 4}
+   elemData := pb.Elements_Data{
+   InstructionReference: "inst_ref",
+   Data: data,
+   Target: &pb.Target{
+   PrimitiveTransformReference: "ptr",
+   Name: "instruction_name",
+   },
+   }
+
+   msg := pb.Elements{}
+
+   for i := 0; i < bufElements+1; i++ {
+   msg.Data = append(msg.Data, &elemData)
+   }
+
+   // The first two calls fill up the buffer completely to stimulate the 
deadlock
 
 Review comment:
   
   
   > **wcn3** wrote:
   > Stimulate is more properly correct here, since I'm creating the actual 
condition of the deadlock. It's not a simulation of a deadlock; without the 
fix, it truly deadlocks. :)
   > 
   > If the word is confusing due to similarity, happy to replace with 'to 
force the deadlock'
   
   
   I could have justified either way, so it was largely a typo check. Thanks 
for the clarification!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94421)
Time Spent: 1h 50m  (was: 1h 40m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94417
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:18
Start Date: 24/Apr/18 01:18
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183581455
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr.go
 ##
 @@ -29,6 +29,15 @@ import (
 )
 
 const chunkSize = int(4e6) // Bytes to put in a single gRPC message. Max is 
slightly higher.
+const bufElements = 20 // Number of chunks buffered per reader.
+
+// This is a reduced version of the full gRPC interface to help with testing.
+// TODO(wcn): need a compile-time assertion to make sure this stays synced 
with what's
 
 Review comment:
   The goal of that compile-time assertion would be to make the breakage more 
obvious because this creates a cleaner breadcrumb. I'd meant to do the TODO, so 
let me either implement it or remove it. Leaving it here isn't an option to 
pursue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94417)
Time Spent: 1h 40m  (was: 1.5h)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94412
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:17
Start Date: 24/Apr/18 01:17
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183580220
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr.go
 ##
 @@ -29,6 +29,15 @@ import (
 )
 
 const chunkSize = int(4e6) // Bytes to put in a single gRPC message. Max is 
slightly higher.
 
 Review comment:
   nit: use const block


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94412)
Time Spent: 1h  (was: 50m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94414&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94414
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:17
Start Date: 24/Apr/18 01:17
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183581292
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go
 ##
 @@ -0,0 +1,89 @@
+package harness
+
+import (
+   "context"
+   "io"
+   "io/ioutil"
+   "log"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec"
+   pb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1"
+)
+
+type fakeClient struct {
+   t *testing.T
+   done  chan bool
+   calls int
+}
+
+func (f *fakeClient) Recv() (*pb.Elements, error) {
+   f.calls++
+   data := []byte{1, 2, 3, 4}
+   elemData := pb.Elements_Data{
+   InstructionReference: "inst_ref",
+   Data: data,
+   Target: &pb.Target{
+   PrimitiveTransformReference: "ptr",
+   Name: "instruction_name",
+   },
+   }
+
+   msg := pb.Elements{}
+
+   for i := 0; i < bufElements+1; i++ {
+   msg.Data = append(msg.Data, &elemData)
+   }
+
+   // The first two calls fill up the buffer completely to stimulate the 
deadlock
 
 Review comment:
   Stimulate is more properly correct here, since I'm creating the actual 
condition of the deadlock. It's not a simulation of a deadlock; without the 
fix, it truly deadlocks. :)
   
   If the word is confusingly due to similarity, happy to replace with 'to 
force the deadlock'


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94414)
Time Spent: 1h 20m  (was: 1h 10m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94415
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:17
Start Date: 24/Apr/18 01:17
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183581292
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go
 ##
 @@ -0,0 +1,89 @@
+package harness
+
+import (
+   "context"
+   "io"
+   "io/ioutil"
+   "log"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec"
+   pb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1"
+)
+
+type fakeClient struct {
+   t *testing.T
+   done  chan bool
+   calls int
+}
+
+func (f *fakeClient) Recv() (*pb.Elements, error) {
+   f.calls++
+   data := []byte{1, 2, 3, 4}
+   elemData := pb.Elements_Data{
+   InstructionReference: "inst_ref",
+   Data: data,
+   Target: &pb.Target{
+   PrimitiveTransformReference: "ptr",
+   Name: "instruction_name",
+   },
+   }
+
+   msg := pb.Elements{}
+
+   for i := 0; i < bufElements+1; i++ {
+   msg.Data = append(msg.Data, &elemData)
+   }
+
+   // The first two calls fill up the buffer completely to stimulate the 
deadlock
 
 Review comment:
   Stimulate is more properly correct here, since I'm creating the actual 
condition of the deadlock. It's not a simulation of a deadlock; without the 
fix, it truly deadlocks. :)
   
   If the word is confusing due to similarity, happy to replace with 'to force 
the deadlock'


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94415)
Time Spent: 1.5h  (was: 1h 20m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94413&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94413
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:17
Start Date: 24/Apr/18 01:17
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183581212
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go
 ##
 @@ -0,0 +1,89 @@
+package harness
+
+import (
+   "context"
+   "io"
+   "io/ioutil"
+   "log"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec"
+   pb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1"
+)
+
+type fakeClient struct {
+   t *testing.T
+   done  chan bool
+   calls int
+}
+
+func (f *fakeClient) Recv() (*pb.Elements, error) {
+   f.calls++
+   data := []byte{1, 2, 3, 4}
+   elemData := pb.Elements_Data{
+   InstructionReference: "inst_ref",
+   Data: data,
+   Target: &pb.Target{
+   PrimitiveTransformReference: "ptr",
+   Name: "instruction_name",
+   },
+   }
+
+   msg := pb.Elements{}
+
+   for i := 0; i < bufElements+1; i++ {
+   msg.Data = append(msg.Data, &elemData)
+   }
+
+   // The first two calls fill up the buffer completely to stimulate the 
deadlock
+   // The third call ends the data stream normally.
+   // Subsequent calls return no data.
+   switch f.calls {
+   case 1:
+   return &msg, nil
+   case 2:
+   return &msg, nil
+   case 3:
+   elemData.Data = []byte{}
+   msg.Data = []*pb.Elements_Data{&elemData}
+   // Broadcasting done here means that this code providing 
messages
+   // has not been blocked by the bug blocking the dataReader
+   // from getting more messages.
+   return &msg, nil
+   default:
+   f.done <- true
+   return nil, io.EOF
+   }
+}
+
+func (f *fakeClient) Send(*pb.Elements) error {
+   return nil
+}
+
+func TestDataChannelTerminateOnClose(t *testing.T) {
+   // The logging of channels closed is quite noisy for this test
+   log.SetOutput(ioutil.Discard)
+   done := make(chan bool, 1)
+   client := &fakeClient{t: t, done: done}
+   c, err := makeDataChannel(context.Background(), nil, client, 
exec.Port{})
+   if err != nil {
+   t.Errorf("Unexpected error in makeDataChannel: %v", err)
+   }
+
+   r, err := c.OpenRead(context.Background(), exec.StreamID{Port: 
exec.Port{URL: ""}, Target: exec.Target{ID: "ptr", Name: "instruction_name"}, 
InstID: "inst_ref"})
+   var read = make([]byte, 4)
+
+   // We don't read up all the buffered data, but immediately close the 
reader.
+   // Previously, since nothing was consuming the incoming gRPC data, the 
whole
+   // data channel would get stuck, and the client.Recv() call was 
eventually
+   // no longer called.
+   _, err = r.Read(read)
+   if err != nil {
+   t.Errorf("Unexpected error from read: %v", err)
+   }
+   r.Close()
+
+   // If done is signaled, that means client.Recv() has been called to 
flush the
+   // channel, meaning consumer code isn't stuck.
+   <-done
 
 Review comment:
   Maybe a timeout to catch that we're stuck and fail the test with an explicit 
error?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94413)
Time Spent: 1h 10m  (was: 1h)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the i

[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94411
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:14
Start Date: 24/Apr/18 01:14
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183581015
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr.go
 ##
 @@ -29,6 +29,15 @@ import (
 )
 
 const chunkSize = int(4e6) // Bytes to put in a single gRPC message. Max is 
slightly higher.
+const bufElements = 20 // Number of chunks buffered per reader.
+
+// This is a reduced version of the full gRPC interface to help with testing.
+// TODO(wcn): need a compile-time assertion to make sure this stays synced 
with what's
 
 Review comment:
   
   If it weren't indirected by the "check in the generated code" step, I'd say 
that line 107 already does it, since it should be checked by the compiler at 
that point. Where makeDataChannel is called.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94411)
Time Spent: 50m  (was: 40m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94410&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94410
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:14
Start Date: 24/Apr/18 01:14
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183581014
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go
 ##
 @@ -0,0 +1,89 @@
+package harness
+
+import (
+   "context"
+   "io"
+   "io/ioutil"
+   "log"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec"
+   pb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1"
+)
+
+type fakeClient struct {
+   t *testing.T
+   done  chan bool
+   calls int
+}
+
+func (f *fakeClient) Recv() (*pb.Elements, error) {
+   f.calls++
+   data := []byte{1, 2, 3, 4}
+   elemData := pb.Elements_Data{
+   InstructionReference: "inst_ref",
+   Data: data,
+   Target: &pb.Target{
+   PrimitiveTransformReference: "ptr",
+   Name: "instruction_name",
+   },
+   }
+
+   msg := pb.Elements{}
+
+   for i := 0; i < bufElements+1; i++ {
+   msg.Data = append(msg.Data, &elemData)
+   }
+
+   // The first two calls fill up the buffer completely to stimulate the 
deadlock
 
 Review comment:
   
   stimulate or simulate?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94410)
Time Spent: 40m  (was: 0.5h)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94407
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:05
Start Date: 24/Apr/18 01:05
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5191: [BEAM-4097] Set 
environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#issuecomment-383769765
 
 
   FYI, there was another upstream lint issues that was causing python 
precommits to fail. #5205 should fix that. Hopefully we'll start seeing green 
after the typo fix here...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94407)
Time Spent: 2h 10m  (was: 2h)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94406
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:05
Start Date: 24/Apr/18 01:05
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #5184: 
BEAM-4141: Drain source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#discussion_r183579925
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/harness/datamgr_test.go
 ##
 @@ -0,0 +1,89 @@
+package harness
 
 Review comment:
   Forgot the apache header. The RAT check fails.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94406)
Time Spent: 0.5h  (was: 20m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94408&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94408
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 24/Apr/18 01:05
Start Date: 24/Apr/18 01:05
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5191: [BEAM-4097] Set 
environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#issuecomment-383769765
 
 
   FYI, there was another upstream lint issue that was causing python 
precommits to fail. #5205 should fix that. Hopefully we'll start seeing green 
after the typo fix here...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94408)
Time Spent: 2h 20m  (was: 2h 10m)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94403
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:54
Start Date: 24/Apr/18 00:54
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on issue #5184: BEAM-4141: Drain source 
when user function processing fails.
URL: https://github.com/apache/beam/pull/5184#issuecomment-383768148
 
 
   R: @herohde @lostluck 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94403)
Time Spent: 20m  (was: 10m)

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94402&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94402
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:53
Start Date: 24/Apr/18 00:53
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5189: [BEAM-3327] Basic 
Docker environment factory
URL: https://github.com/apache/beam/pull/5189#issuecomment-383767932
 
 
   History should be clean now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94402)
Time Spent: 21h 20m  (was: 21h 10m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 21h 20m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4071) Portable Runner Job API shim

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4071?focusedWorklogId=94401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94401
 ]

ASF GitHub Bot logged work on BEAM-4071:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:52
Start Date: 24/Apr/18 00:52
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5150:  [BEAM-4071] Add 
Portable Runner Job API shim
URL: https://github.com/apache/beam/pull/5150#issuecomment-383767860
 
 
   Commits should be clean now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94401)
Time Spent: 17.5h  (was: 17h 20m)

> Portable Runner Job API shim
> 
>
> Key: BEAM-4071
> URL: https://issues.apache.org/jira/browse/BEAM-4071
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> There needs to be a way to execute Java-SDK pipelines against a portable job 
> server. The job server itself is expected to be started up out-of-band. The 
> "PortableRunner" should take an option indicating the Job API endpoint and 
> defer other runner configurations to the backend itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Apex_Gradle #181

2018-04-23 Thread Apache Jenkins Server
See 


--
[...truncated 14.74 MB...]
INFO: Deploy request: 
[OperatorDeployInfo[id=9,name=PAssert$171/GroupGlobally/GatherAllOutputs/GroupByKey,type=GENERIC,checkpoint={,
 0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=input,streamId=stream2,sourceNodeId=8,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=output,streamId=stream12,bufferServer=localhost
Apr 24, 2018 12:52:25 AM 
com.datatorrent.bufferserver.server.Server$UnidentifiedClient onMessage
INFO: Received publisher request: PublishRequestTuple{version=1.0, 
identifier=2.output.2, windowId=}
Apr 24, 2018 12:52:25 AM 
com.datatorrent.bufferserver.server.Server$UnidentifiedClient onMessage
INFO: Received subscriber request: SubscribeRequestTuple{version=1.0, 
identifier=tcp://localhost:36073/1.output.1, windowId=, 
type=stream7/2.input, upstreamIdentifier=1.output.1, mask=0, partitions=null, 
bufferSize=1024}
Apr 24, 2018 12:52:25 AM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Deploy request: 
[OperatorDeployInfo[id=14,name=PAssert$171/GroupGlobally/WindowIntoDummy/Window.Assign,type=GENERIC,checkpoint={,
 0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=inputPort,streamId=stream8,sourceNodeId=13,sourcePortName=output,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=stream9,bufferServer=localhost
Apr 24, 2018 12:52:25 AM 
com.datatorrent.bufferserver.server.Server$UnidentifiedClient onMessage
INFO: Received publisher request: PublishRequestTuple{version=1.0, 
identifier=9.output.9, windowId=}
Apr 24, 2018 12:52:25 AM 
com.datatorrent.bufferserver.server.Server$UnidentifiedClient onMessage
INFO: Received subscriber request: SubscribeRequestTuple{version=1.0, 
identifier=tcp://localhost:36073/8.outputPort.8, windowId=, 
type=stream2/9.input, upstreamIdentifier=8.outputPort.8, mask=0, 
partitions=null, bufferSize=1024}
Apr 24, 2018 12:52:25 AM 
com.datatorrent.bufferserver.server.Server$UnidentifiedClient onMessage
INFO: Received subscriber request: SubscribeRequestTuple{version=1.0, 
identifier=tcp://localhost:36073/13.output.12, windowId=, 
type=stream8/14.inputPort, upstreamIdentifier=13.output.12, mask=0, 
partitions=null, bufferSize=1024}
Apr 24, 2018 12:52:25 AM 
com.datatorrent.bufferserver.server.Server$UnidentifiedClient onMessage
INFO: Received publisher request: PublishRequestTuple{version=1.0, 
identifier=14.outputPort.14, windowId=}
Apr 24, 2018 12:52:25 AM com.datatorrent.stram.engine.Node emitEndStream
INFO: 1 sending EndOfStream
Apr 24, 2018 12:52:25 AM com.datatorrent.stram.engine.Node emitEndStream
INFO: 13 sending EndOfStream
Apr 24, 2018 12:52:25 AM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Deploy request: 
[OperatorDeployInfo[id=15,name=PAssert$171/GroupGlobally/FlattenDummyAndContents,type=GENERIC,checkpoint={,
 0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=data1,streamId=stream0,sourceNodeId=12,sourcePortName=output,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=data2,streamId=stream9,sourceNodeId=14,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=out,streamId=stream19,bufferServer=localhost
Apr 24, 2018 12:52:25 AM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Deploy request: 
[OperatorDeployInfo[id=16,name=PAssert$171/GroupGlobally/GroupDummyAndContents,type=GENERIC,checkpoint={,
 0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=input,streamId=stream19,sourceNodeId=15,sourcePortName=out,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=output,streamId=stream6,bufferServer=localhost
Apr 24, 2018 12:52:25 AM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Deploy request: 
[OperatorDeployInfo[id=10,name=PAssert$171/GroupGlobally/GatherAllOutputs/Values/Values/Map/ParMultiDo(Anonymous),type=GENERIC,checkpoint={,
 0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=input,streamId=stream12,sourceNodeId=9,sourcePortName=output,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=output,streamId=stream14,bufferServer=]]],
 
OperatorDeployInfo[id=12,name=PAssert$171/GroupGlobally/KeyForDummy/AddKeys/Map/ParMultiDo(Anonymous),type=OIO,checkpoint={,
 0, 
0},inputs=[Oper

[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94400
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:45
Start Date: 24/Apr/18 00:45
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5191: 
[BEAM-4097] Set environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#discussion_r183577535
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/universal_local_runner.py
 ##
 @@ -83,6 +86,16 @@ def cleanup(self):
   time.sleep(0.1)
 self._subprocess = None
 
+  @staticmethod
+  def default_docker_image():
+if 'USER' in os.environ:
+  # Perhaps also test if this was built?
+  logging.info('Using latest built Python SDK docker image.')
 
 Review comment:
   'latest locally built'?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94400)
Time Spent: 2h  (was: 1h 50m)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94397
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:41
Start Date: 24/Apr/18 00:41
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5191: 
[BEAM-4097] Set environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#discussion_r183577177
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/universal_local_runner.py
 ##
 @@ -83,6 +86,16 @@ def cleanup(self):
   time.sleep(0.1)
 self._subprocess = None
 
+  @staticmethod
+  def default_docker_image():
+if 'USER' in os.environ:
+  # Perhaps also test if this was built?
+  logging.info('Using latest built Python SDK docker image.')
+  return os.environ['USER'] + 
'-docker.apache.bintray.io/beam/python:latest'
+else:
+  logging.warnign('Could not find a Python SDK docker image.')
 
 Review comment:
   Apparently it's tested somewhere because the presubmits are failing due to 
the `warnign` module not existing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94397)
Time Spent: 1h 50m  (was: 1h 40m)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94396
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:39
Start Date: 24/Apr/18 00:39
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183573933
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ControlClientPool.java
 ##
 @@ -17,16 +17,58 @@
  */
 package org.apache.beam.runners.fnexecution.control;
 
-import org.apache.beam.sdk.fn.function.ThrowingConsumer;
-import org.apache.beam.sdk.util.ThrowingSupplier;
+import javax.annotation.concurrent.ThreadSafe;
 
-/** Control client pool that exposes a source and sink of control clients. */
-public interface ControlClientPool {
+/**
+ * A pool of control clients that brokers incoming SDK harness connections (in 
the form of {@link
+ * InstructionRequestHandler InstructionRequestHandlers}.
+ *
+ * Incoming instruction handlers usually come from the control plane gRPC 
service. Typical use:
+ *
+ * 
+ *   // Within owner of the pool, who may or may not own the control plane 
server as well
+ *   ControlClientPool pool = ...
+ *   FnApiControlClientPoolService service =
+ *   FnApiControlClientPoolService.offeringClientsToSink(pool.getSink(), 
headerAccessor)
+ *   // Incoming gRPC control connections will now be added to the client pool.
+ *
+ *   // Within code that interacts with the instruction handler. The get call 
blocks until an
+ *   // incoming client is available:
+ *   ControlClientSource clientSource = ... InstructionRequestHandler
+ *   instructionHandler = clientSource.get("worker-id");
+ * 
+ *
+ * All {@link ControlClientPool} must be thread-safe.
+ */
+@ThreadSafe
+public interface ControlClientPool {
+
+  /** Sink for control clients. */
+  Sink getSink();
 
   /** Source of control clients. */
-  ThrowingSupplier getSource();
+  Source getSource();
 
-  /** Sink for control clients. */
-  ThrowingConsumer getSink();
+  /** A sink for {@link InstructionRequestHandler InstructionRequestHandlers} 
keyed by worker id. */
+  @FunctionalInterface
+  interface Sink {
+
+/**
+ * Puts an {@link InstructionRequestHandler} into a client pool. Worker 
ids must be unique per
+ * pool.
+ */
+void put(String workerId, InstructionRequestHandler instructionHandler) 
throws Exception;
+  }
+
+  /** A source of {@link InstructionRequestHandler 
InstructionRequestHandlers}. */
+  @FunctionalInterface
+  interface Source {
 
+/**
+ * Retrieves the {@link InstructionRequestHandler} for the given worker 
id, blocking until
+ * available. Worker ids must be unique per pool. A given worker id must 
not be requested
 
 Review comment:
   Yes, the call will never return if this worker is never made available or is 
never explicitly failed. I'll add an explicit comment and rename the method.
   
   I ended up calling this `take`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94396)
Time Spent: 21h 10m  (was: 21h)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 21h 10m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_Verify #4770

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Use Existing Matchers in WatermarkManagerTest

--
[...truncated 1.06 MB...]
test_type_check_violation_valid_simple_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_enforce_kv_type_constraint 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_be_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_have_length_2 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_proxy_to_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_constraint_compatibility 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_repr (apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_proxy_to_union 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_getitem_sequence_not_allowed 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_any_return_type_hint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_primitive_type_or_type_constraint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_single_return_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_no_kwargs_accepted 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_composite_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_simple_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_violation 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_compatibility (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_invalid_elem_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_must_be_set 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_composite_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_simple_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_any_argument_type_hint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_basic_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_composite_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_invalid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_must_be_primitive_type_or_constraint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_mix_positional_and_keyword_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_simple_type_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_functions_as_regular_generator 
(apache_beam.typehints.typehints_test.TestGeneratorWrapper) ... ok
test_compatibility (apache_beam.typehints.typehints_test.TupleHintTestCase) ... 
ok
test_compatibility_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_invalid_ellipsis_type_param 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_params_must_be_type_or_constraint 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_raw_tuple (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_types 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... 

Build failed in Jenkins: beam_PerformanceTests_Spark #1627

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[geet.kumar75] BEAM-4038: Support Kafka Headers in KafkaIO

[geet.kumar75] BEAM-4038: Update license information

[geet.kumar75] Update code formatting

[geet.kumar75] Remove custom implementations of KafkaHeader, KafkaHeaders, etc.

[geet.kumar75] Changes based on review comments

[daniel.o.programmer] [BEAM-3513] Removing PrimitiveCombineGroupedValues 
override w/ FnAPI.

[echauchot] Introduce MetricsPusher in runner core to regularly push aggregated

[echauchot] Instanciate MetricsPusher in runner-specific code because we need

[echauchot] Improve MetricsPusher: do not aggregate metrics when not needed, 
improve

[echauchot] Create JsonMetricsSerializer

[echauchot] Stop MetricsPusher thread by observing pipeline state and improve 
the

[echauchot] Make metrics sink configurable through PipelineOptions, pass

[echauchot] Add MetricsPusher tests specific to Spark (because Spark streaming 
tests

[echauchot] Add a MetricksPusher test to runner-core (batch and streaming are 
done

[echauchot] Push metrics at the end of a batch pipeline in flink runner

[echauchot] improve MetricsPusher lifecycle and thread safety

[echauchot] Make MetricsPusher merge a list a MetricsContainerStepMaps because 
there

[echauchot] Fix thread synchronisation and replace usages of instance variable 
by

[echauchot] Clear dummyMetricsSink before test

[echauchot] Push metrics at the end of a batch pipeline in spark runner

[echauchot] Improve MetricsPusher teardown to enable multiple pipelines in a 
single

[echauchot] Manually generate json and remove jackson

[echauchot] Replace use of http client by use of java.net.HttpUrlConnection and 
deal

[echauchot] Remove DEFAULT_PERIOD constant in favor of already existing

[echauchot] Remove unneeded null check, format

[echauchot] convert MetricsSink to an interface with a single writeMetrics 
method

[echauchot] Remove MetricsSerializer base class and inline serialization in

[echauchot] Dynamically create the sinks by reflection

[echauchot] Split DummyMetricsSink into NoOpMetricsSink (default sink) and

[echauchot] Reduce overhead when no metricsSink is provided, do not start 
polling

[echauchot] Make MetricsPusher a regular object instead of a singleton to allow

[echauchot] Explicitely start MetricsPusher from the runners

[echauchot] Separate MetricsHttpSink POC to a new runners-extensions artifact 
and

[echauchot] Fix cycle bug between teardown() and pushmetrics()

[echauchot] Update MetricsPusher and TestMetricsSink to new serializable

[echauchot] Use regular jackson object mapper to serialize metrics now that 
they are

[echauchot] Give MetricsPusher a bit of time to push before assert in test

[echauchot] Make MetricsPusher thread a daemon

[echauchot] Fix build and clean: dependencies, rat, checkstyle, findbugs, remove

[geet.kumar75] Support kafka versions 0.10.1.0 and above

[echauchot] Move build to gradle

[echauchot] MetricsSink no more needs to be generic

[echauchot] SparkRunnerDebugger does not need to export metrics as it does not 
run

[tgroh] Use Existing Matchers in WatermarkManagerTest

[echauchot] Move MetricsHttpSink and related classes to a new sub-module

[kirpichov] Consistently handle EmptyMatchTreatment

[rangadi] Add 10 millis sleep when there are no elements left in a partition.

--
[...truncated 69.74 KB...]
2018-04-24 00:17:50,212 50988614 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-24 00:18:15,592 50988614 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-24 00:18:19,180 50988614 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r7575b9ff5d12d2be_0162f503e5e3_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_r7575b9ff5d12d2be_0162f503e5e3_1 
... (0s) Current status: RUNNING
  Waiting on 
bqjob_r7575b9ff5d12d2be_0162f503e5e3_1 ... (0s) Current status: DONE   
2018-04-24 00:18:19,180 50988614 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-24 00:18:49,107 50988614 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 


Jenkins build is back to normal : beam_PerformanceTests_HadoopInputFormat #178

2018-04-23 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-4163) Fix SELECT * for Pojo queries

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4163:
-

 Summary: Fix SELECT * for Pojo queries
 Key: BEAM-4163
 URL: https://issues.apache.org/jira/browse/BEAM-4163
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


Rows generated from Pojos are based on field indices. Which means they can 
break if Pojo fields are enumerated in a different order. Which can cause 
generated Row to be different for different runner instance. Which can cause 
SELECT * to fail.

 

One solution is to make Pojo field ordering deterministic, e.g. sort them 
before generating field accessors.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_JDBC #489

2018-04-23 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_XmlIOIT_HDFS #85

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[geet.kumar75] BEAM-4038: Support Kafka Headers in KafkaIO

[geet.kumar75] BEAM-4038: Update license information

[geet.kumar75] Update code formatting

[geet.kumar75] Remove custom implementations of KafkaHeader, KafkaHeaders, etc.

[geet.kumar75] Changes based on review comments

[daniel.o.programmer] [BEAM-3513] Removing PrimitiveCombineGroupedValues 
override w/ FnAPI.

[echauchot] Introduce MetricsPusher in runner core to regularly push aggregated

[echauchot] Instanciate MetricsPusher in runner-specific code because we need

[echauchot] Improve MetricsPusher: do not aggregate metrics when not needed, 
improve

[echauchot] Create JsonMetricsSerializer

[echauchot] Stop MetricsPusher thread by observing pipeline state and improve 
the

[echauchot] Make metrics sink configurable through PipelineOptions, pass

[echauchot] Add MetricsPusher tests specific to Spark (because Spark streaming 
tests

[echauchot] Add a MetricksPusher test to runner-core (batch and streaming are 
done

[echauchot] Push metrics at the end of a batch pipeline in flink runner

[echauchot] improve MetricsPusher lifecycle and thread safety

[echauchot] Make MetricsPusher merge a list a MetricsContainerStepMaps because 
there

[echauchot] Fix thread synchronisation and replace usages of instance variable 
by

[echauchot] Clear dummyMetricsSink before test

[echauchot] Push metrics at the end of a batch pipeline in spark runner

[echauchot] Improve MetricsPusher teardown to enable multiple pipelines in a 
single

[echauchot] Manually generate json and remove jackson

[echauchot] Replace use of http client by use of java.net.HttpUrlConnection and 
deal

[echauchot] Remove DEFAULT_PERIOD constant in favor of already existing

[echauchot] Remove unneeded null check, format

[echauchot] convert MetricsSink to an interface with a single writeMetrics 
method

[echauchot] Remove MetricsSerializer base class and inline serialization in

[echauchot] Dynamically create the sinks by reflection

[echauchot] Split DummyMetricsSink into NoOpMetricsSink (default sink) and

[echauchot] Reduce overhead when no metricsSink is provided, do not start 
polling

[echauchot] Make MetricsPusher a regular object instead of a singleton to allow

[echauchot] Explicitely start MetricsPusher from the runners

[echauchot] Separate MetricsHttpSink POC to a new runners-extensions artifact 
and

[echauchot] Fix cycle bug between teardown() and pushmetrics()

[echauchot] Update MetricsPusher and TestMetricsSink to new serializable

[echauchot] Use regular jackson object mapper to serialize metrics now that 
they are

[echauchot] Give MetricsPusher a bit of time to push before assert in test

[echauchot] Make MetricsPusher thread a daemon

[echauchot] Fix build and clean: dependencies, rat, checkstyle, findbugs, remove

[geet.kumar75] Support kafka versions 0.10.1.0 and above

[echauchot] Move build to gradle

[echauchot] MetricsSink no more needs to be generic

[echauchot] SparkRunnerDebugger does not need to export metrics as it does not 
run

[tgroh] Use Existing Matchers in WatermarkManagerTest

[echauchot] Move MetricsHttpSink and related classes to a new sub-module

[kirpichov] Consistently handle EmptyMatchTreatment

[rangadi] Add 10 millis sleep when there are no elements left in a partition.

--
[...truncated 93.52 KB...]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2447)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy61.create(Unknown Source)

Build failed in Jenkins: beam_PerformanceTests_TextIOIT_HDFS #92

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[geet.kumar75] BEAM-4038: Support Kafka Headers in KafkaIO

[geet.kumar75] BEAM-4038: Update license information

[geet.kumar75] Update code formatting

[geet.kumar75] Remove custom implementations of KafkaHeader, KafkaHeaders, etc.

[geet.kumar75] Changes based on review comments

[daniel.o.programmer] [BEAM-3513] Removing PrimitiveCombineGroupedValues 
override w/ FnAPI.

[echauchot] Introduce MetricsPusher in runner core to regularly push aggregated

[echauchot] Instanciate MetricsPusher in runner-specific code because we need

[echauchot] Improve MetricsPusher: do not aggregate metrics when not needed, 
improve

[echauchot] Create JsonMetricsSerializer

[echauchot] Stop MetricsPusher thread by observing pipeline state and improve 
the

[echauchot] Make metrics sink configurable through PipelineOptions, pass

[echauchot] Add MetricsPusher tests specific to Spark (because Spark streaming 
tests

[echauchot] Add a MetricksPusher test to runner-core (batch and streaming are 
done

[echauchot] Push metrics at the end of a batch pipeline in flink runner

[echauchot] improve MetricsPusher lifecycle and thread safety

[echauchot] Make MetricsPusher merge a list a MetricsContainerStepMaps because 
there

[echauchot] Fix thread synchronisation and replace usages of instance variable 
by

[echauchot] Clear dummyMetricsSink before test

[echauchot] Push metrics at the end of a batch pipeline in spark runner

[echauchot] Improve MetricsPusher teardown to enable multiple pipelines in a 
single

[echauchot] Manually generate json and remove jackson

[echauchot] Replace use of http client by use of java.net.HttpUrlConnection and 
deal

[echauchot] Remove DEFAULT_PERIOD constant in favor of already existing

[echauchot] Remove unneeded null check, format

[echauchot] convert MetricsSink to an interface with a single writeMetrics 
method

[echauchot] Remove MetricsSerializer base class and inline serialization in

[echauchot] Dynamically create the sinks by reflection

[echauchot] Split DummyMetricsSink into NoOpMetricsSink (default sink) and

[echauchot] Reduce overhead when no metricsSink is provided, do not start 
polling

[echauchot] Make MetricsPusher a regular object instead of a singleton to allow

[echauchot] Explicitely start MetricsPusher from the runners

[echauchot] Separate MetricsHttpSink POC to a new runners-extensions artifact 
and

[echauchot] Fix cycle bug between teardown() and pushmetrics()

[echauchot] Update MetricsPusher and TestMetricsSink to new serializable

[echauchot] Use regular jackson object mapper to serialize metrics now that 
they are

[echauchot] Give MetricsPusher a bit of time to push before assert in test

[echauchot] Make MetricsPusher thread a daemon

[echauchot] Fix build and clean: dependencies, rat, checkstyle, findbugs, remove

[geet.kumar75] Support kafka versions 0.10.1.0 and above

[echauchot] Move build to gradle

[echauchot] MetricsSink no more needs to be generic

[echauchot] SparkRunnerDebugger does not need to export metrics as it does not 
run

[tgroh] Use Existing Matchers in WatermarkManagerTest

[echauchot] Move MetricsHttpSink and related classes to a new sub-module

[kirpichov] Consistently handle EmptyMatchTreatment

[rangadi] Add 10 millis sleep when there are no elements left in a partition.

--
[...truncated 229.48 KB...]
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:248)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:235)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
Caused by: java.net.ConnectException: Connection refused
   

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94385&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94385
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183564671
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.ImmutableList;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+/** A docker command wrapper. Simplifies communications with the Docker 
daemon. */
+class DockerCommand {
 
 Review comment:
   The class is meant to wrap a single Docker executable and be used in a 
non-static manner, so I think singular makes sense here. I can see how the 
plural might make sense here, but it may also be confusing in this context. 
(Note: This changed from `DockerWrapper` to `DockerCommand` based on 
https://github.com/apache/beam/pull/5189#discussion_r18319).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94385)
Time Spent: 20.5h  (was: 20h 20m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94387
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183567148
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Supplier;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} that creates docker containers by shelling 
out to docker. Returned
+ * {@link RemoteEnvironment RemoteEnvironments} own their respective docker 
containers. Not
+ * thread-safe.
+ */
+public class DockerEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerEnvironmentFactory.class);
+
+  public static DockerEnvironmentFactory forServices(
+  DockerCommand docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  // TODO: Refine this to IdGenerator when we determine where that should 
live.
+  Supplier idGenerator) {
+return new DockerEnvironmentFactory(
+docker,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final DockerCommand docker;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final Supplier idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private DockerEnvironmentFactory(
+  DockerCommand docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  Supplier idGenerator,
+  ControlClientPool.Source clientSource) {
+this.docker = docker;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceServer;
+this.idGenerator = idGenerator;
+this.clientSource = clientSource;
+  }
+
+  /** Creates a new, active {@link RemoteEnvironment} backed by a local Docker 
container. */
+  @Override
+  public RemoteEnvironment createEnvironment(Environment environment) throws 
Exception {
+String workerId = idGenerator.get();
+
+// Prepare docker invocation.
+Path workerPersistentDirectory = 
Files.createTempDirectory("worker_persistent_directory");
+Path semiPersistentD

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94384
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183564726
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.ImmutableList;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+/** A docker command wrapper. Simplifies communications with the Docker 
daemon. */
+class DockerCommand {
+  // TODO: Should we require 64-character container ids? Docker technically 
allows abbreviated ids,
+  // but we _should_ always capture full ids.
+  private static final Pattern CONTAINER_ID_PATTERN = 
Pattern.compile("\\p{XDigit}{64}");
+
+  static DockerCommand forCommand(String dockerExecutable, Duration 
commandTimeout) {
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94384)
Time Spent: 20h 20m  (was: 20h 10m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94389
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183565840
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerContainerEnvironment.java
 ##
 @@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+
+/**
+ * A {@link RemoteEnvironment} that wraps a running Docker container.
+ *
+ * A {@link DockerContainerEnvironment} owns both the underlying docker 
container that it
+ * communicates with an the {@link InstructionRequestHandler} that it uses to 
do so.
+ */
+@ThreadSafe
+class DockerContainerEnvironment implements RemoteEnvironment {
+
+  static DockerContainerEnvironment create(
+  DockerCommand docker,
+  Environment environment,
+  String containerId,
+  InstructionRequestHandler instructionHandler) {
+return new DockerContainerEnvironment(docker, environment, containerId, 
instructionHandler);
+  }
+
+  private final Object lock = new Object();
+  private final DockerCommand docker;
+  private final Environment environment;
+  private final String containerId;
+  private final InstructionRequestHandler instructionHandler;
+
+  private DockerContainerEnvironment(
+  DockerCommand docker,
+  Environment environment,
+  String containerId,
+  InstructionRequestHandler instructionHandler) {
+this.docker = docker;
+this.environment = environment;
+this.containerId = containerId;
+this.instructionHandler = instructionHandler;
+  }
+
+  @Override
+  public Environment getEnvironment() {
+return environment;
+  }
+
+  @Override
+  public InstructionRequestHandler getInstructionRequestHandler() {
+return instructionHandler;
+  }
+
+  /**
+   * Closes this remote docker environment. The associated {@link 
InstructionRequestHandler} should
+   * not be used after calling this.
+   */
+  @Override
+  public void close() throws Exception {
+synchronized (lock) {
 
 Review comment:
   This was a result of 
https://github.com/apache/beam/pull/5189#discussion_r183470809. The net result 
is that it's generally unsafe for multiple threads to close a given docker 
environment. I was hesitant to provide synchronization here to emphasize the 
fact that there should only be a single owner, but @tgroh believed it was 
easier to understand if all methods were thread-safe.
   
   I'll make it idempotent and add an internal and javadoc comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94389)
Time Spent: 21h  (was: 20h 50m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 21h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This me

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94386&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94386
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183568223
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerCommandTest.java
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.Collections;
+import org.apache.beam.runners.fnexecution.environment.testing.NeedsDocker;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link DockerCommand}. */
+@Category(NeedsDocker.class)
+@RunWith(JUnit4.class)
+public class DockerCommandTest {
+
+  @Test
+  public void helloWorld() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("hello-world", Collections.emptyList());
+System.out.printf("Started container: %s%n", container);
+  }
+
+  @Test
+  public void killContainer() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("debian", Arrays.asList("/bin/bash", 
"-c", "sleep 60"));
+docker.killContainer(container);
 
 Review comment:
   Done. I used a manual StopWatch rather than a method-level timeout 
annotation to make sure we're timing the the kill command and not startup time 
(which may include image download, etc.). I've increased the docker command 
timeout itself to allow this to run a bit longer. Not sure if it's appropriate 
to have such a long timeout though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94386)
Time Spent: 20h 40m  (was: 20.5h)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94383
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183566150
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerContainerEnvironment.java
 ##
 @@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+
+/**
+ * A {@link RemoteEnvironment} that wraps a running Docker container.
+ *
+ * A {@link DockerContainerEnvironment} owns both the underlying docker 
container that it
+ * communicates with an the {@link InstructionRequestHandler} that it uses to 
do so.
+ */
+@ThreadSafe
+class DockerContainerEnvironment implements RemoteEnvironment {
+
+  static DockerContainerEnvironment create(
+  DockerCommand docker,
+  Environment environment,
+  String containerId,
+  InstructionRequestHandler instructionHandler) {
+return new DockerContainerEnvironment(docker, environment, containerId, 
instructionHandler);
+  }
+
+  private final Object lock = new Object();
+  private final DockerCommand docker;
+  private final Environment environment;
+  private final String containerId;
+  private final InstructionRequestHandler instructionHandler;
+
+  private DockerContainerEnvironment(
+  DockerCommand docker,
+  Environment environment,
+  String containerId,
+  InstructionRequestHandler instructionHandler) {
+this.docker = docker;
+this.environment = environment;
+this.containerId = containerId;
+this.instructionHandler = instructionHandler;
+  }
+
+  @Override
+  public Environment getEnvironment() {
+return environment;
+  }
+
+  @Override
+  public InstructionRequestHandler getInstructionRequestHandler() {
+return instructionHandler;
+  }
+
+  /**
+   * Closes this remote docker environment. The associated {@link 
InstructionRequestHandler} should
+   * not be used after calling this.
+   */
+  @Override
+  public void close() throws Exception {
+synchronized (lock) {
 
 Review comment:
   Side note: in general, it is safe to call `docker kill` multiple times, but 
subsequent calls will throw exceptions due to non-zero exit codes. I'm changing 
it such that only the first invocation does anything.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94383)
Time Spent: 20h 20m  (was: 20h 10m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94388&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94388
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 24/Apr/18 00:13
Start Date: 24/Apr/18 00:13
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183571040
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerCommandTest.java
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.Collections;
+import org.apache.beam.runners.fnexecution.environment.testing.NeedsDocker;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link DockerCommand}. */
+@Category(NeedsDocker.class)
+@RunWith(JUnit4.class)
+public class DockerCommandTest {
+
+  @Test
+  public void helloWorld() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("hello-world", Collections.emptyList());
+System.out.printf("Started container: %s%n", container);
+  }
+
+  @Test
+  public void killContainer() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("debian", Arrays.asList("/bin/bash", 
"-c", "sleep 60"));
+docker.killContainer(container);
+  }
+
+  private static DockerCommand getWrapper() {
+return DockerCommand.forCommand("docker", Duration.ofMillis(1));
+  }
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94388)
Time Spent: 20h 50m  (was: 20h 40m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_AvroIOIT #414

2018-04-23 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #87

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[geet.kumar75] BEAM-4038: Support Kafka Headers in KafkaIO

[geet.kumar75] BEAM-4038: Update license information

[geet.kumar75] Update code formatting

[geet.kumar75] Remove custom implementations of KafkaHeader, KafkaHeaders, etc.

[geet.kumar75] Changes based on review comments

[daniel.o.programmer] [BEAM-3513] Removing PrimitiveCombineGroupedValues 
override w/ FnAPI.

[echauchot] Introduce MetricsPusher in runner core to regularly push aggregated

[echauchot] Instanciate MetricsPusher in runner-specific code because we need

[echauchot] Improve MetricsPusher: do not aggregate metrics when not needed, 
improve

[echauchot] Create JsonMetricsSerializer

[echauchot] Stop MetricsPusher thread by observing pipeline state and improve 
the

[echauchot] Make metrics sink configurable through PipelineOptions, pass

[echauchot] Add MetricsPusher tests specific to Spark (because Spark streaming 
tests

[echauchot] Add a MetricksPusher test to runner-core (batch and streaming are 
done

[echauchot] Push metrics at the end of a batch pipeline in flink runner

[echauchot] improve MetricsPusher lifecycle and thread safety

[echauchot] Make MetricsPusher merge a list a MetricsContainerStepMaps because 
there

[echauchot] Fix thread synchronisation and replace usages of instance variable 
by

[echauchot] Clear dummyMetricsSink before test

[echauchot] Push metrics at the end of a batch pipeline in spark runner

[echauchot] Improve MetricsPusher teardown to enable multiple pipelines in a 
single

[echauchot] Manually generate json and remove jackson

[echauchot] Replace use of http client by use of java.net.HttpUrlConnection and 
deal

[echauchot] Remove DEFAULT_PERIOD constant in favor of already existing

[echauchot] Remove unneeded null check, format

[echauchot] convert MetricsSink to an interface with a single writeMetrics 
method

[echauchot] Remove MetricsSerializer base class and inline serialization in

[echauchot] Dynamically create the sinks by reflection

[echauchot] Split DummyMetricsSink into NoOpMetricsSink (default sink) and

[echauchot] Reduce overhead when no metricsSink is provided, do not start 
polling

[echauchot] Make MetricsPusher a regular object instead of a singleton to allow

[echauchot] Explicitely start MetricsPusher from the runners

[echauchot] Separate MetricsHttpSink POC to a new runners-extensions artifact 
and

[echauchot] Fix cycle bug between teardown() and pushmetrics()

[echauchot] Update MetricsPusher and TestMetricsSink to new serializable

[echauchot] Use regular jackson object mapper to serialize metrics now that 
they are

[echauchot] Give MetricsPusher a bit of time to push before assert in test

[echauchot] Make MetricsPusher thread a daemon

[echauchot] Fix build and clean: dependencies, rat, checkstyle, findbugs, remove

[geet.kumar75] Support kafka versions 0.10.1.0 and above

[echauchot] Move build to gradle

[echauchot] MetricsSink no more needs to be generic

[echauchot] SparkRunnerDebugger does not need to export metrics as it does not 
run

[tgroh] Use Existing Matchers in WatermarkManagerTest

[echauchot] Move MetricsHttpSink and related classes to a new sub-module

[kirpichov] Consistently handle EmptyMatchTreatment

[rangadi] Add 10 millis sleep when there are no elements left in a partition.

--
[...truncated 1.03 MB...]
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(MongoDbIO.java:652)
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a 
server that matches WritableServerSelector. Client view of cluster state is 
{type=UNKNOWN, servers=[{address=104.197.104.146:27017, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception 
opening socket}, caused by {java.net.SocketTimeoutException: connect timed 
out}}]
at 
com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:219)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java

[jira] [Updated] (BEAM-4161) Nested Rows flattening doesn't work

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4161:
--
Issue Type: Bug  (was: New Feature)

> Nested Rows flattening doesn't work
> ---
>
> Key: BEAM-4161
> URL: https://issues.apache.org/jira/browse/BEAM-4161
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Calcite flattens nested rows. It updates the field indices of the flattened 
> row so the fields are referenced correctly in the Rel Nodes. But the fields 
> after the flattened row don't have the indices updated, they have the 
> previous ordinals before the flattening. There is no way to look up the 
> correct index at the point when it reaches Beam SQL Rel Nodes. It will be 
> fixed in Calcite 1.17.
> We need to update the Calcite as soon as it is released and add few 
> integration tests around nested Rows:
>  - basic nesting with fields before and after the row field;
>  - multi-level row nesting;
>  - multiple row fields;
>  
> Calcite JIRA: CALCITE-2220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4162:
--
Description: 
Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
Beam SQL.

 

Use publication time as event timestamp

  was:Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up 
to Beam SQL


> Wire up PubsubIO+JSON to Beam SQL
> -
>
> Key: BEAM-4162
> URL: https://issues.apache.org/jira/browse/BEAM-4162
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
> Beam SQL.
>  
> Use publication time as event timestamp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #187

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Use Existing Matchers in WatermarkManagerTest

--
[...truncated 523.19 KB...]
Gradle Test Executor 236 started executing tests.
Gradle Test Executor 236 finished executing tests.
Starting process 'Gradle Test Executor 237'. Working directory: 

 Command: /usr/local/asfpackages/java/jdk1.8.0_152/bin/java 
-Dbeam.spark.test.reuseSparkContext=true 
-DbeamTestPipelineOptions=["--runner=TestSparkRunner","--streaming=false","--enableSparkMetricSinks=false"]
 
-Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager
 -Dorg.gradle.native=false -Dspark.ui.enabled=false 
-Dspark.ui.showConsoleProgress=false -Dfile.encoding=UTF-8 -Duser.country=US 
-Duser.language=en -Duser.variant -ea -cp 
/home/jenkins/.gradle/caches/4.6/workerMain/gradle-worker.jar 
worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test 
Executor 237'
Successfully started process 'Gradle Test Executor 237'
Gradle Test Executor 237 started executing tests.
Gradle Test Executor 237 finished executing tests.
Starting process 'Gradle Test Executor 238'. Working directory: 

 Command: /usr/local/asfpackages/java/jdk1.8.0_152/bin/java 
-Dbeam.spark.test.reuseSparkContext=true 
-DbeamTestPipelineOptions=["--runner=TestSparkRunner","--streaming=false","--enableSparkMetricSinks=false"]
 
-Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager
 -Dorg.gradle.native=false -Dspark.ui.enabled=false 
-Dspark.ui.showConsoleProgress=false -Dfile.encoding=UTF-8 -Duser.country=US 
-Duser.language=en -Duser.variant -ea -cp 
/home/jenkins/.gradle/caches/4.6/workerMain/gradle-worker.jar 
worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test 
Executor 238'
Successfully started process 'Gradle Test Executor 238'
Gradle Test Executor 238 started executing tests.
Gradle Test Executor 238 finished executing tests.
Starting process 'Gradle Test Executor 239'. Working directory: 

 Command: /usr/local/asfpackages/java/jdk1.8.0_152/bin/java 
-Dbeam.spark.test.reuseSparkContext=true 
-DbeamTestPipelineOptions=["--runner=TestSparkRunner","--streaming=false","--enableSparkMetricSinks=false"]
 
-Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager
 -Dorg.gradle.native=false -Dspark.ui.enabled=false 
-Dspark.ui.showConsoleProgress=false -Dfile.encoding=UTF-8 -Duser.country=US 
-Duser.language=en -Duser.variant -ea -cp 
/home/jenkins/.gradle/caches/4.6/workerMain/gradle-worker.jar 
worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test 
Executor 239'
Successfully started process 'Gradle Test Executor 239'
Gradle Test Executor 239 started executing tests.
Gradle Test Executor 239 finished executing tests.
Starting process 'Gradle Test Executor 240'. Working directory: 

 Command: /usr/local/asfpackages/java/jdk1.8.0_152/bin/java 
-Dbeam.spark.test.reuseSparkContext=true 
-DbeamTestPipelineOptions=["--runner=TestSparkRunner","--streaming=false","--enableSparkMetricSinks=false"]
 
-Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager
 -Dorg.gradle.native=false -Dspark.ui.enabled=false 
-Dspark.ui.showConsoleProgress=false -Dfile.encoding=UTF-8 -Duser.country=US 
-Duser.language=en -Duser.variant -ea -cp 
/home/jenkins/.gradle/caches/4.6/workerMain/gradle-worker.jar 
worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test 
Executor 240'
Successfully started process 'Gradle Test Executor 240'
Gradle Test Executor 240 started executing tests.
Gradle Test Executor 240 finished executing tests.
Starting process 'Gradle Test Executor 241'. Working directory: 

 Command: /usr/local/asfpackages/java/jdk1.8.0_152/bin/java 
-Dbeam.spark.test.reuseSparkContext=true 
-DbeamTestPipelineOptions=["--runner=TestSparkRunner","--streaming=false","--enableSparkMetricSinks=false"]
 
-Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager
 -Dorg.gradle.native=false -Dspark.ui.enabled=false 
-Dspark.ui.showConsoleProgress=false -Dfile.encoding=UTF-8 -Duser.country=US 
-Duser.language=en -Duser.variant -ea -cp 
/home/jenkins/.gradle/caches/4.6/workerMain/gradle-worker.jar 
worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test 
Executor 241'
Su

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #208

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Use Existing Matchers in WatermarkManagerTest

--
[...truncated 99.43 MB...]
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task run
INFO: Ensuring all FileSystem streams are closed for task 
View.AsSingleton/Combine.GloballyAsSingletonView/View.CreatePCollectionView/Combine.globally(Concatenate)/Combine.perKey(Concatenate)
 -> 
View.AsSingleton/Combine.GloballyAsSingletonView/View.CreatePCollectionView/Combine.globally(Concatenate)/Values/Values/Map/ParMultiDo(Anonymous)
 -> (Map, Map) (1/1) (89d6b16107dcde928321886e1b0e6430) [FINISHED]
Apr 24, 2018 12:05:19 AM grizzled.slf4j.Logger info
INFO: Un-registering task and sending final execution state FINISHED to 
JobManager for task PAssert$291/GroupGlobally/GroupDummyAndContents -> 
PAssert$291/GroupGlobally/Values/Values/Map/ParMultiDo(Anonymous) -> 
PAssert$291/GroupGlobally/ParDo(Concat)/ParMultiDo(Concat) -> 
PAssert$291/GetPane/Map/ParMultiDo(Anonymous) -> 
PAssert$291/RunChecks/ParMultiDo(GroupedValuesChecker) -> 
PAssert$291/VerifyAssertions/ParDo(DefaultConclude)/ParMultiDo(DefaultConclude) 
(79730c6bdd21b511d181618629a87cd8)
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task 
transitionState
INFO: 
Combine.globally(TestCombineFnWithContext)/Combine.perKey(TestCombineFnWithContext)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
Combine.globally(TestCombineFnWithContext)/Values/Values/Map/ParMultiDo(Anonymous)
 -> PAssert$293/GroupGlobally/Window.Into()/Window.Assign.out -> 
PAssert$293/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
PAssert$293/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map/ParMultiDo(Anonymous)
 -> PAssert$293/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign.out 
-> ToKeyedWorkItem (1/1) (62ef92a0802202a1b4df811e62bf7a9d) switched from 
RUNNING to FINISHED.
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task run
INFO: Freeing task resources for 
Combine.globally(TestCombineFnWithContext)/Combine.perKey(TestCombineFnWithContext)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
Combine.globally(TestCombineFnWithContext)/Values/Values/Map/ParMultiDo(Anonymous)
 -> PAssert$293/GroupGlobally/Window.Into()/Window.Assign.out -> 
PAssert$293/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
PAssert$293/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map/ParMultiDo(Anonymous)
 -> PAssert$293/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign.out 
-> ToKeyedWorkItem (1/1) (62ef92a0802202a1b4df811e62bf7a9d).
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task run
INFO: Ensuring all FileSystem streams are closed for task 
Combine.globally(TestCombineFnWithContext)/Combine.perKey(TestCombineFnWithContext)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
Combine.globally(TestCombineFnWithContext)/Values/Values/Map/ParMultiDo(Anonymous)
 -> PAssert$293/GroupGlobally/Window.Into()/Window.Assign.out -> 
PAssert$293/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
PAssert$293/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map/ParMultiDo(Anonymous)
 -> PAssert$293/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign.out 
-> ToKeyedWorkItem (1/1) (62ef92a0802202a1b4df811e62bf7a9d) [FINISHED]
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task 
transitionState
INFO: 
Combine.perKey(TestCombineFnWithContext)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> PAssert$292/GroupGlobally/Window.Into()/Window.Assign.out -> 
PAssert$292/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
PAssert$292/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map/ParMultiDo(Anonymous)
 -> PAssert$292/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign.out 
-> ToKeyedWorkItem (1/1) (bafcc738b0f5f4ab76efea5f4f998fc4) switched from 
RUNNING to FINISHED.
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task run
INFO: Freeing task resources for 
Combine.perKey(TestCombineFnWithContext)/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> PAssert$292/GroupGlobally/Window.Into()/Window.Assign.out -> 
PAssert$292/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous)/ParMultiDo(Anonymous)
 -> 
PAssert$292/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map/ParMultiDo(Anonymous)
 -> PAssert$292/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign.out 
-> ToKeyedWorkItem (1/1) (bafcc738b0f5f4ab76efea5f4f998fc4).
Apr 24, 2018 12:05:19 AM org.apache.flink.runtime.taskmanager.Task run
INFO: Ensuring all FileSystem streams are closed for task 
Combine.perKey(TestCombineFnWithContext)/

[jira] [Created] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4162:
-

 Summary: Wire up PubsubIO+JSON to Beam SQL
 Key: BEAM-4162
 URL: https://issues.apache.org/jira/browse/BEAM-4162
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin
Assignee: Anton Kedin


Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Python #1185

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[geet.kumar75] BEAM-4038: Support Kafka Headers in KafkaIO

[geet.kumar75] BEAM-4038: Update license information

[geet.kumar75] Update code formatting

[geet.kumar75] Remove custom implementations of KafkaHeader, KafkaHeaders, etc.

[geet.kumar75] Changes based on review comments

[daniel.o.programmer] [BEAM-3513] Removing PrimitiveCombineGroupedValues 
override w/ FnAPI.

[echauchot] Introduce MetricsPusher in runner core to regularly push aggregated

[echauchot] Instanciate MetricsPusher in runner-specific code because we need

[echauchot] Improve MetricsPusher: do not aggregate metrics when not needed, 
improve

[echauchot] Create JsonMetricsSerializer

[echauchot] Stop MetricsPusher thread by observing pipeline state and improve 
the

[echauchot] Make metrics sink configurable through PipelineOptions, pass

[echauchot] Add MetricsPusher tests specific to Spark (because Spark streaming 
tests

[echauchot] Add a MetricksPusher test to runner-core (batch and streaming are 
done

[echauchot] Push metrics at the end of a batch pipeline in flink runner

[echauchot] improve MetricsPusher lifecycle and thread safety

[echauchot] Make MetricsPusher merge a list a MetricsContainerStepMaps because 
there

[echauchot] Fix thread synchronisation and replace usages of instance variable 
by

[echauchot] Clear dummyMetricsSink before test

[echauchot] Push metrics at the end of a batch pipeline in spark runner

[echauchot] Improve MetricsPusher teardown to enable multiple pipelines in a 
single

[echauchot] Manually generate json and remove jackson

[echauchot] Replace use of http client by use of java.net.HttpUrlConnection and 
deal

[echauchot] Remove DEFAULT_PERIOD constant in favor of already existing

[echauchot] Remove unneeded null check, format

[echauchot] convert MetricsSink to an interface with a single writeMetrics 
method

[echauchot] Remove MetricsSerializer base class and inline serialization in

[echauchot] Dynamically create the sinks by reflection

[echauchot] Split DummyMetricsSink into NoOpMetricsSink (default sink) and

[echauchot] Reduce overhead when no metricsSink is provided, do not start 
polling

[echauchot] Make MetricsPusher a regular object instead of a singleton to allow

[echauchot] Explicitely start MetricsPusher from the runners

[echauchot] Separate MetricsHttpSink POC to a new runners-extensions artifact 
and

[echauchot] Fix cycle bug between teardown() and pushmetrics()

[echauchot] Update MetricsPusher and TestMetricsSink to new serializable

[echauchot] Use regular jackson object mapper to serialize metrics now that 
they are

[echauchot] Give MetricsPusher a bit of time to push before assert in test

[echauchot] Make MetricsPusher thread a daemon

[echauchot] Fix build and clean: dependencies, rat, checkstyle, findbugs, remove

[geet.kumar75] Support kafka versions 0.10.1.0 and above

[echauchot] Move build to gradle

[echauchot] MetricsSink no more needs to be generic

[echauchot] SparkRunnerDebugger does not need to export metrics as it does not 
run

[tgroh] Use Existing Matchers in WatermarkManagerTest

[echauchot] Move MetricsHttpSink and related classes to a new sub-module

[kirpichov] Consistently handle EmptyMatchTreatment

[rangadi] Add 10 millis sleep when there are no elements left in a partition.

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam3 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 0f2ba71e1b6db88ed79744e363586a8ff16dcb08 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0f2ba71e1b6db88ed79744e363586a8ff16dcb08
Commit message: "Merge pull request #5195: Use Existing Matchers in 
WatermarkManagerTest"
 > git rev-list --no-walk 247a62ff1d4368f1e7c2ade6bed5dec71d8d2bcc # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5809426347260041593.sh
+ rm -rf PerfKitBenchmarker
[beam

[jira] [Comment Edited] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449058#comment-16449058
 ] 

Anton Kedin edited comment on BEAM-4160 at 4/23/18 11:58 PM:
-

PR: [https://github.com/apache/beam/pull/5120]

Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It 
is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment).


was (Author: kedin):
PR: [https://github.com/apache/beam/pull/5120
]

Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It 
is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment).

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4161) Nested Rows flattening doesn't work

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4161:
--
Description: 
Calcite flattens nested rows. It updates the field indices of the flattened row 
so the fields are referenced correctly in the Rel Nodes. But the fields after 
the flattened row don't have the indices updated, they have the previous 
ordinals before the flattening. There is no way to look up the correct index at 
the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17.

We need to update the Calcite as soon as it is released and add few integration 
tests around nested Rows:

 - basic nesting with fields before and after the row field;

 - multi-level row nesting;

 - multiple row fields;

 

Calcite JIRA: CALCITE-2220

  was:
Calcite flattens nested rows. It updates the field indices of the flattened row 
so the fields are referenced correctly in the Rel Nodes. But the fields after 
the flattened row don't have the indices updated, they have the previous 
ordinals before the flattening. There is no way to look up the correct index at 
the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17.

We need to update the Calcite as soon as it is released and add few integration 
tests around nested Rows:

 - basic nesting with fields before and after the row field;

 - multi-level row nesting;

 - multiple row fields;


> Nested Rows flattening doesn't work
> ---
>
> Key: BEAM-4161
> URL: https://issues.apache.org/jira/browse/BEAM-4161
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Calcite flattens nested rows. It updates the field indices of the flattened 
> row so the fields are referenced correctly in the Rel Nodes. But the fields 
> after the flattened row don't have the indices updated, they have the 
> previous ordinals before the flattening. There is no way to look up the 
> correct index at the point when it reaches Beam SQL Rel Nodes. It will be 
> fixed in Calcite 1.17.
> We need to update the Calcite as soon as it is released and add few 
> integration tests around nested Rows:
>  - basic nesting with fields before and after the row field;
>  - multi-level row nesting;
>  - multiple row fields;
>  
> Calcite JIRA: CALCITE-2220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449058#comment-16449058
 ] 

Anton Kedin commented on BEAM-4160:
---

PR: [https://github.com/apache/beam/pull/5120
]

Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It 
is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment).

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4160:
--
Description: Automate conversion of JSON objects to Rows to reduce overhead 
for querying JSON-based sources

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4161) Nested Rows flattening doesn't work

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4161:
-

 Summary: Nested Rows flattening doesn't work
 Key: BEAM-4161
 URL: https://issues.apache.org/jira/browse/BEAM-4161
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin
Assignee: Anton Kedin


Calcite flattens nested rows. It updates the field indices of the flattened row 
so the fields are referenced correctly in the Rel Nodes. But the fields after 
the flattened row don't have the indices updated, they have the previous 
ordinals before the flattening. There is no way to look up the correct index at 
the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17.

We need to update the Calcite as soon as it is released and add few integration 
tests around nested Rows:

 - basic nesting with fields before and after the row field;

 - multi-level row nesting;

 - multiple row fields;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (229e69d -> 0f2ba71)

2018-04-23 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 229e69d  Merge pull request #5185: [BEAM-4086]: KafkaIO tests: Avoid 
busy loop in MockConsumer.poll(), reduce flakes.
 add d9cd312  Use Existing Matchers in WatermarkManagerTest
 add 0f2ba71  Merge pull request #5195: Use Existing Matchers in 
WatermarkManagerTest

No new revisions were added by this update.

Summary of changes:
 .../beam/runners/direct/WatermarkManagerTest.java  | 189 +
 1 file changed, 85 insertions(+), 104 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[jira] [Created] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4160:
-

 Summary: Convert JSON objects to Rows
 Key: BEAM-4160
 URL: https://issues.apache.org/jira/browse/BEAM-4160
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql, sdk-java-core
Reporter: Anton Kedin
Assignee: Anton Kedin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4129) Update Gearpump runner instructions for running WordCount example

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4129?focusedWorklogId=94376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94376
 ]

ASF GitHub Bot logged work on BEAM-4129:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:39
Start Date: 23/Apr/18 23:39
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #5200: [BEAM-4129] Run 
WordCount example on Gearpump runner with Gradle
URL: https://github.com/apache/beam/pull/5200#issuecomment-383756174
 
 
   Nice! Thanks for adding this.
   
   LGTM; @kennknowles to merge


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94376)
Time Spent: 0.5h  (was: 20m)

> Update Gearpump runner instructions for running WordCount example
> -
>
> Key: BEAM-4129
> URL: https://issues.apache.org/jira/browse/BEAM-4129
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump
>Reporter: Scott Wegner
>Assignee: Manu Zhang
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With the move to Gradle, the Gearpump README.md needs to be updated with 
> instructions on how to execute WordCount with the new build system.
> [https://github.com/apache/beam/blob/master/runners/gearpump/README.md]
>  
> It may be easiest to generate a quickstart task similar to other runners: 
> https://github.com/apache/beam/blob/3cdecd0e3e28dd39990d7d1dea8931e871a6ba62/runners/flink/build.gradle#L135



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=94369&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94369
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:25
Start Date: 23/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5175: 
[BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r183566303
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java
 ##
 @@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static 
org.apache.beam.sdk.transforms.splittabledofn.ByteKeyRangeTracker.next;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ByteKeyRangeTrackerTest}. */
+@RunWith(JUnit4.class)
+public class ByteKeyRangeTrackerTest {
+  @Rule public final ExpectedException expected = ExpectedException.none();
+
+  @Test
+  public void testTryClaim() throws Exception {
+ByteKeyRange range = ByteKeyRange.of(ByteKey.of(0x10), ByteKey.of(0xc0));
+ByteKeyRangeTracker tracker = ByteKeyRangeTracker.of(range);
+assertEquals(range, tracker.currentRestriction());
+assertTrue(tracker.tryClaim(ByteKey.of(0x10)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x10, 0x00)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x10, 0x00, 0x00)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x50)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x99)));
+assertFalse(tracker.tryClaim(ByteKey.of(0xc0)));
+  }
+
+  @Test
+  public void testCheckpointUnstarted() throws Exception {
+ByteKeyRangeTracker tracker =
+ByteKeyRangeTracker.of(ByteKeyRange.of(ByteKey.of(0x10), 
ByteKey.of(0xc0)));
+expected.expect(IllegalStateException.class);
+tracker.checkpoint();
+  }
+
+  @Test
+  public void testCheckpointOnlyFailedClaim() throws Exception {
+ByteKeyRangeTracker tracker =
+ByteKeyRangeTracker.of(ByteKeyRange.of(ByteKey.of(0x10), 
ByteKey.of(0xc0)));
+assertFalse(tracker.tryClaim(ByteKey.of(0xd0)));
+expected.expect(IllegalStateException.class);
+tracker.checkpoint();
+  }
+
+  @Test
+  public void testCheckpointJustStarted() throws Exception {
+ByteKeyRangeTracker tracker =
+ByteKeyRangeTracker.of(ByteKeyRange.of(ByteKey.of(0x10), 
ByteKey.of(0xc0)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x10)));
+ByteKeyRange checkpoint = tracker.checkpoint();
+assertEquals(ByteKeyRange.of(ByteKey.of(0x10), ByteKey.of(0x11)), 
tracker.currentRestriction());
+assertEquals(ByteKeyRange.of(ByteKey.of(0x11), ByteKey.of(0xc0)), 
checkpoint);
+  }
+
+  @Test
+  public void testCheckpointRegular() throws Exception {
+ByteKeyRangeTracker tracker =
+ByteKeyRangeTracker.of(ByteKeyRange.of(ByteKey.of(0x10), 
ByteKey.of(0xc0)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x50)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x90)));
+ByteKeyRange checkpoint = tracker.checkpoint();
+assertEquals(ByteKeyRange.of(ByteKey.of(0x10), ByteKey.of(0x91)), 
tracker.currentRestriction());
+assertEquals(ByteKeyRange.of(ByteKey.of(0x91), ByteKey.of(0xc0)), 
checkpoint);
+  }
+
+  @Test
+  public void testCheckpointClaimedLast() throws Exception {
+ByteKeyRangeTracker tracker =
+ByteKeyRangeTracker.of(ByteKeyRange.of(ByteKey.of(0x10), 
ByteKey.of(0xc0)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x50)));
+assertTrue(tracker.tryClaim(ByteKey.of(0x90)));
+assertTrue(tracker.tryClaim(ByteKey.of(0xbf)));
+ByteKeyRange checkpoint = tracker.checkpoint();
+  

[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=94366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94366
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:25
Start Date: 23/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5175: 
[BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r183565970
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import java.util.Arrays;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion. The range is a semi-open bounded interval 
[startKey, endKey)
+ * where the limits are both represented by ByteKey.EMPTY.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any key was 
successfully claimed");
+ByteKeyRange res = ByteKeyRange.of(next(lastClaimedKey), 
range.getEndKey());
+this.range = ByteKeyRange.of(range.getStartKey(), next(lastClaimedKey));
+return res;
+  }
+
+  /**
+   * Attempts to claim the given key.
+   *
+   * Must be larger than the last successfully claimed key.
+   *
+   * @return {@code true} if the key was successfully claimed, {@code false} 
if it is outside the
+   * current {@link ByteKeyRange} of this tracker (in that case this 
operation is a no-op).
+   */
+  @Override
+  protected synchronized boolean tryClaimImpl(ByteKey key) {
+checkArgument(
+lastAttemptedKey == null || key.compareTo(lastAttemptedKey) > 0,
+"Trying to claim key %s while last attempted was %s",
+key,
+lastAttemptedKey);
+checkArgument(
+key.compareTo(range.getStartKey()) > -1,
+"Trying to claim key %s before start of the range %s",
+key,
+range);
+lastAttemptedKey = key;
+// No respective checkArgument for i < range.to() - it's ok to try 
claiming keys beyond
+if (!range.getEndKey().isEmpty() && key.compareTo(range.getEndKey()) > -1) 
{
+  return false;
+}
+lastClaimedKey = key;
+return true;
+  }
+
+  /**
+   * Marks that there are no more keys to be claimed in the range.
+   *
+   * E.g., a {@link DoFn} reading a file and claiming the key of each 
record in the file might
+   * call this if it hits EOF - even though the last attempted claim was 
before the end of the
+   * range, there are no more keys to claim.
+   */
+  public synchronized void markDone() {
+lastAttemptedKey = range.getEndKey();
+  }
+
+  @Override
+  public synchronized void checkDone() throws IllegalStateException {
+checkState(
+next(lastAttemptedKey).compareTo(range.getEndKey()) > -1,
+"Last attemp

[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=94368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94368
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:25
Start Date: 23/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5175: 
[BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r183565569
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import java.util.Arrays;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion. The range is a semi-open bounded interval 
[startKey, endKey)
+ * where the limits are both represented by ByteKey.EMPTY.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any key was 
successfully claimed");
+ByteKeyRange res = ByteKeyRange.of(next(lastClaimedKey), 
range.getEndKey());
 
 Review comment:
   Extract next(..) into a variable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94368)
Time Spent: 6h 10m  (was: 6h)

> Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
> -
>
> Key: BEAM-4018
> URL: https://issues.apache.org/jira/browse/BEAM-4018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> We can have a RestrictionTracker for ByteKey ranges as part of the core sdk 
> so it can be reused by future SDF based IOs like Bigtable, HBase among others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=94367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94367
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:25
Start Date: 23/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5175: 
[BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r183565847
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion. The range is a semi-open bounded interval 
[startKey, endKey)
+ * where the limits are both represented by ByteKey.EMPTY.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+// we don't calc next if end key is empty because it means it is the 
maximum upper bound []
+ByteKey rangeEndKey =
+nextKey.equals(range.getEndKey()) && !range.getEndKey().isEmpty()
+? next(range.getEndKey())
+: range.getEndKey();
+//Assert that nextKey <= range.GetEndKey()
+ByteKeyRange res = ByteKeyRange.of(nextKey, rangeEndKey);
+this.range = ByteKeyRange.of(range.getStartKey(), nextKey);
+return res;
+  }
+
+  /**
+   * Attempts to claim the given key.
+   *
+   * Must be larger than the last successfully claimed key.
+   *
+   * @return {@code true} if the key was successfully claimed, {@code false} 
if it is outside the
+   * current {@link ByteKeyRange} of this tracker (in that case this 
operation is a no-op).
+   */
+  @Override
+  protected synchronized boolean tryClaimImpl(ByteKey key) {
+checkArgument(
+lastAttemptedKey == null || key.compareTo(lastAttemptedKey) > 0,
+"Trying to claim key %s while last attempted was %s",
+key,
+lastAttemptedKey);
+checkArgument(
+key.compareTo(range.getStartKey()) > -1,
 
 Review comment:
   I mean, .compareTo() is not free, and the comparison at line 79 is typically 
unnecessary - it is only necessary for the very first tryClaim() call; for 
subsequent calls, if the comparison at line 74 passed, then this one will pass 
too, by transitivity of comparison.


This is an automa

[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=94361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94361
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:07
Start Date: 23/Apr/18 23:07
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5191: 
[BEAM-4097] Set environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#discussion_r183563364
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/universal_local_runner.py
 ##
 @@ -83,6 +86,16 @@ def cleanup(self):
   time.sleep(0.1)
 self._subprocess = None
 
+  @staticmethod
+  def default_docker_image():
+if 'USER' in os.environ:
+  # Perhaps also test if this was built?
+  logging.info('Using latest built Python SDK docker image.')
+  return os.environ['USER'] + 
'-docker.apache.bintray.io/beam/python:latest'
+else:
+  logging.warnign('Could not find a Python SDK docker image.')
 
 Review comment:
   `warning`
   (is this codepath tested / does it need to be?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94361)
Time Spent: 1h 40m  (was: 1.5h)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4159) Add testing for Pubsub attributes

2018-04-23 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-4159:
---

 Summary: Add testing for Pubsub attributes
 Key: BEAM-4159
 URL: https://issues.apache.org/jira/browse/BEAM-4159
 Project: Beam
  Issue Type: Test
  Components: testing
Reporter: Udi Meiri
Assignee: Jason Kuster


Request is to add an integration test that exercises reading and writing pubsub 
message attributes.

Platform: Java SDK

Stretch goals: ID attribute, timestamp attribute, Python SDK and Go SDK (both 
using the Java runner)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94353
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562815
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Supplier;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} that creates docker containers by shelling 
out to docker. Returned
+ * {@link RemoteEnvironment RemoteEnvironments} own their respective docker 
containers. Not
+ * thread-safe.
+ */
+public class DockerEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerEnvironmentFactory.class);
+
+  public static DockerEnvironmentFactory forServices(
+  DockerCommand docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  // TODO: Refine this to IdGenerator when we determine where that should 
live.
+  Supplier idGenerator) {
+return new DockerEnvironmentFactory(
+docker,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final DockerCommand docker;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final Supplier idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private DockerEnvironmentFactory(
+  DockerCommand docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  Supplier idGenerator,
+  ControlClientPool.Source clientSource) {
+this.docker = docker;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceServer;
+this.idGenerator = idGenerator;
+this.clientSource = clientSource;
+  }
+
+  /** Creates a new, active {@link RemoteEnvironment} backed by a local Docker 
container. */
+  @Override
+  public RemoteEnvironment createEnvironment(Environment environment) throws 
Exception {
+String workerId = idGenerator.get();
+
+// Prepare docker invocation.
+Path workerPersistentDirectory = 
Files.createTempDirectory("worker_persistent_directory");
+Path semiPersistentDire

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94358
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562828
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ControlClientPool.java
 ##
 @@ -17,16 +17,58 @@
  */
 package org.apache.beam.runners.fnexecution.control;
 
-import org.apache.beam.sdk.fn.function.ThrowingConsumer;
-import org.apache.beam.sdk.util.ThrowingSupplier;
+import javax.annotation.concurrent.ThreadSafe;
 
-/** Control client pool that exposes a source and sink of control clients. */
-public interface ControlClientPool {
+/**
+ * A pool of control clients that brokers incoming SDK harness connections (in 
the form of {@link
+ * InstructionRequestHandler InstructionRequestHandlers}.
+ *
+ * Incoming instruction handlers usually come from the control plane gRPC 
service. Typical use:
+ *
+ * 
+ *   // Within owner of the pool, who may or may not own the control plane 
server as well
+ *   ControlClientPool pool = ...
+ *   FnApiControlClientPoolService service =
+ *   FnApiControlClientPoolService.offeringClientsToSink(pool.getSink(), 
headerAccessor)
+ *   // Incoming gRPC control connections will now be added to the client pool.
+ *
+ *   // Within code that interacts with the instruction handler. The get call 
blocks until an
+ *   // incoming client is available:
+ *   ControlClientSource clientSource = ... InstructionRequestHandler
+ *   instructionHandler = clientSource.get("worker-id");
+ * 
+ *
+ * All {@link ControlClientPool} must be thread-safe.
+ */
+@ThreadSafe
+public interface ControlClientPool {
+
+  /** Sink for control clients. */
+  Sink getSink();
 
   /** Source of control clients. */
-  ThrowingSupplier getSource();
+  Source getSource();
 
-  /** Sink for control clients. */
-  ThrowingConsumer getSink();
+  /** A sink for {@link InstructionRequestHandler InstructionRequestHandlers} 
keyed by worker id. */
+  @FunctionalInterface
+  interface Sink {
+
+/**
+ * Puts an {@link InstructionRequestHandler} into a client pool. Worker 
ids must be unique per
+ * pool.
+ */
+void put(String workerId, InstructionRequestHandler instructionHandler) 
throws Exception;
+  }
+
+  /** A source of {@link InstructionRequestHandler 
InstructionRequestHandlers}. */
+  @FunctionalInterface
+  interface Source {
 
+/**
+ * Retrieves the {@link InstructionRequestHandler} for the given worker 
id, blocking until
+ * available. Worker ids must be unique per pool. A given worker id must 
not be requested
 
 Review comment:
   
   This gives the impression that if the worker never becomes available, the 
call never returns. Is this the case?
   
   Also, minor note: this semantics (single get() and put() per workerId) makes 
me think that perhaps "offer/remove" would be better terminology than "get/put" 
- or, at least, "remove" would be better than "get".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94358)
Time Spent: 20h  (was: 19h 50m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94359
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562823
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerCommandTest.java
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.Collections;
+import org.apache.beam.runners.fnexecution.environment.testing.NeedsDocker;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link DockerCommand}. */
+@Category(NeedsDocker.class)
+@RunWith(JUnit4.class)
+public class DockerCommandTest {
+
+  @Test
+  public void helloWorld() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("hello-world", Collections.emptyList());
+System.out.printf("Started container: %s%n", container);
+  }
+
+  @Test
+  public void killContainer() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("debian", Arrays.asList("/bin/bash", 
"-c", "sleep 60"));
+docker.killContainer(container);
+  }
+
+  private static DockerCommand getWrapper() {
+return DockerCommand.forCommand("docker", Duration.ofMillis(1));
+  }
 
 Review comment:
   
   Would be good to assert also that exceptions actually include the necessary 
output.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94359)
Time Spent: 20h 10m  (was: 20h)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94357&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94357
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562825
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerCommandTest.java
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.Collections;
+import org.apache.beam.runners.fnexecution.environment.testing.NeedsDocker;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link DockerCommand}. */
+@Category(NeedsDocker.class)
+@RunWith(JUnit4.class)
+public class DockerCommandTest {
+
+  @Test
+  public void helloWorld() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("hello-world", Collections.emptyList());
+System.out.printf("Started container: %s%n", container);
+  }
+
+  @Test
+  public void killContainer() throws Exception {
+DockerCommand docker = getWrapper();
+String container = docker.runImage("debian", Arrays.asList("/bin/bash", 
"-c", "sleep 60"));
+docker.killContainer(container);
 
 Review comment:
   
   Assert that this takes less than 60 seconds?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94357)
Time Spent: 19h 50m  (was: 19h 40m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94355
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562819
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.ImmutableList;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+/** A docker command wrapper. Simplifies communications with the Docker 
daemon. */
+class DockerCommand {
+  // TODO: Should we require 64-character container ids? Docker technically 
allows abbreviated ids,
+  // but we _should_ always capture full ids.
+  private static final Pattern CONTAINER_ID_PATTERN = 
Pattern.compile("\\p{XDigit}{64}");
+
+  static DockerCommand forCommand(String dockerExecutable, Duration 
commandTimeout) {
 
 Review comment:
   
   An invocation looks like DockerCommand.forCommand(dockerExecutable,...) - a 
bit odd.
   Maybe DockerCommands.forExecutable(...)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94355)
Time Spent: 19h 40m  (was: 19.5h)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94354
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562821
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.ImmutableList;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+/** A docker command wrapper. Simplifies communications with the Docker 
daemon. */
+class DockerCommand {
 
 Review comment:
   
   DockerCommands perhaps?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94354)
Time Spent: 19.5h  (was: 19h 20m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=94356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94356
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 23/Apr/18 23:03
Start Date: 23/Apr/18 23:03
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r183562827
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerContainerEnvironment.java
 ##
 @@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+
+/**
+ * A {@link RemoteEnvironment} that wraps a running Docker container.
+ *
+ * A {@link DockerContainerEnvironment} owns both the underlying docker 
container that it
+ * communicates with an the {@link InstructionRequestHandler} that it uses to 
do so.
+ */
+@ThreadSafe
+class DockerContainerEnvironment implements RemoteEnvironment {
+
+  static DockerContainerEnvironment create(
+  DockerCommand docker,
+  Environment environment,
+  String containerId,
+  InstructionRequestHandler instructionHandler) {
+return new DockerContainerEnvironment(docker, environment, containerId, 
instructionHandler);
+  }
+
+  private final Object lock = new Object();
+  private final DockerCommand docker;
+  private final Environment environment;
+  private final String containerId;
+  private final InstructionRequestHandler instructionHandler;
+
+  private DockerContainerEnvironment(
+  DockerCommand docker,
+  Environment environment,
+  String containerId,
+  InstructionRequestHandler instructionHandler) {
+this.docker = docker;
+this.environment = environment;
+this.containerId = containerId;
+this.instructionHandler = instructionHandler;
+  }
+
+  @Override
+  public Environment getEnvironment() {
+return environment;
+  }
+
+  @Override
+  public InstructionRequestHandler getInstructionRequestHandler() {
+return instructionHandler;
+  }
+
+  /**
+   * Closes this remote docker environment. The associated {@link 
InstructionRequestHandler} should
+   * not be used after calling this.
+   */
+  @Override
+  public void close() throws Exception {
+synchronized (lock) {
 
 Review comment:
   
   Could you add a comment why we add this synchronization? Is it because 
calling instructionHandler.close() from two threads at the same time is unsafe, 
or calling docker.killContainer(containerId) is unsafe, or because it's unsafe 
if two threads both do instructionHandler.close() and then both do 
docker.killContainer()? (but all of this is safe if done from two threads 
sequentially?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94356)
Time Spent: 19h 50m  (was: 19h 40m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3981) Futurize and fix python 2 compatibility for coders package

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3981?focusedWorklogId=94350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94350
 ]

ASF GitHub Bot logged work on BEAM-3981:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:55
Start Date: 23/Apr/18 22:55
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #5053: [BEAM-3981] 
Futurize coders subpackage
URL: https://github.com/apache/beam/pull/5053#issuecomment-383748311
 
 
   Cython issue has a workaround and is no longer a blocker. 
   
   I observed a 15% performance degradation on Dataflow runner with this PR & 
Cython upgrade, however this was a one-off benchmark run and the data is 
inconclusive. I am starting multiple runs to confirm or rule out the regression 
here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94350)
Time Spent: 14h 20m  (was: 14h 10m)

> Futurize and fix python 2 compatibility for coders package
> --
>
> Key: BEAM-3981
> URL: https://issues.apache.org/jira/browse/BEAM-3981
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> Run automatic conversion with futurize tool on coders subpackage and fix 
> python 2 compatibility. This prepares the subpackage for python 3 support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=94347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94347
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:38
Start Date: 23/Apr/18 22:38
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5173: 
[BEAM-3773][SQL] Add EnumerableConverter for JDBC support
URL: https://github.com/apache/beam/pull/5173#discussion_r183558222
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
 ##
 @@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.MetricNameFilter;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.Row;
+import org.apache.calcite.adapter.enumerable.EnumerableRel;
+import org.apache.calcite.adapter.enumerable.EnumerableRelImplementor;
+import org.apache.calcite.adapter.enumerable.PhysType;
+import org.apache.calcite.adapter.enumerable.PhysTypeImpl;
+import org.apache.calcite.linq4j.Enumerable;
+import org.apache.calcite.linq4j.Linq4j;
+import org.apache.calcite.linq4j.tree.BlockBuilder;
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.convert.ConverterImpl;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+
+/**
+ * BeamRelNode to replace a {@code Enumerable} node.
+ */
+public class BeamEnumerableConverter extends ConverterImpl implements 
EnumerableRel {
 
 Review comment:
   A integration test would be good, I'll get one added. I started down the 
unit test path last week and ended up deleting all the code that was unit 
testable when I found a library that implemented it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94347)
Time Spent: 4h 40m  (was: 4.5h)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atla

[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=94343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94343
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:35
Start Date: 23/Apr/18 22:35
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5173: 
[BEAM-3773][SQL] Add EnumerableConverter for JDBC support
URL: https://github.com/apache/beam/pull/5173#discussion_r183557581
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
 ##
 @@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.MetricNameFilter;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.Row;
+import org.apache.calcite.adapter.enumerable.EnumerableRel;
+import org.apache.calcite.adapter.enumerable.EnumerableRelImplementor;
+import org.apache.calcite.adapter.enumerable.PhysType;
+import org.apache.calcite.adapter.enumerable.PhysTypeImpl;
+import org.apache.calcite.linq4j.Enumerable;
+import org.apache.calcite.linq4j.Linq4j;
+import org.apache.calcite.linq4j.tree.BlockBuilder;
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.convert.ConverterImpl;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+
+/**
+ * BeamRelNode to replace a {@code Enumerable} node.
+ */
+public class BeamEnumerableConverter extends ConverterImpl implements 
EnumerableRel {
+
+  public BeamEnumerableConverter(RelOptCluster cluster, RelTraitSet traits, 
RelNode input) {
+super(cluster, ConventionTraitDef.INSTANCE, traits, input);
+  }
+
+  @Override
+  public RelNode copy(RelTraitSet traitSet, List inputs) {
+return new BeamEnumerableConverter(getCluster(), traitSet, sole(inputs));
+  }
+
+  @Override
+  public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery 
mq) {
+// This should always be a last resort.
+return planner.getCostFactory().makeHugeCost();
+  }
+
+  @Override
+  public Result implement(EnumerableRelImplementor implementor, Prefer prefer) 
{
+final BlockBuilder list = new BlockBuilder();
+final RelDataType rowType = getRowType();
+final PhysType physType =
+PhysTypeImpl.of(implementor.getTypeFactory(), rowType, 
prefer.preferArray());
+final Expression node = implementor.stash((BeamRelNode) getInput(), 
BeamRelNode.class);
+list.add(Expressions.call(BeamEnumerableConverter.class, "toEnumerable", 
node));
+return implementor.result(physType, list.toBlock());
+  }
+
+  public static Enumerable toEnumerable(BeamRelNode node) {
+PipelineOptions options = PipelineOptionsFactory.create();
+if (node instanceof BeamIOSinkRel) {
+  return count(options, node);
+}
+return collect(options, node);
+  }
+
+  private static PipelineResult run(PipelineOptions options, BeamRelNode node,
+

[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=94334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94334
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:26
Start Date: 23/Apr/18 22:26
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #5175: [BEAM-4018] Add a 
ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#issuecomment-383742742
 
 
   Changed `ByteKeyRange` to accept degenerate ranges as suggested. Way way 
nicer without the hack.
   This one should be the one, PTAL again @jkff 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94334)
Time Spent: 5h 50m  (was: 5h 40m)

> Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
> -
>
> Key: BEAM-4018
> URL: https://issues.apache.org/jira/browse/BEAM-4018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We can have a RestrictionTracker for ByteKey ranges as part of the core sdk 
> so it can be reused by future SDF based IOs like Bigtable, HBase among others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_Verify #4769

2018-04-23 Thread Apache Jenkins Server
See 


Changes:

[rangadi] Add 10 millis sleep when there are no elements left in a partition.

--
[...truncated 1.06 MB...]
test_type_check_violation_valid_simple_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_enforce_kv_type_constraint 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_be_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_have_length_2 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_proxy_to_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_enforce_list_type_constraint_valid_simple_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_constraint_compatibility 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_list_repr (apache_beam.typehints.typehints_test.ListHintTestCase) ... ok
test_getitem_proxy_to_union 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_getitem_sequence_not_allowed 
(apache_beam.typehints.typehints_test.OptionalHintTestCase) ... ok
test_any_return_type_hint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_primitive_type_or_type_constraint 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_must_be_single_return_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_no_kwargs_accepted 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_composite_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_simple_type 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_type_check_violation 
(apache_beam.typehints.typehints_test.ReturnsDecoratorTestCase) ... ok
test_compatibility (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_invalid_elem_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_must_be_set 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_composite_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_type_check_valid_elem_simple_type 
(apache_beam.typehints.typehints_test.SetHintTestCase) ... ok
test_any_argument_type_hint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_basic_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_composite_type_assertion 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_invalid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_must_be_primitive_type_or_constraint 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_mix_positional_and_keyword_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_only_positional_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_valid_simple_type_arguments 
(apache_beam.typehints.typehints_test.TakesDecoratorTestCase) ... ok
test_functions_as_regular_generator 
(apache_beam.typehints.typehints_test.TestGeneratorWrapper) ... ok
test_compatibility (apache_beam.typehints.typehints_test.TupleHintTestCase) ... 
ok
test_compatibility_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_invalid_ellipsis_type_param 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_getitem_params_must_be_type_or_constraint 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_raw_tuple (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_composite_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_type_arbitrary_length 
(apache_beam.typehints.typehints_test.TupleHintTestCase) ... ok
test_type_check_invalid_simple_types 
(apache_beam.typehints.typehints_test

[jira] [Work logged] (BEAM-3979) New DoFn should allow injecting of all parameters in ProcessContext

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3979?focusedWorklogId=94328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94328
 ]

ASF GitHub Bot logged work on BEAM-3979:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:01
Start Date: 23/Apr/18 22:01
Worklog Time Spent: 10m 
  Work Description: jkff commented on issue #4989: [BEAM-3979] Start 
completing the new DoFn vision: plumb context parameters into process functions.
URL: https://github.com/apache/beam/pull/4989#issuecomment-383737244
 
 
   Other than this, LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94328)
Time Spent: 2.5h  (was: 2h 20m)

> New DoFn should allow injecting of all parameters in ProcessContext
> ---
>
> Key: BEAM-3979
> URL: https://issues.apache.org/jira/browse/BEAM-3979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This was intended in the past, but never completed. Ideally all primitive 
> parameters in ProcessContext should be injectable, and OutputReceiver 
> parameters can be used to collection output. So, we should be able to write a 
> DoFn as follows
> @ProcessElement
> public void process(@Element String word, OutputReceiver receiver) {
>   receiver.output(word.toUpperCase());
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3979) New DoFn should allow injecting of all parameters in ProcessContext

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3979?focusedWorklogId=94326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94326
 ]

ASF GitHub Bot logged work on BEAM-3979:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:01
Start Date: 23/Apr/18 22:01
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4989: 
[BEAM-3979] Start completing the new DoFn vision: plumb context parameters into 
process functions.
URL: https://github.com/apache/beam/pull/4989#discussion_r183550937
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
 ##
 @@ -361,7 +362,18 @@ public void setup() {
 public void processElement(final ProcessContext c) {
   final InputT element = c.element().getKey();
   invoker.invokeSplitRestriction(
-  element, c.element().getValue(), part -> c.output(KV.of(element, 
part)));
+  element, c.element().getValue(), new OutputReceiver() {
+@Override
+public void output(RestrictionT part) {
+  c.output(KV.of(element, part));
+}
+
+@Override
+public void outputWithTimestamp(RestrictionT part, Instant 
timestamp) {
+  c.outputWithTimestamp(KV.of(element, part), timestamp);
 
 Review comment:
   As a tie-breaker: the dynamic splitting API does not allow you to assign 
different timestamps to the restrictions, and neither should the static one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94326)
Time Spent: 2h 10m  (was: 2h)

> New DoFn should allow injecting of all parameters in ProcessContext
> ---
>
> Key: BEAM-3979
> URL: https://issues.apache.org/jira/browse/BEAM-3979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This was intended in the past, but never completed. Ideally all primitive 
> parameters in ProcessContext should be injectable, and OutputReceiver 
> parameters can be used to collection output. So, we should be able to write a 
> DoFn as follows
> @ProcessElement
> public void process(@Element String word, OutputReceiver receiver) {
>   receiver.output(word.toUpperCase());
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3979) New DoFn should allow injecting of all parameters in ProcessContext

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3979?focusedWorklogId=94327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94327
 ]

ASF GitHub Bot logged work on BEAM-3979:


Author: ASF GitHub Bot
Created on: 23/Apr/18 22:01
Start Date: 23/Apr/18 22:01
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4989: 
[BEAM-3979] Start completing the new DoFn vision: plumb context parameters into 
process functions.
URL: https://github.com/apache/beam/pull/4989#discussion_r183551078
 
 

 ##
 File path: 
examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java
 ##
 @@ -106,7 +106,7 @@
 }
 
 @ProcessElement
-public void processElement(ProcessContext c) {
+public void processElement(@Element String element, OutputReceiver 
receiver) {
 
 Review comment:
   Alrighty then.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94327)
Time Spent: 2h 20m  (was: 2h 10m)

> New DoFn should allow injecting of all parameters in ProcessContext
> ---
>
> Key: BEAM-3979
> URL: https://issues.apache.org/jira/browse/BEAM-3979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This was intended in the past, but never completed. Ideally all primitive 
> parameters in ProcessContext should be injectable, and OutputReceiver 
> parameters can be used to collection output. So, we should be able to write a 
> DoFn as follows
> @ProcessElement
> public void process(@Element String word, OutputReceiver receiver) {
>   receiver.output(word.toUpperCase());
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >