[jira] [Resolved] (BEAM-918) Let users set STORAGE_LEVEL via SparkPipelineOptions.

2016-12-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-918.
---
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Let users set STORAGE_LEVEL via SparkPipelineOptions.
> -
>
> Key: BEAM-918
> URL: https://issues.apache.org/jira/browse/BEAM-918
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.4.0-incubating
>
>
> Spark provides different "STORAGE_LEVEL"s for caching RDDs (disk, memory, 
> ser/de, etc.).
> The runner decides on caching when necessary, for example when a RDD is 
> accessed repeatedly.
>  
> For batch, we can let users set their preferred STORAGE_LEVEL via 
> SparkPipelineOptions.
> Note: for streaming we force a "MEMORY_ONLY", since (among other things) the 
> runner heavily relies on stateful operations, and it's natural for streaming 
> to happen in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #1370: [BEAM-918] Allow users to define the stor...

2016-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1370


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: [BEAM-918] This closes #1370

2016-12-01 Thread jbonofre
[BEAM-918] This closes #1370


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/0c875ba7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/0c875ba7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/0c875ba7

Branch: refs/heads/master
Commit: 0c875ba704c2501c3215ffd588d02d2e4117ded2
Parents: 711c680 d99829d
Author: Jean-Baptiste Onofré 
Authored: Thu Dec 1 11:43:36 2016 +0100
Committer: Jean-Baptiste Onofré 
Committed: Thu Dec 1 11:43:36 2016 +0100

--
 .../runners/spark/SparkPipelineOptions.java |  5 ++
 .../spark/translation/BoundedDataset.java   |  5 +-
 .../beam/runners/spark/translation/Dataset.java |  2 +-
 .../spark/translation/EvaluationContext.java| 10 +++-
 .../translation/StorageLevelPTransform.java | 43 +++
 .../spark/translation/TransformTranslator.java  | 27 ++
 .../translation/streaming/UnboundedDataset.java | 13 -
 .../spark/translation/StorageLevelTest.java | 56 
 8 files changed, 155 insertions(+), 6 deletions(-)
--




[1/2] incubator-beam git commit: [BEAM-918] Allow users to define the storage level via pipeline options

2016-12-01 Thread jbonofre
Repository: incubator-beam
Updated Branches:
  refs/heads/master 711c68092 -> 0c875ba70


[BEAM-918] Allow users to define the storage level via pipeline options


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/d99829dd
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/d99829dd
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/d99829dd

Branch: refs/heads/master
Commit: d99829dd99db4090ceb7e5eefce50ee513c5458e
Parents: 711c680
Author: Jean-Baptiste Onofré 
Authored: Thu Nov 17 12:38:00 2016 +0100
Committer: Jean-Baptiste Onofré 
Committed: Thu Dec 1 11:38:25 2016 +0100

--
 .../runners/spark/SparkPipelineOptions.java |  5 ++
 .../spark/translation/BoundedDataset.java   |  5 +-
 .../beam/runners/spark/translation/Dataset.java |  2 +-
 .../spark/translation/EvaluationContext.java| 10 +++-
 .../translation/StorageLevelPTransform.java | 43 +++
 .../spark/translation/TransformTranslator.java  | 27 ++
 .../translation/streaming/UnboundedDataset.java | 13 -
 .../spark/translation/StorageLevelTest.java | 56 
 8 files changed, 155 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/d99829dd/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
index 0fd790e..3f8b379 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
@@ -44,6 +44,11 @@ public interface SparkPipelineOptions
   Long getBatchIntervalMillis();
   void setBatchIntervalMillis(Long batchInterval);
 
+  @Description("Batch default storage level")
+  @Default.String("MEMORY_ONLY")
+  String getStorageLevel();
+  void setStorageLevel(String storageLevel);
+
   @Description("Minimum time to spend on read, for each micro-batch.")
   @Default.Long(200)
   Long getMinReadTimeMillis();

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/d99829dd/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
index 774efb9..1cfb0e0 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
@@ -32,6 +32,7 @@ import org.apache.beam.sdk.values.PCollection;
 import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.JavaRDDLike;
 import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.storage.StorageLevel;
 
 /**
  * Holds an RDD or values for deferred conversion to an RDD if needed. 
PCollections are sometimes
@@ -97,8 +98,8 @@ public class BoundedDataset implements Dataset {
   }
 
   @Override
-  public void cache() {
-rdd.cache();
+  public void cache(String storageLevel) {
+rdd.persist(StorageLevel.fromString(storageLevel));
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/d99829dd/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/Dataset.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/Dataset.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/Dataset.java
index 36b03fe..b5d550e 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/Dataset.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/Dataset.java
@@ -26,7 +26,7 @@ import java.io.Serializable;
  */
 public interface Dataset extends Serializable {
 
-  void cache();
+  void cache(String storageLevel);
 
   void action();
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/d99829dd/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java
index 

[jira] [Commented] (BEAM-918) Let users set STORAGE_LEVEL via SparkPipelineOptions.

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711985#comment-15711985
 ] 

ASF GitHub Bot commented on BEAM-918:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1370


> Let users set STORAGE_LEVEL via SparkPipelineOptions.
> -
>
> Key: BEAM-918
> URL: https://issues.apache.org/jira/browse/BEAM-918
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Jean-Baptiste Onofré
>
> Spark provides different "STORAGE_LEVEL"s for caching RDDs (disk, memory, 
> ser/de, etc.).
> The runner decides on caching when necessary, for example when a RDD is 
> accessed repeatedly.
>  
> For batch, we can let users set their preferred STORAGE_LEVEL via 
> SparkPipelineOptions.
> Note: for streaming we force a "MEMORY_ONLY", since (among other things) the 
> runner heavily relies on stateful operations, and it's natural for streaming 
> to happen in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam-site pull request #98: [BEAM-825] Fill in the documentation/r...

2016-12-01 Thread sandeepdeshmukh
GitHub user sandeepdeshmukh opened a pull request:

https://github.com/apache/incubator-beam-site/pull/98

[BEAM-825] Fill in the documentation/runners/apex portion of the website

@tweise : Could you please review.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeepdeshmukh/incubator-beam-site 
apex-runner-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam-site/pull/98.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #98


commit d8ecc4e30b04d152213e0ebfe0834bb3ed5699e1
Author: Sandeep Deshmukh 
Date:   2016-12-01T13:49:17Z

[BEAM-825] Fill in the documentation/runners/apex portion of the website




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #281

2016-12-01 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1065) FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)

2016-12-01 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712452#comment-15712452
 ] 

Daniel Halperin commented on BEAM-1065:
---

The ability to open at a non-zero location has previously been thought to be 
linked to seekability. Do we think there are channels that can be opened at 
non-0 offsets but cannot be seeked?

> FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)
> --
>
> Key: BEAM-1065
> URL: https://issues.apache.org/jira/browse/BEAM-1065
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>
> FileBasedReader should be able to open the file with the 
> Source.getStartOffset(), and then read forward to find the first input 
> element.
> The benefits are:
> 1. It is easier to implement a ReadableByteChannel.
> 2. Dynamically splitting won't require file systems to support seeking.
> 3. Doesn't need to seek to position twice, which is what current API does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: Closes #1468

2016-12-01 Thread robertwb
Closes #1468


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/739a4319
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/739a4319
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/739a4319

Branch: refs/heads/python-sdk
Commit: 739a431975120fe267e6b81635ce6e2356bd2895
Parents: aa9071d b4c2f62
Author: Robert Bradshaw 
Authored: Thu Dec 1 09:07:28 2016 -0800
Committer: Robert Bradshaw 
Committed: Thu Dec 1 09:07:28 2016 -0800

--
 sdks/python/apache_beam/io/bigquery.py  | 37 
 sdks/python/apache_beam/io/bigquery_test.py | 22 ++
 2 files changed, 59 insertions(+)
--




[1/2] incubator-beam git commit: Parse table schema from JSON

2016-12-01 Thread robertwb
Repository: incubator-beam
Updated Branches:
  refs/heads/python-sdk aa9071d56 -> 739a43197


Parse table schema from JSON


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b4c2f62b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b4c2f62b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b4c2f62b

Branch: refs/heads/python-sdk
Commit: b4c2f62be8a809b666089e7b2fe5dada9cbd6c16
Parents: aa9071d
Author: Sourabh Bajaj 
Authored: Wed Nov 30 13:48:28 2016 -0800
Committer: Robert Bradshaw 
Committed: Thu Dec 1 09:07:27 2016 -0800

--
 sdks/python/apache_beam/io/bigquery.py  | 37 
 sdks/python/apache_beam/io/bigquery_test.py | 22 ++
 2 files changed, 59 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b4c2f62b/sdks/python/apache_beam/io/bigquery.py
--
diff --git a/sdks/python/apache_beam/io/bigquery.py 
b/sdks/python/apache_beam/io/bigquery.py
index 8d7892a..0885e3a 100644
--- a/sdks/python/apache_beam/io/bigquery.py
+++ b/sdks/python/apache_beam/io/bigquery.py
@@ -200,6 +200,43 @@ class TableRowJsonCoder(coders.Coder):
 f=[bigquery.TableCell(v=to_json_value(e)) for e in od.itervalues()])
 
 
+def parse_table_schema_from_json(schema_string):
+  """Parse the Table Schema provided as string.
+
+  Args:
+schema_string: String serialized table schema, should be a valid JSON.
+
+  Returns:
+A TableSchema of the BigQuery export from either the Query or the Table.
+  """
+  json_schema = json.loads(schema_string)
+
+  def _parse_schema_field(field):
+"""Parse a single schema field from dictionary.
+
+Args:
+  field: Dictionary object containing serialized schema.
+
+Returns:
+  A TableFieldSchema for a single column in BigQuery.
+"""
+schema = bigquery.TableFieldSchema()
+schema.name = field['name']
+schema.type = field['type']
+if 'mode' in field:
+  schema.mode = field['mode']
+else:
+  schema.mode = 'NULLABLE'
+if 'description' in field:
+  schema.description = field['description']
+if 'fields' in field:
+  schema.fields = [_parse_schema_field(x) for x in field['fields']]
+return schema
+
+  fields = [_parse_schema_field(f) for f in json_schema['fields']]
+  return bigquery.TableSchema(fields=fields)
+
+
 class BigQueryDisposition(object):
   """Class holding standard strings used for create and write dispositions."""
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b4c2f62b/sdks/python/apache_beam/io/bigquery_test.py
--
diff --git a/sdks/python/apache_beam/io/bigquery_test.py 
b/sdks/python/apache_beam/io/bigquery_test.py
index b0c3bbe..e263e13 100644
--- a/sdks/python/apache_beam/io/bigquery_test.py
+++ b/sdks/python/apache_beam/io/bigquery_test.py
@@ -32,6 +32,7 @@ from apache_beam.internal.clients import bigquery
 from apache_beam.internal.json_value import to_json_value
 from apache_beam.io.bigquery import RowAsDictJsonCoder
 from apache_beam.io.bigquery import TableRowJsonCoder
+from apache_beam.io.bigquery import parse_table_schema_from_json
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 from apache_beam.utils.options import PipelineOptions
@@ -113,6 +114,27 @@ class TestTableRowJsonCoder(unittest.TestCase):
 self.json_compliance_exception(float('-inf'))
 
 
+class TestTableSchemaParser(unittest.TestCase):
+  def test_parse_table_schema_from_json(self):
+string_field = bigquery.TableFieldSchema(
+name='s', type='STRING', mode='NULLABLE', description='s description')
+number_field = bigquery.TableFieldSchema(
+name='n', type='INTEGER', mode='REQUIRED', description='n description')
+record_field = bigquery.TableFieldSchema(
+name='r', type='RECORD', mode='REQUIRED', description='r description',
+fields=[string_field, number_field])
+expected_schema = bigquery.TableSchema(fields=[record_field])
+json_str = json.dumps({'fields': [
+{'name': 'r', 'type': 'RECORD', 'mode': 'REQUIRED',
+ 'description': 'r description', 'fields': [
+ {'name': 's', 'type': 'STRING', 'mode': 'NULLABLE',
+  'description': 's description'},
+ {'name': 'n', 'type': 'INTEGER', 'mode': 'REQUIRED',
+  'description': 'n description'}]}]})
+self.assertEqual(parse_table_schema_from_json(json_str),
+ expected_schema)
+
+
 class TestBigQuerySource(unittest.TestCase):
 
   def test_display_data_item_on_validate_true(self):



[GitHub] incubator-beam pull request #1468: [BEAM] Add table schema from json parser

2016-12-01 Thread sb2nov
Github user sb2nov closed the pull request at:

https://github.com/apache/incubator-beam/pull/1468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1467: Improve size estimation speed for file sa...

2016-12-01 Thread sb2nov
Github user sb2nov closed the pull request at:

https://github.com/apache/incubator-beam/pull/1467


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-1066) Add test coverage for ReleaseInfo

2016-12-01 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1066:
-

 Summary: Add test coverage for ReleaseInfo
 Key: BEAM-1066
 URL: https://issues.apache.org/jira/browse/BEAM-1066
 Project: Beam
  Issue Type: Test
  Components: testing
Affects Versions: Not applicable
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Trivial
 Fix For: Not applicable


We use the version string for a lot, so we should test it to prevent 
regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1066) Add test coverage for ReleaseInfo

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712521#comment-15712521
 ] 

ASF GitHub Bot commented on BEAM-1066:
--

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1480

[BEAM-1066] Add a test of ReleaseInfo

R: @lukecwik OR @kennknowles 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam release-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1480


commit 59e56e42fd29af8c2ac59cc69a92fa7c3ff8743d
Author: Dan Halperin 
Date:   2016-12-01T17:15:28Z

Add a test of ReleaseInfo




> Add test coverage for ReleaseInfo
> -
>
> Key: BEAM-1066
> URL: https://issues.apache.org/jira/browse/BEAM-1066
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Trivial
> Fix For: Not applicable
>
>
> We use the version string for a lot, so we should test it to prevent 
> regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1068) Service Account Credentials File Specified via Pipeline Option Ignored

2016-12-01 Thread Stephen Reichling (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Reichling updated BEAM-1068:

Environment: 
CentOS Linux release 7.1.1503 (Core)
Python 2.7.5

> Service Account Credentials File Specified via Pipeline Option Ignored
> --
>
> Key: BEAM-1068
> URL: https://issues.apache.org/jira/browse/BEAM-1068
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
> Environment: CentOS Linux release 7.1.1503 (Core)
> Python 2.7.5
>Reporter: Stephen Reichling
>Assignee: Frances Perry
>Priority: Minor
>
> When writing a pipeline that authenticates with Google Dataflow APIs using a 
> service account, specifying the path to that service account's credentials 
> file in the {{PipelineOptions}} object passed in to the pipeline does not 
> work, it only works when passed as a command-line flag.
> For example, if I write code like so:
> {code}
> pipelineOptions = options.PipelineOptions()
> gcOptions = pipelineOptions.view_as(options.GoogleCloudOptions)
> gcOptions.service_account_name = 'My Service Account Name'
> gcOptions.service_account_key_file = '/some/path/keyfile.p12'
> pipeline = beam.Pipeline(options=pipelineOptions)
> # ... add stages to the pipeline
> p.run()
> {code}
> and execute it like so:
> {{python ./my_pipeline.py}}
> ...the service account I specify will not be used.
> Only if I were to execute the code like so:
> {{python ./my_pipeline.py --service_account_name 'My Service Account Name' 
> --service_account_key_file /some/path/keyfile.p12}}
> ...does it actually use the service account.
> The problem appears to be rooted in `auth.py` which reconstructs the 
> {{PipelineOptions}} object directly from {{sys.argv}} rather than using the 
> instance passed in to the pipeline: 
> https://github.com/apache/incubator-beam/blob/9ded359daefc6040d61a1f33c77563474fcb09b6/sdks/python/apache_beam/internal/auth.py#L129-L130



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1046) Travis (Linux) failing for python sdk

2016-12-01 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712738#comment-15712738
 ] 

Pablo Estrada commented on BEAM-1046:
-

Fixing this with PR #1456. Should be merged soon.

> Travis (Linux) failing for python sdk
> -
>
> Key: BEAM-1046
> URL: https://issues.apache.org/jira/browse/BEAM-1046
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>
> All PRs are failing with the same error:
> An example: https://travis-ci.org/apache/incubator-beam/builds/178435675
> $ if [ "$TEST_PYTHON" ] && ! pip list | grep tox; then travis_retry pip 
> install tox --user `whoami`; fi
> You are using pip version 6.0.8, however version 9.0.1 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> Collecting tox
>   Downloading tox-2.5.0-py2.py3-none-any.whl (42kB)
> 100% || 45kB 6.3MB/s 
> Collecting travis
>   Could not find any downloads that satisfy the requirement travis
>   No distributions at all found for travis
> The command "pip install tox --user travis" failed. Retrying, 2 of 3.
> You are using pip version 6.0.8, however version 9.0.1 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> Collecting tox
>   Using cached tox-2.5.0-py2.py3-none-any.whl
> Collecting travis
>   Could not find any downloads that satisfy the requirement travis
>   No distributions at all found for travis
> The command "pip install tox --user travis" failed. Retrying, 3 of 3.
> You are using pip version 6.0.8, however version 9.0.1 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> Collecting tox
>   Using cached tox-2.5.0-py2.py3-none-any.whl
> Collecting travis
>   Could not find any downloads that satisfy the requirement travis
>   No distributions at all found for travis
> The command "pip install tox --user travis" failed 3 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1070) Service Account Based Authentication Broken

2016-12-01 Thread Stephen Reichling (JIRA)
Stephen Reichling created BEAM-1070:
---

 Summary: Service Account Based Authentication Broken
 Key: BEAM-1070
 URL: https://issues.apache.org/jira/browse/BEAM-1070
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
 Environment: CentOS Linux release 7.1.1503 (Core) 
Python 2.7.5
Reporter: Stephen Reichling
Assignee: Frances Perry
Priority: Critical


{{sdks/python/apache_beam/internal/auth.py}} calls into the 
{{oauth2client.service_account.ServiceAccountCredentials.from_p12_keyfile}} 
method with invalid and incorrectly-ordered parameters. Compare the [function 
signature of 
ServiceAccountCredentials.from_p12_keyfile|https://github.com/google/oauth2client/blob/ae73312942d3cf0e98f097dfbb40f136c2a7c463/oauth2client/service_account.py#L300-L303]
 with [how it is 
invoked|https://github.com/apache/incubator-beam/blob/9ded359daefc6040d61a1f33c77563474fcb09b6/sdks/python/apache_beam/internal/auth.py#L150-L154].
 This causes a runtime error when one attempts to use a service account to 
authenticate with the Google Dataflow APIs.

The specific problems are:
 - the {{client_scopes}} variable (a list) is passed as a positional parameter 
where the function signature expects the {{private_key_password}} parameter (a 
string).
 - a keyed parameter, {{user_agent}}, is passed but no such parameter is 
defined in the function signature.
 - no value is provided for {{private_key_password}}. All p12 key files for 
service accounts issued by Google Cloud have the password {{notasecret}} as 
documented 
[here|https://support.google.com/cloud/answer/6158849?hl=en#serviceaccounts], 
so it's currently not possible to use a Google-issued p12 key file with this 
implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1067) apex.examples.WordCountTest.testWordCountExample may be flaky

2016-12-01 Thread Stas Levin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stas Levin updated BEAM-1067:
-
Description: 
Seems that 
{{org.apache.beam.runners.apex.examples.WordCountTest.testWordCountExample}} is 
flaky.

For example, 
[this|https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5408/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.runners.apex.examples/WordCountTest/testWordCountExample/
 ] run failed although no changes were made in {{runner-apex}}.

  was:
Seems that 
{{org.apache.beam.runners.apex.examples.WordCountTest.testWordCountExample}} is 
flaky.

For example, 
[this|https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5408/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.runners.apex.examples/WordCountTest/testWordCountExample/
 ] run failed although no changes were made in {{runners-spark}}.


> apex.examples.WordCountTest.testWordCountExample may be flaky
> -
>
> Key: BEAM-1067
> URL: https://issues.apache.org/jira/browse/BEAM-1067
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Stas Levin
>
> Seems that 
> {{org.apache.beam.runners.apex.examples.WordCountTest.testWordCountExample}} 
> is flaky.
> For example, 
> [this|https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5408/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.runners.apex.examples/WordCountTest/testWordCountExample/
>  ] run failed although no changes were made in {{runner-apex}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1067) apex.examples.WordCountTest.testWordCountExample may be flaky

2016-12-01 Thread Stas Levin (JIRA)
Stas Levin created BEAM-1067:


 Summary: apex.examples.WordCountTest.testWordCountExample may be 
flaky
 Key: BEAM-1067
 URL: https://issues.apache.org/jira/browse/BEAM-1067
 Project: Beam
  Issue Type: Bug
  Components: runner-apex
Reporter: Stas Levin


Seems that 
{{org.apache.beam.runners.apex.examples.WordCountTest.testWordCountExample}} is 
flaky.

For example, 
[this|https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/5408/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.runners.apex.examples/WordCountTest/testWordCountExample/
 ] run failed although no changes were made in {{runners-spark}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1042) Clean build fails on Windows in Starter archetype

2016-12-01 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712829#comment-15712829
 ] 

Thomas Weise commented on BEAM-1042:


[~jbonofre] I created this after offline discussion with [~kenn] just so we 
have a record. I'm not working on it. If this is already covered by other work 
please link the JIRAs. 

> Clean build fails on Windows in Starter archetype
> -
>
> Key: BEAM-1042
> URL: https://issues.apache.org/jira/browse/BEAM-1042
> Project: Beam
>  Issue Type: Bug
>Reporter: Thomas Weise
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #1482: travis.yml: disable skipping things that ...

2016-12-01 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1482

travis.yml: disable skipping things that no longer run

R: @kennknowles 

https://travis-ci.org/dhalperi/incubator-beam/builds/180475125 is the link 
on my repo, with (unsurprisingly) 1 network-related flake

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam travis

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1482


commit bd9b295c27cab04a0619696b22dec82c7203516e
Author: Dan Halperin 
Date:   2016-12-01T18:04:38Z

travis.yml: disable skipping things that no longer run




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-1069) Add CountingInput Transform to python sdk

2016-12-01 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1069:
---

 Summary: Add CountingInput Transform to python sdk
 Key: BEAM-1069
 URL: https://issues.apache.org/jira/browse/BEAM-1069
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Frances Perry
Priority: Minor


Similar to java sdk,  
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CountingInput.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1024) upgrade to protobuf-3.1.0

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1024:
--
Component/s: sdk-java-gcp

> upgrade to protobuf-3.1.0
> -
>
> Key: BEAM-1024
> URL: https://issues.apache.org/jira/browse/BEAM-1024
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-gcp
>Reporter: Rafael Fernandez
>Priority: Minor
>
> The SDK currently uses protobuf 3.0.0-beta-1. There are critical improvements 
> to the library since (such as JsonFormat.parser().ignoringUnknownFields()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1024) upgrade to protobuf-3.1.0

2016-12-01 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713164#comment-15713164
 ] 

Daniel Halperin commented on BEAM-1024:
---

Lowering to Minor priority -- in general, this issue is not something Beam can 
accomplish on its own, but once our dependencies or Google as a whole converges 
on a single, stable, standard version we can revisit.

> upgrade to protobuf-3.1.0
> -
>
> Key: BEAM-1024
> URL: https://issues.apache.org/jira/browse/BEAM-1024
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-gcp
>Reporter: Rafael Fernandez
>Priority: Minor
>
> The SDK currently uses protobuf 3.0.0-beta-1. There are critical improvements 
> to the library since (such as JsonFormat.parser().ignoringUnknownFields()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-824) Misleading error message when sdk_location is missing in python

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-824:
-
Component/s: sdk-py

> Misleading error message when sdk_location is missing in python
> ---
>
> Key: BEAM-824
> URL: https://issues.apache.org/jira/browse/BEAM-824
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Pablo Estrada
>
> When trying to submit jobs to the Cloud Dataflow service using the Python 
> SDK, the sdk_location should be provided or the serive errors out saying that 
> package google-cloud-dataflow is missing.
> We might want to prompt users to add sdk_location parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-804) Python Pipeline Option save_main_session non-functional

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-804:
-
Component/s: sdk-py

> Python Pipeline Option save_main_session non-functional
> ---
>
> Key: BEAM-804
> URL: https://issues.apache.org/jira/browse/BEAM-804
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
> Environment: OSX El Capitan, google-cloud-dataflow==0.4.3, python 
> 2.7.12
>Reporter: Zoran Bujila
>Priority: Critical
>
> When trying to use the option --save_main_session a pickling error occurs.
> pickle.PicklingError: Can't pickle  'apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum'>:
>  it's not found as 
> apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnu
> This prevents the use of this option which is desirable as there is an 
> expensive object that needs to be created on each worker in my pipeline and I 
> would like to have this object created only once per worker. It is not 
> practical to have it inline with the ParDo function unless I make the batch 
> size sent to the ParDo quite large. Doing this seems to lead to idle workers 
> and I would ideally want to bring the batch size way down.
> The "Affects Version" option above doesn't have a 0.4.3 version in the drop 
> down so I did not populate it. However, this was a problem with 0.4.1 and has 
> not been corrected with 0.4.3.
> I don't see where I can attach a file, so here is the entire error.
> 2016-10-24 10:00:16,071 The 
> oauth2client.contrib.multistore_file module has been deprecated and will be 
> removed in the next release of oauth2client. Please migrate to 
> multiprocess_file_storage.
> 2016-10-24 10:00:16,127 __init__Direct usage of TextFileSink is 
> deprecated. Please use 'textio.WriteToText()' instead of directly 
> instantiating a TextFileSink object.
> Traceback (most recent call last):
>   File "00test.py", line 41, in 
> p.run()
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 
> 159, in run
> return self.runner.run(self)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/dataflow_runner.py",
>  line 172, in run
> self.dataflow_client.create_job(self.job))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/utils/retry.py", 
> line 160, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/internal/apiclient.py", 
> line 375, in create_job
> job.options, file_copy=self._gcs_file_copy)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/utils/dependency.py", 
> line 325, in stage_job_resources
> pickler.dump_session(pickled_session_file)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/internal/pickler.py", 
> line 204, in dump_session
> return dill.dump_session(file_path)
>   File "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 333, in 
> dump_session
> pickler.dump(main)
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 224, in dump
> self.save(obj)
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 286, in save
> f(self, obj) # Call unbound method with explicit self
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/internal/pickler.py", 
> line 123, in save_module
> return old_save_module(pickler, obj)
>   File "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 1168, in 
> save_module
> state=_main_dict)
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 425, in save_reduce
> save(state)
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 286, in save
> f(self, obj) # Call unbound method with explicit self
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/internal/pickler.py", 
> line 159, in new_save_module_dict
> return old_save_module_dict(pickler, obj)
>   File "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 835, in 
> save_module_dict
> StockPickler.save_dict(pickler, obj)
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 655, in save_dict
> self._batch_setitems(obj.iteritems())
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 687, in _batch_setitems
> save(v)
>   File 
> "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 331, in save
> 

[jira] [Resolved] (BEAM-860) Move Apache RAT license check out of release profile

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-860.
--
   Resolution: Not A Problem
Fix Version/s: Not applicable

> Move Apache RAT license check out of release profile
> 
>
> Key: BEAM-860
> URL: https://issues.apache.org/jira/browse/BEAM-860
> Project: Beam
>  Issue Type: Improvement
>Reporter: Mitch Shanklin
>Assignee: Mitch Shanklin
>Priority: Minor
> Fix For: Not applicable
>
>
> Currently Apache RAT only checks licenses as a part of the release profile. 
> Since the contributor's guide advises users to run 'mvn clean verify' without 
> the release profile locally, this means that missing licenses on files are 
> caught by Jenkins when they could be caught locally, saving a cycle.
> Since RAT runs quickly, there doesn't seem to be a great reason to keep that 
> in the release profile. Seems similar to checkstyle in many respects, which 
> is not part of the release profile.
> See https://github.com/apache/incubator-beam/pull/1199#issuecomment-256802048 
> for discussion of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-884) Add Display Data to the Python SDK's PipelineOptions, Avro io and other transforms

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-884:
-
Component/s: sdk-py

> Add Display Data to the Python SDK's PipelineOptions, Avro io and other 
> transforms
> --
>
> Key: BEAM-884
> URL: https://issues.apache.org/jira/browse/BEAM-884
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Pablo Estrada
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-877) Allow disabling flattening of results when using BigQuery source

2016-12-01 Thread Sourabh Bajaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Bajaj updated BEAM-877:
---
Fix Version/s: Not applicable

> Allow disabling flattening of results when using BigQuery source
> 
>
> Key: BEAM-877
> URL: https://issues.apache.org/jira/browse/BEAM-877
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Sourabh Bajaj
> Fix For: Not applicable
>
>
> Java SDK supports disabling results flattening when creating a BQ source 
> using a query.
> https://github.com/apache/incubator-beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L477
> Python SDK should be updated to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-614) File compression/decompression should support auto detection

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-614:
-
Component/s: sdk-py

> File compression/decompression should support auto detection
> 
>
> Key: BEAM-614
> URL: https://issues.apache.org/jira/browse/BEAM-614
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Slaven Bilac
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-765) TriggerStateMachine does not need to be separate from OnceTriggerStateMachine

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-765:
-
Component/s: runner-core

> TriggerStateMachine does not need to be separate from OnceTriggerStateMachine
> -
>
> Key: BEAM-765
> URL: https://issues.apache.org/jira/browse/BEAM-765
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Trivial
>
> In runners-core, the state machine implementation of triggers does not need 
> to have the fine-grained type-like enforcement of whether a trigger is a 
> {{OnceTrigger}} or not. It may simplify the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-792) In WC Walkthrough: Document how logging works on Apache Spark

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-792:
-
Component/s: website

> In WC Walkthrough: Document how logging works on Apache Spark
> -
>
> Key: BEAM-792
> URL: https://issues.apache.org/jira/browse/BEAM-792
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Hadar Hod
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-444) Promote isBlockOnRun() to PipelineOptions.

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-444:
-
Component/s: sdk-java-core

> Promote isBlockOnRun() to PipelineOptions.
> --
>
> Key: BEAM-444
> URL: https://issues.apache.org/jira/browse/BEAM-444
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>
> Currently, blockOnRun is implemented in different ways by runners.
> DirectRunner did blockOnRun based on DirectOptions.isBlockOnRun.
> Dataflow have a separate BlockingDataflowRunner.
> Flink and Spark runners might or might not block depends on their 
> implementation on run().
> I think DirectRunner's approach is the right way to go, and isBlockOnRun 
> options need to be promoted to the general PipelineOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-599) Return KafkaIO getWatermark log in debug mode

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-599:
-
Component/s: sdk-java-extensions

> Return KafkaIO getWatermark log in debug mode
> -
>
> Key: BEAM-599
> URL: https://issues.apache.org/jira/browse/BEAM-599
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Priority: Minor
>
> https://issues.apache.org/jira/browse/BEAM-574 removes the getWatermark log 
> line from KafkaIO
> PR: https://github.com/apache/incubator-beam/pull/859
> I actually found this log line useful, instead of removing it completely can 
> we return this log line but change the log level to 'debug'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-990) KafkaIO does not commit offsets to Kafka

2016-12-01 Thread Alban Perillat-Merceroz (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alban Perillat-Merceroz updated BEAM-990:
-
Description: 
I use KafkaIO as a source, and I would like consumed offsets to be stored in 
Kafka (in the {{__consumer_offsets}} topic).

I'm configuring the Kafka reader with 
{code:java}
.updateConsumerProperties(ImmutableMap.of(
  ConsumerConfig.GROUP_ID_CONFIG, "my-group",
  ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, java.lang.Boolean.TRUE,
  ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "10" // doesn't 
work with default value either (5000ms)
))
{code}

But the offsets are not stored in Kafka (nothing in {{__consumer_offsets}}, 
next job will restart at latest offset).

I can't find in the code where the offsets are supposed to be committed.

I tried to add a manual commit in the {{consumerPollLoop()}} method, and it 
works, offsets are committed:

{code:java}
private void consumerPollLoop() {
// Read in a loop and enqueue the batch of records, if any, to 
availableRecordsQueue
while (!closed.get()) {
try {
ConsumerRecords records = 
consumer.poll(KAFKA_POLL_TIMEOUT.getMillis());
if (!records.isEmpty() && !closed.get()) {
availableRecordsQueue.put(records); // blocks until 
dequeued.
// Manual commit
consumer.commitSync();
}
} catch (InterruptedException e) {
LOG.warn("{}: consumer thread is interrupted", this, e); // 
not expected
break;
} catch (WakeupException e) {
break;
}
}

LOG.info("{}: Returning from consumer pool loop", this);
}
{code}

Is this a bug in KafkaIO or am I misconfiguring something?

Disclamer: I'm currently using KafkaIO in Dataflow, using the backport in 
Dataflow SDK 
(https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/contrib/kafka/src/main/java/com/google/cloud/dataflow/contrib/kafka/KafkaIO.java),
 but I'm confident the code is similar for this case.

Edit: I found the correct method where KafkaIO is supposed to commit at the end 
of a batch. I'm currently testing it and will be able to open a pull request 
soon:

{code:java}
// KafkaCheckpointMark.java

/**
 * Optional consumer that will be used to commit offsets into Kafka when 
finalizeCheckpoint() is called
 */
@Nullable
private final Consumer consumer;

public KafkaCheckpointMark(List partitions, @Nullable 
Consumer consumer) {
this.partitions = partitions;
this.consumer = consumer;
}

/**
 * Commit synchronously into Kafka offsets that have been passed downstream.
 */
@Override
public void finalizeCheckpoint() throws IOException {
if (consumer == null) {
LOG.warn("finalizeCheckpoint(): no consumer provided, will not 
commit anything.");
return;
}
if (partitions.size() == 0) {
LOG.info("finalizeCheckpoint(): nothing to commit to Kafka.");
return;
}

final Map offsets = newHashMap();
String committedOffsets = "";
for (PartitionMark partition : partitions) {
TopicPartition topicPartition = partition.getTopicPartition();
offsets.put(topicPartition, new 
OffsetAndMetadata(partition.offset));
committedOffsets += topicPartition.topic() + "-" + 
topicPartition.partition() + ":" + partition.offset + ",";
}

final String printableOffsets = committedOffsets.substring(0, 
committedOffsets.length() - 1);
try {
consumer.commitSync(offsets);
LOG.info("finalizeCheckpoint(): committed Kafka offsets {}", 
printableOffsets);
} catch (Exception e) {
LOG.error("finalizeCheckpoint(): {} when trying to commit Kafka 
offsets [{}]",
e.getClass().getSimpleName(),
printableOffsets);
}
}
{code}

  was:
I use KafkaIO as a source, and I would like consumed offsets to be stored in 
Kafka (in the {{__consumer_offsets}} topic).

I'm configuring the Kafka reader with 
{code:java}
.updateConsumerProperties(ImmutableMap.of(
  ConsumerConfig.GROUP_ID_CONFIG, "my-group",
  ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, java.lang.Boolean.TRUE,
  ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "10" // doesn't 
work with default value either (5000ms)
))
{code}

But the offsets are not stored in Kafka (nothing in {{__consumer_offsets}}, 
next job will restart at latest offset).

I can't find in the code where the offsets are supposed to be committed.

I tried to add a 

[jira] [Closed] (BEAM-952) Dynamically load and register IOChannelFactory

2016-12-01 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He closed BEAM-952.
---
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Dynamically load and register IOChannelFactory
> --
>
> Key: BEAM-952
> URL: https://issues.apache.org/jira/browse/BEAM-952
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Pei He
>Assignee: Pei He
> Fix For: 0.4.0-incubating
>
>
> It should follows the pattern of registering PipelineRunner, and uses 
> AutoService and ServiceLoader.
> This would make easy for runners to dynamically load users provided 
> IOChannelFactories.
> This is a preparing work for IOChannelFactory redesign.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713057#comment-15713057
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1469


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1072) Dataflow should reject unset versions

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713105#comment-15713105
 ] 

ASF GitHub Bot commented on BEAM-1072:
--

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1483

[BEAM-1072] DataflowRunner: reject job submission when the version has not 
been properly set

R: @davorbonaci 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam 
dataflow-runner-require-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1483


commit 6fd0782fbb74d65ee7a535f721804a38badfa1d9
Author: Dan Halperin 
Date:   2016-12-01T19:21:30Z

DataflowRunner: reject job submission when the version has not been 
properly set




> Dataflow should reject unset versions
> -
>
> Key: BEAM-1072
> URL: https://issues.apache.org/jira/browse/BEAM-1072
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: Not applicable
>
>
> A recent change broke the setting of the SDK version in certain execution 
> modes. If it happens again, it will cause a poor user experience when running 
> in the Dataflow runner.
> We should catch this break in the DataflowRunner and prevent job submission 
> if so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #1483: [BEAM-1072] DataflowRunner: reject job su...

2016-12-01 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1483

[BEAM-1072] DataflowRunner: reject job submission when the version has not 
been properly set

R: @davorbonaci 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam 
dataflow-runner-require-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1483


commit 6fd0782fbb74d65ee7a535f721804a38badfa1d9
Author: Dan Halperin 
Date:   2016-12-01T19:21:30Z

DataflowRunner: reject job submission when the version has not been 
properly set




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1484: Migrate TransformTreeNode to an Inner Cla...

2016-12-01 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/1484

Migrate TransformTreeNode to an Inner Class

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
transform_hierarchy_maintenance_internally

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1484


commit 1fab0da2bdfb7675fc3aa5f0faf81e2dcfe867fb
Author: Thomas Groh 
Date:   2016-12-01T21:19:14Z

Migrate TransformTreeNode to an Inner Class

TransformTreeNode requires access to the hierarchy it is contained
within, and generally cannot be separated from TransformHierarchy. It is
primarily an implementation detail of TransformHierarchy, so can be
relocated to within it.

commit 3d44a4a0709fbe1ebb25d0dea7f927415fa370a1
Author: Thomas Groh 
Date:   2016-12-01T21:22:11Z

Reduce the visibility of TransformHierarchy Node Mutators

These mutators should not be accessible when visiting the nodes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1071) Support pre-existing tables with streaming BigQueryIO

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1071:
--
Component/s: sdk-java-gcp

> Support pre-existing tables with streaming BigQueryIO
> -
>
> Key: BEAM-1071
> URL: https://issues.apache.org/jira/browse/BEAM-1071
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Sam McVeety
>Priority: Minor
>
> Specifically, with a tableRef function, CREATE_NEVER should be allowed for 
> pre-existing tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1040) Hadoop InputFormat - IO Transform for reads

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1040:
--
Component/s: sdk-java-extensions

> Hadoop InputFormat - IO Transform for reads
> ---
>
> Key: BEAM-1040
> URL: https://issues.apache.org/jira/browse/BEAM-1040
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> We should build a IO read transform that will read data from services 
> supporting the Hadoop InputFormat  interface [1]
> This will make it easy to connect to a variety of databases while still 
> providing some aspects of scalability since the InputFormat interface 
> provides for parallel reading. 
> [1] 
> https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/InputFormat.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1027) Hosting data stores to enable IO Transform testing

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1027:
--
Component/s: testing

> Hosting data stores to enable IO Transform testing
> --
>
> Key: BEAM-1027
> URL: https://issues.apache.org/jira/browse/BEAM-1027
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> Currently we have a good set of unit tests for our IO Transforms - those
> tend to run against in-memory versions of the data stores. However, we'd
> like to further increase our test coverage to include running them against
> real instances of the data stores that the IO Transforms work against (e.g.
> cassandra, mongodb, kafka, etc…), which means we'll need to have real
> instances of various data stores.
> Additionally, if we want to do performance regression detection, it's
> important to have instances of the services that behave realistically,
> which isn't true of in-memory or dev versions of the services.
> My proposed solution is in 
> https://lists.apache.org/thread.html/367fd9669411f21c9ec1f2d27df60464f49d5ce81e6bd16de401d035@%3Cdev.beam.apache.org%3E
>  
> - it still needs further discussion, and (assuming we agree on the general 
> idea), the beam community needs to decide which cluster management software 
> we want to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1042) Clean build fails on Windows in Starter archetype

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1042:
--
Component/s: sdk-java-core

> Clean build fails on Windows in Starter archetype
> -
>
> Key: BEAM-1042
> URL: https://issues.apache.org/jira/browse/BEAM-1042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Weise
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-985) Retry decorator maintains state as uses an iterator

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-985:
-
Component/s: sdk-py

> Retry decorator maintains state as uses an iterator
> ---
>
> Key: BEAM-985
> URL: https://issues.apache.org/jira/browse/BEAM-985
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
> Fix For: 0.4.0-incubating
>
>
> https://github.com/apache/incubator-beam/commit/57c30c752a524a40c7074ea69541964c77f22748
> shows two unittests that fail due to state shared by the retry decorator 
> iterator.
> We have two options in fixing this:
> 1) Fix the retry decorator
> 2) Use an external package such as https://pypi.python.org/pypi/retrying



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-987) TestStream.advanceWatermarkToInfinity should perhaps also advance processing time

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-987:
-
Component/s: testing
 runner-direct

> TestStream.advanceWatermarkToInfinity should perhaps also advance processing 
> time
> -
>
> Key: BEAM-987
> URL: https://issues.apache.org/jira/browse/BEAM-987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct, testing
>Reporter: Eugene Kirpichov
>Assignee: Thomas Groh
>
> I ran into this when writing a test for Splittable DoFn whose input was a 
> TestStream. I constructed a TestStream that didn't call 
> advanceProcessingTime, and as a result, the SDF's timers didn't fire and the 
> test got stuck.
> I think the meaning of "advanceWatermarkToInfinity" is "don't add any more 
> elements to the stream and see what happens eventually", and "eventually" 
> includes "eventually in processing time domain", not just in event-time 
> domain (watermark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-964) Investing exporting BQ as Avro instead of Json for dataflow runner

2016-12-01 Thread Sourabh Bajaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Bajaj updated BEAM-964:
---
Fix Version/s: Not applicable
  Component/s: (was: sdk-java-gcp)
   sdk-py

> Investing exporting BQ as Avro instead of Json for dataflow runner
> --
>
> Key: BEAM-964
> URL: https://issues.apache.org/jira/browse/BEAM-964
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>Priority: Minor
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-850) Add Display Data to the Python Transforms, Sinks, Sources, etc.

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-850:
-
Component/s: sdk-py

> Add Display Data to the Python Transforms, Sinks, Sources, etc.
> ---
>
> Key: BEAM-850
> URL: https://issues.apache.org/jira/browse/BEAM-850
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-754) WordCountIT Flake -- Incorrect checksum

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-754.

   Resolution: Fixed
Fix Version/s: Not applicable

> WordCountIT Flake -- Incorrect checksum
> ---
>
> Key: BEAM-754
> URL: https://issues.apache.org/jira/browse/BEAM-754
> Project: Beam
>  Issue Type: Bug
>Reporter: Jason Kuster
>Assignee: Mark Liu
>Priority: Minor
> Fix For: Not applicable
>
>
> WordCountIT flaked in Jenkins PostCommit -- 
> https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$beam-examples-java/1532/testReport/junit/org.apache.beam.examples/WordCountIT/
>  -- due to an incorrect checksum. Can we add some additional debug output so 
> we can understand what happened in these cases?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-766) Trigger AST should use @AutoValue

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-766:
-
Component/s: sdk-java-core

> Trigger AST should use @AutoValue
> -
>
> Key: BEAM-766
> URL: https://issues.apache.org/jira/browse/BEAM-766
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Trivial
>  Labels: easy, starter
>
> Triggers are already enriched POJOs that could use {{@AutoValue}}. Certainly 
> once they are converted to a simple AST this is the way to go.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-750) Remove StateTag

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-750:
-
Component/s: sdk-java-core

> Remove StateTag
> ---
>
> Key: BEAM-750
> URL: https://issues.apache.org/jira/browse/BEAM-750
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>
> https://github.com/apache/incubator-beam/pull/1044 made StateTag in some 
> cases ignore the state kind (user/system). This is currently harmless, 
> because there is no user-facing API for defining state anyway, so conflicts 
> between user/system state can't happen.
> [~kenn] says this should be solved by removing StateTag entirely (along with 
> state kinds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-985) Retry decorator maintains state as uses an iterator

2016-12-01 Thread Sourabh Bajaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Bajaj updated BEAM-985:
---
Fix Version/s: (was: 0.4.0-incubating)
   Not applicable

> Retry decorator maintains state as uses an iterator
> ---
>
> Key: BEAM-985
> URL: https://issues.apache.org/jira/browse/BEAM-985
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
> Fix For: Not applicable
>
>
> https://github.com/apache/incubator-beam/commit/57c30c752a524a40c7074ea69541964c77f22748
> shows two unittests that fail due to state shared by the retry decorator 
> iterator.
> We have two options in fixing this:
> 1) Fix the retry decorator
> 2) Use an external package such as https://pypi.python.org/pypi/retrying



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-803) Maven configuration that easily launches examples IT tests on one specific runner

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-803.
--
   Resolution: Fixed
Fix Version/s: Not applicable

This is now done via {{-P}}. See the examples/java/pom.xml for more 
info.

> Maven configuration that easily launches examples IT tests on one specific 
> runner
> -
>
> Key: BEAM-803
> URL: https://issues.apache.org/jira/browse/BEAM-803
> Project: Beam
>  Issue Type: Wish
>Reporter: Kenneth Knowles
>Assignee: Jason Kuster
>Priority: Minor
> Fix For: Not applicable
>
>
> Today, there is {{-Pjenkins-precommit}} that activates separate executions 
> for each of the runners, but no easy way to invoke just one of those 
> executions that I can discern.
> The most promising command that I can come up with to run, for example, the 
> Flink wordcount integration test, is {{mvn 
> failsafe:integration-test@flink-runner-integration-tests -Pjenkins-precommit 
> -pl examples/java/}} but this fails due to runner registrar issues. Ideally, 
> this would be a fail-proof one-liner.
> Any tips?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-683) Make BZIP compressed files splittable

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-683:
-
Component/s: sdk-py
 sdk-java-core

> Make BZIP compressed files splittable 
> --
>
> Key: BEAM-683
> URL: https://issues.apache.org/jira/browse/BEAM-683
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-core, sdk-py
>Reporter: Tim Sears
>Priority: Minor
>   Original Estimate: 10h
>  Remaining Estimate: 10h
>
> Bzip2 is compressed as blocks, so it should be possible to do dynamic 
> splitting. To do this: Seek to a location in the bzip, then keep seeking 
> until you find the 6 byte block-start sequence 0x314159265359 (which is the 
> 12 digit approximation of pi). You can use a bzip2 decompressor from that 
> point onwards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-494) All E2E Tests Run Against All Runners In Postcommit

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-494:
-
Component/s: testing

> All E2E Tests Run Against All Runners In Postcommit
> ---
>
> Key: BEAM-494
> URL: https://issues.apache.org/jira/browse/BEAM-494
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-511) Fill in the contribute/technical-vision section of the website

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-511:
-
Component/s: website

> Fill in the contribute/technical-vision section of the website
> --
>
> Key: BEAM-511
> URL: https://issues.apache.org/jira/browse/BEAM-511
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-521) Execute some bounded source reads via composite transform

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-521:
-
Component/s: runner-core

> Execute some bounded source reads via composite transform
> -
>
> Key: BEAM-521
> URL: https://issues.apache.org/jira/browse/BEAM-521
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Eugene Kirpichov
>
> The BoundedSource API is intended for cases where the source can provide 
> meaningfull progress, dynamic splitting and size estimation. E.g. it's a good 
> fit for processing a moderate number of large files, or a key-value table.
> However, existing runners have scalability limitations on how many bundles a 
> BoundedSource can split into, and this leads to it being a very poor fit for 
> the case of processing many small files: the source ends up splitting in a 
> too large number of bundles (at least 1 per file) overwhelming the runner.
> This is a frequent use case, and the power of BoundedSource API is not needed 
> in this case: small files don't need to be dynamically split, progress 
> estimation is not needed, and size estimation is a "nice-to-have" but not 
> entirely necessary.
> In this case, it'd be better to execute the read not as a raw 
> Read.from(BoundedSource) executed natively by the runner, but as a 
> ParDo(splitIntoBundles) + fusion break + ParDo(read each bundle). That way 
> the bundles end up as a simple PCollection with no scalability limitations, 
> and most likely much smaller per-bundle overhead.
> Implementation options:
> - The BoundedSource API could provide a hint method telling Read.from() to 
> expand in this way
> - Individual connectors, such as TextIO.Read, could switch between expanding 
> into Read.from() or into this composite transform depending on parameters 
> (e.g. TextIO.Read.withCompressionType(GZ) would always expand into the 
> composite transform, because for compressed files BoundedSource API is 
> unnecessary)
> - Something else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-719) Run WindowedWordCount Integration Test in Spark

2016-12-01 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu reassigned BEAM-719:
-

Assignee: Mark Liu  (was: Amit Sela)

> Run WindowedWordCount Integration Test in Spark
> ---
>
> Key: BEAM-719
> URL: https://issues.apache.org/jira/browse/BEAM-719
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> The purpose of running WindowedWordCountIT in Spark is to have a streaming 
> test pipeline running in Jenkins pre-commit using TestSparkRunner.
> More discussion happened here:
> https://github.com/apache/incubator-beam/pull/1045#issuecomment-251531770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-894) Using @Teardown to remove temp files from failed bundles in Write.WriteBundles

2016-12-01 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He closed BEAM-894.
---
   Resolution: Won't Fix
Fix Version/s: Not applicable

> Using @Teardown to remove temp files from failed bundles in Write.WriteBundles
> --
>
> Key: BEAM-894
> URL: https://issues.apache.org/jira/browse/BEAM-894
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
> Fix For: Not applicable
>
>
> FileBasedSink lefts temp files behind for failed bundles, and it forces 
> finalize() to depend on pattern match.
> However, pattern matching is not always reliable for eventual consistency 
> file system, such as GCS.
> Given we now have DoFn.TearDown, we can improve FileBasedSink (and in general 
> Write transform) to remove temp files/resources early when DoFn bundles fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-895) Transport.newStorageClient requires credentials

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-895.

Resolution: Fixed

> Transport.newStorageClient requires credentials
> ---
>
> Key: BEAM-895
> URL: https://issues.apache.org/jira/browse/BEAM-895
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Frances Perry
>Assignee: Pei He
> Fix For: 0.4.0-incubating
>
>
> Transport.newStorageClient requires credentials, even if those aren't needed.
> Impact: Examples use publicly accessible files on Google Cloud Storage, 
> however reading those is still requiring the user to authenticate with Google 
> Cloud Storage.
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Unable to get application default 
> credentials. Please see 
> https://developers.google.com/accounts/docs/application-default-credentials 
> for details on how to specify credentials. This version of the SDK is 
> dependent on the gcloud core component version 2015.02.05 or newer to be able 
> to get credentials from the currently authorized user via gcloud auth.
>   at 
> org.apache.beam.sdk.util.Credentials.getCredential(Credentials.java:123)
>   at 
> org.apache.beam.sdk.util.GcpCredentialFactory.getCredential(GcpCredentialFactory.java:43)
>   at 
> org.apache.beam.sdk.options.GcpOptions$GcpUserCredentialsFactory.create(GcpOptions.java:264)
>   at 
> org.apache.beam.sdk.options.GcpOptions$GcpUserCredentialsFactory.create(GcpOptions.java:254)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:549)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:490)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:152)
>   at com.sun.proxy.$Proxy52.getGcpCredential(Unknown Source)
>   at 
> org.apache.beam.sdk.util.Transport.newStorageClient(Transport.java:148)
>   at 
> org.apache.beam.sdk.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:96)
>   at 
> org.apache.beam.sdk.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:84)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:549)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:490)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:152)
>   at com.sun.proxy.$Proxy52.getGcsUtil(Unknown Source)
>   at 
> org.apache.beam.sdk.util.GcsIOChannelFactory.match(GcsIOChannelFactory.java:43)
>   at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:283)
>   at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:195)
>   at 
> org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
>   at 
> org.apache.beam.runners.direct.DirectRunner.apply(DirectRunner.java:226)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:400)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:323)
>   at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:58)
>   at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:173)
>   at org.apache.beam.examples.WordCount.main(WordCount.java:195)
>   ... 6 more
> Caused by: java.io.IOException: The Application Default Credentials are not 
> available. They are available if running on Google App Engine, Google Compute 
> Engine, or Google Cloud Shell. Otherwise, the environment variable 
> GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining 
> the credentials. See 
> https://developers.google.com/accounts/docs/application-default-credentials 
> for more information.
>   at 
> com.google.api.client.googleapis.auth.oauth2.DefaultCredentialProvider.getDefaultCredential(DefaultCredentialProvider.java:98)
>   at 
> com.google.api.client.googleapis.auth.oauth2.GoogleCredential.getApplicationDefault(GoogleCredential.java:213)
>   at 
> com.google.api.client.googleapis.auth.oauth2.GoogleCredential.getApplicationDefault(GoogleCredential.java:191)
>   at 
> org.apache.beam.sdk.util.Credentials.getCredential(Credentials.java:121)
>   ... 30 more



--
This message was sent by Atlassian JIRA

[jira] [Created] (BEAM-1071) Support pre-existing tables with streaming BigQueryIO

2016-12-01 Thread Sam McVeety (JIRA)
Sam McVeety created BEAM-1071:
-

 Summary: Support pre-existing tables with streaming BigQueryIO
 Key: BEAM-1071
 URL: https://issues.apache.org/jira/browse/BEAM-1071
 Project: Beam
  Issue Type: Improvement
Reporter: Sam McVeety
Priority: Minor


Specifically, with a tableRef function, CREATE_NEVER should be allowed for 
pre-existing tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/3] incubator-beam git commit: Revert "Move resource filtering later to avoid spurious rebuilds"

2016-12-01 Thread dhalperi
Revert "Move resource filtering later to avoid spurious rebuilds"

This reverts commit 2422365719c71cade97e1e74f1fb7f42b264244f.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b36048bd
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b36048bd
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b36048bd

Branch: refs/heads/master
Commit: b36048bd0e558fea281a1ec42aa8435db09dbe64
Parents: 1094fa6
Author: Dan Halperin 
Authored: Thu Dec 1 10:22:15 2016 -0800
Committer: Dan Halperin 
Committed: Thu Dec 1 13:10:56 2016 -0800

--
 sdks/java/core/pom.xml | 29 +++--
 1 file changed, 7 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b36048bd/sdks/java/core/pom.xml
--
diff --git a/sdks/java/core/pom.xml b/sdks/java/core/pom.xml
index f842be7..ad84846 100644
--- a/sdks/java/core/pom.xml
+++ b/sdks/java/core/pom.xml
@@ -40,6 +40,13 @@
   
 
   
+
+  
+src/main/resources
+true
+  
+
+
 
   
 
@@ -74,28 +81,6 @@
 
   
 org.apache.maven.plugins
-maven-resources-plugin
-
-  
-resources
-compile
-
-  resources
-
-
-  
-
-  src/main/resources
-  true
-
-  
-
-  
-
-  
-
-  
-org.apache.maven.plugins
 maven-jar-plugin
   
 



[jira] [Updated] (BEAM-1024) upgrade to protobuf-3.1.0

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1024:
--
Priority: Minor  (was: Major)

> upgrade to protobuf-3.1.0
> -
>
> Key: BEAM-1024
> URL: https://issues.apache.org/jira/browse/BEAM-1024
> Project: Beam
>  Issue Type: Wish
>Reporter: Rafael Fernandez
>Priority: Minor
>
> The SDK currently uses protobuf 3.0.0-beta-1. There are critical improvements 
> to the library since (such as JsonFormat.parser().ignoringUnknownFields()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1064) Convert Jenkins jobs to DSL

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1064:
--
Component/s: testing

> Convert Jenkins jobs to DSL
> ---
>
> Key: BEAM-1064
> URL: https://issues.apache.org/jira/browse/BEAM-1064
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>  Labels: jenkins
>
> Move Jenkins jobs to DSL. PR is here:
> https://github.com/apache/incubator-beam/pull/1390



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1060) Make DoFnTester use new DoFn

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1060:
--
Component/s: sdk-java-core

> Make DoFnTester use new DoFn
> 
>
> Key: BEAM-1060
> URL: https://issues.apache.org/jira/browse/BEAM-1060
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1055) Display Data keys on Python are inconsistent

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1055:
--
Component/s: sdk-py

> Display Data keys on Python are inconsistent
> 
>
> Key: BEAM-1055
> URL: https://issues.apache.org/jira/browse/BEAM-1055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>
> Some are in camelCase, some are in snake_case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #1395: Fixing error with PipelineOptions Display...

2016-12-01 Thread pabloem
Github user pabloem closed the pull request at:

https://github.com/apache/incubator-beam/pull/1395


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-989) GcsUtil batch remove() and copy() need tests.

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-989:
-
Component/s: sdk-java-gcp

> GcsUtil batch remove() and copy() need tests.
> -
>
> Key: BEAM-989
> URL: https://issues.apache.org/jira/browse/BEAM-989
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-gcp
>Reporter: Pei He
>Assignee: Pei He
>
> remove(), copy(), and their callbacks are not tested with executeBatches().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-994) Support for S3 file source and sink

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-994:
-
Component/s: sdk-java-extensions

> Support for S3 file source and sink
> ---
>
> Key: BEAM-994
> URL: https://issues.apache.org/jira/browse/BEAM-994
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Sourabh Bajaj
>Priority: Minor
>
> http://stackoverflow.com/questions/40624544/what-is-best-practice-of-the-the-case-of-writing-text-output-into-s3-bucket
>  is one of the examples of the need for such a feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-990) KafkaIO does not commit offsets to Kafka

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-990:
-
Component/s: sdk-java-extensions

> KafkaIO does not commit offsets to Kafka
> 
>
> Key: BEAM-990
> URL: https://issues.apache.org/jira/browse/BEAM-990
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Alban Perillat-Merceroz
>  Labels: KafkaIO
>
> I use KafkaIO as a source, and I would like consumed offsets to be stored in 
> Kafka (in the {{__consumer_offsets}} topic).
> I'm configuring the Kafka reader with 
> {code:java}
> .updateConsumerProperties(ImmutableMap.of(
>   ConsumerConfig.GROUP_ID_CONFIG, "my-group",
>   ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, 
> java.lang.Boolean.TRUE,
>   ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "10" // doesn't 
> work with default value either (5000ms)
> ))
> {code}
> But the offsets are not stored in Kafka (nothing in {{__consumer_offsets}}, 
> next job will restart at latest offset).
> I can't find in the code where the offsets are supposed to be committed.
> I tried to add a manual commit in the {{consumerPollLoop()}} method, and it 
> works, offsets are committed:
> {code:java}
> private void consumerPollLoop() {
> // Read in a loop and enqueue the batch of records, if any, to 
> availableRecordsQueue
> while (!closed.get()) {
> try {
> ConsumerRecords records = 
> consumer.poll(KAFKA_POLL_TIMEOUT.getMillis());
> if (!records.isEmpty() && !closed.get()) {
> availableRecordsQueue.put(records); // blocks until 
> dequeued.
> // Manual commit
> consumer.commitSync();
> }
> } catch (InterruptedException e) {
> LOG.warn("{}: consumer thread is interrupted", this, e); 
> // not expected
> break;
> } catch (WakeupException e) {
> break;
> }
> }
> LOG.info("{}: Returning from consumer pool loop", this);
> }
> {code}
> Is this a bug in KafkaIO or am I misconfiguring something?
> Disclamer: I'm currently using KafkaIO in Dataflow, using the backport in 
> Dataflow SDK 
> (https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/contrib/kafka/src/main/java/com/google/cloud/dataflow/contrib/kafka/KafkaIO.java),
>  but I'm confident the code is similar for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-988) Support for testing how soon output is emitted

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-988:
-
Component/s: testing
 runner-direct

> Support for testing how soon output is emitted
> --
>
> Key: BEAM-988
> URL: https://issues.apache.org/jira/browse/BEAM-988
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct, testing
>Reporter: Eugene Kirpichov
>Assignee: Thomas Groh
>
> I ran into this issue when testing Splittable DoFn. My intention is, it 
> should behave exactly like a DoFn - i.e. emit output immediately when it 
> receives input, regardless of the windowing/triggering strategy of the input 
> (even though SDF has a GBK internally).
> However, currently the SDK doesn't have facilities for testing that. 
> TestStream allows controlling the timing of the input, but there's nothing to 
> capture timing of the output. Moreover, timing of the output is unspecified 
> by the model because triggers technically only enable firing, but do not 
> force it (i.e. they are a lower bound on when output will be emitted).
> I'm not sure what's the best way to address this. E.g., perhaps, PaneInfo 
> could include a field "since when was this pane enabled to fire" (regardless 
> of when it really fired)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-964) Investing exporting BQ as Avro instead of Json for dataflow runner

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-964:
-
Component/s: sdk-java-gcp

> Investing exporting BQ as Avro instead of Json for dataflow runner
> --
>
> Key: BEAM-964
> URL: https://issues.apache.org/jira/browse/BEAM-964
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-852) Validate sources when they are created

2016-12-01 Thread Sourabh Bajaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Bajaj updated BEAM-852:
---
Fix Version/s: Not applicable

> Validate sources when they are created
> --
>
> Key: BEAM-852
> URL: https://issues.apache.org/jira/browse/BEAM-852
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Sourabh Bajaj
> Fix For: Not applicable
>
>
> We currently do not validate some sources at creation time. For example text, 
> Avro. Validating sources early will improve user experience since it will  
> help catch issues early. For example, we can fail before submitting a job to 
> a runner.
> It should also be possible to disable validation to support environments 
> where users do not have access to the input at job submission. Java SDK 
> already follows a similar model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-985) Retry decorator maintains state as uses an iterator

2016-12-01 Thread Sourabh Bajaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Bajaj closed BEAM-985.
--
Resolution: Fixed

> Retry decorator maintains state as uses an iterator
> ---
>
> Key: BEAM-985
> URL: https://issues.apache.org/jira/browse/BEAM-985
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
> Fix For: Not applicable
>
>
> https://github.com/apache/incubator-beam/commit/57c30c752a524a40c7074ea69541964c77f22748
> shows two unittests that fail due to state shared by the retry decorator 
> iterator.
> We have two options in fixing this:
> 1) Fix the retry decorator
> 2) Use an external package such as https://pypi.python.org/pypi/retrying



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-444) Promote isBlockOnRun() to PipelineOptions.

2016-12-01 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He closed BEAM-444.
---
   Resolution: Won't Fix
Fix Version/s: Not applicable

Obsolete given users can control blocking run with PipelineResult.

> Promote isBlockOnRun() to PipelineOptions.
> --
>
> Key: BEAM-444
> URL: https://issues.apache.org/jira/browse/BEAM-444
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
> Fix For: Not applicable
>
>
> Currently, blockOnRun is implemented in different ways by runners.
> DirectRunner did blockOnRun based on DirectOptions.isBlockOnRun.
> Dataflow have a separate BlockingDataflowRunner.
> Flink and Spark runners might or might not block depends on their 
> implementation on run().
> I think DirectRunner's approach is the right way to go, and isBlockOnRun 
> options need to be promoted to the general PipelineOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-443) PipelineResult needs waitUntilFinish() and cancel()

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-443:
-
Component/s: sdk-java-core

> PipelineResult needs waitUntilFinish() and cancel()
> ---
>
> Key: BEAM-443
> URL: https://issues.apache.org/jira/browse/BEAM-443
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>
> waitToFinish() and cancel() are two most common operations for users to 
> interact with a started pipeline.
> Right now, they are only available in DataflowPipelineJob. But, it is better 
> to move them to the common interface, so people can start implement them in 
> other runners, and runner agnostic code can interact with PipelineResult 
> better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-719) Run WindowedWordCount Integration Test in Spark

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-719:
-
Component/s: testing

> Run WindowedWordCount Integration Test in Spark
> ---
>
> Key: BEAM-719
> URL: https://issues.apache.org/jira/browse/BEAM-719
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Mark Liu
>Assignee: Amit Sela
>
> The purpose of running WindowedWordCountIT in Spark is to have a streaming 
> test pipeline running in Jenkins pre-commit using TestSparkRunner.
> More discussion happened here:
> https://github.com/apache/incubator-beam/pull/1045#issuecomment-251531770



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-580) Add a Datastore delete example

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-580:
-
Component/s: sdk-java-gcp

> Add a Datastore delete example
> --
>
> Key: BEAM-580
> URL: https://issues.apache.org/jira/browse/BEAM-580
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-962) Fix games example pipeline options default values conflicts.

2016-12-01 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He closed BEAM-962.
---
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Fix games example pipeline options default values conflicts.
> 
>
> Key: BEAM-962
> URL: https://issues.apache.org/jira/browse/BEAM-962
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Pei He
>Assignee: Pei He
> Fix For: 0.4.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/2] incubator-beam git commit: Improve Splittable DoFn

2016-12-01 Thread tgroh
Repository: incubator-beam
Updated Branches:
  refs/heads/master fd4b631f1 -> 24fab9f53


Improve Splittable DoFn

Makes Splittable DoFn be more like a real DoFn:
- Adds support for side inputs and outputs to SDF
- Teaches `ProcessFn` to work with exploded windows inside the
  `KeyedWorkItem`. It works with them by un-exploding the windows
  in the `Iterable` into a
  single `WindowedValue`, since the values and timestamps are
  guaranteed to be the same.

Makes SplittableParDo.ProcessFn not use the (now unavailable)
OldDoFn state and timers API:
- Makes `ProcessFn` be a primitive transform with its own
  `ParDoEvaluator`. As a nice side effect, this enables the runner to
  provide additional hooks into it - e.g. for giving the runner access
  to the restriction tracker (in later PRs)
- For consistency, moves declaration of `GBKIntoKeyedWorkItems`
  primitive transform into `SplittableParDo`, alongside the
  `SplittableProcessElements` transform
- Preserves compressed representation of `WindowedValue`'s in
  `PushbackSideInputDoFnRunner`
- Uses OutputWindowedValue in SplittableParDo.ProcessFn

Proper lifecycle management for wrapped fn.

- Caches underlying fn using DoFnLifecycleManager, so its
  @Setup and @Teardown methods are called.
- Calls @StartBundle and @FinishBundle methods on the underlying
  fn explicitly. Output from them is prohibited, since an SDF
  is only allowed to output after a successful RestrictionTracker.tryClaim.
  It's possible that an SDF should not be allowed to have
  StartBundle/FinishBundle methods at all, but I'm not sure.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/87ff5ac3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/87ff5ac3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/87ff5ac3

Branch: refs/heads/master
Commit: 87ff5ac36bb9cc62fa4864ffa7b5a5e495b9a4a1
Parents: fd4b631
Author: Eugene Kirpichov 
Authored: Wed Oct 26 16:05:01 2016 -0700
Committer: Thomas Groh 
Committed: Thu Dec 1 14:15:55 2016 -0800

--
 .../core/ElementAndRestrictionCoder.java|   8 +
 .../runners/core/GBKIntoKeyedWorkItems.java |  55 ---
 .../beam/runners/core/SplittableParDo.java  | 378 +++
 .../beam/runners/core/SplittableParDoTest.java  | 134 +--
 ...ectGBKIntoKeyedWorkItemsOverrideFactory.java |  41 +-
 .../beam/runners/direct/DirectGroupByKey.java   |   2 +-
 .../beam/runners/direct/DirectRunner.java   |   8 +-
 .../runners/direct/DoFnLifecycleManager.java|   4 +-
 .../beam/runners/direct/ParDoEvaluator.java |  26 +-
 .../runners/direct/ParDoEvaluatorFactory.java   |  63 +++-
 .../direct/ParDoMultiOverrideFactory.java   |   2 +-
 ...littableProcessElementsEvaluatorFactory.java | 144 +++
 .../direct/TransformEvaluatorRegistry.java  |   5 +
 .../beam/runners/direct/SplittableDoFnTest.java | 194 +-
 .../org/apache/beam/sdk/transforms/DoFn.java|  12 +
 .../apache/beam/sdk/transforms/DoFnTester.java  |  51 ++-
 .../sdk/util/state/TimerInternalsFactory.java   |  36 ++
 17 files changed, 905 insertions(+), 258 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/87ff5ac3/runners/core-java/src/main/java/org/apache/beam/runners/core/ElementAndRestrictionCoder.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ElementAndRestrictionCoder.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ElementAndRestrictionCoder.java
index 6dec8e2..64c1e14 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ElementAndRestrictionCoder.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ElementAndRestrictionCoder.java
@@ -64,4 +64,12 @@ public class ElementAndRestrictionCoder
 RestrictionT value = restrictionCoder.decode(inStream, context);
 return ElementAndRestriction.of(key, value);
   }
+
+  public Coder getElementCoder() {
+return elementCoder;
+  }
+
+  public Coder getRestrictionCoder() {
+return restrictionCoder;
+  }
 }

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/87ff5ac3/runners/core-java/src/main/java/org/apache/beam/runners/core/GBKIntoKeyedWorkItems.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/GBKIntoKeyedWorkItems.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/GBKIntoKeyedWorkItems.java
deleted file mode 100644
index 304e349..000
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/GBKIntoKeyedWorkItems.java
+++ /dev/null
@@ -1,55 +0,0 @@

[GitHub] incubator-beam pull request #1261: [BEAM-801, BEAM-787] Improvements to Spli...

2016-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1261


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-801) SplittableParDo must be a pseudo-primitive, not ParDo(OldDoFn)

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713249#comment-15713249
 ] 

ASF GitHub Bot commented on BEAM-801:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1261


> SplittableParDo must be a pseudo-primitive, not ParDo(OldDoFn)
> --
>
> Key: BEAM-801
> URL: https://issues.apache.org/jira/browse/BEAM-801
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-direct
>Reporter: Kenneth Knowles
>Assignee: Eugene Kirpichov
>
> Today, runners-core contains an expansion of {{ParDo(DoFn)}} that implements 
> it via unsupported features of {{ParDo(OldDoFn)}}. This expansion is used by 
> the DirectRunner.
> The right approach to provide a ready implementation in runners-core is the 
> one taken by {{GroupAlsoByWindowViaWindowSetDoFn}} where the unsupported 
> features are provided to the constructor of the {{DoFn}}, rather than 
> expected to be passed through the {{ProcessContext}}.
> These features are not guaranteed to be part of the public state & timers API 
> of {{DoFn}}, particularly in the case of watermark holds, so it is not 
> prudent to plan on waiting for the new state & timers API and porting over.
> These issues create real blockers by causing dependency cycles between 
> implementing {{DoFn}} fully (requires no use of unsupported features), 
> implementing new state and timers (requires new {{DoFn}}), and keeping the 
> hack until new {{DoFn}} has state and timers (requires use of unsupported 
> features).
> To break the cycle, the tests that rely on unsupported features are being 
> disabled. They can be re-enabled either by following the design suggested 
> above (probably the most robust approach) or by waiting and porting to new 
> features as they are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: This closes #1261

2016-12-01 Thread tgroh
This closes #1261


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/24fab9f5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/24fab9f5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/24fab9f5

Branch: refs/heads/master
Commit: 24fab9f53a8b3a7ef5fb35195dbe9417bbcc4101
Parents: fd4b631 87ff5ac
Author: Thomas Groh 
Authored: Thu Dec 1 14:16:58 2016 -0800
Committer: Thomas Groh 
Committed: Thu Dec 1 14:16:58 2016 -0800

--
 .../core/ElementAndRestrictionCoder.java|   8 +
 .../runners/core/GBKIntoKeyedWorkItems.java |  55 ---
 .../beam/runners/core/SplittableParDo.java  | 378 +++
 .../beam/runners/core/SplittableParDoTest.java  | 134 +--
 ...ectGBKIntoKeyedWorkItemsOverrideFactory.java |  41 +-
 .../beam/runners/direct/DirectGroupByKey.java   |   2 +-
 .../beam/runners/direct/DirectRunner.java   |   8 +-
 .../runners/direct/DoFnLifecycleManager.java|   4 +-
 .../beam/runners/direct/ParDoEvaluator.java |  26 +-
 .../runners/direct/ParDoEvaluatorFactory.java   |  63 +++-
 .../direct/ParDoMultiOverrideFactory.java   |   2 +-
 ...littableProcessElementsEvaluatorFactory.java | 144 +++
 .../direct/TransformEvaluatorRegistry.java  |   5 +
 .../beam/runners/direct/SplittableDoFnTest.java | 194 +-
 .../org/apache/beam/sdk/transforms/DoFn.java|  12 +
 .../apache/beam/sdk/transforms/DoFnTester.java  |  51 ++-
 .../sdk/util/state/TimerInternalsFactory.java   |  36 ++
 17 files changed, 905 insertions(+), 258 deletions(-)
--




[1/2] incubator-beam git commit: Move TransformHierarchy Maintenance into it

2016-12-01 Thread tgroh
Repository: incubator-beam
Updated Branches:
  refs/heads/master 0c875ba70 -> 48130f718


Move TransformHierarchy Maintenance into it

This reduces the complexity of Pipeline.applyInternal by keeping the
responsiblities to passing a node into the Transform Hierarchy,
enforcing name uniqueness, and causing the runner to expand the
PTransform. This logic is moved to the appropriate application sites.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/ab1f1ad0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/ab1f1ad0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/ab1f1ad0

Branch: refs/heads/master
Commit: ab1f1ad012bc559cdb099319a516e4437eed2825
Parents: 0c875ba
Author: Thomas Groh 
Authored: Tue Nov 29 14:29:47 2016 -0800
Committer: Thomas Groh 
Committed: Thu Dec 1 12:55:25 2016 -0800

--
 .../direct/KeyedPValueTrackingVisitor.java  |   2 +-
 .../DataflowPipelineTranslatorTest.java |   2 +-
 .../main/java/org/apache/beam/sdk/Pipeline.java | 117 +++-
 .../beam/sdk/runners/TransformHierarchy.java| 126 -
 .../beam/sdk/runners/TransformTreeNode.java | 165 +
 .../sdk/runners/TransformHierarchyTest.java | 180 ++-
 .../beam/sdk/runners/TransformTreeTest.java |   4 +-
 7 files changed, 340 insertions(+), 256 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/ab1f1ad0/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitor.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitor.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitor.java
index 7c4376a..47b0857 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitor.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedPValueTrackingVisitor.java
@@ -74,7 +74,7 @@ class KeyedPValueTrackingVisitor implements PipelineVisitor {
 if (node.isRootNode()) {
   finalized = true;
 } else if (producesKeyedOutputs.contains(node.getTransform().getClass())) {
-  keyedValues.addAll(node.getExpandedOutputs());
+  keyedValues.addAll(node.getOutput().expand());
 }
   }
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/ab1f1ad0/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
index c925454..95c7132 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
@@ -669,7 +669,7 @@ public class DataflowPipelineTranslatorTest implements 
Serializable {
 PCollection input = p.begin()
 .apply(Create.of(1, 2, 3));
 
-thrown.expect(IllegalStateException.class);
+thrown.expect(IllegalArgumentException.class);
 input.apply(new PartiallyBoundOutputCreator());
 
 Assert.fail("Failure expected from use of partially bound output");

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/ab1f1ad0/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java
index 9edf496..c8a4439 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java
@@ -17,10 +17,11 @@
  */
 package org.apache.beam.sdk;
 
+import static com.google.common.base.Preconditions.checkState;
+
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashSet;
-import java.util.List;
 import java.util.Map;
 import java.util.Set;
 import org.apache.beam.sdk.coders.CoderRegistry;
@@ -31,7 +32,6 @@ import org.apache.beam.sdk.runners.PipelineRunner;
 import org.apache.beam.sdk.runners.TransformHierarchy;
 import org.apache.beam.sdk.runners.TransformTreeNode;
 import org.apache.beam.sdk.transforms.Aggregator;
-import org.apache.beam.sdk.transforms.AppliedPTransform;
 import 

[GitHub] incubator-beam pull request #1469: [BEAM-646] Move TransformHierarchy Mainte...

2016-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1480: [BEAM-1066] Add a test of ReleaseInfo

2016-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] incubator-beam git commit: Closes #1480

2016-12-01 Thread dhalperi
Closes #1480


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/fd4b631f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/fd4b631f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/fd4b631f

Branch: refs/heads/master
Commit: fd4b631f1b4aa1538b779c4de591bd9b18526cd6
Parents: 48130f7 b36048b
Author: Dan Halperin 
Authored: Thu Dec 1 13:10:56 2016 -0800
Committer: Dan Halperin 
Committed: Thu Dec 1 13:10:56 2016 -0800

--
 sdks/java/core/pom.xml  | 29 +++--
 .../apache/beam/sdk/util/ReleaseInfoTest.java   | 45 
 2 files changed, 52 insertions(+), 22 deletions(-)
--




[jira] [Commented] (BEAM-1066) Add test coverage for ReleaseInfo

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713097#comment-15713097
 ] 

ASF GitHub Bot commented on BEAM-1066:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1480


> Add test coverage for ReleaseInfo
> -
>
> Key: BEAM-1066
> URL: https://issues.apache.org/jira/browse/BEAM-1066
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Trivial
> Fix For: Not applicable
>
>
> We use the version string for a lot, so we should test it to prevent 
> regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/3] incubator-beam git commit: Add a test of ReleaseInfo

2016-12-01 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master 48130f718 -> fd4b631f1


Add a test of ReleaseInfo


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/1094fa6a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/1094fa6a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/1094fa6a

Branch: refs/heads/master
Commit: 1094fa6ac32046b4c092294b3cee046c91aea5a1
Parents: 48130f7
Author: Dan Halperin 
Authored: Thu Dec 1 09:15:28 2016 -0800
Committer: Dan Halperin 
Committed: Thu Dec 1 13:10:55 2016 -0800

--
 .../apache/beam/sdk/util/ReleaseInfoTest.java   | 45 
 1 file changed, 45 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/1094fa6a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java
new file mode 100644
index 000..fabb7e2
--- /dev/null
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ReleaseInfoTest.java
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import static org.hamcrest.Matchers.containsString;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
+
+import org.junit.Test;
+
+/**
+ * Tests for {@link ReleaseInfo}.
+ */
+public class ReleaseInfoTest {
+
+  @Test
+  public void getReleaseInfo() throws Exception {
+ReleaseInfo info = ReleaseInfo.getReleaseInfo();
+
+// Validate name
+assertThat(info.getName(), containsString("Beam"));
+
+// Validate semantic version
+String version = info.getVersion();
+String pattern = "\\d+\\.\\d+\\.\\d+.*";
+assertTrue(
+String.format("%s does not match pattern %s", version, pattern),
+version.matches(pattern));
+  }
+}



[jira] [Updated] (BEAM-1023) Add test coverage for BigQueryIO.Write in streaming mode

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1023:
--
Component/s: sdk-java-gcp

> Add test coverage for BigQueryIO.Write in streaming mode
> 
>
> Key: BEAM-1023
> URL: https://issues.apache.org/jira/browse/BEAM-1023
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1019) Shade bytebuddy dependency

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1019:
--
Component/s: sdk-java-core

> Shade bytebuddy dependency
> --
>
> Key: BEAM-1019
> URL: https://issues.apache.org/jira/browse/BEAM-1019
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>
> We encountered backward incompatible changes in bytebuddy during upgrading to 
> Mockito 2.0.
> Shading bytebuddy helps to address them and future issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1021) DatastoreIO for python

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1021:
--
Component/s: sdk-py

> DatastoreIO for python
> --
>
> Key: BEAM-1021
> URL: https://issues.apache.org/jira/browse/BEAM-1021
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-905) Archetype pom needs to generalize dependencies

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-905.
--
   Resolution: Fixed
 Assignee: Daniel Halperin  (was: Pei He)
Fix Version/s: Not applicable

> Archetype pom needs to generalize dependencies
> --
>
> Key: BEAM-905
> URL: https://issues.apache.org/jira/browse/BEAM-905
> Project: Beam
>  Issue Type: Bug
>Affects Versions: 0.4.0-incubating
> Environment: Currently the archetype pom includes the direct runner 
> and the dataflow one, but not the others. It should do the same magic as the 
> main examples.
>Reporter: Frances Perry
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-537) Create Apache Bigtop Beam packages

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-537:
-
Component/s: sdk-java-extensions

> Create Apache Bigtop Beam packages
> --
>
> Key: BEAM-537
> URL: https://issues.apache.org/jira/browse/BEAM-537
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-extensions
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Considering that Apache Bigtop is a well tested and used framework to create 
> Big Data distributions (e.g. Cloudera, EMR, etc) , it is probably a good idea 
> to provide packages with ready to go Beam packages. This is probably more of 
> an Issue for Bigtop, but I create this issue for reference, and to document 
> any fix or PR towards this goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-546) Aggregator Verifier and Integrate to E2E Test Framework

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-546:
-
Component/s: testing

> Aggregator Verifier and Integrate to E2E Test Framework
> ---
>
> Key: BEAM-546
> URL: https://issues.apache.org/jira/browse/BEAM-546
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Minor
>
> Create a matcher to verify the value of 
> [Aggregator|https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Aggregator.java],
>  which is 
> [AggregatorValues|https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/AggregatorValues.java],
>  by given expected values.
> For example:
> {code}
> MyFnWithAggregator fn = new MyFnWithAggregator();
> Pipeline p = ... (setup the pipeline, use fn) ...
> p.run();
> assertThat(pResult, isAggregatorValue(values, fn.aggregator));
> {code}
> It also should be integrated in E2E pipeline test and used by setting 
> onSuccessMatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-443) PipelineResult needs waitUntilFinish() and cancel()

2016-12-01 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He closed BEAM-443.
---
   Resolution: Fixed
Fix Version/s: 0.2.0-incubating

> PipelineResult needs waitUntilFinish() and cancel()
> ---
>
> Key: BEAM-443
> URL: https://issues.apache.org/jira/browse/BEAM-443
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
> Fix For: 0.2.0-incubating
>
>
> waitToFinish() and cancel() are two most common operations for users to 
> interact with a started pipeline.
> Right now, they are only available in DataflowPipelineJob. But, it is better 
> to move them to the common interface, so people can start implement them in 
> other runners, and runner agnostic code can interact with PipelineResult 
> better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam-site pull request #99: Simplify Flink Runner instructions for...

2016-12-01 Thread aljoscha
GitHub user aljoscha opened a pull request:

https://github.com/apache/incubator-beam-site/pull/99

Simplify Flink Runner instructions for running on cluster

R: @davorbonaci I think this should be simple/straighforward enough now 
😃 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/incubator-beam-site 
simplify-flink-quickstart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam-site/pull/99.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #99


commit 997e045b7e0feee0f3623a8e6c2cc690347a77ef
Author: Aljoscha Krettek 
Date:   2016-12-01T20:51:54Z

Simplify Flink Runner instructions for running on cluster




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This closes #1469

2016-12-01 Thread tgroh
This closes #1469


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/48130f71
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/48130f71
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/48130f71

Branch: refs/heads/master
Commit: 48130f718d019d6928c464e6f7ad90cd510b62d2
Parents: 0c875ba ab1f1ad
Author: Thomas Groh 
Authored: Thu Dec 1 12:55:26 2016 -0800
Committer: Thomas Groh 
Committed: Thu Dec 1 12:55:26 2016 -0800

--
 .../direct/KeyedPValueTrackingVisitor.java  |   2 +-
 .../DataflowPipelineTranslatorTest.java |   2 +-
 .../main/java/org/apache/beam/sdk/Pipeline.java | 117 +++-
 .../beam/sdk/runners/TransformHierarchy.java| 126 -
 .../beam/sdk/runners/TransformTreeNode.java | 165 +
 .../sdk/runners/TransformHierarchyTest.java | 180 ++-
 .../beam/sdk/runners/TransformTreeTest.java |   4 +-
 7 files changed, 340 insertions(+), 256 deletions(-)
--




[jira] [Closed] (BEAM-1066) Add test coverage for ReleaseInfo

2016-12-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-1066.
-
Resolution: Fixed

> Add test coverage for ReleaseInfo
> -
>
> Key: BEAM-1066
> URL: https://issues.apache.org/jira/browse/BEAM-1066
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Trivial
> Fix For: Not applicable
>
>
> We use the version string for a lot, so we should test it to prevent 
> regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1072) Dataflow should reject unset versions

2016-12-01 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1072:
-

 Summary: Dataflow should reject unset versions
 Key: BEAM-1072
 URL: https://issues.apache.org/jira/browse/BEAM-1072
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: Not applicable
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor
 Fix For: Not applicable


A recent change broke the setting of the SDK version in certain execution 
modes. If it happens again, it will cause a poor user experience when running 
in the Dataflow runner.

We should catch this break in the DataflowRunner and prevent job submission if 
so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >