[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118585
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 03/Jul/18 05:57
Start Date: 03/Jul/18 05:57
Worklog Time Spent: 10m 
  Work Description: superbobry commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-402021833
 
 
   > There are no Jenkins build issues using my approach ;-)
   
   No, but your approach introduces boilerplate. I think this is worth fixing 
given that a) there seem to be no issue with importing from `past` on Cython, 
and b) a `past` import is no less safe than try-except `NameError`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118585)
Time Spent: 12h 20m  (was: 12h 10m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] 01/01: Prepare repository for deployment.

2018-07-02 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit b42fda3e3237a0dae6ddb39b7d34fb2960cc8461
Author: Mergebot 
AuthorDate: Mon Jul 2 21:33:03 2018 -0700

Prepare repository for deployment.
---
 content/documentation/dsls/sql/create-table/index.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/documentation/dsls/sql/create-table/index.html 
b/content/documentation/dsls/sql/create-table/index.html
index c7c1e02..a8aa955 100644
--- a/content/documentation/dsls/sql/create-table/index.html
+++ b/content/documentation/dsls/sql/create-table/index.html
@@ -373,8 +373,8 @@ LOCATION '[PROJECT_ID]:[DATASET].[TABLE]'
 
 Read Mode
 
-Not supported. BigQueryI/O is currently limited to write access only in Beam
-SQL.
+Beam SQL supports reading columns with simple types (simpleType) and arrays of simple
+types (ARRAYsimpleType).
 
 Write Mode
 



[beam-site] branch asf-site updated (e4990b6 -> b42fda3)

2018-07-02 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from e4990b6  Prepare repository for deployment.
 add 2edac53  Update doc
 add 9597d44  This closes #488
 new b42fda3  Prepare repository for deployment.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/documentation/dsls/sql/create-table/index.html | 4 ++--
 src/documentation/dsls/sql/create-table.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)



[beam-site] branch mergebot updated (a1702d1 -> 9597d44)

2018-07-02 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from a1702d1  This closes #476
 add e4990b6  Prepare repository for deployment.
 new 2edac53  Update doc
 new 9597d44  This closes #488

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../19/beam-2.3.0.html => 06/26/beam-2.5.0.html}   | 134 --
 content/blog/index.html|  30 
 content/feed.xml   | 156 +
 content/index.html |  10 +-
 src/documentation/dsls/sql/create-table.md |   4 +-
 5 files changed, 200 insertions(+), 134 deletions(-)
 copy content/blog/2018/{02/19/beam-2.3.0.html => 06/26/beam-2.5.0.html} (68%)



[beam-site] 01/02: Update doc

2018-07-02 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 2edac53fc1d36e3225adcdc31378db44d30ec683
Author: amaliujia 
AuthorDate: Mon Jul 2 15:17:40 2018 -0700

Update doc
---
 src/documentation/dsls/sql/create-table.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/documentation/dsls/sql/create-table.md 
b/src/documentation/dsls/sql/create-table.md
index 0b30327..cfa1d2d 100644
--- a/src/documentation/dsls/sql/create-table.md
+++ b/src/documentation/dsls/sql/create-table.md
@@ -95,8 +95,8 @@ LOCATION '[PROJECT_ID]:[DATASET].[TABLE]'
 
 ### Read Mode
 
-Not supported. BigQueryI/O is currently limited to write access only in Beam
-SQL.
+Beam SQL supports reading columns with simple types (`simpleType`) and arrays 
of simple
+types (`ARRAY`).
 
 ### Write Mode
 



[beam-site] 02/02: This closes #488

2018-07-02 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 9597d44fdcd19028125727c01a8ec79b49a3ea96
Merge: e4990b6 2edac53
Author: Mergebot 
AuthorDate: Mon Jul 2 21:30:07 2018 -0700

This closes #488

 src/documentation/dsls/sql/create-table.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[jira] [Work logged] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4713?focusedWorklogId=118574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118574
 ]

ASF GitHub Bot logged work on BEAM-4713:


Author: ASF GitHub Bot
Created on: 03/Jul/18 04:27
Start Date: 03/Jul/18 04:27
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5867: [BEAM-4713] 
DECIMAL support in RowJsonDeserializer
URL: https://github.com/apache/beam/pull/5867
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonDeserializer.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonDeserializer.java
index dfe5a8a8708..33c6b6ec5c6 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonDeserializer.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonDeserializer.java
@@ -20,6 +20,7 @@
 import static java.util.stream.Collectors.toList;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.BOOLEAN;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.DECIMAL;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.DOUBLE;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.FLOAT;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
@@ -28,6 +29,7 @@
 import static org.apache.beam.sdk.schemas.Schema.TypeName.STRING;
 import static 
org.apache.beam.sdk.util.RowJsonValueExtractors.booleanValueExtractor;
 import static 
org.apache.beam.sdk.util.RowJsonValueExtractors.byteValueExtractor;
+import static 
org.apache.beam.sdk.util.RowJsonValueExtractors.decimalValueExtractor;
 import static 
org.apache.beam.sdk.util.RowJsonValueExtractors.doubleValueExtractor;
 import static 
org.apache.beam.sdk.util.RowJsonValueExtractors.floatValueExtractor;
 import static 
org.apache.beam.sdk.util.RowJsonValueExtractors.intValueExtractor;
@@ -83,6 +85,7 @@
   .put(DOUBLE, doubleValueExtractor())
   .put(BOOLEAN, booleanValueExtractor())
   .put(STRING, stringValueExtractor())
+  .put(DECIMAL, decimalValueExtractor())
   .build();
 
   private Schema schema;
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValidation.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValidation.java
index 063ccae431e..064bb491178 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValidation.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValidation.java
@@ -19,6 +19,7 @@
 
 import static org.apache.beam.sdk.schemas.Schema.TypeName.BOOLEAN;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.DECIMAL;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.DOUBLE;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.FLOAT;
 import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
@@ -37,7 +38,7 @@
 class RowJsonValidation {
 
   private static final ImmutableSet SUPPORTED_TYPES =
-  ImmutableSet.of(BYTE, INT16, INT32, INT64, FLOAT, DOUBLE, BOOLEAN, 
STRING);
+  ImmutableSet.of(BYTE, INT16, INT32, INT64, FLOAT, DOUBLE, BOOLEAN, 
STRING, DECIMAL);
 
   static void verifyFieldTypeSupported(Schema.Field field) {
 Schema.FieldType fieldType = field.getType();
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java
index 2b52efc9318..4db0823ed17 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJsonValueExtractors.java
@@ -165,6 +165,18 @@
 .build();
   }
 
+  /**
+   * Extracts BigDecimal from the JsonNode if it is within bounds.
+   *
+   * Throws {@link UnsupportedRowJsonException} if value is out of bounds.
+   */
+  static ValueExtractor decimalValueExtractor() {
+return ValidatingValueExtractor.builder()
+.setExtractor(JsonNode::decimalValue)
+.setValidator(jsonNode -> jsonNode.isNumber())
+.build();
+  }
+
   @AutoValue
   public abstract static class ValidatingValueExtractor implements 
ValueExtractor {
 
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonDeserializerTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonDeserializerTest.java
index f18787a1f7f..2bbabc397c0 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonDeserializerTest.java
+++ 

[beam] 01/01: Merge pull request #5867: [BEAM-4713] DECIMAL support in RowJsonDeserializer

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 020c64440fc9f01d97356618e11e8ab6ddfcf97b
Merge: d707a05 58854ec
Author: Kenn Knowles 
AuthorDate: Mon Jul 2 21:27:39 2018 -0700

Merge pull request #5867: [BEAM-4713] DECIMAL support in RowJsonDeserializer

 .../org/apache/beam/sdk/util/RowJsonDeserializer.java|  3 +++
 .../java/org/apache/beam/sdk/util/RowJsonValidation.java |  3 ++-
 .../org/apache/beam/sdk/util/RowJsonValueExtractors.java | 12 
 .../apache/beam/sdk/util/RowJsonDeserializerTest.java| 16 ++--
 4 files changed, 31 insertions(+), 3 deletions(-)



[jira] [Work logged] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4713?focusedWorklogId=118573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118573
 ]

ASF GitHub Bot logged work on BEAM-4713:


Author: ASF GitHub Bot
Created on: 03/Jul/18 04:27
Start Date: 03/Jul/18 04:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5867: [BEAM-4713] 
DECIMAL support in RowJsonDeserializer
URL: https://github.com/apache/beam/pull/5867#issuecomment-402009699
 
 
   LGTM. Useful!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118573)
Time Spent: 50m  (was: 40m)

> DECIMAL support in RowJsonDeserializer
> --
>
> Key: BEAM-4713
> URL: https://issues.apache.org/jira/browse/BEAM-4713
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (d707a05 -> 020c644)

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from d707a05  Merge pull request #5872 from charlesccychen/fix-todo-style
 add 58854ec  support decimal type in reading from PubSub JSON
 new 020c644  Merge pull request #5867: [BEAM-4713] DECIMAL support in 
RowJsonDeserializer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/beam/sdk/util/RowJsonDeserializer.java|  3 +++
 .../java/org/apache/beam/sdk/util/RowJsonValidation.java |  3 ++-
 .../org/apache/beam/sdk/util/RowJsonValueExtractors.java | 12 
 .../apache/beam/sdk/util/RowJsonDeserializerTest.java| 16 ++--
 4 files changed, 31 insertions(+), 3 deletions(-)



[jira] [Created] (BEAM-4723) Enhance Datetime*Expression Datetime Type

2018-07-02 Thread Kai Jiang (JIRA)
Kai Jiang created BEAM-4723:
---

 Summary: Enhance Datetime*Expression Datetime Type
 Key: BEAM-4723
 URL: https://issues.apache.org/jira/browse/BEAM-4723
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Kai Jiang
Assignee: Kai Jiang


Datetime*Expression only supports timestamp type for first operand now. We 
should let it accept all Datetime_Types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle #627

2018-07-02 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-4713) DECIMAL support in RowJsonDeserializer

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4713?focusedWorklogId=118560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118560
 ]

ASF GitHub Bot logged work on BEAM-4713:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:42
Start Date: 03/Jul/18 01:42
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #5867: [BEAM-4713] DECIMAL 
support in RowJsonDeserializer
URL: https://github.com/apache/beam/pull/5867#issuecomment-401987194
 
 
   @kennknowles updated `RowJsonDeserializerTest`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118560)
Time Spent: 40m  (was: 0.5h)

> DECIMAL support in RowJsonDeserializer
> --
>
> Key: BEAM-4713
> URL: https://issues.apache.org/jira/browse/BEAM-4713
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118552
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199661824
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -59,13 +59,12 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self, operation_name, step_name, consumers, counter_factory,
+  def __init__(self, name_context, step_name, consumers, counter_factory,
state_sampler, windowed_coder, target, data_channel):
 super(RunnerIOOperation, self).__init__(
-operation_name, None, counter_factory, state_sampler)
+name_context, None, counter_factory, state_sampler)
 self.windowed_coder = windowed_coder
 self.windowed_coder_impl = windowed_coder.get_impl()
-self.step_name = step_name
 
 Review comment:
   Is this deletion intentional?  If so, can you add a comment / JIRA reference 
to clean up step_name in the arguments?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118552)
Time Spent: 16.5h  (was: 16h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118554
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199662355
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/statesampler.py
 ##
 @@ -62,12 +64,22 @@ def __init__(self, prefix, counter_factory,
sampling_period_ms=DEFAULT_SAMPLING_PERIOD_MS):
 self.states_by_name = {}
 self._prefix = prefix
-self._counter_factory = counter_factory
+self._counter_factory = counter_factory or CounterFactory()
 
 Review comment:
   It looks like the second branch is only used by tests.  Can we have the 
tests pass empty `CounterFactory()`s instead of adding this optional behavior 
in the actual code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118554)
Time Spent: 16h 40m  (was: 16.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118551
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199661924
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -49,7 +48,7 @@ def get_data(self):
 per_thread_worker_data = _PerThreadWorkerData()
 
 
-class PerThreadLoggingContext(LoggingContext):
+class PerThreadLoggingContext(object):
 
 Review comment:
   Are we able to get rid of this class entirely?  It looks like you removed 
the only usage in operations.py.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118551)
Time Spent: 16.5h  (was: 16h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118553
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199662052
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
 
   def __init__(self):
 super(_PerThreadWorkerData, self).__init__()
-# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements
 
 Review comment:
   Accidental deletion?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118553)
Time Spent: 16h 40m  (was: 16.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118549
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199661623
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -539,19 +529,14 @@ def __init__(self,
   windowing: windowing properties of the output PCollection(s)
   tagged_receivers: a dict of tag name to Receiver objects
   step_name: the name of this step
-  logging_context: a LoggingContext object
+  logging_context: DEPRECATED
 
 Review comment:
   Can you add a JIRA to remove this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118549)
Time Spent: 16h 20m  (was: 16h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118550
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199255236
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operation_specs.py
 ##
 @@ -376,9 +376,9 @@ def __init__(self, operations, stage_name,
step_names=None,
original_names=None,
name_contexts=None):
+
 self.operations = operations
 self.stage_name = stage_name
-# TODO(BEAM-4028): Remove arguments other than name_contexts.
 
 Review comment:
   Is this obsolete?  The Jira is still open.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118550)
Time Spent: 16.5h  (was: 16h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118545
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660651
 
 

 ##
 File path: sdks/python/apache_beam/transforms/sideinputs_test.py
 ##
 @@ -194,7 +197,7 @@ def match(actual):
 [[actual_elem, actual_list, actual_dict]] = actual
 equal_to([expected_elem])([actual_elem])
 equal_to(expected_list)(actual_list)
-equal_to(expected_pairs)(actual_dict.iteritems())
+equal_to(expected_pairs)(iteritems(actual_dict))
 
 Review comment:
   Can we just use `.items()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118545)
Time Spent: 1h  (was: 50m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118537
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660514
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform.py
 ##
 @@ -622,7 +626,7 @@ def __init__(self, fn, *args, **kwargs):
 super(PTransformWithSideInputs, self).__init__()
 
 if (any([isinstance(v, pvalue.PCollection) for v in args]) or
-any([isinstance(v, pvalue.PCollection) for v in kwargs.itervalues()])):
+any([isinstance(v, pvalue.PCollection) for v in itervalues(kwargs)])):
 
 Review comment:
   Can we just do `.values()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118537)
Time Spent: 20m  (was: 10m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118544
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199661209
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window.py
 ##
 @@ -245,10 +274,17 @@ def __init__(self, value, timestamp):
 self.value = value
 self.timestamp = Timestamp.of(timestamp)
 
-  def __cmp__(self, other):
+  def __eq__(self, other):
+return (type(self) == type(other)) and (self.value == other.value) and \
 
 Review comment:
   Please avoid backslash continuation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118544)
Time Spent: 50m  (was: 40m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118538
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199659883
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -286,7 +290,7 @@ def match(actual):
 def matcher():
   def match(actual):
 equal_to([1])([len(actual)])
-equal_to(pairs)(actual[0].iteritems())
+equal_to(pairs)(iteritems(actual[0]))
 
 Review comment:
   We can just use `.items()` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118538)
Time Spent: 20m  (was: 10m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118542
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660012
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1855,7 +1865,7 @@ def __init__(self, value):
   value: An object of values for the PCollection
 """
 super(Create, self).__init__()
-if isinstance(value, string_types):
+if isinstance(value, (unicode, str)):
 
 Review comment:
   Should we add `bytes` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118542)
Time Spent: 40m  (was: 0.5h)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118539
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199659779
 
 

 ##
 File path: sdks/python/apache_beam/pipeline_test.py
 ##
 @@ -520,11 +520,11 @@ def test_dir(self):
 options = Breakfast()
 self.assertEquals(
 set(['from_dictionary', 'get_all_options', 'slices', 'style',
- 'view_as', 'display_data']),
+ 'view_as', 'display_data', 'next']),
 
 Review comment:
   This is fine, but was there a particular reason it was added?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118539)
Time Spent: 0.5h  (was: 20m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118546
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660156
 
 

 ##
 File path: sdks/python/apache_beam/transforms/create_source.py
 ##
 @@ -57,15 +64,15 @@ def split(self, desired_bundle_size, start_position=None,
 start_position = 0
   if stop_position is None:
 stop_position = len(self._serialized_values)
-  avg_size_per_value = self._total_size / len(self._serialized_values)
+  avg_size_per_value = self._total_size // len(self._serialized_values)
   num_values_per_split = max(
-  int(desired_bundle_size / avg_size_per_value), 1)
+  int(desired_bundle_size // avg_size_per_value), 1)
 
 Review comment:
   We still need to coerce it into an `int`; e.g. `4 // 7.0` has value `0.0` 
but type `float`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118546)
Time Spent: 1h  (was: 50m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118541
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660657
 
 

 ##
 File path: sdks/python/apache_beam/transforms/sideinputs_test.py
 ##
 @@ -284,8 +287,8 @@ def  matcher(expected_elem, expected_kvs):
   def match(actual):
 [[actual_elem, actual_dict1, actual_dict2]] = actual
 equal_to([expected_elem])([actual_elem])
-equal_to(expected_kvs)(actual_dict1.iteritems())
-equal_to(expected_kvs)(actual_dict2.iteritems())
+equal_to(expected_kvs)(iteritems(actual_dict1))
 
 Review comment:
   Can we just use `.items()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118541)
Time Spent: 40m  (was: 0.5h)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118540
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660344
 
 

 ##
 File path: sdks/python/apache_beam/transforms/display.py
 ##
 @@ -141,7 +147,7 @@ def create_from_options(cls, pipeline_options):
 
 items = {k: (v if DisplayDataItem._get_value_type(v) is not None
  else str(v))
- for k, v in pipeline_options.display_data().items()}
+ for k, v in iteritems(pipeline_options.display_data())}
 
 Review comment:
   Does this actually need to change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118540)
Time Spent: 40m  (was: 0.5h)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4006) Futurize and fix python 2 compatibility for transforms subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=118543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118543
 ]

ASF GitHub Bot logged work on BEAM-4006:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:21
Start Date: 03/Jul/18 01:21
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5729: [BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r199660969
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger_test.py
 ##
 @@ -421,12 +426,12 @@ def format_result(k_v):
 | beam.GroupByKey()
 | beam.Map(format_result))
   assert_that(result, equal_to(
-  {
+  iteritems({
   'A-5': {1, 2, 3, 4, 5},
   # A-10, A-11 never emitted due to AfterCount(3) never firing.
   'B-4': {6, 7, 8, 9},
   'B-3': {10, 15, 16},
-  }.iteritems()))
+  })))
 
 Review comment:
   How about just `.items()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118543)

> Futurize and fix python 2 compatibility for transforms subpackage
> -
>
> Key: BEAM-4006
> URL: https://issues.apache.org/jira/browse/BEAM-4006
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118535
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659550
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -28,12 +31,13 @@
 import tempfile
 import time
 from datetime import datetime
-from StringIO import StringIO
+from io import BytesIO
 
 Review comment:
   In some of your other pending changes you use `io.BytesIO`.  Can we choose 
one style and be consistent?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118535)
Time Spent: 5h 10m  (was: 5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118533
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659232
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -374,7 +376,7 @@ def test_progress_metrics(self):
 res.wait_until_finish()
 try:
   self.assertEqual(2, len(res._metrics_by_stage))
-  pregbk_metrics, postgbk_metrics = res._metrics_by_stage.values()
+  pregbk_metrics, postgbk_metrics = list(res._metrics_by_stage.values())
 
 Review comment:
   Do we need the extra `list()` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118533)
Time Spent: 5h  (was: 4h 50m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118526
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659055
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/data_plane_test.py
 ##
 @@ -28,12 +28,15 @@
 from concurrent import futures
 
 import grpc
-import six
+from future import standard_library
+from future.utils import raise_
 
 from apache_beam.portability.api import beam_fn_api_pb2
 from apache_beam.portability.api import beam_fn_api_pb2_grpc
 from apache_beam.runners.worker import data_plane
 
+standard_library.install_aliases()
 
 Review comment:
   Why is this needed in this module?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118526)
Time Spent: 4h 10m  (was: 4h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118522
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199658941
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -23,21 +23,27 @@
 import abc
 import contextlib
 import logging
-import Queue as queue
+import queue
 import sys
 import threading
 import traceback
+from builtins import object
+from builtins import range
 from concurrent import futures
 
 import grpc
-import six
+from future import standard_library
+from future.utils import raise_
+from future.utils import with_metaclass
 
 from apache_beam.portability.api import beam_fn_api_pb2
 from apache_beam.portability.api import beam_fn_api_pb2_grpc
 from apache_beam.runners.worker import bundle_processor
 from apache_beam.runners.worker import data_plane
 from apache_beam.runners.worker.worker_id_interceptor import 
WorkerIdInterceptor
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "queue" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118522)
Time Spent: 3.5h  (was: 3h 20m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118532
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659615
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -52,6 +58,8 @@
 from apache_beam.utils import proto_utils
 from apache_beam.utils.plugin import BeamPlugin
 
+standard_library.install_aliases()
 
 Review comment:
   Why do we need this in this file?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118532)
Time Spent: 4h 50m  (was: 4h 40m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118530
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659286
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -52,6 +56,7 @@
 from apache_beam.transforms.window import GlobalWindows
 from apache_beam.utils import proto_utils
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "queue" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118530)
Time Spent: 4h 40m  (was: 4.5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118521
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659014
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger_test.py
 ##
 @@ -17,14 +17,21 @@
 
 """Tests for worker logging utilities."""
 
+from __future__ import absolute_import
+
 import json
 import logging
 import sys
 import threading
 import unittest
+from builtins import object
+
+from future import standard_library
 
 from apache_beam.runners.worker import logger
 
+standard_library.install_aliases()
 
 Review comment:
   Why is this needed in this module?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118521)
Time Spent: 3h 20m  (was: 3h 10m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118520
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199658900
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -33,6 +37,8 @@
 from apache_beam.runners.worker.log_handler import FnApiLogRecordHandler
 from apache_beam.runners.worker.sdk_worker import SdkHarness
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "http.server" above?  It 
looks like this only works because of an "accident" in that this module is 
imported after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118520)
Time Spent: 3h 10m  (was: 3h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118531
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659221
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -398,7 +400,7 @@ def test_progress_metrics(self):
 
   # The actual stage name ends up being something like 'm_out/lamdbda...'
   m_out, = [
-  metrics for name, metrics in postgbk_metrics.ptransforms.items()
+  metrics for name, metrics in 
list(postgbk_metrics.ptransforms.items())
 
 Review comment:
   Do we need the extra `list()` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118531)
Time Spent: 4h 50m  (was: 4h 40m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118523
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659024
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/log_handler.py
 ##
 @@ -16,17 +16,23 @@
 #
 """Beam fn API log handler."""
 
+from __future__ import absolute_import
+
 import logging
 import math
-import Queue as queue
+import queue
 import threading
+from builtins import range
 
 import grpc
+from future import standard_library
 
 from apache_beam.portability.api import beam_fn_api_pb2
 from apache_beam.portability.api import beam_fn_api_pb2_grpc
 from apache_beam.runners.worker.worker_id_interceptor import 
WorkerIdInterceptor
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "queue" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118523)
Time Spent: 3h 40m  (was: 3.5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118519
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199658762
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_id_interceptor.py
 ##
 @@ -39,8 +39,8 @@ class 
WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
   # and throw exception in worker_id_interceptor.py after we have rolled out
   # the corresponding container changes.
   # Unique worker Id for this worker.
-  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
-  'WORKER_ID') else str(uuid.uuid4())
+  _worker_id = os.environ['WORKER_ID'] if 'WORKER_ID' in os.environ else \
+  str(uuid.uuid4())
 
 Review comment:
   +1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118519)
Time Spent: 3h  (was: 2h 50m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118527
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659153
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -14,18 +14,22 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+from __future__ import absolute_import
+
 import functools
 import logging
 import os
-import Queue as queue
+import queue as queue
 
 Review comment:
   "as queue" is redundant.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118527)
Time Spent: 4h 20m  (was: 4h 10m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118525
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659068
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/data_plane.py
 ##
 @@ -24,18 +24,24 @@
 import abc
 import collections
 import logging
-import Queue as queue
+import queue
 import sys
 import threading
+from builtins import object
+from builtins import range
 
 import grpc
-import six
+from future import standard_library
+from future.utils import raise_
+from future.utils import with_metaclass
 
 from apache_beam.coders import coder_impl
 from apache_beam.portability.api import beam_fn_api_pb2
 from apache_beam.portability.api import beam_fn_api_pb2_grpc
 from apache_beam.runners.worker.worker_id_interceptor import 
WorkerIdInterceptor
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "queue" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118525)
Time Spent: 4h  (was: 3h 50m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118529
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659136
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -34,6 +38,8 @@
 from apache_beam.portability.api import endpoints_pb2
 from apache_beam.runners.portability import fn_api_runner
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "queue" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118529)
Time Spent: 4h 40m  (was: 4.5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118528
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659640
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -28,12 +31,13 @@
 import tempfile
 import time
 from datetime import datetime
-from StringIO import StringIO
+from io import BytesIO
 
 import pkg_resources
 from apitools.base.py import encoding
 from apitools.base.py import exceptions
-import six
+
+from future import standard_library
 
 Review comment:
   Do we want to call this before we do the import of "io" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118528)
Time Spent: 4.5h  (was: 4h 20m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118536
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659436
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py
 ##
 @@ -49,7 +51,7 @@ def _matches(self, item):
 if self.origin != IGNORED and item.origin != self.origin:
   return False
 if self.context != IGNORED:
-  for key, name in self.context.iteritems():
+  for key, name in iteritems(self.context):
 
 Review comment:
   Why `iteritems` here vs `.items()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118536)
Time Spent: 5h 20m  (was: 5h 10m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118524
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659114
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -103,7 +107,7 @@ def __init__(self, operation_name, step_name, consumers, 
counter_factory,
 # We must do this manually as we don't have a spec or spec.output_coders.
 self.receivers = [
 operations.ConsumerSet(self.counter_factory, self.step_name, 0,
-   next(consumers.itervalues()),
+   next(itervalues(consumers)),
 
 Review comment:
   Is this preferred over `consumers.values()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118524)
Time Spent: 3h 50m  (was: 3h 40m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118534
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:05
Start Date: 03/Jul/18 01:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5373: [BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199659367
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/executor.py
 ##
 @@ -22,19 +22,24 @@
 import collections
 import itertools
 import logging
-import Queue
+import queue
 import sys
 import threading
 import traceback
+from builtins import object
+from builtins import range
 from weakref import WeakValueDictionary
 
-import six
+from future import standard_library
+from future.utils import raise_
 
 from apache_beam.metrics.execution import MetricsContainer
 from apache_beam.runners.worker import statesampler
 from apache_beam.transforms import sideinputs
 from apache_beam.utils import counters
 
+standard_library.install_aliases()
 
 Review comment:
   Do we want to call this before we do the import of "queue" above?  It looks 
like this only works because of an "accident" in that this module is imported 
after someone else already called `install_aliases()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118534)
Time Spent: 5h 10m  (was: 5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4007) Futurize and fix python 2 compatibility for typehints subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4007?focusedWorklogId=118515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118515
 ]

ASF GitHub Bot logged work on BEAM-4007:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:51
Start Date: 03/Jul/18 00:51
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5872: [BEAM-4007] 
Fix TODO style in typehints.py
URL: https://github.com/apache/beam/pull/5872
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/typehints/typehints.py 
b/sdks/python/apache_beam/typehints/typehints.py
index 5da65fa352d..eb9254a5e6c 100644
--- a/sdks/python/apache_beam/typehints/typehints.py
+++ b/sdks/python/apache_beam/typehints/typehints.py
@@ -416,7 +416,7 @@ def __repr__(self):
 return 'Any'
 
   def __hash__(self):
-# TODO(BEAM - 3730)
+# TODO(BEAM-3730): Fix typehints.TypeVariable issues with __hash__.
 return hash(id(self))
 
   def type_check(self, instance):
@@ -432,7 +432,7 @@ def __eq__(self, other):
 return type(self) == type(other) and self.name == other.name
 
   def __hash__(self):
-# TODO(BEAM - 3730)
+# TODO(BEAM-3730): Fix typehints.TypeVariable issues with __hash__.
 return hash(id(self))
 
   def __repr__(self):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118515)
Time Spent: 4h 50m  (was: 4h 40m)

> Futurize and fix python 2 compatibility for typehints subpackage
> 
>
> Key: BEAM-4007
> URL: https://issues.apache.org/jira/browse/BEAM-4007
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (552be73 -> d707a05)

2018-07-02 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 552be73  Merge pull request #5840: [BEAM-4451] Schemas default 
provider and serviceloader
 add 6dcd7e9  [BEAM-4007] Fix TODO style in typehints.py
 new d707a05  Merge pull request #5872 from charlesccychen/fix-todo-style

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/typehints/typehints.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[beam] 01/01: Merge pull request #5872 from charlesccychen/fix-todo-style

2018-07-02 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit d707a050642c763b013beddb6766d65ee578f3fd
Merge: 552be73 6dcd7e9
Author: Charles Chen 
AuthorDate: Mon Jul 2 17:51:29 2018 -0700

Merge pull request #5872 from charlesccychen/fix-todo-style

[BEAM-4007] Fix TODO style in typehints.py

 sdks/python/apache_beam/typehints/typehints.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118514
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:46
Start Date: 03/Jul/18 00:46
Worklog Time Spent: 10m 
  Work Description: cclauss commented on a change in pull request #5373: 
[BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199657561
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_id_interceptor.py
 ##
 @@ -39,8 +39,8 @@ class 
WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
   # and throw exception in worker_id_interceptor.py after we have rolled out
   # the corresponding container changes.
   # Unique worker Id for this worker.
-  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
-  'WORKER_ID') else str(uuid.uuid4())
+  _worker_id = os.environ['WORKER_ID'] if 'WORKER_ID' in os.environ else \
+  str(uuid.uuid4())
 
 Review comment:
   OUCH...  Backshash is a bad idea (see PEP8).  One space character to the 
right of the backslach and the script breaks on a change that is not visible to 
the reader.  What about:
   * ___worker_id = os.environ.get('WORKER_ID', str(uuid.uuid4()))__ instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118514)
Time Spent: 2h 50m  (was: 2h 40m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #1013

2018-07-02 Thread Apache Jenkins Server
See 


Changes:

[robbe.sneyders] Futurize typehints subpackage

[robbe.sneyders] Address PR comments

[ekirpichov] Makes FileIO.match watermark advance even without new files

[daniel.o.programmer] [BEAM-3708] Adding grouping table to Precombine step.

[daniel.o.programmer] [BEAM-3708] Simplifying precombine grouping tables.

[kenn] sqlline dep to 1.4.0

[kenn] Remove more extraneous printlns from build

[kenn] Remove aliased tables in Nexmark SQL query 5

[kenn] fixup! Remove aliased tables in Nexmark SQL query 5

[kenn] Simplify/fix windowing check in BeamSortRel

[Scott Wegner] Increase concurrency in ValidateRunner execution

[lcwik] [BEAM-3971, BEAM-4284] Remove fromProto for Pipeline and PTransform

[kenn] Instantiate $SUM0 as a SUM operator

[kenn] Instantiate "+" as DATETIME_PLUS

--
[...truncated 19.43 MB...]
INFO: 2018-07-03T00:41:08.842Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/Combine.perKey(SampleAny)/GroupByKey/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Wait.OnSignal/To wait 
view 
0/Sample.Any/Combine.globally(SampleAny)/Combine.perKey(SampleAny)/GroupByKey/Reify
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:08.880Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Keys sample as view/GBKaSVForSize/Write into 
SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample as 
view/ParMultiDo(ToIsmRecordForMapLike)
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:08.920Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/View.AsList/ParDo(ToIsmRecordForGlobalWindow) into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Flatten.Iterables/FlattenIterables/FlatMap
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:08.948Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Keys sample as view/GBKaSVForKeys/Write into 
SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample as 
view/ParMultiDo(ToIsmRecordForMapLike)
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:08.980Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Flatten.Iterables/FlattenIterables/FlatMap into 
SpannerIO.Write/Write mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/Values/Values/Map
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:09.020Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/Values/Values/Map into 
SpannerIO.Write/Write mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/Combine.perKey(SampleAny)/Combine.GroupedValues/Extract
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:09.060Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Keys sample as view/ParDo(ToIsmMetadataRecordForKey) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample as 
view/GBKaSVForKeys/Read
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:09.101Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/Combine.perKey(SampleAny)/Combine.GroupedValues/Extract
 into SpannerIO.Write/Write mutations to Cloud Spanner/Wait.OnSignal/To wait 
view 
0/Sample.Any/Combine.globally(SampleAny)/Combine.perKey(SampleAny)/Combine.GroupedValues
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:09.137Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/WithKeys/AddKeys/Map into 
SpannerIO.Write/Write mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/ParDo(CollectWindows)
Jul 03, 2018 12:41:10 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:41:09.180Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/To wait view 
0/Sample.Any/Combine.globally(SampleAny)/Combine.perKey(SampleAny)/Combine.GroupedValues
 into 

[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=118513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118513
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:44
Start Date: 03/Jul/18 00:44
Worklog Time Spent: 10m 
  Work Description: cclauss commented on a change in pull request #5373: 
[BEAM-4003] Futurize runners subpackage
URL: https://github.com/apache/beam/pull/5373#discussion_r199657561
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_id_interceptor.py
 ##
 @@ -39,8 +39,8 @@ class 
WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
   # and throw exception in worker_id_interceptor.py after we have rolled out
   # the corresponding container changes.
   # Unique worker Id for this worker.
-  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
-  'WORKER_ID') else str(uuid.uuid4())
+  _worker_id = os.environ['WORKER_ID'] if 'WORKER_ID' in os.environ else \
+  str(uuid.uuid4())
 
 Review comment:
   OUCH...  Backshash is a bad idea (see PEP8).  One space character to the 
right of the backslach and the script breaks on a change that is not visible to 
the reader.  What about ___worker_id = os.environ.get('WORKER_ID', 
str(uuid.uuid4()))__ instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118513)
Time Spent: 2h 40m  (was: 2.5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4453) Provide automatic schema registration for POJOs

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4453?focusedWorklogId=118511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118511
 ]

ASF GitHub Bot logged work on BEAM-4453:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:42
Start Date: 03/Jul/18 00:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax opened a new pull request #5873: [BEAM-4453] 
Add schema support for Java POJOs and Java Beans
URL: https://github.com/apache/beam/pull/5873
 
 
   Adds automatic schema support for these types of classes. ByteBuddy is use 
to generate automatic connectors, so that Row objects can transparently 
delegate to the underlying storage in user objects.
   
   While this PR is large, splitting it up didn't seem to help much. Also, much 
of the line count is in unit tests.
   
   R: @apilloud


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118511)
Time Spent: 10m
Remaining Estimate: 0h

> Provide automatic schema registration for POJOs
> ---
>
> Key: BEAM-4453
> URL: https://issues.apache.org/jira/browse/BEAM-4453
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118510
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:40
Start Date: 03/Jul/18 00:40
Worklog Time Spent: 10m 
  Work Description: holdenk commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401978352
 
 
   Ah ok, sorry for my confusion then :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118510)
Time Spent: 12h 10m  (was: 12h)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4451) SchemaRegistry should support a ServiceLoader interface

2018-07-02 Thread Reuven Lax (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax resolved BEAM-4451.
--
   Resolution: Fixed
Fix Version/s: 2.6.0

> SchemaRegistry should support a ServiceLoader interface
> ---
>
> Key: BEAM-4451
> URL: https://issues.apache.org/jira/browse/BEAM-4451
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This will allow JARs to register schemas only when they are linked in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_JDBC #798

2018-07-02 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PreCommit_Java_Cron #58

2018-07-02 Thread Apache Jenkins Server
See 


Changes:

[robbe.sneyders] Futurize typehints subpackage

[robbe.sneyders] Address PR comments

[ekirpichov] Makes FileIO.match watermark advance even without new files

[daniel.o.programmer] [BEAM-3708] Adding grouping table to Precombine step.

[daniel.o.programmer] [BEAM-3708] Simplifying precombine grouping tables.

[kenn] sqlline dep to 1.4.0

[kenn] Remove more extraneous printlns from build

[kenn] Remove aliased tables in Nexmark SQL query 5

[kenn] fixup! Remove aliased tables in Nexmark SQL query 5

[kenn] Simplify/fix windowing check in BeamSortRel

[Scott Wegner] Increase concurrency in ValidateRunner execution

[lcwik] [BEAM-3971, BEAM-4284] Remove fromProto for Pipeline and PTransform

[kenn] Instantiate $SUM0 as a SUM operator

[kenn] Instantiate "+" as DATETIME_PLUS

--
[...truncated 16.72 MB...]
INFO: 2018-07-03T00:19:11.947Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.193Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.223Z: Unzipping flatten s13 for input 
s12.org.apache.beam.sdk.values.PCollection.:349#1d275f544daf228c
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.246Z: Fusing unzipped copy of 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Add void 
key/AddKeys/Map, through flatten 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/Flatten.PCollections,
 into producer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/DropShardNum
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.272Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/GroupByWindow
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.306Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.335Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Write
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.370Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.406Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.435Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.467Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map
Jul 03, 2018 12:19:16 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-03T00:19:12.494Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key into 

[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=118509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118509
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:22
Start Date: 03/Jul/18 00:22
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5519: [BEAM-4432] Adding 
Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#issuecomment-401975854
 
 
   Thanks Ismael! for some reason, I hadn't seen your other ocmments. Github 
hid that file from me. I've addressed them. Only thing remaining is setting up 
`SyntheticIO.read`. I'll work on that tomorrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118509)
Time Spent: 5h 50m  (was: 5h 40m)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4007) Futurize and fix python 2 compatibility for typehints subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4007?focusedWorklogId=118508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118508
 ]

ASF GitHub Bot logged work on BEAM-4007:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:17
Start Date: 03/Jul/18 00:17
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on issue #5872: [BEAM-4007] Fix 
TODO style in typehints.py
URL: https://github.com/apache/beam/pull/5872#issuecomment-401975081
 
 
   LGTM, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118508)
Time Spent: 4h 40m  (was: 4.5h)

> Futurize and fix python 2 compatibility for typehints subpackage
> 
>
> Key: BEAM-4007
> URL: https://issues.apache.org/jira/browse/BEAM-4007
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4451) SchemaRegistry should support a ServiceLoader interface

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4451?focusedWorklogId=118506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118506
 ]

ASF GitHub Bot logged work on BEAM-4451:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:12
Start Date: 03/Jul/18 00:12
Worklog Time Spent: 10m 
  Work Description: reuvenlax closed pull request #5840: [BEAM-4451] 
Schemas default provider and serviceloader
URL: https://github.com/apache/beam/pull/5840
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/DefaultSchema.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/DefaultSchema.java
new file mode 100644
index 000..c4e3269ab3b
--- /dev/null
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/DefaultSchema.java
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.schemas;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+import java.lang.reflect.InvocationTargetException;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.CheckForNull;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TypeDescriptor;
+
+/**
+ * The {@link DefaultSchema} annotation specifies a {@link SchemaProvider} 
class to handle obtaining
+ * a schema and row for the specified class.
+ *
+ * For example, if your class is JavaBean, the JavaBeanSchema provider 
class knows how to vend
+ * schemas for this class. You can annotate it as follows:
+ *
+ * 
+ *   {@literal @}DefaultSchema(JavaBeanSchema.class)
+ *   class MyClass {
+ * public String getFoo();
+ * void setFoo(String foo);
+ *   
+ *   }
+ * 
+ */
+@Documented
+@Retention(RetentionPolicy.RUNTIME)
+@Target(ElementType.TYPE)
+@SuppressWarnings("rawtypes")
+@Experimental(Kind.SCHEMAS)
+public @interface DefaultSchema {
+  @CheckForNull
+  Class value();
+
+  /**
+   * {@link SchemaProvider} for default schemas. Looks up the provider 
annotated for a type, and
+   * delegates to that provider.
+   */
+  class DefaultSchemaProvider extends SchemaProvider {
+final Map cachedProviders = 
Maps.newConcurrentMap();
+
+@Nullable
+private SchemaProvider getSchemaProvider(TypeDescriptor typeDescriptor) 
{
+  return cachedProviders.computeIfAbsent(
+  typeDescriptor,
+  type -> {
+Class clazz = type.getRawType();
+DefaultSchema annotation = 
clazz.getAnnotation(DefaultSchema.class);
+if (annotation == null) {
+  return null;
+}
+Class providerClass = annotation.value();
+checkArgument(
+providerClass != null,
+"Type "
++ type
++ " has a @DefaultSchemaProvider annotation "
++ " with a null argument.");
+
+try {
+  return providerClass.getDeclaredConstructor().newInstance();
+} catch (NoSuchMethodException
+| InstantiationException
+| IllegalAccessException
+| InvocationTargetException e) {
+  throw new IllegalStateException(
+  "Failed to create SchemaProvider "
+  + providerClass.getSimpleName()
+  + " which was"
+

[beam] branch master updated (a56ce43 -> 552be73)

2018-07-02 Thread reuvenlax
This is an automated email from the ASF dual-hosted git repository.

reuvenlax pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a56ce43  Merge pull request #5337 from RobbeSneyders/typehints
 add 2ed7f4e  Add SchemaProviderRegistrar class.
 add a8ad643  Add DefaultSchema and fix failures turned up by plugins.
 add 6892a32  Address review commemnts.
 new 552be73  Merge pull request #5840: [BEAM-4451] Schemas default 
provider and serviceloader

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/beam/sdk/schemas/DefaultSchema.java | 144 +
 .../apache/beam/sdk/schemas/SchemaProvider.java|   4 +
 .../SchemaProviderRegistrar.java}  |  22 ++--
 .../apache/beam/sdk/schemas/SchemaRegistry.java|  35 -
 .../beam/sdk/schemas/SchemaRegistryTest.java   |  82 
 5 files changed, 274 insertions(+), 13 deletions(-)
 create mode 100644 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/DefaultSchema.java
 copy 
sdks/java/core/src/main/java/org/apache/beam/sdk/{coders/CoderProviderRegistrar.java
 => schemas/SchemaProviderRegistrar.java} (65%)



[beam] 01/01: Merge pull request #5840: [BEAM-4451] Schemas default provider and serviceloader

2018-07-02 Thread reuvenlax
This is an automated email from the ASF dual-hosted git repository.

reuvenlax pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 552be73d3ebde253323d41d0f9e891cdf55cc1e1
Merge: a56ce43 6892a32
Author: reuvenlax 
AuthorDate: Mon Jul 2 17:12:24 2018 -0700

Merge pull request #5840: [BEAM-4451] Schemas default provider and 
serviceloader

 .../org/apache/beam/sdk/schemas/DefaultSchema.java | 144 +
 .../apache/beam/sdk/schemas/SchemaProvider.java|   4 +
 .../beam/sdk/schemas/SchemaProviderRegistrar.java  |  42 ++
 .../apache/beam/sdk/schemas/SchemaRegistry.java|  35 -
 .../beam/sdk/schemas/SchemaRegistryTest.java   |  82 
 5 files changed, 305 insertions(+), 2 deletions(-)



[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118502
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198317502
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem_test.py
 ##
 @@ -17,19 +17,28 @@
 #
 
 """Unit tests for filesystem module."""
+from __future__ import absolute_import
+from __future__ import division
+
 import bz2
 import gzip
 import logging
 import os
 import tempfile
 import unittest
-from StringIO import StringIO
+from builtins import range
+from io import BytesIO
+
+from future import standard_library
+from future.utils import iteritems
 
 from apache_beam.io.filesystem import CompressedFile
 from apache_beam.io.filesystem import CompressionTypes
 from apache_beam.io.filesystem import FileMetadata
 from apache_beam.io.filesystem import FileSystem
 
+standard_library.install_aliases()
 
 Review comment:
   Should we do this before `import io`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118502)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118503
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198316763
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsource_test.py
 ##
 @@ -44,6 +48,8 @@
 from apache_beam.transforms.display import DisplayData
 from apache_beam.transforms.display_test import DisplayDataItemMatcher
 
+standard_library.install_aliases()
 
 Review comment:
   Should we do this before `import io` on line 21?  This only works now 
because by the arbitrary current import order, some other module has called 
`install_aliases()` before we get to line 21.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118503)
Time Spent: 2h  (was: 1h 50m)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118499
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198317488
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem.py
 ##
 @@ -22,30 +22,42 @@
 """
 
 from __future__ import absolute_import
+from __future__ import division
 
 import abc
 import bz2
-import cStringIO
 import fnmatch
 import logging
 import os
 import posixpath
 import re
 import time
 import zlib
+from builtins import object
+from builtins import zip
+from io import BytesIO
 
-from six import integer_types
-from six import string_types
+from future import standard_library
+from future.utils import with_metaclass
 
 from apache_beam.utils.plugin import BeamPlugin
 
+standard_library.install_aliases()
 
 Review comment:
   Should we do this before `import io`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118499)
Time Spent: 1h 50m  (was: 1h 40m)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118501
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r199652683
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsink.py
 ##
 @@ -307,7 +314,7 @@ def _rename_batch(batch):
   return exceptions
 
   exception_batches = util.run_using_threadpool(
-  _rename_batch, zip(source_file_batch, destination_file_batch),
+  _rename_batch, list(zip(source_file_batch, destination_file_batch)),
 
 Review comment:
   Reading `util.run_using_threadpool`, it looks like the code calls 
`len(inputs)`, which may not be compatible with all iterables.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118501)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118504
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198317522
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -40,6 +43,9 @@
 from apache_beam.io.filesystemio import UploaderStream
 from apache_beam.utils import retry
 
+standard_library.install_aliases()
 
 Review comment:
   Should we do this before `import io`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118504)
Time Spent: 2h  (was: 1h 50m)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118498
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198317709
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -483,6 +489,7 @@ def size(self):
 return self._size
 
   def get_range(self, start, end):
+self._download_stream.seek(0)
 
 Review comment:
   This is fine, but I'm curious why you had to make this change--did you 
encounter an error before this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118498)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118497
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198317551
 
 

 ##
 File path: sdks/python/apache_beam/io/tfrecordio_test.py
 ##
 @@ -42,6 +46,8 @@
 from apache_beam.testing.util import assert_that
 from apache_beam.testing.util import equal_to
 
+standard_library.install_aliases()
 
 Review comment:
   Should we do this before `import io`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118497)
Time Spent: 1h 40m  (was: 1.5h)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118505
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r199652508
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio.py
 ##
 @@ -60,6 +63,8 @@
 from apache_beam.io.iobase import Read
 from apache_beam.transforms import PTransform
 
+standard_library.install_aliases()
 
 Review comment:
   This is needed for `import io`, but is misplaced.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118505)
Time Spent: 2h  (was: 1h 50m)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4000) Futurize and fix python 2 compatibility for io subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4000?focusedWorklogId=118500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118500
 ]

ASF GitHub Bot logged work on BEAM-4000:


Author: ASF GitHub Bot
Created on: 03/Jul/18 00:05
Start Date: 03/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5715: [BEAM-4000] Futurize io subpackage
URL: https://github.com/apache/beam/pull/5715#discussion_r198316034
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio.py
 ##
 @@ -60,6 +63,8 @@
 from apache_beam.io.iobase import Read
 from apache_beam.transforms import PTransform
 
+standard_library.install_aliases()
 
 Review comment:
   Should we do this before `import io` on line 46?  This only works now 
because by the arbitrary current import order, some other module has called 
`install_aliases()` before we get to line 46.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118500)
Time Spent: 1h 50m  (was: 1h 40m)

> Futurize and fix python 2 compatibility for io subpackage
> -
>
> Key: BEAM-4000
> URL: https://issues.apache.org/jira/browse/BEAM-4000
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118496
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:58
Start Date: 02/Jul/18 23:58
Worklog Time Spent: 10m 
  Work Description: cclauss commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401972217
 
 
   Yes.  The commits were squashed into a single commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118496)
Time Spent: 12h  (was: 11h 50m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4007) Futurize and fix python 2 compatibility for typehints subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4007?focusedWorklogId=118495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118495
 ]

ASF GitHub Bot logged work on BEAM-4007:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:57
Start Date: 02/Jul/18 23:57
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5872: [BEAM-4007] 
Fix TODO style in typehints.py
URL: https://github.com/apache/beam/pull/5872#issuecomment-401972046
 
 
   R: @RobbeSneyders


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118495)
Time Spent: 4.5h  (was: 4h 20m)

> Futurize and fix python 2 compatibility for typehints subpackage
> 
>
> Key: BEAM-4007
> URL: https://issues.apache.org/jira/browse/BEAM-4007
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4007) Futurize and fix python 2 compatibility for typehints subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4007?focusedWorklogId=118494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118494
 ]

ASF GitHub Bot logged work on BEAM-4007:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:57
Start Date: 02/Jul/18 23:57
Worklog Time Spent: 10m 
  Work Description: charlesccychen opened a new pull request #5872: 
[BEAM-4007] Fix TODO style in typehints.py
URL: https://github.com/apache/beam/pull/5872
 
 
   This change fixes the TODO style for a few lines introduced by 
https://github.com/apache/beam/pull/5337.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118494)
Time Spent: 4h 20m  (was: 4h 10m)

> Futurize and fix python 2 compatibility for typehints subpackage
> 
>
> Key: BEAM-4007
> URL: https://issues.apache.org/jira/browse/BEAM-4007
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4372) Need an undeprecated Reshuffle transform

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-4372:
--

Assignee: Yueyang Qiu  (was: Eugene Kirpichov)

> Need an undeprecated Reshuffle transform
> 
>
> Key: BEAM-4372
> URL: https://issues.apache.org/jira/browse/BEAM-4372
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Raghu Angadi
>Assignee: Yueyang Qiu
>Priority: Major
>
> {{Reshuffle}} transform is a convenient wrapper around GBK. It preserves user 
> windowing correctly which is not always obvious for users. It usage has grown 
> over many months since it was deprecated as it makes it easier for potential 
> incorrect/unportable use of GroupByKey semantics. See [dev 
> thread|https://lists.apache.org/thread.html/820064a81c86a6d44f21f0d6c68ea3f46cec151e5e1a0b52eeed3fbf@%3Cdev.beam.apache.org%3E]
>  for more discussion.
>  
> There is a broad agreement we do need a transform that is meant for its 
> common used purpose in user pipeline : to control parallelism (e.g. to 
> control connections to a DB). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3194) Support annotating that a DoFn requires stable / deterministic input for replay/retry

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-3194:
--

Assignee: Yueyang Qiu  (was: Eugene Kirpichov)

> Support annotating that a DoFn requires stable / deterministic input for 
> replay/retry
> -
>
> Key: BEAM-3194
> URL: https://issues.apache.org/jira/browse/BEAM-3194
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Yueyang Qiu
>Priority: Major
>
> See the thread: 
> https://lists.apache.org/thread.html/5fd81ce371aeaf642665348f8e6940e308e04275dd7072f380f9f945@%3Cdev.beam.apache.org%3E
> We need this in order to have truly cross-runner end-to-end exactly once via 
> replay + idempotence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4007) Futurize and fix python 2 compatibility for typehints subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4007?focusedWorklogId=118492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118492
 ]

ASF GitHub Bot logged work on BEAM-4007:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:55
Start Date: 02/Jul/18 23:55
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5337: [BEAM-4007] 
Futurize typehints subpackage
URL: https://github.com/apache/beam/pull/5337
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/typehints/__init__.py 
b/sdks/python/apache_beam/typehints/__init__.py
index e89afa1285a..23d0b40d07f 100644
--- a/sdks/python/apache_beam/typehints/__init__.py
+++ b/sdks/python/apache_beam/typehints/__init__.py
@@ -17,6 +17,8 @@
 
 """A package defining the syntax and decorator semantics for type-hints."""
 
+from __future__ import absolute_import
+
 # pylint: disable=wildcard-import
 from apache_beam.typehints.typehints import *
 from apache_beam.typehints.decorators import *
diff --git a/sdks/python/apache_beam/typehints/decorators.py 
b/sdks/python/apache_beam/typehints/decorators.py
index 88160c04660..6604cf12081 100644
--- a/sdks/python/apache_beam/typehints/decorators.py
+++ b/sdks/python/apache_beam/typehints/decorators.py
@@ -83,8 +83,13 @@ def foo((a, b)):
 defined, or before importing a module containing type-hinted functions.
 """
 
+from __future__ import absolute_import
+
 import inspect
 import types
+from builtins import next
+from builtins import object
+from builtins import zip
 
 from apache_beam.typehints import native_type_compatibility
 from apache_beam.typehints import typehints
@@ -175,7 +180,7 @@ def with_defaults(self, hints):
 return IOTypeHints(self.input_types or hints.input_types,
self.output_types or hints.output_types)
 
-  def __nonzero__(self):
+  def __bool__(self):
 return bool(self.input_types or self.output_types)
 
   def __repr__(self):
@@ -404,7 +409,7 @@ def with_output_types(*return_type_hint, **kwargs):
 from apache_beam.typehints import with_output_types
 from apache_beam.typehints import Set
 
-class Coordinate:
+class Coordinate(object):
   def __init__(self, x, y):
 self.x = x
 self.y = y
diff --git a/sdks/python/apache_beam/typehints/native_type_compatibility.py 
b/sdks/python/apache_beam/typehints/native_type_compatibility.py
index 0be931e8fe2..87fb2c80816 100644
--- a/sdks/python/apache_beam/typehints/native_type_compatibility.py
+++ b/sdks/python/apache_beam/typehints/native_type_compatibility.py
@@ -17,8 +17,12 @@
 
 """Module to convert Python's native typing types to Beam types."""
 
+from __future__ import absolute_import
+
 import collections
 import typing
+from builtins import next
+from builtins import range
 
 from apache_beam.typehints import typehints
 
diff --git 
a/sdks/python/apache_beam/typehints/native_type_compatibility_test.py 
b/sdks/python/apache_beam/typehints/native_type_compatibility_test.py
index 4171507f345..2abde69a95f 100644
--- a/sdks/python/apache_beam/typehints/native_type_compatibility_test.py
+++ b/sdks/python/apache_beam/typehints/native_type_compatibility_test.py
@@ -17,6 +17,8 @@
 
 """Test for Beam type compatibility library."""
 
+from __future__ import absolute_import
+
 import typing
 import unittest
 
diff --git a/sdks/python/apache_beam/typehints/opcodes.py 
b/sdks/python/apache_beam/typehints/opcodes.py
index 252bcf50e35..a9874cfbda3 100644
--- a/sdks/python/apache_beam/typehints/opcodes.py
+++ b/sdks/python/apache_beam/typehints/opcodes.py
@@ -33,8 +33,6 @@
 import types
 from functools import reduce
 
-import six
-
 from . import typehints
 from .trivial_inference import BoundMethod
 from .trivial_inference import Const
@@ -47,6 +45,11 @@
 from .typehints import Tuple
 from .typehints import Union
 
+try:# Python 2
+  unicode   # pylint: disable=unicode-builtin
+except NameError:   # Python 3
+  unicode = str
+
 
 def pop_one(state, unused_arg):
   del state.stack[-1:]
@@ -152,7 +155,7 @@ def binary_true_divide(state, unused_arg):
 def binary_subscr(state, unused_arg):
   index = state.stack.pop()
   base = state.stack.pop()
-  if base in (str, six.text_type):
+  if base in (str, unicode):
 out = base
   elif (isinstance(index, Const) and isinstance(index.value, int)
 and isinstance(base, typehints.TupleHint.TupleConstraint)):
diff --git a/sdks/python/apache_beam/typehints/trivial_inference.py 
b/sdks/python/apache_beam/typehints/trivial_inference.py
index 92770fba10a..d8181ad7c78 100644
--- a/sdks/python/apache_beam/typehints/trivial_inference.py
+++ 

[jira] [Work logged] (BEAM-4007) Futurize and fix python 2 compatibility for typehints subpackage

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4007?focusedWorklogId=118491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118491
 ]

ASF GitHub Bot logged work on BEAM-4007:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:55
Start Date: 02/Jul/18 23:55
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5337: [BEAM-4007] 
Futurize typehints subpackage
URL: https://github.com/apache/beam/pull/5337#issuecomment-401971768
 
 
   Thanks, this LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118491)
Time Spent: 4h  (was: 3h 50m)

> Futurize and fix python 2 compatibility for typehints subpackage
> 
>
> Key: BEAM-4007
> URL: https://issues.apache.org/jira/browse/BEAM-4007
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5337 from RobbeSneyders/typehints

2018-07-02 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit a56ce43109c97c739fa08adca45528c41e3c925c
Merge: f53bc09 7b79c0d
Author: Charles Chen 
AuthorDate: Mon Jul 2 16:55:52 2018 -0700

Merge pull request #5337 from RobbeSneyders/typehints

[BEAM-4007] Futurize typehints subpackage

 sdks/python/apache_beam/typehints/__init__.py  |  2 ++
 sdks/python/apache_beam/typehints/decorators.py|  9 +++--
 .../typehints/native_type_compatibility.py |  4 
 .../typehints/native_type_compatibility_test.py|  2 ++
 sdks/python/apache_beam/typehints/opcodes.py   |  9 ++---
 .../apache_beam/typehints/trivial_inference.py | 13 -
 .../typehints/trivial_inference_test.py|  3 +++
 sdks/python/apache_beam/typehints/typecheck.py | 21 +
 .../apache_beam/typehints/typed_pipeline_test.py   |  3 +++
 sdks/python/apache_beam/typehints/typehints.py | 22 +-
 .../python/apache_beam/typehints/typehints_test.py |  5 +
 sdks/python/tox.ini|  1 +
 12 files changed, 75 insertions(+), 19 deletions(-)



[beam] branch master updated (f53bc09 -> a56ce43)

2018-07-02 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from f53bc09  Merge pull request #5856: Remove aliased tables in Nexmark 
SQL query 5
 add ffe4eab  Futurize typehints subpackage
 add 7b79c0d  Address PR comments
 new a56ce43  Merge pull request #5337 from RobbeSneyders/typehints

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/typehints/__init__.py  |  2 ++
 sdks/python/apache_beam/typehints/decorators.py|  9 +++--
 .../typehints/native_type_compatibility.py |  4 
 .../typehints/native_type_compatibility_test.py|  2 ++
 sdks/python/apache_beam/typehints/opcodes.py   |  9 ++---
 .../apache_beam/typehints/trivial_inference.py | 13 -
 .../typehints/trivial_inference_test.py|  3 +++
 sdks/python/apache_beam/typehints/typecheck.py | 21 +
 .../apache_beam/typehints/typed_pipeline_test.py   |  3 +++
 sdks/python/apache_beam/typehints/typehints.py | 22 +-
 .../python/apache_beam/typehints/typehints_test.py |  5 +
 sdks/python/tox.ini|  1 +
 12 files changed, 75 insertions(+), 19 deletions(-)



[jira] [Commented] (BEAM-3545) Fn API metrics in Go SDK harness

2018-07-02 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530591#comment-16530591
 ] 

Robert Burke commented on BEAM-3545:


I have a suspicion that I have not implemented gauges correctly. I could not 
find a semantics doc. I'm not certain of this, but I suspect we should be 
propagating the maximum value + associated timestamp, rather than the "last 
seen" value + associated timestamp which is what it is now. It shouldn't affect 
the user side at all though.

I have some performance improvements for the metrics in the works, which should 
be sufficient to close this, I just need to write some benchmarks to discourage 
regression.

 

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118490
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:46
Start Date: 02/Jul/18 23:46
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401970262
 
 
   The build ran once more after d9d8689 was pushed. See: 
https://builds.apache.org/job/beam_PreCommit_Python_Commit/199/


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118490)
Time Spent: 11h 50m  (was: 11h 40m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118489
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:43
Start Date: 02/Jul/18 23:43
Worklog Time Spent: 10m 
  Work Description: holdenk commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401969902
 
 
   Isn't https://builds.apache.org/job/beam_PreCommit_Python_Commit/166/console 
from your approach or am I looking at the wrong build logs?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118489)
Time Spent: 11h 40m  (was: 11.5h)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118488
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:35
Start Date: 02/Jul/18 23:35
Worklog Time Spent: 10m 
  Work Description: cclauss edited a comment on issue #5843: [BEAM-3761] 
Define cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401968502
 
 
   There are no Jenkins build issues using my approach  ;-)
   
   Besides, I think you already put futurize in all appropriate places in 
tox.ini


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118488)
Time Spent: 11.5h  (was: 11h 20m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118487
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:34
Start Date: 02/Jul/18 23:34
Worklog Time Spent: 10m 
  Work Description: cclauss commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401968502
 
 
   There are no Jenkins build issues using my approach  ;-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118487)
Time Spent: 11h 20m  (was: 11h 10m)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5856: Remove aliased tables in Nexmark SQL query 5

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit f53bc0997c47708955a8f7ce0ac342294e9039ac
Merge: b6d4eeb 4c3e288
Author: Kenn Knowles 
AuthorDate: Mon Jul 2 16:28:28 2018 -0700

Merge pull request #5856: Remove aliased tables in Nexmark SQL query 5

 .../beam/sdk/nexmark/queries/sql/SqlQuery5.java| 49 +-
 1 file changed, 30 insertions(+), 19 deletions(-)



[beam] branch master updated (b6d4eeb -> f53bc09)

2018-07-02 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from b6d4eeb  Merge pull request #5864: [BEAM-4547] Instantiate $SUM0 as a 
SUM operator
 add cd2e72c  Remove aliased tables in Nexmark SQL query 5
 add 4c3e288  fixup! Remove aliased tables in Nexmark SQL query 5
 new f53bc09  Merge pull request #5856: Remove aliased tables in Nexmark 
SQL query 5

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../beam/sdk/nexmark/queries/sql/SqlQuery5.java| 49 +-
 1 file changed, 30 insertions(+), 19 deletions(-)



[jira] [Assigned] (BEAM-4288) SplittableDoFn: splitAtFraction() API for Python

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-4288:
--

Assignee: Chamikara Jayalath  (was: Eugene Kirpichov)

> SplittableDoFn: splitAtFraction() API for Python
> 
>
> Key: BEAM-4288
> URL: https://issues.apache.org/jira/browse/BEAM-4288
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Chamikara Jayalath
>Priority: Major
>
> SDF currently only has a checkpoint() API. This Jira is about adding the 
> splitAtFraction() API and its support in runners that support the respective 
> feature for sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3761) Fix Python 3 cmp function

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3761?focusedWorklogId=118486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118486
 ]

ASF GitHub Bot logged work on BEAM-3761:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:25
Start Date: 02/Jul/18 23:25
Worklog Time Spent: 10m 
  Work Description: holdenk commented on issue #5843: [BEAM-3761] Define 
cmp() in Python 3
URL: https://github.com/apache/beam/pull/5843#issuecomment-401966969
 
 
   No, I'm not suggesting you change your approach I'm talking about addressing 
the jenkins build issues?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118486)
Time Spent: 11h 10m  (was: 11h)

> Fix Python 3 cmp function
> -
>
> Key: BEAM-3761
> URL: https://issues.apache.org/jira/browse/BEAM-3761
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: holdenk
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Various functions don't exist in Python 3 that did in python 2. This Jira is 
> to fix the use of cmp (which often will involve rewriting __cmp__ as well).
>  
> Note: there are existing PRs for basestring and unicode ( 
> [https://github.com/apache/beam/pull/4697|https://github.com/apache/beam/pull/4697,]
>  , [https://github.com/apache/beam/pull/4730] )
>  
> Note once all of the missing names/functions are fixed we can enable F821 in 
> falke8 python 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4204) Python: PortableRunner - p.run() via given JobService

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-4204:
--

Assignee: Ankur Goenka  (was: Eugene Kirpichov)

> Python: PortableRunner - p.run() via given JobService
> -
>
> Key: BEAM-4204
> URL: https://issues.apache.org/jira/browse/BEAM-4204
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Like BEAM-4071 but for Python. Is this fully encompassed by 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/universal_local_runner.py]
>  ? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4204) Python: PortableRunner - p.run() via given JobService

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-4204.
--
   Resolution: Fixed
Fix Version/s: (was: Not applicable)
   2.5.0

> Python: PortableRunner - p.run() via given JobService
> -
>
> Key: BEAM-4204
> URL: https://issues.apache.org/jira/browse/BEAM-4204
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Like BEAM-4071 but for Python. Is this fully encompassed by 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/universal_local_runner.py]
>  ? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3268) getPerDestinationOutputFilenames() is getting processed before write is finished on dataflow runner

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-3268.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

> getPerDestinationOutputFilenames() is getting processed before write is 
> finished on dataflow runner
> ---
>
> Key: BEAM-3268
> URL: https://issues.apache.org/jira/browse/BEAM-3268
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.3.0
>Reporter: Kamil Szewczyk
>Assignee: Eugene Kirpichov
>Priority: Major
> Fix For: 2.5.0
>
> Attachments: comparison.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> While running filebased-io-test we found dataflow-runnner misbehaving. We run 
> tests using single pipeline and without using Reshuffling between writing and 
> reading dataflow jobs are unsuccessful because the runner tries to access the 
> files that were not created yet. 
> On the picture the difference between execution of writting is presented. On 
> the left there is working example with Reshuffling added and on the right 
> without it.
> !comparison.png|thumbnail!
> Steps to reproduce: substitute your-bucket-name wit your valid bucket.
> {code:java}
> mvn -e -Pio-it verify -pl sdks/java/io/file-based-io-tests 
> -DintegrationTestPipelineOptions='["--runner=dataflow", 
> "--filenamePrefix=gs://your-bucket-name/TEXTIO_IT"]' -Pdataflow-runner
> {code}
> Then look on the cloud console and job should fail.
> Now add Reshuffling to 
> sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
>  as in the example.
> {code:java}
> .getPerDestinationOutputFilenames().apply(Values.create())
> .apply(Reshuffle.viaRandomKey());
> PCollection consolidatedHashcode = testFilenames
> {code}
> and trigger previously used maven command to see it working in the console 
> right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4166) FnApiDoFnRunner doesn't invoke setup/teardown

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-4166.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

> FnApiDoFnRunner doesn't invoke setup/teardown
> -
>
> Key: BEAM-4166
> URL: https://issues.apache.org/jira/browse/BEAM-4166
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> And we apparently lack test coverage for that - one would think that 
> ValidatesRunner tests would check lifecycle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4064) ClassCastExeption when reading Avro files using specific records with org.apache.avro.util.Utf8 fields

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-4064:
--

Assignee: Chamikara Jayalath  (was: Eugene Kirpichov)

> ClassCastExeption when reading Avro files using specific records with 
> org.apache.avro.util.Utf8 fields
> --
>
> Key: BEAM-4064
> URL: https://issues.apache.org/jira/browse/BEAM-4064
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-avro
>Affects Versions: 2.4.0
>Reporter: Przemyslaw Dubaniewicz
>Assignee: Chamikara Jayalath
>Priority: Major
>
> Reading Avro files using Avro-generated classes with 
> org.apache.avro.util.Utf8 fields fails with an exception:
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: java.lang.String 
> cannot be cast to org.apache.avro.util.Utf8
>     at com.example.avro.AvroRecord.put(AvroRecord.java:129)
>     at org.apache.avro.generic.GenericData.setField(GenericData.java:690)
>     at org.apache.avro.reflect.ReflectData.setField(ReflectData.java:135)
>     at org.apache.avro.reflect.ReflectData.setField(ReflectData.java:128)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:119)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:310)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
>     at 
> org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:577)
>     at 
> org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:223)
>     at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
>     at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:468)
>     at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:261)
>     at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
>     at 
> org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:161)
>     at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:125)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4722) Dataflow post-commits failing due to insufficient INSTANCE_TEMPLATES quota

2018-07-02 Thread Rafael Fernandez (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafael Fernandez resolved BEAM-4722.

   Resolution: Fixed
Fix Version/s: Not applicable

3x'd instance templates.

> Dataflow post-commits failing due to insufficient INSTANCE_TEMPLATES quota
> --
>
> Key: BEAM-4722
> URL: https://issues.apache.org/jira/browse/BEAM-4722
> Project: Beam
>  Issue Type: Improvement
>  Components: gcp-quota, runner-dataflow, testing
>Reporter: Scott Wegner
>Assignee: Rafael Fernandez
>Priority: Major
> Fix For: Not applicable
>
>
> See: https://github.com/apache/beam/pull/5861
> We recently increased the parallelism of Dataflow ValidatesRunner tests. 
> However, when we ran 3 concurrent builds we saw them all fail with 
> insufficient INSTANCE_TEMPLATES quota errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2810) Consider a faster Avro library in Python

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2810?focusedWorklogId=118485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118485
 ]

ASF GitHub Bot logged work on BEAM-2810:


Author: ASF GitHub Bot
Created on: 02/Jul/18 23:23
Start Date: 02/Jul/18 23:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5496: [BEAM-2810] 
use fastavro in Avro IO
URL: https://github.com/apache/beam/pull/5496#issuecomment-401966713
 
 
   This change fails the isort lint test:
   
   ```
   Running isort for module apache_beam  gen_protos.py  setup.py  
test_config.py:
   ERROR: /home/git/beam/sdks/python/apache_beam/io/avroio.py Imports are 
incorrectly sorted.
   --- /home/git/beam/sdks/python/apache_beam/io/avroio.py:before   
2018-07-02 12:57:38.472207
   +++ /home/git/beam/sdks/python/apache_beam/io/avroio.py:after
2018-07-02 16:20:40.310009
   @@ -51,8 +51,6 @@
from avro import io as avroio
from avro import datafile
from avro import schema
   -from fastavro.read import block_reader
   -from fastavro.write import Writer

import apache_beam as beam
from apache_beam.io import filebasedsink
   @@ -61,6 +59,8 @@
from apache_beam.io.filesystem import CompressionTypes
from apache_beam.io.iobase import Read
from apache_beam.transforms import PTransform
   +from fastavro.read import block_reader
   +from fastavro.write import Writer

__all__ = ['ReadFromAvro', 'ReadAllFromAvro', 'WriteToAvro']

   Command exited with non-zero status 1
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118485)
Time Spent: 5h 40m  (was: 5.5h)

> Consider a faster Avro library in Python
> 
>
> Key: BEAM-2810
> URL: https://issues.apache.org/jira/browse/BEAM-2810
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Ryan Williams
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> https://stackoverflow.com/questions/45870789/bottleneck-on-data-source
> Seems like this job is reading Avro files (exported by BigQuery) at about 2 
> MB/s.
> We use the standard Python "avro" library which is apparently known to be 
> very slow (10x+ slower than Java) 
> http://apache-avro.679487.n3.nabble.com/Avro-decode-very-slow-in-Python-td4034422.html,
>  and there are alternatives e.g. https://pypi.python.org/pypi/fastavro/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3874) Switch AvroIO sink default codec to Snappy

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-3874:
--

Assignee: (was: Eugene Kirpichov)

> Switch AvroIO sink default codec to Snappy
> --
>
> Key: BEAM-3874
> URL: https://issues.apache.org/jira/browse/BEAM-3874
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-avro
>Reporter: Marian Dvorsky
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> AvroIO currently uses CodecFactory.deflateCodec(6) as the default codec for 
> writes.
> That compresses well, but is quite expensive.
> Snappy codec offers sparser, but much faster compression, and is typically a 
> better CPU/storage tradeoff except for very long lived files. 
> We should consider switching the default to Snappy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3945) TFRecord Performance Tests doesn't work on hdfs

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-3945:
--

Assignee: Udi Meiri  (was: Eugene Kirpichov)

> TFRecord Performance Tests doesn't work on hdfs
> ---
>
> Key: BEAM-3945
> URL: https://issues.apache.org/jira/browse/BEAM-3945
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text, testing
>Reporter: Kamil Szewczyk
>Assignee: Udi Meiri
>Priority: Minor
>
> TFRecord have issue reading files from hdfs using filename pattern 
> _"hdfs://...*"_
> {code:java}
> TFRecordIO.read().from(filenamePattern).withCompression(AUTO){code}
> [link to 
> github|https://github.com/apache/beam/blob/36257aba9054e664ebaafccfefb78bf54a162618/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java#L113]
>  this is a blocker for running full set of filebased io tests on hdfs.
> Steps to reproduce:
>  1. Create remote hadoop environment. This step asume you have local kubectl 
> tool configured to use your GCP project.
> {code}
> pushd .test-infra/kubernetes/hadoop/SmallITCluster/ && /bin/bash 
> ./setup-all.sh && popd
> {code}
> 2. Update /etc/hosts file with the provided output from sctipt.
> 3. Confirm that it works and hadoop web interface is accessible on 
> {color:#FF}http://hadoop-x:50070{color} where x is added in step2 
> sequence from your /etc/hosts entry. Please also substitute x in further 
> usages of this.
>  4. Tell runner to use root as hadoop user.
> {code}
> export HADOOP_USER_NAME=root
> {code}
> 5. Run TFRecord tests on this environment using DirectRunner:
> {code}
> mvn -e -Pio-it verify -pl sdks/java/io/file-based-io-tests/ 
> -Dit.test=org.apache.beam.sdk.io.tfrecord.TFRecordIOIT -Dfilesystem=hdfs 
> -DintegrationTestPipelineOptions='["--filenamePrefix=hdfs://hadoop-x:9000/TFRecord",
>  "--hdfsConfiguration=[{\"fs.defaultFS\" : \"hdfs://hadoop-x:9000\", 
> \"dfs.replication\": 1, \"dfs.client.use.datanode.hostname\":\"true\"}]" ]' 
> -DforceDirectRunner=true 
> {code}
> The error message is:
> {code:bash}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 78.055 s <<< FAILURE! - in org.apache.beam.sdk.io.tfrecord.TFRecordIOIT
> [ERROR] writeThenReadAll(org.apache.beam.sdk.io.tfrecord.TFRecordIOIT)  Time 
> elapsed: 78.055 s  <<< ERROR!
> java.lang.IllegalStateException: Invalid data
> at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
> at 
> org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:642)
> at 
> org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526)
> at 
> org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426)
> at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
> at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267)
> at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:148)
> at 
> org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:161)
> at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:125)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> This results were also observed when running tests on jenkins. [Link to 
> jenkins 
> build|https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_TFRecordIOIT_HDFS/3/console]
> When you open http://hadoop-x:50070/explorer.html#/ you will see TFRecord 
> files that were created during write phase. Unable to be processed in reading 
> phase.
> {color:red}Important note{color}: if I copy files made by writing pipeline 
> from hdfs directory to local directory and run reading pipeline over them, 
> everything is working fine, so only reading from hdfs is a problem.
> You can wipe out hdfs environment by runnning:
> {code}
> pushd .test-infra/kubernetes/hadoop/SmallITCluster/ && /bin/bash 
> ./teardown-all.sh && popd
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3907) Clarify how watermark is estimated for watchForNewFiles() transforms

2018-07-02 Thread Eugene Kirpichov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-3907.
--
   Resolution: Fixed
Fix Version/s: 2.6.0

This will not need clarification since I fixed a bug in watermark advancement 
in watchForNewFiles(). It will need clarification if somebody adds custom 
timestamps/watermarks.

> Clarify how watermark is estimated for watchForNewFiles() transforms
> 
>
> Key: BEAM-3907
> URL: https://issues.apache.org/jira/browse/BEAM-3907
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Eugene Kirpichov
>Priority: Major
> Fix For: 2.6.0
>
>
> For example 
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L488]
> It's not clear how the watermark will be estimated/incremented when using 
> these transforms. 
> Other source implementations seems to be describing this. For example: 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L89



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >