[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121228
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 05:43
Start Date: 10/Jul/18 05:43
Worklog Time Spent: 10m 
  Work Description: cclauss commented on issue #5869: [BEAM-1251] Replace 
NameError-driven dispatch with ``past``
URL: https://github.com/apache/beam/pull/5869#issuecomment-403707090
 
 
   Please resolve conflict.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121228)
Time Spent: 17.5h  (was: 17h 20m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4749) fastavro breaks macos tests

2018-07-09 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-4749:
-

 Summary: fastavro breaks macos tests
 Key: BEAM-4749
 URL: https://issues.apache.org/jira/browse/BEAM-4749
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Ahmet Altay
Assignee: Chamikara Jayalath


Recent addition of the fastavro dependency is breaking python linter in macos. 
At least for some cases, because it requires a compiler.

Could we optionally depend on fastavro, and fallback to regular avro package?

 

Log:

*> Task :beam-sdks-python:lintPy27*

ERROR: invocation failed (exit code 1), logfile: 
/Users/relax/beam-gradle/beam/sdks/python/target/.tox/py27-lint/log/py27-lint-2.log

ERROR: actionid: py27-lint

msg: installpkg

cmdargs: 
['/Users/relax/beam-gradle/beam/sdks/python/target/.tox/py27-lint/bin/python', 
'/Users/relax/beam-gradle/beam/sdks/python/target/.tox/py27-lint/bin/pip', 
'install', 
'/Users/relax/beam-gradle/beam/sdks/python/target/.tox/dist/apache-beam-2.6.0.dev0.zip[test]']

 

Processing ./target/.tox/dist/apache-beam-2.6.0.dev0.zip

Collecting avro<2.0.0,>=1.8.1 (from apache-beam==2.6.0.dev0)

Collecting crcmod<2.0,>=1.7 (from apache-beam==2.6.0.dev0)

Collecting dill==0.2.6 (from apache-beam==2.6.0.dev0)

Collecting fastavro==0.19.7 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/a7/0a/b08ba5cef63c675e8442c2bf1cbcef90c8b9f824be2202d492f0cedb0913/fastavro-0.19.7.tar.gz]

Collecting grpcio<2,>=1.8 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/66/89/4a90caabd51c17686cbb48a9bbe8c592c4be929c0d2542d2ffde76b0d671/grpcio-1.13.0-cp27-cp27m-macosx_10_12_x86_64.whl]

Collecting hdfs<3.0.0,>=2.1.0 (from apache-beam==2.6.0.dev0)

Collecting httplib2<=0.11.3,>=0.8 (from apache-beam==2.6.0.dev0)

Collecting mock<3.0.0,>=1.0.1 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/e6/35/f187bdf23be87092bd0f1200d43d23076cee4d0dec109f195173fd3ebc79/mock-2.0.0-py2.py3-none-any.whl]

Collecting oauth2client<5,>=2.0.1 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/82/d8/3eab58811282ac7271a081ba5c0d4b875ce786ca68ce43e2a62ade32e9a8/oauth2client-4.1.2-py2.py3-none-any.whl]

Collecting protobuf<4,>=3.5.0.post1 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/4f/56/a21f2d077ceae7fd521c0ed31fb8bb1c7f13ffbb09bf7dd27de6cf6bad08/protobuf-3.6.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl]

Collecting pytz<=2018.4,>=2018.3 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/dc/83/15f7833b70d3e067ca91467ca245bae0f6fe56ddc7451aa0dc5606b120f2/pytz-2018.4-py2.py3-none-any.whl]

Collecting pyyaml<4.0.0,>=3.12 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/4a/85/db5a2df477072b2902b0eb892feb37d88ac635d36245a72a6a69b23b383a/PyYAML-3.12.tar.gz]

Collecting pyvcf<0.7.0,>=0.6.8 (from apache-beam==2.6.0.dev0)

Requirement already satisfied: six<1.12,>=1.9 in 
./target/.tox/py27-lint/lib/python2.7/site-packages (from 
apache-beam==2.6.0.dev0) (1.11.0)

Collecting typing<3.7.0,>=3.6.0 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/0d/4d/4e5985d075d241d686a1663fa1f88b61d544658d08c1375c7c6aac32afc3/typing-3.6.4-py2-none-any.whl]

Collecting futures<4.0.0,>=3.1.1 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/2d/99/b2c4e9d5a30f6471e410a146232b4118e697fa3ffc06d6a65efde84debd0/futures-3.2.0-py2-none-any.whl]

Requirement already satisfied: future<1.0.0,>=0.16.0 in 
./target/.tox/py27-lint/lib/python2.7/site-packages (from 
apache-beam==2.6.0.dev0) (0.16.0)

Collecting nose>=1.3.7 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl]

Collecting pyhamcrest<2.0,>=1.9 (from apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/9a/d5/d37fd731b7d0e91afcc84577edeccf4638b4f9b82f5ffe2f8b62e2ddc609/PyHamcrest-1.9.0-py2.py3-none-any.whl]

Requirement already satisfied: setuptools>=18.0 in 
./target/.tox/py27-lint/lib/python2.7/site-packages (from 
fastavro==0.19.7->apache-beam==2.6.0.dev0) (39.2.0)

Requirement already satisfied: enum34>=1.0.4 in 
./target/.tox/py27-lint/lib/python2.7/site-packages (from 
grpcio<2,>=1.8->apache-beam==2.6.0.dev0) (1.1.6)

Collecting docopt (from hdfs<3.0.0,>=2.1.0->apache-beam==2.6.0.dev0)

Collecting requests>=2.7.0 (from hdfs<3.0.0,>=2.1.0->apache-beam==2.6.0.dev0)

  Using cached 
[https://files.pythonhosted.org/packages/65/47/7e02164a2a3db50ed6d8a6ab1d6d60b69c4c3fdf57a284257925dfc12bda/requests-2.19.1-py2.py3-none-any.whl]

Collecting 

[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121205
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:31
Start Date: 10/Jul/18 04:31
Worklog Time Spent: 10m 
  Work Description: aaltay edited a comment on issue #5869: [BEAM-1251] 
Replace NameError-driven dispatch with ``past``
URL: https://github.com/apache/beam/pull/5869#issuecomment-403695936
 
 
   Thank you @superbobry.
   
   Could you coordinate your changes with rest of the people working on python 
3 changes on the mailing list. There was a recent reviewed proposal for python 
3 conversion and some of the bits you are removing were added as part of it. 
You can find the proposal on the Beam web site: 
https://beam.apache.org/contribute/#python-3-support
   
   @tvalentyn could also help with coordination.
   
   (I noticed that you are already started coordinating on the mailing list. 
That is great, please share major planned changes, such as the use of `past` in 
the mailing list.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121205)
Time Spent: 17h 20m  (was: 17h 10m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121204
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:23
Start Date: 10/Jul/18 04:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #5869: [BEAM-1251] Replace 
NameError-driven dispatch with ``past``
URL: https://github.com/apache/beam/pull/5869#issuecomment-403695936
 
 
   Thank you @superbobry.
   
   Could you coordinate your changes with rest of the people working on python 
3 changes on the mailing list. There was a recent reviewed proposal for python 
3 conversion and some of the bits you are removing were added as part of it. 
You can find the proposal on the Beam web site: 
https://beam.apache.org/contribute/#python-3-support
   
   @tvalentyn could also help with coordination.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121204)
Time Spent: 17h 10m  (was: 17h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121203
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:19
Start Date: 10/Jul/18 04:19
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5887: [BEAM-1251] 
Upgrade from buffer to memoryview (again)
URL: https://github.com/apache/beam/pull/5887#issuecomment-403695410
 
 
   @angoenka @cclauss The postcommit issue was fixed and merged in 
https://github.com/apache/beam/pull/5908.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121203)
Time Spent: 17h  (was: 16h 50m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121198
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201211626
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
 
 Review comment:
   This could be a single line pydoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121198)
Time Spent: 2h  (was: 1h 50m)

> Example of distributed optimization
> ---
>
> Key: BEAM-4391
> URL: https://issues.apache.org/jira/browse/BEAM-4391
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Joachim van der Herten
>Assignee: Joachim van der Herten
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving 
> distributed optimization tasks. It will include an example of a optimization 
> problem with both discrete and continuous parameters, which is then solved 
> using Apache Beam. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121192
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201211562
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
 
 Review comment:
   Could you explain more about the example? What concepts are being 
illustrated? How to run the example?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121192)
Time Spent: 1h 20m  (was: 1h 10m)

> Example of distributed optimization
> ---
>
> Key: BEAM-4391
> URL: https://issues.apache.org/jira/browse/BEAM-4391
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Joachim van der Herten
>Assignee: Joachim van der Herten
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving 
> distributed optimization tasks. It will include an example of a optimization 
> problem with both discrete and continuous parameters, which is then solved 
> using Apache Beam. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121200=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121200
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201212061
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
+  """
+
+  class PreGenerateMappings(beam.DoFn):
+"""
+ParDo implementation which splits of 2 records and generated a sub grid.
+
+This facilitates parallellization of the grid generation.
+Emits both the PCollection representing the subgrid, as well as the list
+of remaining records. Both serve as an input to GenerateMappings
+"""
+
+def process(self, element):
+  records = list(element[1])
+  # Split of 2 crops and pre-generate all combinations to facilitate 
parallellism
+  # No point splitting of a crop which can only be created in 1 greenhouse,
+  # split of crops with highest number of options.
+  best_split = np.argsort([-len(rec['transport_costs']) for rec in 
records])[:2]
+  rec1 = records[best_split[0]]
+  rec2 = records[best_split[1]]
+
+  # Generate & emit all combinations
+  for a in rec1['transport_costs']:
+if a[1]:
+  for b in rec2['transport_costs']:
+if b[1]:
+  combination = [(rec1['crop'], a[0]), (rec2['crop'], b[0])]
+  yield pvalue.TaggedOutput('splitted', combination)
+
+  # Pass on remaining records
+  remaining = [rec for i, rec in enumerate(records) if i not in best_split]
+  yield pvalue.TaggedOutput('combine', remaining)
+
+  class GenerateMappings(beam.DoFn):
+"""
+ParDo implementation to generate all possible assignments of crops to 
greenhouses.
+
+Input: dict with crop, quantity and transport_costs keys, e.g.,
+{
+'crop': 'OP009',
+'quantity': 102,
+'transport_costs': [('A', None), ('B', 3), ('C', 8)]
+}
+
+Output: tuple (mapping_identifier, {crop -> greenhouse})
+"""
+
+@staticmethod
+def _coordinates_to_greenhouse(coordinates, greenhouses, crops):
+  # Map the grid 

[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121197
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201212282
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
+  """
+
+  class PreGenerateMappings(beam.DoFn):
+"""
+ParDo implementation which splits of 2 records and generated a sub grid.
+
+This facilitates parallellization of the grid generation.
+Emits both the PCollection representing the subgrid, as well as the list
+of remaining records. Both serve as an input to GenerateMappings
+"""
+
+def process(self, element):
+  records = list(element[1])
+  # Split of 2 crops and pre-generate all combinations to facilitate 
parallellism
+  # No point splitting of a crop which can only be created in 1 greenhouse,
+  # split of crops with highest number of options.
+  best_split = np.argsort([-len(rec['transport_costs']) for rec in 
records])[:2]
+  rec1 = records[best_split[0]]
+  rec2 = records[best_split[1]]
+
+  # Generate & emit all combinations
+  for a in rec1['transport_costs']:
+if a[1]:
+  for b in rec2['transport_costs']:
+if b[1]:
+  combination = [(rec1['crop'], a[0]), (rec2['crop'], b[0])]
+  yield pvalue.TaggedOutput('splitted', combination)
+
+  # Pass on remaining records
+  remaining = [rec for i, rec in enumerate(records) if i not in best_split]
+  yield pvalue.TaggedOutput('combine', remaining)
+
+  class GenerateMappings(beam.DoFn):
+"""
+ParDo implementation to generate all possible assignments of crops to 
greenhouses.
+
+Input: dict with crop, quantity and transport_costs keys, e.g.,
+{
+'crop': 'OP009',
+'quantity': 102,
+'transport_costs': [('A', None), ('B', 3), ('C', 8)]
+}
+
+Output: tuple (mapping_identifier, {crop -> greenhouse})
+"""
+
+@staticmethod
+def _coordinates_to_greenhouse(coordinates, greenhouses, crops):
+  # Map the grid 

[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121194
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201212407
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
+  """
+
+  class PreGenerateMappings(beam.DoFn):
+"""
+ParDo implementation which splits of 2 records and generated a sub grid.
+
+This facilitates parallellization of the grid generation.
+Emits both the PCollection representing the subgrid, as well as the list
+of remaining records. Both serve as an input to GenerateMappings
+"""
+
+def process(self, element):
+  records = list(element[1])
+  # Split of 2 crops and pre-generate all combinations to facilitate 
parallellism
+  # No point splitting of a crop which can only be created in 1 greenhouse,
+  # split of crops with highest number of options.
+  best_split = np.argsort([-len(rec['transport_costs']) for rec in 
records])[:2]
+  rec1 = records[best_split[0]]
+  rec2 = records[best_split[1]]
+
+  # Generate & emit all combinations
+  for a in rec1['transport_costs']:
+if a[1]:
+  for b in rec2['transport_costs']:
+if b[1]:
+  combination = [(rec1['crop'], a[0]), (rec2['crop'], b[0])]
+  yield pvalue.TaggedOutput('splitted', combination)
+
+  # Pass on remaining records
+  remaining = [rec for i, rec in enumerate(records) if i not in best_split]
+  yield pvalue.TaggedOutput('combine', remaining)
+
+  class GenerateMappings(beam.DoFn):
+"""
+ParDo implementation to generate all possible assignments of crops to 
greenhouses.
+
+Input: dict with crop, quantity and transport_costs keys, e.g.,
+{
+'crop': 'OP009',
+'quantity': 102,
+'transport_costs': [('A', None), ('B', 3), ('C', 8)]
+}
+
+Output: tuple (mapping_identifier, {crop -> greenhouse})
+"""
+
+@staticmethod
+def _coordinates_to_greenhouse(coordinates, greenhouses, crops):
+  # Map the grid 

[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121191
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201211395
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,350 @@
+#
 
 Review comment:
   Both numpy and scipy are installed 
(https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies#version-250_1).
 As @davidcavazos suggest I will suggest simplifying and removing the setup.py 
if possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121191)
Time Spent: 1h 10m  (was: 1h)

> Example of distributed optimization
> ---
>
> Key: BEAM-4391
> URL: https://issues.apache.org/jira/browse/BEAM-4391
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Joachim van der Herten
>Assignee: Joachim van der Herten
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving 
> distributed optimization tasks. It will include an example of a optimization 
> problem with both discrete and continuous parameters, which is then solved 
> using Apache Beam. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121193
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201211811
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
+  """
+
+  class PreGenerateMappings(beam.DoFn):
+"""
+ParDo implementation which splits of 2 records and generated a sub grid.
+
+This facilitates parallellization of the grid generation.
+Emits both the PCollection representing the subgrid, as well as the list
+of remaining records. Both serve as an input to GenerateMappings
+"""
+
+def process(self, element):
+  records = list(element[1])
+  # Split of 2 crops and pre-generate all combinations to facilitate 
parallellism
+  # No point splitting of a crop which can only be created in 1 greenhouse,
+  # split of crops with highest number of options.
+  best_split = np.argsort([-len(rec['transport_costs']) for rec in 
records])[:2]
+  rec1 = records[best_split[0]]
+  rec2 = records[best_split[1]]
+
+  # Generate & emit all combinations
+  for a in rec1['transport_costs']:
+if a[1]:
+  for b in rec2['transport_costs']:
+if b[1]:
+  combination = [(rec1['crop'], a[0]), (rec2['crop'], b[0])]
+  yield pvalue.TaggedOutput('splitted', combination)
+
+  # Pass on remaining records
+  remaining = [rec for i, rec in enumerate(records) if i not in best_split]
+  yield pvalue.TaggedOutput('combine', remaining)
+
+  class GenerateMappings(beam.DoFn):
+"""
+ParDo implementation to generate all possible assignments of crops to 
greenhouses.
 
 Review comment:
   You can start pydoc comments in the first line with """. (For examples see: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/tfidf.py#L52)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL 

[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121199
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201211731
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
+  """
+
+  class PreGenerateMappings(beam.DoFn):
+"""
+ParDo implementation which splits of 2 records and generated a sub grid.
+
+This facilitates parallellization of the grid generation.
+Emits both the PCollection representing the subgrid, as well as the list
+of remaining records. Both serve as an input to GenerateMappings
+"""
+
+def process(self, element):
+  records = list(element[1])
+  # Split of 2 crops and pre-generate all combinations to facilitate 
parallellism
 
 Review comment:
   I do not understand this comment.
   
   Could you change comments to reflect both the nature of the example and also 
the Beam concepts it is showing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121199)

> Example of distributed optimization
> ---
>
> Key: BEAM-4391
> URL: https://issues.apache.org/jira/browse/BEAM-4391
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Joachim van der Herten
>Assignee: Joachim van der Herten
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving 
> distributed optimization tasks. It will include an example of a optimization 
> problem with both discrete and continuous parameters, which is then solved 
> using Apache Beam. 



--
This message 

[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121195
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201211520
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
 
 Review comment:
   There is no need for this disclaimer. It is clearly in an example directory.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121195)
Time Spent: 1h 50m  (was: 1h 40m)

> Example of distributed optimization
> ---
>
> Key: BEAM-4391
> URL: https://issues.apache.org/jira/browse/BEAM-4391
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Joachim van der Herten
>Assignee: Joachim van der Herten
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving 
> distributed optimization tasks. It will include an example of a optimization 
> problem with both discrete and continuous parameters, which is then solved 
> using Apache Beam. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4391) Example of distributed optimization

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=121196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121196
 ]

ASF GitHub Bot logged work on BEAM-4391:


Author: ASF GitHub Bot
Created on: 10/Jul/18 04:10
Start Date: 10/Jul/18 04:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5736: 
[BEAM-4391] Example of distributed optimization
URL: https://github.com/apache/beam/pull/5736#discussion_r201212249
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/complete/distribopt/distribopt/distribopt.py
 ##
 @@ -0,0 +1,349 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Example illustrating the use of Apache Beam for distributing optimization 
tasks.
+Running this example requires NumPy and SciPy
+"""
+import argparse
+import logging
+import string
+import uuid
+from collections import defaultdict
+
+import numpy as np
+from scipy.optimize import minimize
+
+import apache_beam as beam
+from apache_beam import pvalue
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+
+
+class Simulator(object):
+  """
+  Greenhouse simulation
+
+  Disclaimer: this code is an example and does not correspond to any real 
greenhouse simulation.
+  """
+
+  def __init__(self, quantities):
+super(Simulator, self).__init__()
+self.quantities = np.atleast_1d(quantities)
+
+self.A = np.array([[3.0, 10, 30],
+   [0.1, 10, 35],
+   [3.0, 10, 30],
+   [0.1, 10, 35]])
+
+self.P = 1e-4 * np.array([[3689, 1170, 2673],
+  [4699, 4387, 7470],
+  [1091, 8732, 5547],
+  [381, 5743, 8828]])
+
+a0 = np.array([1.0, 1.2, 3.0, 3.2])
+coeff = np.sum(np.cos(np.dot(np.atleast_2d(a0).T, self.quantities[None, 
:])), axis=1)
+self.alpha = coeff / np.sum(coeff)
+
+  def simulate(self, xc):
+# Map the input parameter to a cost for each crop.
+f = -np.sum(self.alpha * np.exp(-np.sum(self.A * np.square(xc - self.P), 
axis=1)))
+return np.square(f) * np.log(self.quantities)
+
+
+class CreateGrid(beam.PTransform):
+  """
+  A transform for generating the mapping grid.
+  """
+
+  class PreGenerateMappings(beam.DoFn):
+"""
+ParDo implementation which splits of 2 records and generated a sub grid.
+
+This facilitates parallellization of the grid generation.
+Emits both the PCollection representing the subgrid, as well as the list
+of remaining records. Both serve as an input to GenerateMappings
+"""
+
+def process(self, element):
+  records = list(element[1])
+  # Split of 2 crops and pre-generate all combinations to facilitate 
parallellism
+  # No point splitting of a crop which can only be created in 1 greenhouse,
+  # split of crops with highest number of options.
+  best_split = np.argsort([-len(rec['transport_costs']) for rec in 
records])[:2]
+  rec1 = records[best_split[0]]
+  rec2 = records[best_split[1]]
+
+  # Generate & emit all combinations
+  for a in rec1['transport_costs']:
+if a[1]:
+  for b in rec2['transport_costs']:
+if b[1]:
+  combination = [(rec1['crop'], a[0]), (rec2['crop'], b[0])]
+  yield pvalue.TaggedOutput('splitted', combination)
+
+  # Pass on remaining records
+  remaining = [rec for i, rec in enumerate(records) if i not in best_split]
+  yield pvalue.TaggedOutput('combine', remaining)
+
+  class GenerateMappings(beam.DoFn):
+"""
+ParDo implementation to generate all possible assignments of crops to 
greenhouses.
+
+Input: dict with crop, quantity and transport_costs keys, e.g.,
+{
+'crop': 'OP009',
+'quantity': 102,
+'transport_costs': [('A', None), ('B', 3), ('C', 8)]
+}
+
+Output: tuple (mapping_identifier, {crop -> greenhouse})
+"""
+
+@staticmethod
+def _coordinates_to_greenhouse(coordinates, greenhouses, crops):
+  # Map the grid 

[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121189
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 03:55
Start Date: 10/Jul/18 03:55
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5911: [BEAM-1251] 
Upgrade snappy and use a memoryview
URL: https://github.com/apache/beam/pull/5911
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/avroio.py 
b/sdks/python/apache_beam/io/avroio.py
index 9b86b58982b..f90dc3c6833 100644
--- a/sdks/python/apache_beam/io/avroio.py
+++ b/sdks/python/apache_beam/io/avroio.py
@@ -341,8 +341,8 @@ def _decompress_bytes(data, codec):
 
   # Compressed data includes a 4-byte CRC32 checksum which we verify.
   # We take care to avoid extra copies of data while slicing large objects
-  # by use of a buffer.
-  result = snappy.decompress(buffer(data)[:-4])
+  # by use of a memoryview.
+  result = snappy.decompress(memoryview(data)[:-4])
   avroio.BinaryDecoder(io.BytesIO(data[-4:])).check_crc32(result)
   return result
 else:
diff --git a/sdks/python/apache_beam/io/tfrecordio.py 
b/sdks/python/apache_beam/io/tfrecordio.py
index 989247a96ee..2ef7c5b4c72 100644
--- a/sdks/python/apache_beam/io/tfrecordio.py
+++ b/sdks/python/apache_beam/io/tfrecordio.py
@@ -43,7 +43,7 @@ def _default_crc32c_fn(value):
   if not _default_crc32c_fn.fn:
 try:
   import snappy  # pylint: disable=import-error
-  _default_crc32c_fn.fn = snappy._crc32c  # pylint: 
disable=protected-access
+  _default_crc32c_fn.fn = snappy._snappy._crc32c  # pylint: 
disable=protected-access
 except ImportError:
   logging.warning('Couldn\'t find python-snappy so the implementation of '
   '_TFRecordUtil._masked_crc32c is not as fast as it could 
'
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/names.py 
b/sdks/python/apache_beam/runners/dataflow/internal/names.py
index c31e43f78a5..fb4643fe0a1 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/names.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/names.py
@@ -42,7 +42,7 @@
 
 # Update this version to the next version whenever there is a change that will
 # require changes to legacy Dataflow worker execution environment.
-BEAM_CONTAINER_VERSION = 'beam-master-20180619'
+BEAM_CONTAINER_VERSION = 'beam-master-20180709'
 # Update this version to the next version whenever there is a change that
 # requires changes to SDK harness container or SDK harness launcher.
 BEAM_FNAPI_CONTAINER_VERSION = 'beam-master-20180619'
diff --git a/sdks/python/container/Dockerfile b/sdks/python/container/Dockerfile
index 90348c6e231..afb6b43f938 100644
--- a/sdks/python/container/Dockerfile
+++ b/sdks/python/container/Dockerfile
@@ -70,7 +70,7 @@ RUN \
 # Optional packages
 pip install "cython == 0.28.1" && \
 pip install "guppy == 0.1.10" && \
-pip install "python-snappy == 0.5.1" && \
+pip install "python-snappy == 0.5.3" && \
 # These are additional packages likely to be used by customers.
 pip install "numpy == 1.13.3" --no-binary=:all: && \
 pip install "pandas == 0.18.1" && \


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121189)
Time Spent: 16h 50m  (was: 16h 40m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5911 from charlesccychen/upgrade-snappy

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 6f6401d7b30f9095ba9bd581301be0dbae3e24fe
Merge: deae145 58f0255
Author: Charles Chen 
AuthorDate: Mon Jul 9 20:55:25 2018 -0700

Merge pull request #5911 from charlesccychen/upgrade-snappy

[BEAM-1251] Upgrade snappy and use a memoryview

 sdks/python/apache_beam/io/avroio.py   | 4 ++--
 sdks/python/apache_beam/io/tfrecordio.py   | 2 +-
 sdks/python/apache_beam/runners/dataflow/internal/names.py | 2 +-
 sdks/python/container/Dockerfile   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)



[beam] branch master updated (deae145 -> 6f6401d)

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from deae145  Merge pull request #5908 from charlesccychen/revert-5887
 add 58f0255  [BEAM-1251] Upgrade snappy and use a memoryview
 new 6f6401d  Merge pull request #5911 from charlesccychen/upgrade-snappy

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/io/avroio.py   | 4 ++--
 sdks/python/apache_beam/io/tfrecordio.py   | 2 +-
 sdks/python/apache_beam/runners/dataflow/internal/names.py | 2 +-
 sdks/python/container/Dockerfile   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)



[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121176
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 02:57
Start Date: 10/Jul/18 02:57
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5911: [BEAM-1251] 
Upgrade snappy and use a memoryview
URL: https://github.com/apache/beam/pull/5911#issuecomment-403683836
 
 
   R: @aaltay 
   CC: @cclauss 
   
   All tests pass with this change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121176)
Time Spent: 16h 40m  (was: 16.5h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle #655

2018-07-09 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121165
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 01:52
Start Date: 10/Jul/18 01:52
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5911: [BEAM-1251] 
Upgrade snappy and use a memoryview
URL: https://github.com/apache/beam/pull/5911#issuecomment-403673794
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121165)
Time Spent: 16.5h  (was: 16h 20m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121164
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 01:52
Start Date: 10/Jul/18 01:52
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5911: [BEAM-1251] 
Upgrade snappy and use a memoryview
URL: https://github.com/apache/beam/pull/5911#issuecomment-403673739
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121164)
Time Spent: 16h 20m  (was: 16h 10m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=121158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121158
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 10/Jul/18 01:16
Start Date: 10/Jul/18 01:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5519: [BEAM-4432] Adding 
Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#issuecomment-403668051
 
 
   Rebased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121158)
Time Spent: 7h 40m  (was: 7.5h)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121154
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:52
Start Date: 10/Jul/18 00:52
Worklog Time Spent: 10m 
  Work Description: charlesccychen opened a new pull request #5911: 
[BEAM-1251] Upgrade snappy and use a memoryview
URL: https://github.com/apache/beam/pull/5911
 
 
   This change undoes the rollback of #5887 (done in #5908) and incorporates 
appropriate fixes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121154)
Time Spent: 16h 10m  (was: 16h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #1041

2018-07-09 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121152
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:45
Start Date: 10/Jul/18 00:45
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5908: [BEAM-1251] 
Revert #5887 to unbreak Python PostCommit
URL: https://github.com/apache/beam/pull/5908
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/avroio.py 
b/sdks/python/apache_beam/io/avroio.py
index f90dc3c6833..9b86b58982b 100644
--- a/sdks/python/apache_beam/io/avroio.py
+++ b/sdks/python/apache_beam/io/avroio.py
@@ -341,8 +341,8 @@ def _decompress_bytes(data, codec):
 
   # Compressed data includes a 4-byte CRC32 checksum which we verify.
   # We take care to avoid extra copies of data while slicing large objects
-  # by use of a memoryview.
-  result = snappy.decompress(memoryview(data)[:-4])
+  # by use of a buffer.
+  result = snappy.decompress(buffer(data)[:-4])
   avroio.BinaryDecoder(io.BytesIO(data[-4:])).check_crc32(result)
   return result
 else:
diff --git a/sdks/python/container/Dockerfile b/sdks/python/container/Dockerfile
index afb6b43f938..90348c6e231 100644
--- a/sdks/python/container/Dockerfile
+++ b/sdks/python/container/Dockerfile
@@ -70,7 +70,7 @@ RUN \
 # Optional packages
 pip install "cython == 0.28.1" && \
 pip install "guppy == 0.1.10" && \
-pip install "python-snappy == 0.5.3" && \
+pip install "python-snappy == 0.5.1" && \
 # These are additional packages likely to be used by customers.
 pip install "numpy == 1.13.3" --no-binary=:all: && \
 pip install "pandas == 0.18.1" && \


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121152)
Time Spent: 16h  (was: 15h 50m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121151
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:45
Start Date: 10/Jul/18 00:45
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5908: [BEAM-1251] 
Revert #5887 to unbreak Python PostCommit
URL: https://github.com/apache/beam/pull/5908#issuecomment-403663314
 
 
   Tests pass; merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121151)
Time Spent: 15h 50m  (was: 15h 40m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5908 from charlesccychen/revert-5887

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit deae1456655e306607d0841faa98b7a980d9311c
Merge: af3225d f6200d0
Author: Charles Chen 
AuthorDate: Mon Jul 9 17:45:31 2018 -0700

Merge pull request #5908 from charlesccychen/revert-5887

[BEAM-1251] Revert #5887 to unbreak Python PostCommit

 sdks/python/apache_beam/io/avroio.py | 4 ++--
 sdks/python/container/Dockerfile | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)



[beam] branch master updated (af3225d -> deae145)

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from af3225d  [BEAM-4126] Delete Maven build files.
 add f6200d0  [BEAM-1251] Revert #5887 to unbreak Python PostCommit
 new deae145  Merge pull request #5908 from charlesccychen/revert-5887

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/io/avroio.py | 4 ++--
 sdks/python/container/Dockerfile | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)



Build failed in Jenkins: beam_PostCommit_Python_Verify #5532

2018-07-09 Thread Apache Jenkins Server
See 


Changes:

[boyuanz] Add per_element_output_counter to python SDK

[boyuanz] Change experimental flag name to 'per_element_instrumentation'

[boyuanz] Added cython annotation to _OutputProcessor since cython annotation is

[boyuanz] Change reading 'outputs_per_element_counter' experimental flag into 
new

[relax] Add schema inference from POJO.

[relax] Add BYTES type to schema, and POJO inference.

[relax] Add automatic getter/setter inference from POJOS.* Recursive POJOs

[relax] Handle setter for ByteBuffer fields. Setter will be called with a byte[]

[relax] Split Row into separate subclass for RowWithStorage and RowWithGetters.

[relax] Introduce FieldValueGetterFactory and FieldValueSetterFactory.

[relax] Add support for nested, array, and map types.

[relax] Refactor getter-based SchemaProvider into a common base class.

[relax] Refactor getter/setter conversion utilities into a library. This cleans

[relax] Refactor Schema inference from POJOs so much of the code can be reused

[relax] Add encoding byte BYTES type.

[relax] Make FieldValueGetter and FieldValueSetter generic types again.

[relax] Start adding JavaBean getters and setters.

[relax] Make sure that field order is stable, and make tests robust to

[relax] Simplify convertArray.

[thw] Use the beam:option:value:v1 as the portable pipeline options

[thw] [BEAM-4733] Pass pipeline options from Python portable runner to job

[thw] Fix Flink portable streaming translation executable stage output

[ccy] [BEAM-4003] Fix missing iteritems import

[relax] Address code-review comments.

[relax] Address code-review comments.

[github] Avoid silent error when calling unimplemented function.

[github] Avoid silent error when calling unimplemented function.

[relax] Add better validation to isSetter isGetter

[ekirpichov] Infer boundedness for SDF application at application time

[ekirpichov] [BEAM-4745] Revert "[BEAM-4016] Invoke Setup and TearDown on

[markliu] Remove :buildSrc from settings.gradle since it's not a subproject

[Pablo] Logging relies on StateSampler for context

[ccy] [BEAM-4594] Beam Python state and timers user-facing API

[ccy] [BEAM-4593] Remove refcounts from the Python SDK

--
[...truncated 1.31 MB...]
test_match_type_variables 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_type_check_invalid_key_type 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_type_check_invalid_value_type 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_type_check_valid_composite_type 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_type_check_valid_simple_type 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_type_checks_not_dict 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_value_type_must_be_valid_composite_param 
(apache_beam.typehints.typehints_test.DictHintTestCase) ... ok
test_compatibility (apache_beam.typehints.typehints_test.GeneratorHintTestCase) 
... ok
test_generator_argument_hint_invalid_yield_type 
(apache_beam.typehints.typehints_test.GeneratorHintTestCase) ... ok
test_generator_return_hint_invalid_yield_type 
(apache_beam.typehints.typehints_test.GeneratorHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.GeneratorHintTestCase) ... ok
test_compatibility (apache_beam.typehints.typehints_test.IterableHintTestCase) 
... ok
test_getitem_invalid_composite_type_param 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_repr (apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_tuple_compatibility 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_type_check_must_be_iterable 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_type_check_violation_invalid_composite_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_type_check_violation_invalid_simple_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_type_check_violation_valid_composite_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_type_check_violation_valid_simple_type 
(apache_beam.typehints.typehints_test.IterableHintTestCase) ... ok
test_enforce_kv_type_constraint 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_be_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_param_must_have_length_2 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_getitem_proxy_to_tuple 
(apache_beam.typehints.typehints_test.KVHintTestCase) ... ok
test_enforce_list_type_constraint_invalid_composite_type 
(apache_beam.typehints.typehints_test.ListHintTestCase) ... ok

Build failed in Jenkins: beam_PreCommit_Java_Cron #86

2018-07-09 Thread Apache Jenkins Server
See 


Changes:

[boyuanz] Add per_element_output_counter to python SDK

[boyuanz] Change experimental flag name to 'per_element_instrumentation'

[boyuanz] Added cython annotation to _OutputProcessor since cython annotation is

[boyuanz] Change reading 'outputs_per_element_counter' experimental flag into 
new

[relax] Add schema inference from POJO.

[relax] Add BYTES type to schema, and POJO inference.

[relax] Add automatic getter/setter inference from POJOS.* Recursive POJOs

[relax] Handle setter for ByteBuffer fields. Setter will be called with a byte[]

[relax] Split Row into separate subclass for RowWithStorage and RowWithGetters.

[relax] Introduce FieldValueGetterFactory and FieldValueSetterFactory.

[relax] Add support for nested, array, and map types.

[relax] Refactor getter-based SchemaProvider into a common base class.

[relax] Refactor getter/setter conversion utilities into a library. This cleans

[relax] Refactor Schema inference from POJOs so much of the code can be reused

[relax] Add encoding byte BYTES type.

[relax] Make FieldValueGetter and FieldValueSetter generic types again.

[relax] Start adding JavaBean getters and setters.

[relax] Make sure that field order is stable, and make tests robust to

[relax] Simplify convertArray.

[thw] Use the beam:option:value:v1 as the portable pipeline options

[thw] [BEAM-4733] Pass pipeline options from Python portable runner to job

[thw] Fix Flink portable streaming translation executable stage output

[ccy] [BEAM-4003] Fix missing iteritems import

[relax] Address code-review comments.

[relax] Address code-review comments.

[github] Avoid silent error when calling unimplemented function.

[github] Avoid silent error when calling unimplemented function.

[relax] Add better validation to isSetter isGetter

[ekirpichov] Infer boundedness for SDF application at application time

[ekirpichov] [BEAM-4745] Revert "[BEAM-4016] Invoke Setup and TearDown on

[markliu] Remove :buildSrc from settings.gradle since it's not a subproject

[Pablo] Logging relies on StateSampler for context

[ccy] [BEAM-4594] Beam Python state and timers user-facing API

[ccy] [BEAM-4593] Remove refcounts from the Python SDK

--
[...truncated 17.05 MB...]
INFO: 2018-07-10T00:18:42.768Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key
Jul 10, 2018 12:18:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-10T00:18:42.800Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Write
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
Jul 10, 2018 12:18:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-10T00:18:42.830Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read
Jul 10, 2018 12:18:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-10T00:18:42.859Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
Jul 10, 2018 12:18:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-10T00:18:42.888Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable
Jul 10, 2018 12:18:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-10T00:18:42.914Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Drop 
key/Values/Map
Jul 10, 2018 12:18:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-10T00:18:42.936Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Pair
 with random key into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Gather 
bundles
Jul 10, 2018 12:18:45 AM 

[jira] [Work logged] (BEAM-1909) BigQuery read transform fails for DirectRunner when querying non-US regions

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1909?focusedWorklogId=121130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121130
 ]

ASF GitHub Bot logged work on BEAM-1909:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:18
Start Date: 10/Jul/18 00:18
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5435: [BEAM-1909] Fix 
BigQuery read transform fails for DirectRunner when querying non-US regions
URL: https://github.com/apache/beam/pull/5435#issuecomment-403659463
 
 
   R: @udim 
   Can you please take a look. 
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121130)
Time Spent: 2h  (was: 1h 50m)

> BigQuery read transform fails for DirectRunner when querying non-US regions
> ---
>
> Key: BEAM-1909
> URL: https://issues.apache.org/jira/browse/BEAM-1909
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> See: 
> http://stackoverflow.com/questions/42135002/google-dataflow-cannot-read-and-write-in-different-locations-python-sdk-v0-5-5/42144748?noredirect=1#comment73621983_42144748
> This should be fixed by creating the temp dataset and table in the correct 
> region.
> cc: [~sb2nov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121129
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:13
Start Date: 10/Jul/18 00:13
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5908: [BEAM-1251] Revert 
#5887 to unbreak Python PostCommit
URL: https://github.com/apache/beam/pull/5908#issuecomment-403658724
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121129)
Time Spent: 15h 40m  (was: 15.5h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=121127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121127
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:05
Start Date: 10/Jul/18 00:05
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201184243
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
 
 Review comment:
   I found we use this trigger phrase to trigger a snapshot build in release 
process: 
https://beam.apache.org/contribute/release-guide/#start-a-snapshot-build


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121127)
Time Spent: 2h 10m  (was: 2h)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4126) Delete maven build files

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4126?focusedWorklogId=121124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121124
 ]

ASF GitHub Bot logged work on BEAM-4126:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:03
Start Date: 10/Jul/18 00:03
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5571: [BEAM-4126] Delete 
Maven build files.
URL: https://github.com/apache/beam/pull/5571#issuecomment-403656920
 
 
   If your talking about beam-examples-java, beam-runners-direct-java, 
beam-sdks-java-core. Those `dependencies` were there because the archetype 
testing/generation needed to have an explicit hint that these modules needed to 
be built before you can generate the archetypes. Those dependencies have been 
moved to the equivalent build.gradle files (e.g. 
https://github.com/apache/beam/blob/8873d414b0b4f4cbae4f424af663d33bc92e7b6f/sdks/java/maven-archetypes/starter/build.gradle#L27)
 to ensure the appropriate ordering.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121124)
Time Spent: 1h 40m  (was: 1.5h)

> Delete maven build files
> 
>
> Key: BEAM-4126
> URL: https://issues.apache.org/jira/browse/BEAM-4126
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Once we are fully-migrated to Gradle, we should proactively remove Maven 
> build files (pom.xml etc) so that it's clear they no longer need to be 
> maintained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: [BEAM-4126] Delete Maven build files.

2018-07-09 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit af3225d4fba5e5a8bc7501514e916874e9f5453c
Merge: 8873d41 4f3858d
Author: Lukasz Cwik 
AuthorDate: Mon Jul 9 17:03:22 2018 -0700

[BEAM-4126] Delete Maven build files.

 examples/java/pom.xml  |  563 -
 examples/pom.xml   |   48 -
 model/fn-execution/pom.xml |  104 -
 model/job-management/pom.xml   |  122 --
 model/pipeline/pom.xml |   79 -
 model/pom.xml  |   40 -
 pom.xml| 2315 
 runners/apex/pom.xml   |  325 ---
 runners/core-construction-java/pom.xml |  199 --
 runners/core-java/pom.xml  |  161 --
 runners/direct-java/pom.xml|  392 
 runners/extensions-java/metrics/pom.xml|   75 -
 runners/extensions-java/pom.xml|   38 -
 runners/flink/pom.xml  |  443 
 runners/gcp/gcemd/pom.xml  |  157 --
 runners/gcp/gcsproxy/pom.xml   |  157 --
 runners/gcp/pom.xml|   38 -
 runners/gearpump/pom.xml   |  255 ---
 runners/google-cloud-dataflow-java/pom.xml |  531 -
 runners/java-fn-execution/pom.xml  |  213 --
 runners/local-java/pom.xml |  102 -
 runners/pom.xml|   74 -
 runners/reference/java/pom.xml |  116 -
 runners/reference/pom.xml  |   39 -
 runners/samza/pom.xml  |  365 ---
 runners/spark/pom.xml  |  470 
 sdks/go/container/pom.xml  |  157 --
 sdks/go/pom.xml|  175 --
 sdks/java/build-tools/pom.xml  |   61 -
 sdks/java/container/pom.xml|  187 --
 sdks/java/core/pom.xml |  350 ---
 .../extensions/google-cloud-platform-core/pom.xml  |  215 --
 sdks/java/extensions/jackson/pom.xml   |   76 -
 sdks/java/extensions/join-library/pom.xml  |   69 -
 sdks/java/extensions/pom.xml   |   44 -
 sdks/java/extensions/protobuf/pom.xml  |  145 --
 sdks/java/extensions/sketching/pom.xml |  114 -
 sdks/java/extensions/sorter/pom.xml|   87 -
 sdks/java/extensions/sql/pom.xml   |  453 
 sdks/java/fn-execution/pom.xml |  134 --
 sdks/java/harness/pom.xml  |  290 ---
 sdks/java/io/amazon-web-services/pom.xml   |  188 --
 sdks/java/io/amqp/pom.xml  |  128 --
 sdks/java/io/cassandra/pom.xml |  114 -
 sdks/java/io/common/pom.xml|   59 -
 .../elasticsearch-tests-2/pom.xml  |   60 -
 .../elasticsearch-tests-5/pom.xml  |   94 -
 .../elasticsearch-tests-common/pom.xml |   77 -
 sdks/java/io/elasticsearch-tests/pom.xml   |  150 --
 sdks/java/io/elasticsearch/pom.xml |  112 -
 sdks/java/io/file-based-io-tests/pom.xml   |  451 
 sdks/java/io/google-cloud-platform/pom.xml |  426 
 sdks/java/io/hadoop-common/pom.xml |   94 -
 sdks/java/io/hadoop-file-system/pom.xml|  151 --
 sdks/java/io/hadoop-input-format/pom.xml   |  455 
 sdks/java/io/hbase/pom.xml |  195 --
 sdks/java/io/hcatalog/pom.xml  |  327 ---
 sdks/java/io/jdbc/pom.xml  |  367 
 sdks/java/io/jms/pom.xml   |  120 -
 sdks/java/io/kafka/pom.xml |  150 --
 sdks/java/io/kinesis/pom.xml   |  176 --
 sdks/java/io/mongodb/pom.xml   |  318 ---
 sdks/java/io/mqtt/pom.xml  |  127 --
 sdks/java/io/parquet/pom.xml   |  140 --
 sdks/java/io/pom.xml   |  143 --
 sdks/java/io/redis/pom.xml |   95 -
 sdks/java/io/solr/pom.xml  |  168 --
 sdks/java/io/tika/pom.xml  |  102 -
 sdks/java/io/xml/pom.xml   |  157 --
 sdks/java/javadoc/pom.xml  |  395 
 sdks/java/maven-archetypes/examples/pom.xml|   22 +-
 sdks/java/maven-archetypes/pom.xml |  115 -
 sdks/java/maven-archetypes/starter/pom.xml |   28 +-
 sdks/java/nexmark/pom.xml  |  337 ---
 sdks/java/pom.xml 

[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121125
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 10/Jul/18 00:03
Start Date: 10/Jul/18 00:03
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5887: [BEAM-1251] 
Upgrade from buffer to memoryview (again)
URL: https://github.com/apache/beam/pull/5887#issuecomment-403656957
 
 
   @angoenka Yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121125)
Time Spent: 15.5h  (was: 15h 20m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (8873d41 -> af3225d)

2018-07-09 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 8873d41  Merge pull request #5904: Infer boundedness for SDF 
application at application time
 add 4f3858d  [BEAM-4126] Delete Maven build files.
 new af3225d  [BEAM-4126] Delete Maven build files.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 examples/java/pom.xml  |  563 -
 examples/pom.xml   |   48 -
 model/fn-execution/pom.xml |  104 -
 model/job-management/pom.xml   |  122 --
 model/pipeline/pom.xml |   79 -
 model/pom.xml  |   40 -
 pom.xml| 2315 
 runners/apex/pom.xml   |  325 ---
 runners/core-construction-java/pom.xml |  199 --
 runners/core-java/pom.xml  |  161 --
 runners/direct-java/pom.xml|  392 
 runners/extensions-java/metrics/pom.xml|   75 -
 runners/extensions-java/pom.xml|   38 -
 runners/flink/pom.xml  |  443 
 runners/gcp/gcemd/pom.xml  |  157 --
 runners/gcp/gcsproxy/pom.xml   |  157 --
 runners/gcp/pom.xml|   38 -
 runners/gearpump/pom.xml   |  255 ---
 runners/google-cloud-dataflow-java/pom.xml |  531 -
 runners/java-fn-execution/pom.xml  |  213 --
 runners/local-java/pom.xml |  102 -
 runners/pom.xml|   74 -
 runners/reference/java/pom.xml |  116 -
 runners/reference/pom.xml  |   39 -
 runners/samza/pom.xml  |  365 ---
 runners/spark/pom.xml  |  470 
 sdks/go/container/pom.xml  |  157 --
 sdks/go/pom.xml|  175 --
 sdks/java/build-tools/pom.xml  |   61 -
 sdks/java/container/pom.xml|  187 --
 sdks/java/core/pom.xml |  350 ---
 .../extensions/google-cloud-platform-core/pom.xml  |  215 --
 sdks/java/extensions/jackson/pom.xml   |   76 -
 sdks/java/extensions/join-library/pom.xml  |   69 -
 sdks/java/extensions/pom.xml   |   44 -
 sdks/java/extensions/protobuf/pom.xml  |  145 --
 sdks/java/extensions/sketching/pom.xml |  114 -
 sdks/java/extensions/sorter/pom.xml|   87 -
 sdks/java/extensions/sql/pom.xml   |  453 
 sdks/java/fn-execution/pom.xml |  134 --
 sdks/java/harness/pom.xml  |  290 ---
 sdks/java/io/amazon-web-services/pom.xml   |  188 --
 sdks/java/io/amqp/pom.xml  |  128 --
 sdks/java/io/cassandra/pom.xml |  114 -
 sdks/java/io/common/pom.xml|   59 -
 .../elasticsearch-tests-2/pom.xml  |   60 -
 .../elasticsearch-tests-5/pom.xml  |   94 -
 .../elasticsearch-tests-common/pom.xml |   77 -
 sdks/java/io/elasticsearch-tests/pom.xml   |  150 --
 sdks/java/io/elasticsearch/pom.xml |  112 -
 sdks/java/io/file-based-io-tests/pom.xml   |  451 
 sdks/java/io/google-cloud-platform/pom.xml |  426 
 sdks/java/io/hadoop-common/pom.xml |   94 -
 sdks/java/io/hadoop-file-system/pom.xml|  151 --
 sdks/java/io/hadoop-input-format/pom.xml   |  455 
 sdks/java/io/hbase/pom.xml |  195 --
 sdks/java/io/hcatalog/pom.xml  |  327 ---
 sdks/java/io/jdbc/pom.xml  |  367 
 sdks/java/io/jms/pom.xml   |  120 -
 sdks/java/io/kafka/pom.xml |  150 --
 sdks/java/io/kinesis/pom.xml   |  176 --
 sdks/java/io/mongodb/pom.xml   |  318 ---
 sdks/java/io/mqtt/pom.xml  |  127 --
 sdks/java/io/parquet/pom.xml   |  140 --
 sdks/java/io/pom.xml   |  143 --
 sdks/java/io/redis/pom.xml |   95 -
 sdks/java/io/solr/pom.xml  |  168 --
 sdks/java/io/tika/pom.xml  |  102 -
 sdks/java/io/xml/pom.xml   |  157 --
 sdks/java/javadoc/pom.xml  |  395 

[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121119
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:56
Start Date: 09/Jul/18 23:56
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5908: [BEAM-1251] 
Revert #5887 to unbreak Python PostCommit
URL: https://github.com/apache/beam/pull/5908#issuecomment-403655681
 
 
   R: @angoenka 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121119)
Time Spent: 15h 10m  (was: 15h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121120
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:56
Start Date: 09/Jul/18 23:56
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5908: [BEAM-1251] 
Revert #5887 to unbreak Python PostCommit
URL: https://github.com/apache/beam/pull/5908#issuecomment-403655697
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121120)
Time Spent: 15h 20m  (was: 15h 10m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121118
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:55
Start Date: 09/Jul/18 23:55
Worklog Time Spent: 10m 
  Work Description: charlesccychen opened a new pull request #5908: 
[BEAM-1251] Revert #5887 to unbreak Python PostCommit
URL: https://github.com/apache/beam/pull/5908
 
 
   This change reverts #5887 to fix issues with the Python PostCommit.  I will 
take care of undoing this rollback after an appropriate fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121118)
Time Spent: 15h  (was: 14h 50m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121112
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:48
Start Date: 09/Jul/18 23:48
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5887: [BEAM-1251] Upgrade 
from buffer to memoryview (again)
URL: https://github.com/apache/beam/pull/5887#issuecomment-403654529
 
 
   Is this the only commit that needs to be reverted?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121112)
Time Spent: 14h 50m  (was: 14h 40m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=12=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-12
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:47
Start Date: 09/Jul/18 23:47
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5887: [BEAM-1251] 
Upgrade from buffer to memoryview (again)
URL: https://github.com/apache/beam/pull/5887#issuecomment-403654392
 
 
   We should roll back until we figure out a proper fix.  Unfortunately, we 
can't just add a python-snappy pin to the setup file, since it doesn't compile 
on all platforms.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 12)
Time Spent: 14h 40m  (was: 14.5h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4126) Delete maven build files

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4126?focusedWorklogId=121110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121110
 ]

ASF GitHub Bot logged work on BEAM-4126:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:45
Start Date: 09/Jul/18 23:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5571: [BEAM-4126] Delete 
Maven build files.
URL: https://github.com/apache/beam/pull/5571#issuecomment-403654053
 
 
   Q: How come it's okay to remove dependencies from the poms within 
`maven-archetypes`? Does mvn infer the dependencies when creating the 
archetypes?
   
   Everything else LGTM. Thanks Luke : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121110)
Time Spent: 1.5h  (was: 1h 20m)

> Delete maven build files
> 
>
> Key: BEAM-4126
> URL: https://issues.apache.org/jira/browse/BEAM-4126
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Once we are fully-migrated to Gradle, we should proactively remove Maven 
> build files (pom.xml etc) so that it's clear they no longer need to be 
> maintained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=121108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121108
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:43
Start Date: 09/Jul/18 23:43
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5887: [BEAM-1251] Upgrade 
from buffer to memoryview (again)
URL: https://github.com/apache/beam/pull/5887#issuecomment-403653679
 
 
   The build seems to be broken after this. Can you please take a look. 
   https://scans.gradle.com/s/ek5enzlgtrm3c/console-log#L3941
   https://builds.apache.org/job/beam_PostCommit_Python_Verify/5531/consoleFull
   
   Thannks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121108)
Time Spent: 14.5h  (was: 14h 20m)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Trivial
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3954) Get Jenkins agents dockerized

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3954?focusedWorklogId=121103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121103
 ]

ASF GitHub Bot logged work on BEAM-3954:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:27
Start Date: 09/Jul/18 23:27
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #4960: [BEAM-3954] try 
jenkins-pipeline with docker images - Test, Do Not Merge
URL: https://github.com/apache/beam/pull/4960#issuecomment-403651016
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121103)
Time Spent: 50m  (was: 40m)

> Get Jenkins agents dockerized 
> --
>
> Key: BEAM-3954
> URL: https://issues.apache.org/jira/browse/BEAM-3954
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4126) Delete maven build files

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4126?focusedWorklogId=121100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121100
 ]

ASF GitHub Bot logged work on BEAM-4126:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:21
Start Date: 09/Jul/18 23:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5571: [BEAM-4126] Delete 
Maven build files.
URL: https://github.com/apache/beam/pull/5571#issuecomment-403650013
 
 
   Google has migrated away from needing the pom.xml for the build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121100)
Time Spent: 1h 20m  (was: 1h 10m)

> Delete maven build files
> 
>
> Key: BEAM-4126
> URL: https://issues.apache.org/jira/browse/BEAM-4126
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Once we are fully-migrated to Gradle, we should proactively remove Maven 
> build files (pom.xml etc) so that it's clear they no longer need to be 
> maintained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4126) Delete maven build files

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4126?focusedWorklogId=121098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121098
 ]

ASF GitHub Bot logged work on BEAM-4126:


Author: ASF GitHub Bot
Created on: 09/Jul/18 23:17
Start Date: 09/Jul/18 23:17
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5571: [BEAM-4126] Delete 
Maven build files.
URL: https://github.com/apache/beam/pull/5571#issuecomment-403649445
 
 
   R: @markflyhigh 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121098)
Time Spent: 1h 10m  (was: 1h)

> Delete maven build files
> 
>
> Key: BEAM-4126
> URL: https://issues.apache.org/jira/browse/BEAM-4126
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Once we are fully-migrated to Gradle, we should proactively remove Maven 
> build files (pom.xml etc) so that it's clear they no longer need to be 
> maintained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4531) [Go SDK] Allow Dynamic Structural DoFns

2018-07-09 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-4531:
--

Assignee: Robert Burke

> [Go SDK] Allow Dynamic Structural DoFns
> ---
>
> Key: BEAM-4531
> URL: https://issues.apache.org/jira/browse/BEAM-4531
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
>
> Presently, the Go SDK permits 3 kinds of functions for use in ParDos.
>  # Native functions
>  # Structural DoFns with correctly named methods, that reflect the bundle 
> lifecycle, or combine lifecycle (Setup StartBundle, FinishBundle, 
> ProcessElement, Teardown etc.) These also permit limited Stateful ParDos, 
> such as for caching expensive network responses for re-use when processing 
> state.
>  # Dynamic functions (DynFns) which permit the use of closured global state, 
> and dynamic input types which are generated at pipeline runtime.
> There's presently no way to generate a stateful function that relies on 
> closured global state and needs to be aware of the bundle lifecycle in the 
> worker harness. In short, there's no way to create a Dynamic Structural DoFn.
> To implement this, in particular, [graph.DynFn 
> |http://example.com](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L70)
> will need to be modified so that multiple methods can be returned, likely as 
> methods on a struct, so that 
> [graph.NewFn](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L81)
>  can populate the 
> [graph.Fn.methods](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L40)
>  map correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4512) Move DataflowRunner off of Maven build files

2018-07-09 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537751#comment-16537751
 ] 

Mark Liu edited comment on BEAM-4512 at 7/9/18 10:57 PM:
-

The work inside Google that moves project build dependency from Maven pom to 
Gradle is done and no longer a blocker to delete pom in Beam.


was (Author: markflyhigh):
The work inside Google that moves project build dependency from Maven pom to 
Gradle is done and no longer a blocker to delete pom in Beam.

> Move DataflowRunner off of Maven build files
> 
>
> Key: BEAM-4512
> URL: https://issues.apache.org/jira/browse/BEAM-4512
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Chamikara Jayalath
>Assignee: Mark Liu
>Priority: Major
>
> Currently DataflowRunner (internally at Google) depends on Beam's Maven build 
> files. We have to move some internal build targets to use Gradle so that 
> Maven files can be deleted from Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4512) Move DataflowRunner off of Maven build files

2018-07-09 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537751#comment-16537751
 ] 

Mark Liu commented on BEAM-4512:


The work inside Google that moves project build dependency from Maven pom to 
Gradle is done and no longer a blocker to delete pom in Beam.

> Move DataflowRunner off of Maven build files
> 
>
> Key: BEAM-4512
> URL: https://issues.apache.org/jira/browse/BEAM-4512
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Chamikara Jayalath
>Assignee: Mark Liu
>Priority: Major
>
> Currently DataflowRunner (internally at Google) depends on Beam's Maven build 
> files. We have to move some internal build targets to use Gradle so that 
> Maven files can be deleted from Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4748) Flaky post-commit test org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactServicesTest.putArtifactsMultipleFilesConcurrentlyTest

2018-07-09 Thread Mikhail Gryzykhin (JIRA)
Mikhail Gryzykhin created BEAM-4748:
---

 Summary: Flaky post-commit test 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactServicesTest.putArtifactsMultipleFilesConcurrentlyTest
 Key: BEAM-4748
 URL: https://issues.apache.org/jira/browse/BEAM-4748
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Mikhail Gryzykhin


Test flaked on following job:

https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_GradleBuild/1040/testReport/junit/org.apache.beam.runners.fnexecution.artifact/BeamFileSystemArtifactServicesTest/putArtifactsMultipleFilesConcurrentlyTest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (ec26b27 -> 8873d41)

2018-07-09 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from ec26b27  Merge pull request #5778 from charlesccychen/delete-refcounts
 add e710f60  Infer boundedness for SDF application at application time
 new 8873d41  Merge pull request #5904: Infer boundedness for SDF 
application at application time

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../java/org/apache/beam/sdk/transforms/ParDo.java |  2 +-
 .../java/org/apache/beam/sdk/transforms/Watch.java |  2 ++
 .../java/org/apache/beam/sdk/io/FileIOTest.java|  2 ++
 .../beam/sdk/transforms/SplittableDoFnTest.java| 41 ++
 4 files changed, 46 insertions(+), 1 deletion(-)



[beam] 01/01: Merge pull request #5904: Infer boundedness for SDF application at application time

2018-07-09 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 8873d414b0b4f4cbae4f424af663d33bc92e7b6f
Merge: ec26b27 e710f60
Author: Ismaël Mejía 
AuthorDate: Mon Jul 9 23:41:24 2018 +0200

Merge pull request #5904: Infer boundedness for SDF application at 
application time

 .../java/org/apache/beam/sdk/transforms/ParDo.java |  2 +-
 .../java/org/apache/beam/sdk/transforms/Watch.java |  2 ++
 .../java/org/apache/beam/sdk/io/FileIOTest.java|  2 ++
 .../beam/sdk/transforms/SplittableDoFnTest.java| 41 ++
 4 files changed, 46 insertions(+), 1 deletion(-)




[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=121034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121034
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 21:29
Start Date: 09/Jul/18 21:29
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201153357
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
   common_job_properties.enablePhraseTriggeringFromPullRequest(
   delegate,
   './gradlew publish',
   'Run Gradle Publish')
 
+  // Allows triggering this build command against pull requests.
+  common_job_properties.enablePhraseTriggeringFromPullRequest(
+  delegate,
+  './gradlew clean build',
+  'Run Gradle Build')
+
   steps {
+gradle {
+  rootBuildScriptDir(common_job_properties.checkoutDir)
+  tasks('clean')
+}
+gradle {
+  rootBuildScriptDir(common_job_properties.checkoutDir)
+  tasks('build')
+  common_job_properties.setGradleSwitches(delegate)
+  switches('--no-parallel')
 
 Review comment:
   Discussed and solved offline with Luke.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121034)
Time Spent: 2h  (was: 1h 50m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4593) Remove refcounts from the Python SDK

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4593?focusedWorklogId=121029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121029
 ]

ASF GitHub Bot logged work on BEAM-4593:


Author: ASF GitHub Bot
Created on: 09/Jul/18 21:19
Start Date: 09/Jul/18 21:19
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5778: [BEAM-4593] 
Remove refcounts from the Python SDK
URL: https://github.com/apache/beam/pull/5778
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/pipeline.py 
b/sdks/python/apache_beam/pipeline.py
index 6e9fccef3b5..5a4c1dc9228 100644
--- a/sdks/python/apache_beam/pipeline.py
+++ b/sdks/python/apache_beam/pipeline.py
@@ -47,7 +47,6 @@
 from __future__ import absolute_import
 
 import abc
-import collections
 import logging
 import os
 import re
@@ -529,7 +528,6 @@ def apply(self, transform, pvalueish=None, label=None):
'output type-hint was found for the '
'PTransform %s' % ptransform_name)
 
-current.update_input_refcounts()
 self.transforms_stack.pop()
 return pvalueish_result
 
@@ -699,30 +697,10 @@ def __init__(self, parent, transform, full_label, inputs):
 self.outputs = {}
 self.parts = []
 
-# Per tag refcount dictionary for PValues for which this node is a
-# root producer.
-self.refcounts = collections.defaultdict(int)
-
   def __repr__(self):
 return "%s(%s, %s)" % (self.__class__.__name__, self.full_label,
type(self.transform).__name__)
 
-  def update_input_refcounts(self):
-"""Increment refcounts for all transforms providing inputs."""
-
-def real_producer(pv):
-  real = pv.producer
-  while real.parts:
-real = real.parts[-1]
-  return real
-
-if not self.is_composite():
-  for main_input in self.inputs:
-if not isinstance(main_input, pvalue.PBegin):
-  real_producer(main_input).refcounts[main_input.tag] += 1
-  for side_input in self.side_inputs:
-real_producer(side_input.pvalue).refcounts[side_input.pvalue.tag] += 1
-
   def replace_output(self, output, tag=None):
 """Replaces the output defined by the given tag with the given output.
 
@@ -885,7 +863,6 @@ def is_side_input(tag):
   pc = context.pcollections.get_by_id(pcoll_id)
   pc.producer = result
   pc.tag = None if tag == 'None' else tag
-result.update_input_refcounts()
 return result
 
 
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 5028cb89a89..098d47e880a 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -265,7 +265,6 @@ def visit_transform(self, transform_node):
   new_side_input.pvalue.producer = map_to_void_key
   map_to_void_key.add_output(new_side_input.pvalue)
   parent.add_part(map_to_void_key)
-  transform_node.update_input_refcounts()
 elif access_pattern == common_urns.side_inputs.MULTIMAP.urn:
   # Ensure the input coder is a KV coder and patch up the
   # access pattern to appease Dataflow.
@@ -596,7 +595,6 @@ def run_ParDo(self, transform_node):
 
 # Attach side inputs.
 si_dict = {}
-# We must call self._cache.get_pvalue exactly once due to refcounting.
 si_labels = {}
 full_label_counts = defaultdict(int)
 lookup_label = lambda side_pval: si_labels[side_pval]


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121029)
Time Spent: 20m  (was: 10m)

> Remove refcounts from the Python SDK
> 
>
> Key: BEAM-4593
> URL: https://issues.apache.org/jira/browse/BEAM-4593
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should remove PValue refcounting logic from the Python SDK.  This is 
> vestigial logic from the first DirectRunner implementation, and is not used 
> anymore.



--
This message was sent by 

[beam] 01/01: Merge pull request #5778 from charlesccychen/delete-refcounts

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit ec26b27638e257c46dc8ef53dd80da3a5a9f9d80
Merge: a1f6db0 238515c
Author: Charles Chen 
AuthorDate: Mon Jul 9 14:19:30 2018 -0700

Merge pull request #5778 from charlesccychen/delete-refcounts

[BEAM-4593] Remove refcounts from the Python SDK

 sdks/python/apache_beam/pipeline.py| 23 --
 .../runners/dataflow/dataflow_runner.py|  2 --
 2 files changed, 25 deletions(-)



[beam] branch master updated (a1f6db0 -> ec26b27)

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a1f6db0  Merge pull request #5691 from 
charlesccychen/user-statetimer-interfaces
 add 238515c  [BEAM-4593] Remove refcounts from the Python SDK
 new ec26b27  Merge pull request #5778 from charlesccychen/delete-refcounts

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/pipeline.py| 23 --
 .../runners/dataflow/dataflow_runner.py|  2 --
 2 files changed, 25 deletions(-)



[beam] branch master updated (3757b5f -> a1f6db0)

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 3757b5f  Merge pull request #5900 from charlesccychen/fix-iteritems
 add 5daf675  [BEAM-4594] Beam Python state and timers user-facing API
 new a1f6db0  Merge pull request #5691 from 
charlesccychen/user-statetimer-interfaces

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/runners/common.py  |  20 +-
 sdks/python/apache_beam/runners/common_test.py |   6 +-
 sdks/python/apache_beam/transforms/core.py |  59 -
 sdks/python/apache_beam/transforms/userstate.py| 162 
 .../apache_beam/transforms/userstate_test.py   | 273 +
 sdks/python/scripts/generate_pydoc.sh  |   6 +-
 6 files changed, 511 insertions(+), 15 deletions(-)
 create mode 100644 sdks/python/apache_beam/transforms/userstate.py
 create mode 100644 sdks/python/apache_beam/transforms/userstate_test.py



[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=121019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121019
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 09/Jul/18 21:12
Start Date: 09/Jul/18 21:12
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5691: [BEAM-4594] 
Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/common.py 
b/sdks/python/apache_beam/runners/common.py
index 88745c778e3..aa435594ed7 100644
--- a/sdks/python/apache_beam/runners/common.py
+++ b/sdks/python/apache_beam/runners/common.py
@@ -41,6 +41,7 @@
 from apache_beam.transforms import DoFn
 from apache_beam.transforms import core
 from apache_beam.transforms.core import RestrictionProvider
+from apache_beam.transforms.userstate import UserStateUtils
 from apache_beam.transforms.window import GlobalWindow
 from apache_beam.transforms.window import TimestampedValue
 from apache_beam.transforms.window import WindowFn
@@ -208,18 +209,29 @@ def _validate(self):
 self._validate_process()
 self._validate_bundle_method(self.start_bundle_method)
 self._validate_bundle_method(self.finish_bundle_method)
+self._validate_stateful_dofn()
 
   def _validate_process(self):
 """Validate that none of the DoFnParameters are repeated in the function
 """
-for param in core.DoFn.DoFnParams:
-  assert self.process_method.defaults.count(param) <= 1
+param_ids = [d.param_id for d in self.process_method.defaults
+ if isinstance(d, core._DoFnParam)]
+if len(param_ids) != len(set(param_ids)):
+  raise ValueError(
+  'DoFn %r has duplicate process method parameters: %s.' % (
+  self.do_fn, param_ids))
 
   def _validate_bundle_method(self, method_wrapper):
 """Validate that none of the DoFnParameters are used in the function
 """
-for param in core.DoFn.DoFnParams:
-  assert param not in method_wrapper.defaults
+for param in core.DoFn.DoFnProcessParams:
+  if param in method_wrapper.defaults:
+raise ValueError(
+'DoFn.process() method-only parameter %s cannot be used in %s.' %
+(param, method_wrapper))
+
+  def _validate_stateful_dofn(self):
+UserStateUtils.validate_stateful_dofn(self.do_fn)
 
   def is_splittable_dofn(self):
 return any([isinstance(default, RestrictionProvider) for default in
diff --git a/sdks/python/apache_beam/runners/common_test.py 
b/sdks/python/apache_beam/runners/common_test.py
index d4848e48abe..18e2c455f1f 100644
--- a/sdks/python/apache_beam/runners/common_test.py
+++ b/sdks/python/apache_beam/runners/common_test.py
@@ -30,7 +30,7 @@ class MyDoFn(DoFn):
   def process(self, element, w1=DoFn.WindowParam, w2=DoFn.WindowParam):
 pass
 
-with self.assertRaises(AssertionError):
+with self.assertRaises(ValueError):
   DoFnSignature(MyDoFn())
 
   def test_dofn_validate_start_bundle_error(self):
@@ -41,7 +41,7 @@ def process(self, element):
   def start_bundle(self, w1=DoFn.WindowParam):
 pass
 
-with self.assertRaises(AssertionError):
+with self.assertRaises(ValueError):
   DoFnSignature(MyDoFn())
 
   def test_dofn_validate_finish_bundle_error(self):
@@ -52,7 +52,7 @@ def process(self, element):
   def finish_bundle(self, w1=DoFn.WindowParam):
 pass
 
-with self.assertRaises(AssertionError):
+with self.assertRaises(ValueError):
   DoFnSignature(MyDoFn())
 
 
diff --git a/sdks/python/apache_beam/transforms/core.py 
b/sdks/python/apache_beam/transforms/core.py
index 506eadb92ec..bbd78342a7f 100644
--- a/sdks/python/apache_beam/transforms/core.py
+++ b/sdks/python/apache_beam/transforms/core.py
@@ -43,6 +43,8 @@
 from apache_beam.transforms.display import HasDisplayData
 from apache_beam.transforms.ptransform import PTransform
 from apache_beam.transforms.ptransform import PTransformWithSideInputs
+from apache_beam.transforms.userstate import StateSpec
+from apache_beam.transforms.userstate import TimerSpec
 from apache_beam.transforms.window import GlobalWindows
 from apache_beam.transforms.window import TimestampCombiner
 from apache_beam.transforms.window import TimestampedValue
@@ -278,6 +280,41 @@ def get_function_arguments(obj, func):
   return inspect.getargspec(f)
 
 
+class _DoFnParam(object):
+  """DoFn parameter."""
+
+  def __init__(self, param_id):
+self.param_id = param_id
+
+  def __eq__(self, other):
+if type(self) == type(other):
+  return 

[beam] 01/01: Merge pull request #5691 from charlesccychen/user-statetimer-interfaces

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit a1f6db032d2b69202042af4adb8ace552003a178
Merge: 3757b5f 5daf675
Author: Charles Chen 
AuthorDate: Mon Jul 9 14:12:53 2018 -0700

Merge pull request #5691 from charlesccychen/user-statetimer-interfaces

[BEAM-4594] Beam Python state and timers user-facing API

 sdks/python/apache_beam/runners/common.py  |  20 +-
 sdks/python/apache_beam/runners/common_test.py |   6 +-
 sdks/python/apache_beam/transforms/core.py |  59 -
 sdks/python/apache_beam/transforms/userstate.py| 162 
 .../apache_beam/transforms/userstate_test.py   | 273 +
 sdks/python/scripts/generate_pydoc.sh  |   6 +-
 6 files changed, 511 insertions(+), 15 deletions(-)



[jira] [Work logged] (BEAM-4648) Remove the unused Python RPC DirectRunner

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4648?focusedWorklogId=121018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121018
 ]

ASF GitHub Bot logged work on BEAM-4648:


Author: ASF GitHub Bot
Created on: 09/Jul/18 21:11
Start Date: 09/Jul/18 21:11
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5777: [BEAM-4648] 
Remove experimental Python RPC DirectRunner
URL: https://github.com/apache/beam/pull/5777#issuecomment-403621767
 
 
   Thanks, rebased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121018)
Time Spent: 40m  (was: 0.5h)

> Remove the unused Python RPC DirectRunner
> -
>
> Key: BEAM-4648
> URL: https://issues.apache.org/jira/browse/BEAM-4648
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should remove the unused Python RPC DirectRunner here: 
> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/experimental



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4648) Remove the unused Python RPC DirectRunner

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4648?focusedWorklogId=121012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121012
 ]

ASF GitHub Bot logged work on BEAM-4648:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:45
Start Date: 09/Jul/18 20:45
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #5777: [BEAM-4648] Remove 
experimental Python RPC DirectRunner
URL: https://github.com/apache/beam/pull/5777#issuecomment-403614389
 
 
   Looks like there's merge conflicts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121012)
Time Spent: 0.5h  (was: 20m)

> Remove the unused Python RPC DirectRunner
> -
>
> Key: BEAM-4648
> URL: https://issues.apache.org/jira/browse/BEAM-4648
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should remove the unused Python RPC DirectRunner here: 
> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/experimental



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=121013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121013
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:45
Start Date: 09/Jul/18 20:45
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5691: [BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r201141205
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio.py
 ##
 @@ -61,6 +59,8 @@
 from apache_beam.io.filesystem import CompressionTypes
 from apache_beam.io.iobase import Read
 from apache_beam.transforms import PTransform
+from fastavro.read import block_reader
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121013)
Time Spent: 5h 10m  (was: 5h)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=121007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121007
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:35
Start Date: 09/Jul/18 20:35
Worklog Time Spent: 10m 
  Work Description: charlesccychen closed pull request #5900: [BEAM-4003] 
Fix missing iteritems import
URL: https://github.com/apache/beam/pull/5900
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index b90844e44dd..5028cb89a89 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -32,6 +32,7 @@
 
 from future.moves.urllib.parse import quote
 from future.moves.urllib.parse import unquote
+from future.utils import iteritems
 
 import apache_beam as beam
 from apache_beam import coders


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121007)
Time Spent: 9h 40m  (was: 9.5h)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5900 from charlesccychen/fix-iteritems

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 3757b5f69dd5a3aec89a86fe80b88cebef7c
Merge: c473fda 8016f6c
Author: Charles Chen 
AuthorDate: Mon Jul 9 13:35:43 2018 -0700

Merge pull request #5900 from charlesccychen/fix-iteritems

[BEAM-4003] Fix missing iteritems import

 sdks/python/apache_beam/runners/dataflow/dataflow_runner.py | 1 +
 1 file changed, 1 insertion(+)



[beam] branch master updated (c473fda -> 3757b5f)

2018-07-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from c473fda  Merge pull request #5905: [BEAM-4745] Revert "[BEAM-4016] 
Invoke Setup and TearDown on SplitRestrictionFn and PairWithRestrictionFn"
 add 8016f6c  [BEAM-4003] Fix missing iteritems import
 new 3757b5f  Merge pull request #5900 from charlesccychen/fix-iteritems

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/runners/dataflow/dataflow_runner.py | 1 +
 1 file changed, 1 insertion(+)



[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=121000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121000
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:16
Start Date: 09/Jul/18 20:16
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #5868: [BEAM-4718]Run 
gradle build before publish
URL: https://github.com/apache/beam/pull/5868#issuecomment-403606030
 
 
   Deleted build & publish trigger phrases in commit 
https://github.com/apache/beam/pull/5868/commits/7d34ba0c16e6ea8d64e3e5833c0bb7341d12531b
 @alanmyrvold 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121000)
Time Spent: 1h 50m  (was: 1h 40m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4594) Implement Beam Python User State and Timer API

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4594?focusedWorklogId=120999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120999
 ]

ASF GitHub Bot logged work on BEAM-4594:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:15
Start Date: 09/Jul/18 20:15
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5691: 
[BEAM-4594] Beam Python state and timers user-facing API
URL: https://github.com/apache/beam/pull/5691#discussion_r201127178
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio.py
 ##
 @@ -61,6 +59,8 @@
 from apache_beam.io.filesystem import CompressionTypes
 from apache_beam.io.iobase import Read
 from apache_beam.transforms import PTransform
+from fastavro.read import block_reader
 
 Review comment:
   This import reordering seems wrong.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120999)
Time Spent: 5h  (was: 4h 50m)

> Implement Beam Python User State and Timer API
> --
>
> Key: BEAM-4594
> URL: https://issues.apache.org/jira/browse/BEAM-4594
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Labels: portability
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This issue tracks the implementation of the Beam Python User State and Timer 
> API, described here: [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (BEAM-4016) @SplitRestriction should execute after @Setup on SplittableDoFn

2018-07-09 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reopened BEAM-4016:


Reopened after an issue found at Dataflow. It needs to be addressed there 
first, for ref BEAM-4745

> @SplitRestriction should execute after @Setup on SplittableDoFn
> ---
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0, 2.5.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.6.0
>
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4745) SDF tests broken by innocent change due to Dataflow worker dependencies

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4745?focusedWorklogId=120996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120996
 ]

ASF GitHub Bot logged work on BEAM-4745:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:12
Start Date: 09/Jul/18 20:12
Worklog Time Spent: 10m 
  Work Description: iemejia closed pull request #5905: [BEAM-4745] Revert 
"[BEAM-4016] Invoke Setup and TearDown on SplitRestrictionFn and 
PairWithRestrictionFn"
URL: https://github.com/apache/beam/pull/5905
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
index 8d8da216afa..b581eecf414 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
@@ -415,7 +415,6 @@ public String apply(T input) {
 @Setup
 public void setup() {
   invoker = DoFnInvokers.invokerFor(fn);
-  invoker.invokeSetup();
 }
 
 @ProcessElement
@@ -423,12 +422,6 @@ public void processElement(ProcessContext context) {
   context.output(
   KV.of(context.element(), 
invoker.invokeGetInitialRestriction(context.element(;
 }
-
-@Teardown
-public void tearDown() {
-  invoker.invokeTeardown();
-  invoker = null;
-}
   }
 
   /** Splits the restriction using the given {@link SplitRestriction} method. 
*/
@@ -446,7 +439,6 @@ public void tearDown() {
 @Setup
 public void setup() {
   invoker = DoFnInvokers.invokerFor(splittableFn);
-  invoker.invokeSetup();
 }
 
 @ProcessElement
@@ -467,11 +459,5 @@ public void outputWithTimestamp(RestrictionT part, Instant 
timestamp) {
 }
   });
 }
-
-@Teardown
-public void tearDown() {
-  invoker.invokeTeardown();
-  invoker = null;
-}
   }
 }
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
index fe33b1ab5b8..b7f0c10d046 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
@@ -513,19 +513,18 @@ public void testLateData() throws Exception {
 
 private State state = State.BEFORE_SETUP;
 
+@ProcessElement
+public void processElement(ProcessContext c, OffsetRangeTracker tracker) {
+  assertEquals(State.INSIDE_BUNDLE, state);
+  assertTrue(tracker.tryClaim(0L));
+  c.output(c.element());
+}
+
 @GetInitialRestriction
 public OffsetRange getInitialRestriction(String value) {
-  assertEquals(State.OUTSIDE_BUNDLE, state);
   return new OffsetRange(0, 1);
 }
 
-@SplitRestriction
-public void splitRestriction(
-String value, OffsetRange range, OutputReceiver receiver) 
{
-  assertEquals(State.OUTSIDE_BUNDLE, state);
-  receiver.output(range);
-}
-
 @Setup
 public void setUp() {
   assertEquals(State.BEFORE_SETUP, state);
@@ -538,13 +537,6 @@ public void startBundle() {
   state = State.INSIDE_BUNDLE;
 }
 
-@ProcessElement
-public void processElement(ProcessContext c, OffsetRangeTracker tracker) {
-  assertEquals(State.INSIDE_BUNDLE, state);
-  assertTrue(tracker.tryClaim(0L));
-  c.output(c.element());
-}
-
 @FinishBundle
 public void finishBundle() {
   assertEquals(State.INSIDE_BUNDLE, state);
@@ -561,9 +553,12 @@ public void tearDown() {
   @Test
   @Category({ValidatesRunner.class, UsesSplittableParDo.class})
   public void testLifecycleMethods() throws Exception {
+
 PCollection res =
 p.apply(Create.of("a", "b", "c")).apply(ParDo.of(new 
SDFWithLifecycle()));
+
 PAssert.that(res).containsInAnyOrder("a", "b", "c");
+
 p.run();
   }
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120996)
Time Spent: 20m  (was: 10m)

> SDF tests broken by innocent change due to Dataflow worker dependencies
> 

[beam] 01/01: Merge pull request #5905: [BEAM-4745] Revert "[BEAM-4016] Invoke Setup and TearDown on SplitRestrictionFn and PairWithRestrictionFn"

2018-07-09 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit c473fda82bcabf55b9409d90e3448326e70ad104
Merge: cc4346c b4cf9c1
Author: Ismaël Mejía 
AuthorDate: Mon Jul 9 22:12:35 2018 +0200

Merge pull request #5905: [BEAM-4745] Revert "[BEAM-4016] Invoke Setup and 
TearDown on SplitRestrictionFn and PairWithRestrictionFn"

 .../runners/core/construction/SplittableParDo.java | 14 
 .../beam/sdk/transforms/SplittableDoFnTest.java| 25 +-
 2 files changed, 10 insertions(+), 29 deletions(-)



[beam] branch master updated (cc4346c -> c473fda)

2018-07-09 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from cc4346c  Remove :buildSrc from settings.gradle since it's not a 
subproject
 add b4cf9c1  [BEAM-4745] Revert "[BEAM-4016] Invoke Setup and TearDown on 
SplitRestrictionFn and PairWithRestrictionFn"
 new c473fda  Merge pull request #5905: [BEAM-4745] Revert "[BEAM-4016] 
Invoke Setup and TearDown on SplitRestrictionFn and PairWithRestrictionFn"

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../runners/core/construction/SplittableParDo.java | 14 
 .../beam/sdk/transforms/SplittableDoFnTest.java| 25 +-
 2 files changed, 10 insertions(+), 29 deletions(-)



Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle #654

2018-07-09 Thread Apache Jenkins Server
See 


--
[...truncated 17.49 MB...]
INFO: Adding 
View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly
 as step s7
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
View.AsSingleton/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow)
 as step s8
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
View.AsSingleton/Combine.GloballyAsSingletonView/CreateDataflowView as step s9
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding Create123/Read(CreateSource) as step s10
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding OutputSideInputs as step s11
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/Window.Into()/Window.Assign as step 
s12
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/GatherAllOutputs/Reify.Window/ParDo(Anonymous) as step 
s13
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map 
as step s14
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign as step 
s15
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GatherAllOutputs/GroupByKey as step 
s16
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GatherAllOutputs/Values/Values/Map as 
step s17
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/RewindowActuals/Window.Assign as step 
s18
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/KeyForDummy/AddKeys/Map as step s19
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/RemoveActualsTriggering/Flatten.PCollections as step 
s20
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/Create.Values/Read(CreateSource) as 
step s21
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/WindowIntoDummy/Window.Assign as step 
s22
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
PAssert$33/GroupGlobally/RemoveDummyTriggering/Flatten.PCollections as step s23
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/FlattenDummyAndContents as step s24
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/NeverTrigger/Flatten.PCollections as 
step s25
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/GroupDummyAndContents as step s26
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/Values/Values/Map as step s27
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GroupGlobally/ParDo(Concat) as step s28
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/GetPane/Map as step s29
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/RunChecks as step s30
Jul 09, 2018 8:04:16 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding PAssert$33/VerifyAssertions/ParDo(DefaultConclude) as step s31
Jul 09, 2018 8:04:16 PM 

[beam] branch master updated (f063b15 -> cc4346c)

2018-07-09 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from f063b15  Logging relies on StateSampler for context
 add 10d0f58  Remove :buildSrc from settings.gradle since it's not a 
subproject
 new cc4346c  Remove :buildSrc from settings.gradle since it's not a 
subproject

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 settings.gradle | 1 -
 1 file changed, 1 deletion(-)



[beam] 01/01: Remove :buildSrc from settings.gradle since it's not a subproject

2018-07-09 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit cc4346ca9583ed2c9752fda8325b89d46e669a18
Merge: f063b15 10d0f58
Author: Lukasz Cwik 
AuthorDate: Mon Jul 9 13:05:03 2018 -0700

Remove :buildSrc from settings.gradle since it's not a subproject

 settings.gradle | 1 -
 1 file changed, 1 deletion(-)



[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120990
 ]

ASF GitHub Bot logged work on BEAM-4742:


Author: ASF GitHub Bot
Created on: 09/Jul/18 20:01
Start Date: 09/Jul/18 20:01
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #5903: [BEAM-4742] 
mkdirs if they don't exist in localfilesystem
URL: https://github.com/apache/beam/pull/5903#issuecomment-403602098
 
 
   OK, I'm reorienting this PR around 
[BEAM-4747](https://issues.apache.org/jira/browse/BEAM-4747) and will respond 
to your comments above shortly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120990)
Time Spent: 2h 10m  (was: 2h)

> Allow custom docker-image in portable wordcount example
> ---
>
> Key: BEAM-4742
> URL: https://issues.apache.org/jira/browse/BEAM-4742
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 2.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I hit a couple snags [running the portable wordcount 
> example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]:
>  * -[the default docker image is hard-coded to a bintray 
> URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68],
>  but I published my image to Docker Hub- I missed that [there's already a 
> pipeline option for 
> this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks 
> [~lcwik]
>  * the default output path is in a temporary directory that doesn't exist at 
> the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or 
> directory}} 
> I'll send a PR with fixes to each of these shortly.
> I've also not found where to observe output from successfully running the 
> example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4747) Python LocalFileSystem directory-creation semantics

2018-07-09 Thread Ryan Williams (JIRA)
Ryan Williams created BEAM-4747:
---

 Summary: Python LocalFileSystem directory-creation semantics
 Key: BEAM-4747
 URL: https://issues.apache.org/jira/browse/BEAM-4747
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Affects Versions: 2.5.0
Reporter: Ryan Williams
Assignee: Ryan Williams


Coming out of discussion on 
[BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742] / 
[#5903|https://github.com/apache/beam/pull/5903] is a question of whether 
{{LocalFileSystem.open,create,copy,rename}} should create 
intermediate (destination) directories, or fail with {{IOError}}'s (as the 
stdlib {{os}} module generally will).

If the semantics of {{LocalFileSystem}} should mimic those of distributed 
filesystems (in the spirit of [recent discussion about {{DirectRunner}} being 
more like a local simulation of a distributed runner than a production-grade 
local runner|https://www.mail-archive.com/dev@beam.apache.org/msg08410.html]), 
then this makes sense, and it sounds like [~lcwik] and [~angoenka] are in favor 
of this interpretation.

I'll repurpose [#5903|https://github.com/apache/beam/pull/5903] to this end 
unless I hear otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=120986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120986
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:52
Start Date: 09/Jul/18 19:52
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #5900: [BEAM-4003] Fix 
missing iteritems import
URL: https://github.com/apache/beam/pull/5900#issuecomment-403599535
 
 
   LGTM; Postcommit failure seems to be another issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120986)
Time Spent: 9.5h  (was: 9h 20m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4742) Allow custom docker-image in portable wordcount example

2018-07-09 Thread Ryan Williams (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Williams resolved BEAM-4742.
-
   Resolution: Not A Problem
Fix Version/s: 2.5.0

> Allow custom docker-image in portable wordcount example
> ---
>
> Key: BEAM-4742
> URL: https://issues.apache.org/jira/browse/BEAM-4742
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 2.5.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I hit a couple snags [running the portable wordcount 
> example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]:
>  * -[the default docker image is hard-coded to a bintray 
> URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68],
>  but I published my image to Docker Hub- I missed that [there's already a 
> pipeline option for 
> this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks 
> [~lcwik]
>  * the default output path is in a temporary directory that doesn't exist at 
> the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or 
> directory}} 
> I'll send a PR with fixes to each of these shortly.
> I've also not found where to observe output from successfully running the 
> example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4742) Allow custom docker-image in portable wordcount example

2018-07-09 Thread Ryan Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537466#comment-16537466
 ] 

Ryan Williams commented on BEAM-4742:
-

OK these were both basically mistakes on my part: 
{{FileBasedSink.initialize_write }} [creates the directories containing the 
output 
path|https://github.com/apache/beam/blob/a8eaa1b3ec0544de8b56dd504bc249d4c1a2017f/sdks/python/apache_beam/io/filebasedsink.py#L162],
 so [~angoenka]'s guess as to why I might've seen the {{IOError}} above is that 
I had a typo in the output path and was trying to create a top-level directory 
I didn't have permissions for (e.g. {{/tmpz}}).

On the point about observing the output of a portable wordcount example, the 
output gets written to the filesystem inside the docker container; {{docker 
ps}} will display the ID of the container, {{docker exec -it  bash}} will 
attach a shell to it, and then {{ls}} etc will allow inspecting the output 
files.

[The metrics normally collected/logged when running wordcount in 
{{DirectRunner}}|https://github.com/apache/beam/blob/f063b157eea480d079c4e966e528eef050a0c192/sdks/python/apache_beam/examples/wordcount.py#L121-L134]
 are not collected and/or not output in {{PortableRunner}} atm.

> Allow custom docker-image in portable wordcount example
> ---
>
> Key: BEAM-4742
> URL: https://issues.apache.org/jira/browse/BEAM-4742
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I hit a couple snags [running the portable wordcount 
> example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]:
>  * -[the default docker image is hard-coded to a bintray 
> URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68],
>  but I published my image to Docker Hub- I missed that [there's already a 
> pipeline option for 
> this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks 
> [~lcwik]
>  * the default output path is in a temporary directory that doesn't exist at 
> the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or 
> directory}} 
> I'll send a PR with fixes to each of these shortly.
> I've also not found where to observe output from successfully running the 
> example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120985
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:47
Start Date: 09/Jul/18 19:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403598107
 
 
   Merged. Thanks @charlesccychen for reviewing the large-ish change : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120985)
Time Spent: 19h 10m  (was: 19h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120984
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:46
Start Date: 09/Jul/18 19:46
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/common.pxd 
b/sdks/python/apache_beam/runners/common.pxd
index 5c5eba22732..4bb226492ba 100644
--- a/sdks/python/apache_beam/runners/common.pxd
+++ b/sdks/python/apache_beam/runners/common.pxd
@@ -81,9 +81,7 @@ cdef class PerWindowInvoker(DoFnInvoker):
 
 cdef class DoFnRunner(Receiver):
   cdef DoFnContext context
-  cdef LoggingContext logging_context
   cdef object step_name
-  cdef ScopedMetricsContainer scoped_metrics_container
   cdef list side_inputs
   cdef DoFnInvoker do_fn_invoker
 
@@ -112,15 +110,5 @@ cdef class DoFnContext(object):
   cpdef set_element(self, WindowedValue windowed_value)
 
 
-cdef class LoggingContext(object):
-  # TODO(robertwb): Optimize "with [cdef class]"
-  cpdef enter(self)
-  cpdef exit(self)
-
-
-cdef class _LoggingContextAdapter(LoggingContext):
-  cdef object underlying
-
-
 cdef class _ReceiverAdapter(Receiver):
   cdef object underlying
diff --git a/sdks/python/apache_beam/runners/common.py 
b/sdks/python/apache_beam/runners/common.py
index d5f35de988f..88745c778e3 100644
--- a/sdks/python/apache_beam/runners/common.py
+++ b/sdks/python/apache_beam/runners/common.py
@@ -119,16 +119,6 @@ def logging_name(self):
 return self.user_name
 
 
-class LoggingContext(object):
-  """For internal use only; no backwards-compatibility guarantees."""
-
-  def enter(self):
-pass
-
-  def exit(self):
-pass
-
-
 class Receiver(object):
   """For internal use only; no backwards-compatibility guarantees.
 
@@ -551,20 +541,15 @@ def __init__(self,
   windowing: windowing properties of the output PCollection(s)
   tagged_receivers: a dict of tag name to Receiver objects
   step_name: the name of this step
-  logging_context: a LoggingContext object
+  logging_context: DEPRECATED [BEAM-4728]
   state: handle for accessing DoFn state
-  scoped_metrics_container: Context switcher for metrics container
+  scoped_metrics_container: DEPRECATED
   operation_name: The system name assigned by the runner for this 
operation.
 """
 # Need to support multiple iterations.
 side_inputs = list(side_inputs)
 
-from apache_beam.metrics.execution import ScopedMetricsContainer
-
-self.scoped_metrics_container = (
-scoped_metrics_container or ScopedMetricsContainer())
 self.step_name = step_name
-self.logging_context = logging_context or LoggingContext()
 self.context = DoFnContext(step_name, state=state)
 
 do_fn_signature = DoFnSignature(fn)
@@ -595,26 +580,16 @@ def receive(self, windowed_value):
 
   def process(self, windowed_value):
 try:
-  self.logging_context.enter()
-  self.scoped_metrics_container.enter()
   self.do_fn_invoker.invoke_process(windowed_value)
 except BaseException as exn:
   self._reraise_augmented(exn)
-finally:
-  self.scoped_metrics_container.exit()
-  self.logging_context.exit()
 
   def _invoke_bundle_method(self, bundle_method):
 try:
-  self.logging_context.enter()
-  self.scoped_metrics_container.enter()
   self.context.set_element(None)
   bundle_method()
 except BaseException as exn:
   self._reraise_augmented(exn)
-finally:
-  self.scoped_metrics_container.exit()
-  self.logging_context.exit()
 
   def start(self):
 self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle)
diff --git a/sdks/python/apache_beam/runners/worker/bundle_processor.py 
b/sdks/python/apache_beam/runners/worker/bundle_processor.py
index 4193ea2debb..958731d0ce4 100644
--- a/sdks/python/apache_beam/runners/worker/bundle_processor.py
+++ b/sdks/python/apache_beam/runners/worker/bundle_processor.py
@@ -63,13 +63,12 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self, operation_name, step_name, consumers, counter_factory,
+  def __init__(self, name_context, step_name, consumers, counter_factory,
state_sampler, windowed_coder, target, data_channel):
 super(RunnerIOOperation, self).__init__(
-operation_name, None, counter_factory, 

[beam] branch master updated: Logging relies on StateSampler for context

2018-07-09 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new f063b15  Logging relies on StateSampler for context
f063b15 is described below

commit f063b157eea480d079c4e966e528eef050a0c192
Author: Pablo 
AuthorDate: Mon May 14 11:29:21 2018 -0700

Logging relies on StateSampler for context
---
 sdks/python/apache_beam/runners/common.pxd | 12 ---
 sdks/python/apache_beam/runners/common.py  | 29 ++--
 .../apache_beam/runners/worker/bundle_processor.py | 11 +++---
 sdks/python/apache_beam/runners/worker/logger.pxd  | 25 --
 sdks/python/apache_beam/runners/worker/logger.py   | 17 ++
 .../apache_beam/runners/worker/logger_test.py  | 39 ++
 .../apache_beam/runners/worker/operation_specs.py  |  2 +-
 .../apache_beam/runners/worker/operations.py   | 18 +++---
 .../apache_beam/runners/worker/statesampler.py | 31 +++--
 .../runners/worker/statesampler_fast.pxd   |  4 ++-
 .../runners/worker/statesampler_fast.pyx   | 21 +---
 .../runners/worker/statesampler_slow.py| 20 +++
 12 files changed, 109 insertions(+), 120 deletions(-)

diff --git a/sdks/python/apache_beam/runners/common.pxd 
b/sdks/python/apache_beam/runners/common.pxd
index 5c5eba2..4bb2264 100644
--- a/sdks/python/apache_beam/runners/common.pxd
+++ b/sdks/python/apache_beam/runners/common.pxd
@@ -81,9 +81,7 @@ cdef class PerWindowInvoker(DoFnInvoker):
 
 cdef class DoFnRunner(Receiver):
   cdef DoFnContext context
-  cdef LoggingContext logging_context
   cdef object step_name
-  cdef ScopedMetricsContainer scoped_metrics_container
   cdef list side_inputs
   cdef DoFnInvoker do_fn_invoker
 
@@ -112,15 +110,5 @@ cdef class DoFnContext(object):
   cpdef set_element(self, WindowedValue windowed_value)
 
 
-cdef class LoggingContext(object):
-  # TODO(robertwb): Optimize "with [cdef class]"
-  cpdef enter(self)
-  cpdef exit(self)
-
-
-cdef class _LoggingContextAdapter(LoggingContext):
-  cdef object underlying
-
-
 cdef class _ReceiverAdapter(Receiver):
   cdef object underlying
diff --git a/sdks/python/apache_beam/runners/common.py 
b/sdks/python/apache_beam/runners/common.py
index d5f35de..88745c7 100644
--- a/sdks/python/apache_beam/runners/common.py
+++ b/sdks/python/apache_beam/runners/common.py
@@ -119,16 +119,6 @@ class DataflowNameContext(NameContext):
 return self.user_name
 
 
-class LoggingContext(object):
-  """For internal use only; no backwards-compatibility guarantees."""
-
-  def enter(self):
-pass
-
-  def exit(self):
-pass
-
-
 class Receiver(object):
   """For internal use only; no backwards-compatibility guarantees.
 
@@ -551,20 +541,15 @@ class DoFnRunner(Receiver):
   windowing: windowing properties of the output PCollection(s)
   tagged_receivers: a dict of tag name to Receiver objects
   step_name: the name of this step
-  logging_context: a LoggingContext object
+  logging_context: DEPRECATED [BEAM-4728]
   state: handle for accessing DoFn state
-  scoped_metrics_container: Context switcher for metrics container
+  scoped_metrics_container: DEPRECATED
   operation_name: The system name assigned by the runner for this 
operation.
 """
 # Need to support multiple iterations.
 side_inputs = list(side_inputs)
 
-from apache_beam.metrics.execution import ScopedMetricsContainer
-
-self.scoped_metrics_container = (
-scoped_metrics_container or ScopedMetricsContainer())
 self.step_name = step_name
-self.logging_context = logging_context or LoggingContext()
 self.context = DoFnContext(step_name, state=state)
 
 do_fn_signature = DoFnSignature(fn)
@@ -595,26 +580,16 @@ class DoFnRunner(Receiver):
 
   def process(self, windowed_value):
 try:
-  self.logging_context.enter()
-  self.scoped_metrics_container.enter()
   self.do_fn_invoker.invoke_process(windowed_value)
 except BaseException as exn:
   self._reraise_augmented(exn)
-finally:
-  self.scoped_metrics_container.exit()
-  self.logging_context.exit()
 
   def _invoke_bundle_method(self, bundle_method):
 try:
-  self.logging_context.enter()
-  self.scoped_metrics_container.enter()
   self.context.set_element(None)
   bundle_method()
 except BaseException as exn:
   self._reraise_augmented(exn)
-finally:
-  self.scoped_metrics_container.exit()
-  self.logging_context.exit()
 
   def start(self):
 self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle)
diff --git a/sdks/python/apache_beam/runners/worker/bundle_processor.py 
b/sdks/python/apache_beam/runners/worker/bundle_processor.py
index 4193ea2..958731d 100644
--- a/sdks/python/apache_beam/runners/worker/bundle_processor.py
+++ 

[jira] [Work logged] (BEAM-4733) Python portable runner to pass pipeline options to job service

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4733?focusedWorklogId=120982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120982
 ]

ASF GitHub Bot logged work on BEAM-4733:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:42
Start Date: 09/Jul/18 19:42
Worklog Time Spent: 10m 
  Work Description: tweise closed pull request #5888: [BEAM-4733] Pass 
pipeline options from Python portable runner to job server.
URL: https://github.com/apache/beam/pull/5888
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java
index 4cdca616308..efe61e7d389 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslation.java
@@ -18,14 +18,24 @@
 
 package org.apache.beam.runners.core.construction;
 
+import com.fasterxml.jackson.core.TreeNode;
 import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.CaseFormat;
+import com.google.common.collect.ImmutableMap;
 import com.google.protobuf.Struct;
 import com.google.protobuf.util.JsonFormat;
 import java.io.IOException;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.util.common.ReflectHelpers;
 
-/** Utilities for going to/from Runner API pipeline options. */
+/**
+ * Utilities for going to/from Runner API pipeline options.
+ *
+ * TODO: Make this the default to/from translation for PipelineOptions.
+ */
 public class PipelineOptionsTranslation {
   private static final ObjectMapper MAPPER =
   new ObjectMapper()
@@ -34,10 +44,26 @@
   /** Converts the provided {@link PipelineOptions} to a {@link Struct}. */
   public static Struct toProto(PipelineOptions options) {
 Struct.Builder builder = Struct.newBuilder();
+
 try {
+  // TODO: Officially define URNs for options and their scheme.
+  TreeNode treeNode = MAPPER.valueToTree(options);
+  TreeNode rootOptions = treeNode.get("options");
+  Iterator optionsKeys = rootOptions.fieldNames();
+  Map optionsUsingUrns = new HashMap<>();
+  while (optionsKeys.hasNext()) {
+String optionKey = optionsKeys.next();
+TreeNode optionValue = rootOptions.get(optionKey);
+optionsUsingUrns.put(
+"beam:option:"
++ CaseFormat.LOWER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, 
optionKey)
++ ":v1",
+optionValue);
+  }
+
   // The JSON format of a Protobuf Struct is the JSON object that is 
equivalent to that struct
   // (with values encoded in a standard json-codeable manner). See Beam PR 
3719 for more.
-  JsonFormat.parser().merge(MAPPER.writeValueAsString(options), builder);
+  JsonFormat.parser().merge(MAPPER.writeValueAsString(optionsUsingUrns), 
builder);
   return builder.build();
 } catch (IOException e) {
   throw new RuntimeException(e);
@@ -46,6 +72,20 @@ public static Struct toProto(PipelineOptions options) {
 
   /** Converts the provided {@link Struct} into {@link PipelineOptions}. */
   public static PipelineOptions fromProto(Struct protoOptions) throws 
IOException {
-return MAPPER.readValue(JsonFormat.printer().print(protoOptions), 
PipelineOptions.class);
+Map mapWithoutUrns = new HashMap<>();
+TreeNode rootOptions = 
MAPPER.readTree(JsonFormat.printer().print(protoOptions));
+Iterator optionsKeys = rootOptions.fieldNames();
+while (optionsKeys.hasNext()) {
+  String optionKey = optionsKeys.next();
+  TreeNode optionValue = rootOptions.get(optionKey);
+  mapWithoutUrns.put(
+  CaseFormat.LOWER_UNDERSCORE.to(
+  CaseFormat.LOWER_CAMEL,
+  optionKey.substring("beam:option:".length(), optionKey.length() 
- ":v1".length())),
+  optionValue);
+}
+return MAPPER.readValue(
+MAPPER.writeValueAsString(ImmutableMap.of("options", mapWithoutUrns)),
+PipelineOptions.class);
   }
 }
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslationTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineOptionsTranslationTest.java
index adf4759733c..0faa1db2984 100644
--- 

[beam] 01/01: Merge pull request #5888: [BEAM-4733] Pass pipeline options from Python portable runner to job server.

2018-07-09 Thread thw
This is an automated email from the ASF dual-hosted git repository.

thw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 930cc53c91e5f65c4ed5008da5493d8e0d787364
Merge: a8eaa1b fe9c3c5
Author: Thomas Weise 
AuthorDate: Mon Jul 9 21:42:55 2018 +0200

Merge pull request #5888: [BEAM-4733] Pass pipeline options from Python 
portable runner to job server.

 .../construction/PipelineOptionsTranslation.java   | 46 --
 .../PipelineOptionsTranslationTest.java|  4 +-
 .../FlinkStreamingPortablePipelineTranslator.java  |  7 +++-
 .../streaming/ExecutableStageDoFnOperator.java | 18 ++---
 .../streaming/ExecutableStageDoFnOperatorTest.java | 20 +-
 .../runners/portability/portable_runner.py |  9 -
 6 files changed, 80 insertions(+), 24 deletions(-)

diff --cc 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
index b24c7aa,cf8e734..cc0ed88
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
@@@ -20,9 -20,9 +20,8 @@@ package org.apache.beam.runners.flink.t
  import static org.apache.flink.util.Preconditions.checkState;
  
  import java.util.Collection;
- import java.util.HashMap;
  import java.util.List;
  import java.util.Map;
 -import java.util.logging.Logger;
  import javax.annotation.concurrent.GuardedBy;
  import org.apache.beam.model.pipeline.v1.RunnerApi;
  import org.apache.beam.runners.core.DoFnRunner;



[beam] branch master updated (a8eaa1b -> 930cc53)

2018-07-09 Thread thw
This is an automated email from the ASF dual-hosted git repository.

thw pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a8eaa1b  Merge pull request #4741 from boyuanzz/output_counter
 add aedba8d  Use the beam:option:value:v1 as the portable pipeline options 
representation
 add 4226aa2  [BEAM-4733] Pass pipeline options from Python portable runner 
to job server.
 add fe9c3c5  Fix Flink portable streaming translation executable stage 
output mapping.
 new 930cc53  Merge pull request #5888: [BEAM-4733] Pass pipeline options 
from Python portable runner to job server.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../construction/PipelineOptionsTranslation.java   | 46 --
 .../PipelineOptionsTranslationTest.java|  4 +-
 .../FlinkStreamingPortablePipelineTranslator.java  |  7 +++-
 .../streaming/ExecutableStageDoFnOperator.java | 18 ++---
 .../streaming/ExecutableStageDoFnOperatorTest.java | 20 +-
 .../runners/portability/portable_runner.py |  9 -
 6 files changed, 80 insertions(+), 24 deletions(-)



[jira] [Work logged] (BEAM-4003) Futurize and fix python 2 compatibility for runners subpackage

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4003?focusedWorklogId=120973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120973
 ]

ASF GitHub Bot logged work on BEAM-4003:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:17
Start Date: 09/Jul/18 19:17
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5900: [BEAM-4003] 
Fix missing iteritems import
URL: https://github.com/apache/beam/pull/5900#issuecomment-403590405
 
 
   R: @tvalentyn 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120973)
Time Spent: 9h 20m  (was: 9h 10m)

> Futurize and fix python 2 compatibility for runners subpackage
> --
>
> Key: BEAM-4003
> URL: https://issues.apache.org/jira/browse/BEAM-4003
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4729) Conditionally propagate local GCS credentials to locally spawned docker images.

2018-07-09 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537427#comment-16537427
 ] 

Luke Cwik commented on BEAM-4729:
-

Unless we want to design and build/integrate a KMS solution for Apache Beam. We 
should follow the same pattern that the AWS module does and allow for users to 
specify the credentials provider: 
https://github.com/apache/beam/blob/a8eaa1b3ec0544de8b56dd504bc249d4c1a2017f/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java#L79

We currently have a really outdated *CredentialFactory* class: 
[https://github.com/apache/beam/blob/451af5133bc0a6416afa7b1844833c153f510181/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/CredentialFactory.java]

We should consider replacing this with the CredentialsProvider implementation 
that is part of GAX 
[http://googleapis.github.io/gax-java/1.28.0/apidocs/com/google/api/gax/core/CredentialsProvider.html]
 instead of rolling our own.

Regardless of which credentials provider we use, we'll need to create one which 
is able to serialize the credentials through in a way which is likely going to 
follow one of the simple credentials provider classes like 
*AWSStaticCredentialsProvider* or something similar.

 

> Conditionally propagate local GCS credentials to locally spawned docker 
> images.
> ---
>
> Key: BEAM-4729
> URL: https://issues.apache.org/jira/browse/BEAM-4729
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-harness
>Reporter: Robert Bradshaw
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4729) Conditionally propagate local GCS credentials to locally spawned docker images.

2018-07-09 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537427#comment-16537427
 ] 

Luke Cwik edited comment on BEAM-4729 at 7/9/18 7:15 PM:
-

Unless we want to design and build/integrate a KMS solution for Apache Beam. We 
should follow the same pattern that the AWS module does and allow for users to 
specify the credentials provider: 
[https://github.com/apache/beam/blob/a8eaa1b3ec0544de8b56dd504bc249d4c1a2017f/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java#L79]

We currently have a really outdated *CredentialFactory* class: 
[https://github.com/apache/beam/blob/451af5133bc0a6416afa7b1844833c153f510181/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/CredentialFactory.java]

We should consider replacing this with the *CredentialsProvider* implementation 
that is part of GAX 
[http://googleapis.github.io/gax-java/1.28.0/apidocs/com/google/api/gax/core/CredentialsProvider.html]
 instead of rolling our own.

Regardless of which credentials provider we use, we'll need to create one which 
is able to serialize the credentials through in a way which is likely going to 
follow one of the simple credentials provider classes like 
*AWSStaticCredentialsProvider* or something similar.

 


was (Author: lcwik):
Unless we want to design and build/integrate a KMS solution for Apache Beam. We 
should follow the same pattern that the AWS module does and allow for users to 
specify the credentials provider: 
https://github.com/apache/beam/blob/a8eaa1b3ec0544de8b56dd504bc249d4c1a2017f/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java#L79

We currently have a really outdated *CredentialFactory* class: 
[https://github.com/apache/beam/blob/451af5133bc0a6416afa7b1844833c153f510181/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/auth/CredentialFactory.java]

We should consider replacing this with the CredentialsProvider implementation 
that is part of GAX 
[http://googleapis.github.io/gax-java/1.28.0/apidocs/com/google/api/gax/core/CredentialsProvider.html]
 instead of rolling our own.

Regardless of which credentials provider we use, we'll need to create one which 
is able to serialize the credentials through in a way which is likely going to 
follow one of the simple credentials provider classes like 
*AWSStaticCredentialsProvider* or something similar.

 

> Conditionally propagate local GCS credentials to locally spawned docker 
> images.
> ---
>
> Key: BEAM-4729
> URL: https://issues.apache.org/jira/browse/BEAM-4729
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-harness
>Reporter: Robert Bradshaw
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=120969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120969
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:11
Start Date: 09/Jul/18 19:11
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201115343
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
   common_job_properties.enablePhraseTriggeringFromPullRequest(
   delegate,
   './gradlew publish',
   'Run Gradle Publish')
 
+  // Allows triggering this build command against pull requests.
 
 Review comment:
   Sorry for the misunderstanding. Yes this phrase triggers 'publish'. I think 
it's better to delete these phrases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120969)
Time Spent: 1h 40m  (was: 1.5h)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=120967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120967
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:05
Start Date: 09/Jul/18 19:05
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201113802
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
   common_job_properties.enablePhraseTriggeringFromPullRequest(
   delegate,
   './gradlew publish',
   'Run Gradle Publish')
 
+  // Allows triggering this build command against pull requests.
+  common_job_properties.enablePhraseTriggeringFromPullRequest(
+  delegate,
+  './gradlew clean build',
+  'Run Gradle Build')
+
   steps {
+gradle {
+  rootBuildScriptDir(common_job_properties.checkoutDir)
+  tasks('clean')
+}
+gradle {
+  rootBuildScriptDir(common_job_properties.checkoutDir)
+  tasks('build')
+  common_job_properties.setGradleSwitches(delegate)
+  switches('--no-parallel')
 
 Review comment:
   I met some errors only occurring with 'build --no-parallel' and I'm trying 
to keep this build as same as what we do in release. The only concern with 
'--no-parallel' is it may cost more time, but since it runs nightly, maybe it's 
not an issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120967)
Time Spent: 1.5h  (was: 1h 20m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=120965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120965
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:00
Start Date: 09/Jul/18 19:00
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201112484
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
 
 Review comment:
   Agree.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120965)
Time Spent: 1h 20m  (was: 1h 10m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=120964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120964
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:00
Start Date: 09/Jul/18 19:00
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #5868: 
[BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201112466
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
   common_job_properties.enablePhraseTriggeringFromPullRequest(
   delegate,
   './gradlew publish',
   'Run Gradle Publish')
 
+  // Allows triggering this build command against pull requests.
 
 Review comment:
   I think 'Run Gradle Build' just triggers './gradlew clean build', which 
excludes 'publish'


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120964)
Time Spent: 1h 10m  (was: 1h)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120962
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:54
Start Date: 09/Jul/18 18:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403583667
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120962)
Time Spent: 18h 50m  (was: 18h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120961
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:53
Start Date: 09/Jul/18 18:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403583607
 
 
   Squashed commits and resolved conflicts. Reruning tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120961)
Time Spent: 18h 40m  (was: 18.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=120960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120960
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:48
Start Date: 09/Jul/18 18:48
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on a change in pull request 
#5868: [BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201109056
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
 
 Review comment:
   The publishing to the snapshot should not be triggerable by a trigger 
phrase, since it affects the state of anyone relying on the snapshot. Delete 
this trigger?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120960)
Time Spent: 1h  (was: 50m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4718) ./gradlew build should run nightly before ./gradlew publish

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4718?focusedWorklogId=120959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120959
 ]

ASF GitHub Bot logged work on BEAM-4718:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:47
Start Date: 09/Jul/18 18:47
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on a change in pull request 
#5868: [BEAM-4718]Run gradle build before publish
URL: https://github.com/apache/beam/pull/5868#discussion_r201108767
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_NightlySnapshot.groovy
 ##
 @@ -36,13 +36,29 @@ job('beam_Release_Gradle_NightlySnapshot') {
   'd...@beam.apache.org')
 
 
-  // Allows triggering this build against pull requests.
+  // Allows triggering this publish command against pull requests.
   common_job_properties.enablePhraseTriggeringFromPullRequest(
   delegate,
   './gradlew publish',
   'Run Gradle Publish')
 
+  // Allows triggering this build command against pull requests.
 
 Review comment:
   This "Run Gradle Build" trigger seems misleading, since it will also 
publish. Maybe delete it, or set up a separate jenkins job to just build with 
that trigger phrase.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120959)
Time Spent: 50m  (was: 40m)

> ./gradlew build should run nightly before ./gradlew publish
> ---
>
> Key: BEAM-4718
> URL: https://issues.apache.org/jira/browse/BEAM-4718
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #1040

2018-07-09 Thread Apache Jenkins Server
See 


--
[...truncated 19.42 MB...]
INFO: 2018-07-09T18:40:27.889Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.069Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.098Z: Elided trivial flatten 
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.122Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.152Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.182Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.205Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.237Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.259Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.283Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.313Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.343Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
Jul 09, 2018 6:40:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-07-09T18:40:28.374Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 

[jira] [Work logged] (BEAM-4742) Allow custom docker-image in portable wordcount example

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4742?focusedWorklogId=120958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120958
 ]

ASF GitHub Bot logged work on BEAM-4742:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:46
Start Date: 09/Jul/18 18:46
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #5903: [BEAM-4742] 
mkdirs if they don't exist in localfilesystem
URL: https://github.com/apache/beam/pull/5903#issuecomment-403581322
 
 
   Good point, probably worth fixing in `rename`/`copy` as well.
   
   Confusingly, I'm now seeing the portable wordcount example not fail without 
this change… let me try to get an answer one way or another there, and likely 
file a different JIRA to link this against, as well as addressing your changes.
   
   If I confirm that this isn't actually an issue in the portable wordcount 
example, and that means you don't think it's worth making this change at all, 
let me know, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120958)
Time Spent: 2h  (was: 1h 50m)

> Allow custom docker-image in portable wordcount example
> ---
>
> Key: BEAM-4742
> URL: https://issues.apache.org/jira/browse/BEAM-4742
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I hit a couple snags [running the portable wordcount 
> example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]:
>  * -[the default docker image is hard-coded to a bintray 
> URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68],
>  but I published my image to Docker Hub- I missed that [there's already a 
> pipeline option for 
> this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks 
> [~lcwik]
>  * the default output path is in a temporary directory that doesn't exist at 
> the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or 
> directory}} 
> I'll send a PR with fixes to each of these shortly.
> I've also not found where to observe output from successfully running the 
> example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >