[jira] [Work logged] (BEAM-8617) Tear down the DoFns upon the control service termination in Python SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8617?focusedWorklogId=347972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347972
 ]

ASF GitHub Bot logged work on BEAM-8617:


Author: ASF GitHub Bot
Created on: 22/Nov/19 07:00
Start Date: 22/Nov/19 07:00
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10073: [BEAM-8617] 
Tear down the DoFns upon the control service termination in Python SDK harness.
URL: https://github.com/apache/beam/pull/10073#issuecomment-557416225
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347972)
Time Spent: 40m  (was: 0.5h)

> Tear down the DoFns upon the control service termination in Python SDK harness
> --
>
> Key: BEAM-8617
> URL: https://issues.apache.org/jira/browse/BEAM-8617
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML can be found [1], the teardown of DoFns should 
> be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support to teardown the DoFns upon the control 
> service termination for Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:12 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}

Installed {{sudo apt-get install python3-dev}}. It didn't work.



was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFU

[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:10 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}

Installed {{sudo apt-get install python3-dev}}



was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat

[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:09 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python

[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:06 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


:sdks:python:test-suites:tox:py35:testPy35Cython


{noformat}
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].



> Python setup issues
> ---
>
> Key: BEAM-8787
> URL: https://issues.apache.org/jira/browse/BEAM-8787
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: debian x86 (gLinux)
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> This could be an issue with incomplete or inaccurate contributing docs. tldr; 

[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347918
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 05:00
Start Date: 22/Nov/19 05:00
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10161: [BEAM-8746] 
Make local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r349433786
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   ok, I've been playing around with this, and the main impediment to keeping 
this design simple is that there are two separate hostnames required, one for 
opening the port for the server, and one which is delivered to the client for 
reconnecting to the staging service.
   
   I think it'd be nice to prevent people from having to figure out all this 
stuff again, because it's pretty frustrating to get it right, so here's my best 
effort at a compromise between making this configurable and making it "just 
work".
   
   ```python
 def get_hostname(self):
   """Return the host name at which this server will be accessible.
   
   In particular, this is provided to the client as the
   artifact staging endpoint.
   """
   return 'localhost'
   
 def start_grpc_server(self, port=0):
   self._server = grpc.server(UnboundedThreadPoolExecutor())
   hostname = self.get_hostname()
   # either open this up to the world, or lock it down to localhost
   if os.environ.get('DOCKER_MAC_CONTAINER') == '1' or hostname != 
'localhost':
 service_address = '[::]'
   else:
 service_address = 'localhost'
   port = self._server.add_insecure_port('%s:%d' % (service_address, port))
   beam_job_api_pb2_grpc.add_JobServiceServicer_to_server(self, 
self._server)
   beam_artifact_api_pb2_grpc.add_ArtifactStagingServiceServicer_to_server(
   self._artifact_service, self._server)
   self._artifact_staging_endpoint = endpoints_pb2.ApiServiceDescriptor(
   url='%s:%d' % (hostname, port))
   self._server.start()
   _LOGGER.info('Grpc server started at %s on port %d' % (hostname, port))
   return port
   ```
   
   What do you think?  The other option is that I just copy all of 
`start_grpc_server` into my sub-class.  It's not the end of the world if that's 
the decision we come to. 
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347918)
Time Spent: 1.5h  (was: 1h 20m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-21 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979820#comment-16979820
 ] 

Kenneth Knowles commented on BEAM-8747:
---

Nice work! Fully qualified names definitely circumvent my {{grep}} analysis.

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347897
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 22/Nov/19 03:17
Start Date: 22/Nov/19 03:17
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #10175: 
[BEAM-7390] Add code snippet for Max
URL: https://github.com/apache/beam/pull/10175#discussion_r349418508
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py
 ##
 @@ -0,0 +1,60 @@
+# coding=utf-8
 
 Review comment:
   Yes, these are for the [transform 
catalog](https://beam.apache.org/documentation/transforms/python/overview/), so 
even if it is very similar content, it is useful to see examples for every 
single transform and how to use them in different circumstances. We already 
have the [element-wise 
transforms](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise)
 completed with their docs, code samples and interactive notebooks. 
([example](https://beam.apache.org/documentation/transforms/python/elementwise/map/))
   
   We are organizing them as one file per transform, for consistency with how 
the docs are organized. I was going to add reviewers after tests passed since 
Ahmet is OOO
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347897)
Time Spent: 4h 20m  (was: 4h 10m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 3:11 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}


h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}

Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails.
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]

apt-get install libbz2-dev


> Python setup issues
> ---
>
> Key: BEAM-8787
> URL: https://issues.apache.org/jira/browse/BEAM-8787
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: debian x86 (gLinux)
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> This could be an issue with incomplete or inaccurate contributing docs. tldr; 
> `./gradlew check` fails on Debian after initial checkout.
> The docs say that one should first run:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> but even after running this pieces are missing. I'm still debugging exactly 
> what's missing but the sy

[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347895&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347895
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 03:10
Start Date: 22/Nov/19 03:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10194: [BEAM-7594] 
Fix flaky filename generation
URL: https://github.com/apache/beam/pull/10194#discussion_r349417305
 
 

 ##
 File path: sdks/python/apache_beam/io/textio_test.py
 ##
 @@ -101,17 +100,19 @@ def write_data(
 return f.name, [line.decode('utf-8') for line in all_data]
 
 
-def write_pattern(lines_per_file, no_data=False):
+def write_pattern(lines_per_file, no_data=False, return_filenames=False):
 
 Review comment:
   It might be nice to change all of the uses of the function, instead of 
conditionally changing the return type. But I won't block the PR on this : ) 
LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347895)
Time Spent: 40m  (was: 0.5h)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam

[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347896
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 03:10
Start Date: 22/Nov/19 03:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10194: [BEAM-7594] Fix 
flaky filename generation
URL: https://github.com/apache/beam/pull/10194#issuecomment-557368681
 
 
   LGTM except for the one comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347896)
Time Spent: 50m  (was: 40m)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1471, in process_bundle
> 22:05:08 result_future = 
> self._controller.control_handler.push(process_bundle_req)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347894
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 03:07
Start Date: 22/Nov/19 03:07
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349416916
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self):
 label='after reshuffle')
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_reshuffle_preserves_timestamps(self):
+pipeline = TestPipeline()
+
+# Create a PCollection and assign each element with a different timestamp.
+before_reshuffle = (pipeline
+| "Four elements" >> beam.Create([
+{'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+{'name': 'foo', 'timestamp': 0},
+{'name': 'bar', 'timestamp': 33},
+{'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+])
+| "With timestamp" >> beam.Map(
+lambda element: beam.window.TimestampedValue(
+element, element['timestamp'])))
+
+# For each element in a PCollection, gets the current timestamp of the
+# element and reassigns the timestamp to the element.
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+# Reshuffle the PCollection above and assign the timestamp of an element to
+# that element again.
+after_reshuffle = (before_reshuffle
+   | "Reshuffle" >> beam.Reshuffle()
+   | "With timestamps again" >> beam.ParDo(AddTimestamp()))
+
+# Given an element, emits a string which contains the timestamp and the 
name
+# field of the element.
+class FormatWithTimestamp(beam.DoFn):
 
 Review comment:
   Done. Thank you! It's good to know the code can be written in this way.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347894)
Time Spent: 18h 20m  (was: 18h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-21 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8651:
--
Description: 
Several Beam users reported an intermittent error which happens during 
unpickling in StockUnpickler.find_class. A similar error happens consistently 
when user's pipelines have instances of super() in their main module, and use 
--save_main_session, see: 
[BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945].
 

In this case the error happens only sometimes, and super() calls don't play a 
role.  

So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
pipelines only, which use portable SDK worker.

Typical stack trace:

{noformat}
File "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
line 1148, in _create_pardo_operation
    dofn_data = pickler.loads(serialized_fn)
   
  File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, in 
loads
    return dill.loads(s)
   
  File "python3.5/site-packages/dill/_dill.py", line 317, in loads  
   
    return load(file, ignore)   
   
  File "python3.5/site-packages/dill/_dill.py", line 305, in load   
   
    obj = pik.load()
   
  File "python3.5/site-packages/dill/_dill.py", line 474, in find_class 
   
    return StockUnpickler.find_class(self, module, name)
   
AttributeError: Can't get attribute 'ClassName' on 
{noformat}

According to Guenther from [1]:
{quote}
This looks exactly like a race condition that we've encountered on Python
3.7.1: There's a bug in some older 3.7.x releases that breaks the
thread-safety of the unpickler, as concurrent unpickle threads can access a
module before it has been fully imported. See
https://bugs.python.org/issue34572 for more information.

The traceback shows a Python 3.6 venv so this could be a different issue
(the unpickle bug was introduced in version 3.7). If it's the same bug then
upgrading to Python 3.7.3 or higher should fix that issue. One potential
workaround is to ensure that all of the modules get imported during the
initialization of the sdk_worker, as this bug only affects imports done by
the unpickler.
{quote}

Opening this for visibility. Current open questions are:

1. Find a minimal example to reproduce this issue.
2. Figure out whether users are still affected by this issue on Python 3.7.3.
3. Communicate a workarounds for 3.5, 3.6 users affected by this.

[1] 
https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



  was:
Several Beam users [1,2] reported an error which happens on Python 3 in 
StockUnpickler.find_class.

So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
pipelines only, which use portable SDK worker.

Typical stack trace:

{noformat}
File "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
line 1148, in _create_pardo_operation
    dofn_data = pickler.loads(serialized_fn)
   
  File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, in 
loads
    return dill.loads(s)
   
  File "python3.5/site-packages/dill/_dill.py", line 317, in loads  
   
    return load(file, ignore)   
   
  File "python3.5/site-packages/dill/_dill.py", line 305, in load   
   
    obj = pik.load()
   
  File "python3.5/site-packages/dill/_dill.py", line 474, in find_class 
   
    return StockUnpickler.find_class(self, module, name)
   
AttributeError: Can't get attribute 'ClassName' on 
{noformat}

According to Guenther from [1]:
{quote}
This looks exactly like a race condition that we've encountered on Python
3.7.1: There's a bug in some older 3.7.x releases that breaks the
thread-safety of the unpickler, as concurrent unpickle threads can access a
module before it has been fully imported. See
https://bugs.python.org/issue34572 for more information.

The traceback shows a Python 3.6 venv so this could be a different issue
(the unpickle bug was introduced in version 3.7). If it's the same bug then
upgrading to 

[jira] [Commented] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always

2019-11-21 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979799#comment-16979799
 ] 

Pablo Estrada commented on BEAM-8803:
-

Hm This is very troublesome. I'm writing a fix to make RETRY_ALWAYS the default 
behavior, and throw an error when there are errors inserting rows - and produce 
a deadletter pcollection for other scenarios.

> Default behaviour for Python BQ Streaming inserts sink should be to retry 
> always
> 
>
> Key: BEAM-8803
> URL: https://issues.apache.org/jira/browse/BEAM-8803
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=347893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347893
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Nov/19 03:01
Start Date: 22/Nov/19 03:01
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-557366844
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347893)
Time Spent: 50m  (was: 40m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2019-11-21 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7198:
--
Labels: easy-fix starter  (was: new)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>  Labels: easy-fix, starter
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 2:56 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}

Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails.
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]

apt-get install libbz2-dev



was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}

Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails.
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053
 

> Python setup issues
> ---
>
> Key: BEAM-8787
> URL: https://issues.apache.org/jira/browse/BEAM-8787
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: debian x86 (gLinux)
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> This could be an issue with incomplete or inaccurate contributing docs. tldr; 
> `./gradlew check` fails on Debian after initial checkout.
> The docs say that one should first run:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> but even after running this pieces are missing. I'm still debugging exactly 
> what's missing but the symptoms look like this:
> > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED
> The path python3.5 (from --python=python3.5) does not exist
> > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED
> [ant:fmpp] Traceback (most recent call last):
> [ant:fmpp]   File "/usr/lib/python3/dist-packages/virtualenv.py", line

[jira] [Updated] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always

2019-11-21 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-8803:

Status: Open  (was: Triage Needed)

> Default behaviour for Python BQ Streaming inserts sink should be to retry 
> always
> 
>
> Key: BEAM-8803
> URL: https://issues.apache.org/jira/browse/BEAM-8803
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8803?focusedWorklogId=347891&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347891
 ]

ASF GitHub Bot logged work on BEAM-8803:


Author: ASF GitHub Bot
Created on: 22/Nov/19 02:55
Start Date: 22/Nov/19 02:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10195: [BEAM-8803] 
BigQuery Streaming Inserts are always retried by default.
URL: https://github.com/apache/beam/pull/10195
 
 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badg

[jira] [Commented] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2019-11-21 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979792#comment-16979792
 ] 

Valentyn Tymofieiev commented on BEAM-7198:
---

I'll unassign this issue for now, but feel free to pick it up later if you have 
time.

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Francesco Perera
>Priority: Minor
>  Labels: new
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2019-11-21 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-7198:
-

Assignee: (was: Francesco Perera)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>  Labels: new
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2019-11-21 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7198:
--
Labels: new  (was: )

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Francesco Perera
>Priority: Minor
>  Labels: new
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347888
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 22/Nov/19 02:49
Start Date: 22/Nov/19 02:49
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9949: [BEAM-7948] 
Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#issuecomment-557364094
 
 
   Thanks for the review @lukecwik 👍 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347888)
Time Spent: 3.5h  (was: 3h 20m)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 2:38 AM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}

Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails.
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053
 


was (Author: suztomo):
h1. Problem

The problem for my environment was Python3.6 was missing required module and 
the latest python3-disutils does not support Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}

Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails.
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053
 

> Python setup issues
> ---
>
> Key: BEAM-8787
> URL: https://issues.apache.org/jira/browse/BEAM-8787
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: debian x86 (gLinux)
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> This could be an issue with incomplete or inaccurate contributing docs. tldr; 
> `./gradlew check` fails on Debian after initial checkout.
> The docs say that one should first run:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> but even after running this pieces are missing. I'm still debugging exactly 
> what's missing but the symptoms look like this:
> > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED
> The path python3.5 (from --python=python3.5) does not exist
> > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED
> [ant:fmpp] Traceback (most recent call last):
> [ant:fmpp]   File "/usr/lib/python3/dist-packages/virtualenv.py", line 25, in 
> 
> [ant:fmpp] import distutils.sysconfig
> [ant:fmpp] ModuleNotFoundError: No module named 'distutils.sysconfig'
> ...
> FAILURE: Build completed with 2 failures.
> 1: Task failed with an exception.
> ---
> * What went wrong:
> Execution fail

[jira] [Commented] (BEAM-8787) Python setup issues

2019-11-21 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki commented on BEAM-8787:
---

h1. Problem

The problem for my environment was Python3.6 was missing required module and 
the latest python3-disutils does not support Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}

Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails.
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053
 

> Python setup issues
> ---
>
> Key: BEAM-8787
> URL: https://issues.apache.org/jira/browse/BEAM-8787
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: debian x86 (gLinux)
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> This could be an issue with incomplete or inaccurate contributing docs. tldr; 
> `./gradlew check` fails on Debian after initial checkout.
> The docs say that one should first run:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> but even after running this pieces are missing. I'm still debugging exactly 
> what's missing but the symptoms look like this:
> > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED
> The path python3.5 (from --python=python3.5) does not exist
> > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED
> [ant:fmpp] Traceback (most recent call last):
> [ant:fmpp]   File "/usr/lib/python3/dist-packages/virtualenv.py", line 25, in 
> 
> [ant:fmpp] import distutils.sysconfig
> [ant:fmpp] ModuleNotFoundError: No module named 'distutils.sysconfig'
> ...
> FAILURE: Build completed with 2 failures.
> 1: Task failed with an exception.
> ---
> * What went wrong:
> Execution failed for task ':sdks:python:test-suites:tox:py35:setupVirtualenv'.
> > Process 'command 'virtualenv'' finished with non-zero exit value 3
> Indeed there is no Python 3.5 on this system:
> gnome-user-share  python2.6
> gnome-vfs-2.0 python2.7
> gnupg python3
> gnupg2python3.6
> gold-ld   python3.7
> goobuntu-config-tools python3.8
> But nowhere in the setup docs do we say that Python 3.5 is required to build 
> this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-1251) Python 3 Support

2019-11-21 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974802#comment-16974802
 ] 

Valentyn Tymofieiev edited comment on BEAM-1251 at 11/22/19 2:11 AM:
-

We are investigating an issue BEAM-8651 where Python 3 pipelines _sometimes_ 
fail with pickling errors in StockUnpickler.find_class(). Positing it here for 
visibility since it seems to be common in certain execution scenarios.

Note that this is different from  
[BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945],
 which causes _consistent_ failures  in StockUnpickler.find_class, when 
--save_main_session is used and main module has super() calls.


was (Author: tvalentyn):
We are investigating an issue BEAM-8651 where Python 3 pipelines sometimes fail 
with pickling errors in StockUnpickler.find_class(). Positing it here for 
visibility since it seems to be common in certain execution scenarios.

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 30h 10m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-11-21 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-5878.
---
Resolution: Fixed

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
> Fix For: 2.18.0
>
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-11-21 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-5878:
--
Fix Version/s: 2.18.0

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
> Fix For: 2.18.0
>
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .

2019-11-21 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945
 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 2:01 AM:
-

The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the 
pipeline code does not depend on the entities defined in the main module, and 
don't pass --save_main_session.
 - don't use super() in the main module.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is NOT an equivalent replacement, but 
may work in simple class hierarchies. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].


was (Author: tvalentyn):
The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - don't use super() in the main module.
 - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the 
pipeline code does not depend on the entities defined in the main module, and 
don't pass --save_main_session.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is NOT an equivalent replacement, but 
may work in simple class hierarchies. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].

> Using --save_main_session fails on Python 3 when main module has invocations 
> of superclass method using 'super' .
> -
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/sit

[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .

2019-11-21 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945
 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 1:50 AM:
-

The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - don't use super() in the main module.
 - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the 
pipeline code does not depend on the entities defined in the main module, and 
don't pass --save_main_session.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is NOT an equivalent replacement, but 
may work in simple class hierarchies. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].


was (Author: tvalentyn):
The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - don't use super() in the main module.
 - restructure the pipeline so that the pipeline code does not depend on the 
entities defined in the main module, and don't pass --save_main_session.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is NOT an equivalent replacement, but 
may work in simple class hierarchies. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].

> Using --save_main_session fails on Python 3 when main module has invocations 
> of superclass method using 'super' .
> -
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.p

[jira] [Assigned] (BEAM-7952) Make the input queue of the input buffer in Python SDK Harness size limited.

2019-11-21 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned BEAM-7952:
-

Assignee: sunjincheng

> Make the input queue of the input buffer in Python SDK Harness size limited.
> 
>
> Key: BEAM-7952
> URL: https://issues.apache.org/jira/browse/BEAM-7952
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>
> At Python SDK harness, the input queue size of the input buffer in Python SDK 
> Harness is not size limited and also not configurable. This may become a 
> problem if the data production rate is more than the data consumption rate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-21 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979765#comment-16979765
 ] 

sunjincheng commented on BEAM-8733:
---

I want to confirm a few things with you before making changes as I'm still not 
quite familiar with the Beam. Per my understanding, the registration in the 
Java SDK harness is also 
asynchronous(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/BeamFnControlClient.java#L138).
 Have I missed something? (I am not arguing, just want to have the correct 
understanding) :)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-21 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng reassigned BEAM-8733:
-

Assignee: sunjincheng

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .

2019-11-21 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945
 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 1:40 AM:
-

The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after 
[https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - don't use super() in the main module.
 - restructure the pipeline so that the pipeline code does not depend on the 
entities defined in the main module, and don't pass --save_main_session.
 - refer to superclass methods in the main module via 
SuperClassName.method(self, ...). This is NOT an equivalent replacement, but 
may work in simple class hierarchies. 
[Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].


was (Author: tvalentyn):
The error is happens when main pipeline module has class methods that refer to 
superclass methods using super(). A reference to super in the method code 
creates a cyclical reference inside the object, which dill  currently handles 
via pickling objects by reference. Such approach does not work for restoring a 
pickled  a main session, since object classes need to be defined at the moment 
of unpickling . This issue will be addressed after  
https://github.com/uqfoundation/dill/issues/300. is fixed or we start using 
CloudPickle as a pickler, which is investigated in BEAM-8123. 

In the meantime following workarounds are available:
- don't use super() in the main module.
- refer to superclass methods via SuperClassName.method(self, ...). This is NOT 
an equivalent replacement, but may work in simple class hierarchies. 

> Using --save_main_session fails on Python 3 when main module has invocations 
> of superclass method using 'super' .
> -
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
>
> pickler.load_session(session_file)
> 

[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347872
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 01:35
Start Date: 22/Nov/19 01:35
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10194: [BEAM-7594] Fix flaky 
filename generation
URL: https://github.com/apache/beam/pull/10194#issuecomment-557348453
 
 
   CC: @tvalentyn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347872)
Time Spent: 0.5h  (was: 20m)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1471, in process_bundle
> 22:05:08 result_future = 
> self._controller.control_handler.push(process_bundle_req)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 990, in pu

[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347871
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 01:34
Start Date: 22/Nov/19 01:34
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10194: [BEAM-7594] Fix flaky 
filename generation
URL: https://github.com/apache/beam/pull/10194#issuecomment-557348291
 
 
   R: @pabloem  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347871)
Time Spent: 20m  (was: 10m)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1471, in process_bundle
> 22:05:08 result_future = 
> self._controller.control_handler.push(process_bundle_req)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 990, in push
>

[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347869
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 22/Nov/19 01:31
Start Date: 22/Nov/19 01:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10185: [BEAM-8651] 
Cherrypick PR #10167 to the release branch. 
URL: https://github.com/apache/beam/pull/10185#issuecomment-557347567
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347869)
Time Spent: 2h 50m  (was: 2h 40m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347870
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 01:31
Start Date: 22/Nov/19 01:31
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10194: [BEAM-7594] Fix 
flaky filename generation
URL: https://github.com/apache/beam/pull/10194
 
 
   See [this 
comment](https://issues.apache.org/jira/browse/BEAM-7594?focusedCommentId=16979737&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16979737)
 for description of flake.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge

[jira] [Assigned] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-7594:
---

Assignee: Udi Meiri

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1471, in process_bundle
> 22:05:08 result_future = 
> self._controller.control_handler.push(process_bundle_req)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 990, in push
> 22:05:08 response = self.worker.do_instruction(request)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> 22:05:08 request.instruction_id)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> 22:05:08 bundle_processor.process_bundle(instruction_id))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle
> 22:05:08 dat

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347844
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:57
Start Date: 22/Nov/19 00:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-557339905
 
 
   Run Python Precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347844)
Time Spent: 4.5h  (was: 4h 20m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979737#comment-16979737
 ] 

Udi Meiri edited comment on BEAM-7594 at 11/22/19 12:52 AM:


Another failure. This time I noticed that there 2 failures in 2 different tox 
environments (py35 and py36):
{code}
10:52:42   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/util.py",
 line 144, in _equal
10:52:42 'Failed assert: %r == %r' % (expected, actual))
10:52:42 apache_beam.testing.util.BeamAssertException: Failed assert: 
[('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 
'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), 
('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 
'line4'), ('/tmp/20191121183751l7ue7cnm', 'line0'), 
('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 
'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), 
('/tmp/20191121183751l7ue7cnm', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 
'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), 
('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 
'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), 
('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 
'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), 
('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 
'line4'), ('/tmp/20191121183751ahzccono', 'line0'), 
('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 
'line2'), ('/tmp/20191121183751ahzccono', 'line3'), 
('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 
'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), 
('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 
'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 
'assert_that/Match']
{code}
{code}
10:52:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/testing/util.py",
 line 144, in _equal
10:52:18 'Failed assert: %r == %r' % (expected, actual))
10:52:18 apache_beam.testing.util.BeamAssertException: Failed assert: 
[('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 
'line1'), ('/tmp/20191121183751ahzccono', 'line2'), 
('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 
'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), 
('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 
'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), 
('/tmp/201911211837512aj0mm6o', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 
'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), 
('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 
'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), 
('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 
'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), 
('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 
'line4'), ('/tmp/20191121183751ahzccono', 'line0'), 
('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 
'line2'), ('/tmp/20191121183751ahzccono', 'line3'), 
('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 
'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), 
('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 
'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 
'assert_that/Match']
{code}

All the filenames above share the prefix: '/tmp/20191121183751'. The code for 
this prefix:
{code}
prefix = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
{code}


was (Author: udim):
Another failure. This time I noticed that there 2 failures in 2 different tox 
environments (py35 and py36):
{code}
10:52:42   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/util.py",
 line 144, in _equal
10:52:42 'Failed assert: %r == %r' % (expected, actual))
10:52:42 apache_beam.testing.util.BeamAssertException: Failed assert: 
[('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 
'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), 
('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 
'line4'), ('/tmp/20191121183751l7ue7cnm', 'line0'), 
('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 
'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), 
('/tmp/20191121183751l7ue7cnm', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 
'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), 
('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 
'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4')

[jira] [Commented] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979737#comment-16979737
 ] 

Udi Meiri commented on BEAM-7594:
-

Another failure. This time I noticed that there 2 failures in 2 different tox 
environments (py35 and py36):
{code}
10:52:42   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/util.py",
 line 144, in _equal
10:52:42 'Failed assert: %r == %r' % (expected, actual))
10:52:42 apache_beam.testing.util.BeamAssertException: Failed assert: 
[('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 
'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), 
('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 
'line4'), ('/tmp/20191121183751l7ue7cnm', 'line0'), 
('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 
'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), 
('/tmp/20191121183751l7ue7cnm', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 
'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), 
('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 
'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), 
('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 
'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), 
('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 
'line4'), ('/tmp/20191121183751ahzccono', 'line0'), 
('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 
'line2'), ('/tmp/20191121183751ahzccono', 'line3'), 
('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 
'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), 
('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 
'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 
'assert_that/Match']
{code}
{code}
10:52:18   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/testing/util.py",
 line 144, in _equal
10:52:18 'Failed assert: %r == %r' % (expected, actual))
10:52:18 apache_beam.testing.util.BeamAssertException: Failed assert: 
[('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 
'line1'), ('/tmp/20191121183751ahzccono', 'line2'), 
('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 
'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), 
('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 
'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), 
('/tmp/201911211837512aj0mm6o', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 
'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), 
('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 
'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), 
('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 
'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), 
('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 
'line4'), ('/tmp/20191121183751ahzccono', 'line0'), 
('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 
'line2'), ('/tmp/20191121183751ahzccono', 'line3'), 
('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 
'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), 
('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 
'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 
'assert_that/Match']
{code}

Notice that all the filenames share the prefix '/tmp/20191121183751'. The code 
for this prefix:
{code}
prefix = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
{code}

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08  

[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347837
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:40
Start Date: 22/Nov/19 00:40
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347837)
Time Spent: 3h 10m  (was: 3h)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347835
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:37
Start Date: 22/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349383237
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java
 ##
 @@ -363,16 +392,43 @@ private BundleProcessor 
createBundleProcessor(BeamFnApi.InstructionRequest reque
   tearDownFunctions::add,
   splitListener);
 }
-return BundleProcessor.create(
-startFunctionRegistry,
-finishFunctionRegistry,
-tearDownFunctions,
-allResiduals,
-pCollectionConsumerRegistry,
-metricsContainerRegistry,
-stateTracker,
-beamFnStateClient,
-queueingClient);
+return bundleProcessor;
+  }
+
+  /** A cache for {@link BundleProcessor}s. */
+  private static class BundleProcessorCache {
+
+private final Map> 
cachedBundleProcessors;
+
+BundleProcessorCache() {
+  this.cachedBundleProcessors = Maps.newConcurrentMap();
+}
+
+/**
+ * Get a {@link BundleProcessor} from the cache if it's available. 
Otherwise, create one using
+ * the specified bundleProcessorSupplier.
+ */
+BundleProcessor get(
+String bundleDescriptorId, Supplier 
bundleProcessorSupplier) {
+  ConcurrentLinkedQueue bundleProcessors =
+  cachedBundleProcessors.computeIfAbsent(
+  bundleDescriptorId, descriptorId -> new 
ConcurrentLinkedQueue<>());
+  BundleProcessor bundleProcessor = bundleProcessors.poll();
+  if (bundleProcessor != null) {
+return bundleProcessor;
+  }
+
+  return bundleProcessorSupplier.get();
+}
+
+/**
+ * Add a {@link BundleProcessor} to cache. The {@link BundleProcessor} 
will be reset before
+ * added to the cache.
 
 Review comment:
   `added to the cache.` -> `being added to the cache.`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347835)
Time Spent: 1.5h  (was: 1h 20m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347833
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:37
Start Date: 22/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349380364
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/CounterCell.java
 ##
 @@ -51,6 +51,12 @@ public CounterCell(MetricName name) {
 this.name = name;
   }
 
+  @Override
+  public void reset() {
+dirty.afterModification();
 
 Review comment:
   I don't believe reset() should make this dirty since reset() should set this 
to the state it is at as if it was newly constructed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347833)
Time Spent: 1h 20m  (was: 1h 10m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347836
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:37
Start Date: 22/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349385020
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java
 ##
 @@ -429,6 +434,22 @@ void release(String bundleDescriptorId, BundleProcessor 
bundleProcessor) {
   bundleProcessor.reset();
   cachedBundleProcessors.get(bundleDescriptorId).add(bundleProcessor);
 }
+
+/** Shutdown all the cached {@link BundleProcessor}s, running the 
tearDown() functions. */
+void shutdown() throws Exception {
+  for (ConcurrentLinkedQueue bundleProcessors :
+  cachedBundleProcessors.values()) {
+for (BundleProcessor bundleProcessor : bundleProcessors) {
+  // Need to reverse this since we want to call teardown in 
topological order.
 
 Review comment:
   It is not necessary to reverse the list since teardown can't produce output 
and should not meaningfully impact other instances. Its the same with 
startInstance where the order doesn't matter.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347836)
Time Spent: 1.5h  (was: 1h 20m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347834&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347834
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:37
Start Date: 22/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349381333
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GaugeCell.java
 ##
 @@ -50,6 +50,12 @@ public GaugeCell(MetricName name) {
 this.name = name;
   }
 
+  @Override
+  public void reset() {
+dirty.afterModification();
 
 Review comment:
   I don't believe reset() should make this dirty since reset() should set this 
to the state it is at as if it was newly constructed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347834)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347832
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:37
Start Date: 22/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349380027
 
 

 ##
 File path: 
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
 ##
 @@ -226,6 +226,7 @@ public void testUsingUserState() throws Exception {
 consumers,
 startFunctionRegistry,
 finishFunctionRegistry,
+new ArrayList<>()::add,
 
 Review comment:
   after
   ```
   Iterables.getOnlyElement(finishFunctionRegistry.getFunctions()).run();
   assertThat(mainOutputValues, empty());
   ```
   add something like:
   ```
   Iterables.getOnlyElement(tearDownFunctionRegistry.getFunctions()).run();
   assertThat(mainOutputValues, empty());
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347832)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347831
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:37
Start Date: 22/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349380409
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DistributionCell.java
 ##
 @@ -52,6 +52,12 @@ public DistributionCell(MetricName name) {
 this.name = name;
   }
 
+  @Override
+  public void reset() {
+dirty.afterModification();
 
 Review comment:
   I don't believe reset() should make this dirty since reset() should set this 
to the state it is at as if it was newly constructed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347831)
Time Spent: 1h 10m  (was: 1h)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=347824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347824
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:26
Start Date: 22/Nov/19 00:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9915: [BEAM-7746] Add 
python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#issuecomment-557332997
 
 
   We're so close.  Just a few more.   Thanks for your hard work getting 
through all of this!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347824)
Time Spent: 27h 40m  (was: 27.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347823&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347823
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:19
Start Date: 22/Nov/19 00:19
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#issuecomment-557331371
 
 
   In the Java parity file, quite a few tests are very similar. Some of them 
are called "SimpleCombine", and some of them are called "BasicCombine", and 
some are called "AccumulatingCombine". Their only difference is the CombineFn 
they use.
   
   The Java parity:
   
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1448
   
   So here in Python I renamed the tests according to the CombineFn they use:
   test_MeanCombineFn_combine
   test_MeanCombineFn_combine_empty
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347823)
Time Spent: 18h 10m  (was: 18h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347818
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:12
Start Date: 22/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r349379106
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -965,18 +1000,21 @@ class TriggerDriver(with_metaclass(ABCMeta, object)):
   """Breaks a series of bundle and timer firings into window (pane)s."""
 
   @abstractmethod
-  def process_elements(self, state, windowed_values, output_watermark):
+  def process_elements(self, state, windowed_values, output_watermark,
+   input_watermark=MIN_TIMESTAMP):
 pass
 
   @abstractmethod
-  def process_timer(self, window_id, name, time_domain, timestamp, state):
+  def process_timer(self, window_id, name, time_domain, timestamp, state,
+input_watermark=None):
 pass
 
   def process_entire_key(
-  self, key, windowed_values, output_watermark=MIN_TIMESTAMP):
+  self, key, windowed_values, input_watermark=MIN_TIMESTAMP,
 
 Review comment:
   Have we verified we found all callers of this code? Inserting a (default) 
argument before existing arguments could result in an off-by-one error for 
unmodified callers. 
   
   Also, the ordering should be consistent with `process_elements()`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347818)
Time Spent: 4h 20m  (was: 4h 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347817&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347817
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:12
Start Date: 22/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r349380006
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1036,14 +1074,17 @@ class BatchGlobalTriggerDriver(TriggerDriver):
   index=0,
   nonspeculative_index=0)
 
-  def process_elements(self, state, windowed_values, unused_output_watermark):
+  def process_elements(self, state, windowed_values,
+   unused_output_watermark=MIN_TIMESTAMP,
 
 Review comment:
   Let's not provide default values here, as it's unclear what they should be. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347817)
Time Spent: 4h 20m  (was: 4h 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347816
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:12
Start Date: 22/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10035: [BEAM-8581] 
and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r349379753
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -965,18 +1000,21 @@ class TriggerDriver(with_metaclass(ABCMeta, object)):
   """Breaks a series of bundle and timer firings into window (pane)s."""
 
   @abstractmethod
-  def process_elements(self, state, windowed_values, output_watermark):
+  def process_elements(self, state, windowed_values, output_watermark,
+   input_watermark=MIN_TIMESTAMP):
 pass
 
   @abstractmethod
-  def process_timer(self, window_id, name, time_domain, timestamp, state):
+  def process_timer(self, window_id, name, time_domain, timestamp, state,
+input_watermark=None):
 pass
 
   def process_entire_key(
-  self, key, windowed_values, output_watermark=MIN_TIMESTAMP):
+  self, key, windowed_values, input_watermark=MIN_TIMESTAMP,
 
 Review comment:
   Is `MIN_TIMESTAMP` the correct default?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347816)
Time Spent: 4h 20m  (was: 4h 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8804) PCollectionList support in cross-language transforms

2019-11-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8804:
--
Description: Currently, Beam model doesn't have any information on the 
order of input/output PCollections from PTransforms. Therefore, PCollectionList 
needs to be converted to PCollectionTuple when it goes across the 
cross-language boundaries (or even in the same language, whenever it is 
converted between in-memory object and proto) and it's impossible to recreate 
PCollectionList from proto with the original order. The possible workaround is 
just to use PCollectionTuple with integer id (starting from 0 like indexes) 
instead of PCollectionList. In that case, we should first well-define how we 
generate proto from PCollectionList since each SDK uses a different convention. 
 (was: Currently, Beam model doesn't have any information on the order of 
output PCollections from PTransforms. So, PCollectionList needs to be converted 
to PCollectionTuple when it goes across the cross-language boundary (or even in 
the same language, when it is converted between in-memory object and proto).)

> PCollectionList support in cross-language transforms
> 
>
> Key: BEAM-8804
> URL: https://issues.apache.org/jira/browse/BEAM-8804
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Currently, Beam model doesn't have any information on the order of 
> input/output PCollections from PTransforms. Therefore, PCollectionList needs 
> to be converted to PCollectionTuple when it goes across the cross-language 
> boundaries (or even in the same language, whenever it is converted between 
> in-memory object and proto) and it's impossible to recreate PCollectionList 
> from proto with the original order. The possible workaround is just to use 
> PCollectionTuple with integer id (starting from 0 like indexes) instead of 
> PCollectionList. In that case, we should first well-define how we generate 
> proto from PCollectionList since each SDK uses a different convention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347814
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:07
Start Date: 22/Nov/19 00:07
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10173: 
[BEAM-8575] Added two unit tests in CombineTest class to test 
AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#discussion_r349379825
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +395,54 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_accumulating_combine(self):
 
 Review comment:
   Removed @attr('ValidatesRunner').
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347814)
Time Spent: 18h  (was: 17h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347813
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:06
Start Date: 22/Nov/19 00:06
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10173: 
[BEAM-8575] Added two unit tests in CombineTest class to test 
AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#discussion_r349379585
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +395,54 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_accumulating_combine(self):
+with TestPipeline() as p:
+  input = (p
+   | beam.Create([('a', 1),
+  ('a', 1),
+  ('a', 4),
+  ('b', 1),
+  ('b', 13)]))
+  # The mean of all values regardless of key.
+  global_mean = (input
+ | beam.Values()
+ | beam.CombineGlobally(combine.MeanCombineFn()))
+
+  # The (key, mean) pairs for all keys.
+  mean_per_key = (input | beam.CombinePerKey(combine.MeanCombineFn()))
+
+  expected_mean_per_key = [('a', 2), ('b', 7)]
+  assert_that(global_mean, equal_to([4]), label='global mean')
+  assert_that(mean_per_key, equal_to(expected_mean_per_key),
+  label='mean per key')
+
+  @attr('ValidatesRunner')
+  def test_accumulating_combine_empty(self):
+# For each element in a PCollection, if it is float('NaN'), then emits
+# a string 'NaN', otherwise emits str(element).
+class FormatNaNDoFn(beam.DoFn):
+  def process(self, element):
+return ([str(element)], ['NaN'])[math.isnan(element)]
+
+with TestPipeline() as p:
+  input = (p | beam.Create([]))
+
+  # Compute the mean of all values in the PCollection,
+  # then format the mean. Since the Pcollection is empty,
+  # the mean is float('NaN'), and is formatted to be a string 'NaN'.
+  global_mean = (input
+ | beam.Values()
+ | beam.CombineGlobally(combine.MeanCombineFn())
+ | beam.ParDo(FormatNaNDoFn()))
 
 Review comment:
   Good idea. Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347813)
Time Spent: 17h 50m  (was: 17h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8805) Remove obsolete worker_threads experiment in tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8805?focusedWorklogId=347812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347812
 ]

ASF GitHub Bot logged work on BEAM-8805:


Author: ASF GitHub Bot
Created on: 22/Nov/19 00:03
Start Date: 22/Nov/19 00:03
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10193: [BEAM-8805] 
Remove obsolete worker_threads experiment in tests
URL: https://github.com/apache/beam/pull/10193
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/j

[jira] [Created] (BEAM-8805) Remove obsolete worker_threads experiment in tests

2019-11-21 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8805:
-

 Summary: Remove obsolete worker_threads experiment in tests
 Key: BEAM-8805
 URL: https://issues.apache.org/jira/browse/BEAM-8805
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Kyle Weaver
Assignee: Kyle Weaver


As of https://github.com/apache/beam/pull/10123 the worker_threads experiment 
is obsolete and should be removed from our test scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8805) Remove obsolete worker_threads experiment in tests

2019-11-21 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8805:
--
Status: Open  (was: Triage Needed)

> Remove obsolete worker_threads experiment in tests
> --
>
> Key: BEAM-8805
> URL: https://issues.apache.org/jira/browse/BEAM-8805
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>
> As of https://github.com/apache/beam/pull/10123 the worker_threads experiment 
> is obsolete and should be removed from our test scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-21 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-7594:
---

Assignee: (was: Lorenzo Caggioni)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1471, in process_bundle
> 22:05:08 result_future = 
> self._controller.control_handler.push(process_bundle_req)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 990, in push
> 22:05:08 response = self.worker.do_instruction(request)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> 22:05:08 request.instruction_id)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> 22:05:08 bundle_processor.process_bundle(instruction_id))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle
> 22:05:08 data.ptransform_id

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347809&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347809
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:57
Start Date: 21/Nov/19 23:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349377287
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self):
 label='after reshuffle')
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_reshuffle_preserves_timestamps(self):
+pipeline = TestPipeline()
+
+# Create a PCollection and assign each element with a different timestamp.
+before_reshuffle = (pipeline
+| "Four elements" >> beam.Create([
+{'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+{'name': 'foo', 'timestamp': 0},
+{'name': 'bar', 'timestamp': 33},
+{'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+])
+| "With timestamp" >> beam.Map(
+lambda element: beam.window.TimestampedValue(
+element, element['timestamp'])))
+
+# For each element in a PCollection, gets the current timestamp of the
+# element and reassigns the timestamp to the element.
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+# Reshuffle the PCollection above and assign the timestamp of an element to
+# that element again.
+after_reshuffle = (before_reshuffle
+   | "Reshuffle" >> beam.Reshuffle()
+   | "With timestamps again" >> beam.ParDo(AddTimestamp()))
+
+# Given an element, emits a string which contains the timestamp and the 
name
+# field of the element.
+class FormatWithTimestamp(beam.DoFn):
 
 Review comment:
   You can have a method
   
   ```
   def format_with_timestamp(element, timestamp=beam.DoFn.TimestampParam):
   ...
   ```
   
   rather than a full DoFn.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347809)
Time Spent: 17h 40m  (was: 17.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347807
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:55
Start Date: 21/Nov/19 23:55
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-557325606
 
 
   There hasn't been any review in the last 9 days, so I'm asking @pabloem to 
merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347807)
Time Spent: 4h 10m  (was: 4h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=347805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347805
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:54
Start Date: 21/Nov/19 23:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r349376331
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
 
 Review comment:
   So what you'd want to do is create some class with custom pickling 
(implement `__reduce__`). This picking would be artificially large, e.g. 
include a dummy `'a' * 1000` value, as many runners trigger on serialized size 
not element count.
   
   In the constructor, you would increment a class-level variable to indicate 
how many are alive. In the destructor (`__del__`), you would decrement it. An 
error would be thrown if there are too many. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347805)
Time Spent: 8h  (was: 7h 50m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347806
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:54
Start Date: 21/Nov/19 23:54
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10070: [BEAM-8575] 
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#issuecomment-557325303
 
 
   Modified code according to the reviewer's second round of feedback, except 
that I still need DoFn to get timestamp.  Waiting for reviewer to resolve 
conversations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347806)
Time Spent: 17.5h  (was: 17h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347804
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:54
Start Date: 21/Nov/19 23:54
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10183: [BEAM-7850] 
Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#discussion_r349376284
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -698,10 +700,10 @@ message StandardCoders {
 // TODO: consider inlining field on PCollection
 message WindowingStrategy {
 
-  // (Required) The SdkFunctionSpec of the UDF that assigns windows,
+  // (Required) The FunctionSpec of the UDF that assigns windows,
   // merges windows, and shifts timestamps before they are
   // combined according to the OutputTime.
-  SdkFunctionSpec window_fn = 1;
+  FunctionSpec window_fn = 1;
 
 Review comment:
   After hearing about some of the difficulties that Cham is running into. I 
would go either way on whether we add an environment id here or remove it 
completely.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347804)
Time Spent: 1.5h  (was: 1h 20m)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8804) PCollectionList support in cross-language transforms

2019-11-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee reassigned BEAM-8804:
-

Assignee: Heejong Lee

> PCollectionList support in cross-language transforms
> 
>
> Key: BEAM-8804
> URL: https://issues.apache.org/jira/browse/BEAM-8804
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Currently, Beam model doesn't have any information on the order of output 
> PCollections from PTransforms. So, PCollectionList needs to be converted to 
> PCollectionTuple when it goes across the cross-language boundary (or even in 
> the same language, when it is converted between in-memory object and proto).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8804) PCollectionList support in cross-language transforms

2019-11-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8804:
--
Status: Open  (was: Triage Needed)

> PCollectionList support in cross-language transforms
> 
>
> Key: BEAM-8804
> URL: https://issues.apache.org/jira/browse/BEAM-8804
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Currently, Beam model doesn't have any information on the order of output 
> PCollections from PTransforms. So, PCollectionList needs to be converted to 
> PCollectionTuple when it goes across the cross-language boundary (or even in 
> the same language, when it is converted between in-memory object and proto).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347800
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:50
Start Date: 21/Nov/19 23:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10159: [BEAM-8575] 
Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349375222
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
+  accumulation_mode=
+  AccumulationMode.ACCUMULATING)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2))
 
 Review comment:
   Why specify without_defaults()?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347800)
Time Spent: 17h 10m  (was: 17h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347801
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:50
Start Date: 21/Nov/19 23:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10159: [BEAM-8575] 
Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349375288
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
 
 Review comment:
   We would need to use a test stream to make this test non-trivial. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347801)
Time Spent: 17h 20m  (was: 17h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347799
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:46
Start Date: 21/Nov/19 23:46
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9949: [BEAM-7948] Add 
time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#issuecomment-557323260
 
 
   Just remove the synchronized from 
`BeamFnDataSizeBasedBufferingOutboundObserver.flush()` and add
   ```
   @Override
   public synchronized void flush() throws IOException {
 super.flush();
   }
   ```
   to the time based outbound observer and we can merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347799)
Time Spent: 3h 20m  (was: 3h 10m)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347798
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:45
Start Date: 21/Nov/19 23:45
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10183: [BEAM-7850] 
Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#discussion_r349374247
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -698,10 +700,10 @@ message StandardCoders {
 // TODO: consider inlining field on PCollection
 message WindowingStrategy {
 
-  // (Required) The SdkFunctionSpec of the UDF that assigns windows,
+  // (Required) The FunctionSpec of the UDF that assigns windows,
   // merges windows, and shifts timestamps before they are
   // combined according to the OutputTime.
-  SdkFunctionSpec window_fn = 1;
+  FunctionSpec window_fn = 1;
 
 Review comment:
   The windowing strategy cannot be coerced into an environment-agnostic one 
the same way a coder can be, as it involves actually executing user code as 
oppose to specifying constraints on its behavior. (In some sense, it is in a 
very real sense a cross-language UDF.) 
   
   Having to traverse the graph is ugly, but I suppose possible for now. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347798)
Time Spent: 1h 20m  (was: 1h 10m)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347797
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:45
Start Date: 21/Nov/19 23:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9949: [BEAM-7948] Add 
time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#issuecomment-557323260
 
 
   Just remove the synchronized from 
BeamFnDataSizeBasedBufferingOutboundObserver.java and add
   ```
   @Override
   public synchronized void flush() throws IOException {
 super.flush();
   }
   ```
   to the time based outbound observer and we can merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347797)
Time Spent: 3h 10m  (was: 3h)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347795&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347795
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:43
Start Date: 21/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9949: [BEAM-7948] 
Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#discussion_r349373751
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java
 ##
 @@ -48,25 +46,27 @@
   Coder coder,
   StreamObserver outboundObserver) {
 super(sizeLimit, outputLocation, coder, outboundObserver);
-this.lock = new Object();
+this.flushLock = new Object();
 this.flushFuture =
 Executors.newSingleThreadScheduledExecutor(
 new ThreadFactoryBuilder()
 .setDaemon(true)
 .setNameFormat("DataBufferOutboundFlusher-thread")
 .build())
 .scheduleAtFixedRate(this::periodicFlush, timeLimit, timeLimit, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   Your right, I missed the fact that there was no periodic schedule that took 
a callable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347795)
Time Spent: 3h  (was: 2h 50m)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347796
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:43
Start Date: 21/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9949: [BEAM-7948] 
Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#discussion_r349372594
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserver.java
 ##
 @@ -85,7 +85,7 @@ public void close() throws Exception {
   }
 
   @Override
-  public void flush() throws IOException {
+  public synchronized void flush() throws IOException {
 
 Review comment:
   We don't want to make this synchronized since this class is not thread safe 
and should not take this perf hit. Your previous usage of defining an override 
that is synchronized for the class that called super.flush() is all you need.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347796)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347788
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:16
Start Date: 21/Nov/19 23:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10175: [BEAM-7390] 
Add code snippet for Max
URL: https://github.com/apache/beam/pull/10175#discussion_r349364098
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py
 ##
 @@ -0,0 +1,60 @@
+# coding=utf-8
 
 Review comment:
   Does each of these need their own file? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347788)
Time Spent: 4h 10m  (was: 4h)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347786
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:12
Start Date: 21/Nov/19 23:12
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349364637
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self):
 label='after reshuffle')
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_reshuffle_preserves_timestamps(self):
+pipeline = TestPipeline()
+
+# Create a PCollection and assign each element with a different timestamp.
+before_reshuffle = (pipeline
+| "Four elements" >> beam.Create([
+{'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+{'name': 'foo', 'timestamp': 0},
+{'name': 'bar', 'timestamp': 33},
+{'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+])
+| "With timestamp" >> beam.Map(
+lambda element: beam.window.TimestampedValue(
+element, element['timestamp'])))
+
+# For each element in a PCollection, gets the current timestamp of the
+# element and reassigns the timestamp to the element.
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+# Reshuffle the PCollection above and assign the timestamp of an element to
+# that element again.
+after_reshuffle = (before_reshuffle
+   | "Reshuffle" >> beam.Reshuffle()
+   | "With timestamps again" >> beam.ParDo(AddTimestamp()))
+
+# Given an element, emits a string which contains the timestamp and the 
name
+# field of the element.
+class FormatWithTimestamp(beam.DoFn):
 
 Review comment:
   I need beam.DoFn.TimestampParam to get the timestamp of each element.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347786)
Time Spent: 17h  (was: 16h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347785
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 21/Nov/19 23:04
Start Date: 21/Nov/19 23:04
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184#issuecomment-557312272
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347785)
Time Spent: 3h  (was: 2h 50m)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347782
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:55
Start Date: 21/Nov/19 22:55
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349359544
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self):
 label='after reshuffle')
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_reshuffle_preserves_timestamps(self):
+pipeline = TestPipeline()
+
+# Create a PCollection and assign each element with a different timestamp.
+before_reshuffle = (pipeline
+| "Four elements" >> beam.Create([
+{'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+{'name': 'foo', 'timestamp': 0},
+{'name': 'bar', 'timestamp': 33},
+{'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+])
+| "With timestamp" >> beam.Map(
+lambda element: beam.window.TimestampedValue(
+element, element['timestamp'])))
+
+# For each element in a PCollection, gets the current timestamp of the
+# element and reassigns the timestamp to the element.
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+# Reshuffle the PCollection above and assign the timestamp of an element to
+# that element again.
+after_reshuffle = (before_reshuffle
+   | "Reshuffle" >> beam.Reshuffle()
+   | "With timestamps again" >> beam.ParDo(AddTimestamp()))
 
 Review comment:
   I agree. 
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347782)
Time Spent: 16h 50m  (was: 16h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347780
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:54
Start Date: 21/Nov/19 22:54
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349359198
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self):
 label='after reshuffle')
 pipeline.run()
 
+  @attr('ValidatesRunner')
+  def test_reshuffle_preserves_timestamps(self):
+pipeline = TestPipeline()
+
+# Create a PCollection and assign each element with a different timestamp.
+before_reshuffle = (pipeline
+| "Four elements" >> beam.Create([
+{'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+{'name': 'foo', 'timestamp': 0},
+{'name': 'bar', 'timestamp': 33},
+{'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+])
+| "With timestamp" >> beam.Map(
+lambda element: beam.window.TimestampedValue(
+element, element['timestamp'])))
+
+# For each element in a PCollection, gets the current timestamp of the
+# element and reassigns the timestamp to the element.
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
+
+# Reshuffle the PCollection above and assign the timestamp of an element to
+# that element again.
+after_reshuffle = (before_reshuffle
+   | "Reshuffle" >> beam.Reshuffle()
 
 Review comment:
   I agree.
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347780)
Time Spent: 16h 40m  (was: 16.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347776
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:52
Start Date: 21/Nov/19 22:52
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349358685
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
 
 Review comment:
   I agree. Thank you for pointing it out!
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347776)
Time Spent: 16.5h  (was: 16h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347775
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:52
Start Date: 21/Nov/19 22:52
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10070: 
[BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r349358685
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -276,6 +279,13 @@ def process(self, element):
 with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'):
   pipeline.run()
 
+class AddTimestamp(beam.DoFn):
+  def process(self, element, timestamp=beam.DoFn.TimestampParam):
+yield beam.window.TimestampedValue(element, timestamp)
 
 Review comment:
   I agree. Thank you for pointing it out!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347775)
Time Spent: 16h 20m  (was: 16h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=347773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347773
 ]

ASF GitHub Bot logged work on BEAM-3865:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:50
Start Date: 21/Nov/19 22:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10192: [BEAM-3865] 
Stronger trigger tests.
URL: https://github.com/apache/beam/pull/10192#issuecomment-557307971
 
 
   R: @HuangLED 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347773)
Time Spent: 2h 10m  (was: 2h)

> Incorrect timestamp on merging window outputs.
> --
>
> Key: BEAM-3865
> URL: https://issues.apache.org/jira/browse/BEAM-3865
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Looks like we're setting multiple watermark holds with one arbitrarily being 
> held. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347771
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:50
Start Date: 21/Nov/19 22:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10191: [BEAM-8802] Don't 
clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191#issuecomment-557307841
 
 
   Thanks. I'll wait for all tests to pass before merging. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347771)
Time Spent: 50m  (was: 40m)

> Timestamp combiner not respected across bundles in streaming mode.
> --
>
> Key: BEAM-8802
> URL: https://issues.apache.org/jira/browse/BEAM-8802
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=347770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347770
 ]

ASF GitHub Bot logged work on BEAM-3865:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:49
Start Date: 21/Nov/19 22:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10192: [BEAM-3865] 
Stronger trigger tests.
URL: https://github.com/apache/beam/pull/10192
 
 
   When possible, runs trigger fns over various permutations and bundling of 
the inputs (using different keys), ensuring the results are still correct. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/

[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347764
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:43
Start Date: 21/Nov/19 22:43
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10191: [BEAM-8802] 
Don't clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191#discussion_r349355630
 
 

 ##
 File path: sdks/python/apache_beam/transforms/timeutil.py
 ##
 @@ -64,19 +64,19 @@ class TimestampCombinerImpl(with_metaclass(ABCMeta, 
object)):
 
   @abstractmethod
   def assign_output_time(self, window, input_timestamp):
-pass
+raise NotImplementedError
 
   @abstractmethod
   def combine(self, output_timestamp, other_output_timestamp):
-pass
+raise NotImplementedError
 
   def combine_all(self, merging_timestamps):
 """Apply combine to list of timestamps."""
 combined_output_time = None
 for output_time in merging_timestamps:
   if combined_output_time is None:
 combined_output_time = output_time
-  else:
+  elif output_time is not None:
 combined_output_time = self.combine(
 combined_output_time, output_time)
 
 Review comment:
   what would be the "else" after this change, and what would it imply when 
this happens?   
   
   Shall we at least log something when fall into that situation? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347764)
Time Spent: 40m  (was: 0.5h)

> Timestamp combiner not respected across bundles in streaming mode.
> --
>
> Key: BEAM-8802
> URL: https://issues.apache.org/jira/browse/BEAM-8802
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347763
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:43
Start Date: 21/Nov/19 22:43
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10191: [BEAM-8802] 
Don't clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191#discussion_r349355630
 
 

 ##
 File path: sdks/python/apache_beam/transforms/timeutil.py
 ##
 @@ -64,19 +64,19 @@ class TimestampCombinerImpl(with_metaclass(ABCMeta, 
object)):
 
   @abstractmethod
   def assign_output_time(self, window, input_timestamp):
-pass
+raise NotImplementedError
 
   @abstractmethod
   def combine(self, output_timestamp, other_output_timestamp):
-pass
+raise NotImplementedError
 
   def combine_all(self, merging_timestamps):
 """Apply combine to list of timestamps."""
 combined_output_time = None
 for output_time in merging_timestamps:
   if combined_output_time is None:
 combined_output_time = output_time
-  else:
+  elif output_time is not None:
 combined_output_time = self.combine(
 combined_output_time, output_time)
 
 Review comment:
   what would be the "else" after this change, What would it imply when this 
happens?   Shall we at least log something when fall into that situation? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347763)
Time Spent: 0.5h  (was: 20m)

> Timestamp combiner not respected across bundles in streaming mode.
> --
>
> Key: BEAM-8802
> URL: https://issues.apache.org/jira/browse/BEAM-8802
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=347761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347761
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:42
Start Date: 21/Nov/19 22:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10081: [BEAM-8645] 
A test case for TimestampCombiner.
URL: https://github.com/apache/beam/pull/10081#discussion_r349355406
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -480,6 +486,74 @@ def test_with_input_types_decorator_violation(self):
 pc = p | Create(l_3_tuple)
 _ = pc | beam.CombineGlobally(self.fn)
 
+#
+# Test cases for streaming.
+#
+@attr('ValidatesRunner')
+class TimestampCombinerTest(unittest.TestCase):
+
+  @unittest.skip('BEAM-8657')
+  def test_combiner_earliest(self):
+"""Test TimestampCombiner with EARLIEST."""
+options = PipelineOptions(streaming=True)
+with TestPipeline(options=options) as p:
+  result = (p
+| TestStream()
+.add_elements([window.TimestampedValue(('k', 100), 2)])
+.add_elements([window.TimestampedValue(('k', 400), 7)])
+.advance_watermark_to_infinity()
+| beam.WindowInto(
+window.FixedWindows(10),
+timestamp_combiner=TimestampCombiner.OUTPUT_AT_EARLIEST)
+| beam.CombinePerKey(sum))
+
+  records = (result
+ | beam.Map(lambda e, ts=beam.DoFn.TimestampParam: (e, ts)))
+
+  # All the KV pairs are applied GBK using EARLIEST timestamp for the same
+  # key.
+  expected_window_to_elements = {
+  window.IntervalWindow(0, 10): [
+  (('k', 500), Timestamp(2)),
+  ],
+  }
+
+  assert_that(
+  records,
+  equal_to_per_window(expected_window_to_elements),
+  use_global_window=False,
+  label='assert per window')
+
+  def test_combiner_latest(self):
+"""Test TimestampCombiner with LATEST."""
+options = PipelineOptions(streaming=True)
+with TestPipeline(options=options) as p:
+  result = (p
+| TestStream()
+.add_elements([window.TimestampedValue(('k', 100), 2)])
+.add_elements([window.TimestampedValue(('k', 400), 7)])
 
 Review comment:
   Turns out this happens to work because 7 is the last element.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347761)
Time Spent: 7h 50m  (was: 7h 40m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347757
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:40
Start Date: 21/Nov/19 22:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10191: [BEAM-8802] 
Don't clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191
 
 
   This fixes the errors exposed at https://github.com/apache/beam/pull/10081
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://buil

[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347758
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:40
Start Date: 21/Nov/19 22:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10191: [BEAM-8802] Don't 
clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191#issuecomment-557305006
 
 
   R: @HuangLED
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347758)
Time Spent: 20m  (was: 10m)

> Timestamp combiner not respected across bundles in streaming mode.
> --
>
> Key: BEAM-8802
> URL: https://issues.apache.org/jira/browse/BEAM-8802
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347755
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:39
Start Date: 21/Nov/19 22:39
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10159: [BEAM-8575] 
Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349354363
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
 
 Review comment:
   Why is this validates runner?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347755)
Time Spent: 16h 10m  (was: 16h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347752
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:35
Start Date: 21/Nov/19 22:35
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10190: [BEAM-8575] Added 
two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-557303427
 
 
   The mean combine fn already covers this test case completely. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347752)
Time Spent: 16h  (was: 15h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8804) PCollectionList support in cross-language transforms

2019-11-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8804:
--
Description: Currently, Beam model doesn't have any information on the 
order of output PCollections from PTransforms. So, PCollectionList needs to be 
converted to PCollectionTuple when it goes across the cross-language boundary 
(or even in the same language, when it is converted between in-memory object 
and proto).  (was: Currently, Beam model doesn't have any information on the 
order of output PCollections from PTransforms. So, PCollectionList needs to be 
converted to PCollectionTuple when it goes across the cross-language boundary 
(or even in the same language, when it is converted between in-memory object 
and proto). Maybe we should add more fields in PTransform proto definition to 
keep the ordering information.)

> PCollectionList support in cross-language transforms
> 
>
> Key: BEAM-8804
> URL: https://issues.apache.org/jira/browse/BEAM-8804
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Heejong Lee
>Priority: Major
>
> Currently, Beam model doesn't have any information on the order of output 
> PCollections from PTransforms. So, PCollectionList needs to be converted to 
> PCollectionTuple when it goes across the cross-language boundary (or even in 
> the same language, when it is converted between in-memory object and proto).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8739) Consistently use with Pipeline(...) syntax

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8739?focusedWorklogId=347751&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347751
 ]

ASF GitHub Bot logged work on BEAM-8739:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:33
Start Date: 21/Nov/19 22:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10149: [BEAM-8739] 
Consistently use with Pipeline(...) syntax
URL: https://github.com/apache/beam/pull/10149#issuecomment-557302872
 
 
   R: @ibzib lint and tests are now all happy.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347751)
Time Spent: 20m  (was: 10m)

> Consistently use with Pipeline(...) syntax
> --
>
> Key: BEAM-8739
> URL: https://issues.apache.org/jira/browse/BEAM-8739
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've run into a couple of tests that forgot to do p.run(). In addition, I'm 
> seeing new tests written in this old style. We should consistently use the 
> with syntax where possible for our examples and tests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347746
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:25
Start Date: 21/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184#issuecomment-557300482
 
 
   Thank you for pointing 404 URL error. Fixing pom.xml.
   Documentation: 
https://github.com/GoogleCloudPlatform/cloud-opensource-java/tree/master/dependencies
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347746)
Time Spent: 2h 50m  (was: 2h 40m)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347741
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:19
Start Date: 21/Nov/19 22:19
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10184: 
[BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid 
upgrading dependencies.
URL: https://github.com/apache/beam/pull/10184#discussion_r349347092
 
 

 ##
 File path: build.gradle
 ##
 @@ -294,3 +294,41 @@ release {
 pushToRemote = ''
   }
 }
+
+// Reports linkage errors across multiple Apache Beam artifact ids.
+//
+// To use (from the root of project):
+//./gradlew -Ppublishing 
-PjavaLinkageArtifactIds=artifactId1,artifactId2,... :checkJavaLinkage
+//
+// For example:
+//./gradlew -Ppublishing 
-PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc 
:checkJavaLinkage
+//
+// Note that this task publishes artifacts into your local Maven repository.
+if (project.hasProperty('javaLinkageArtifactIds')) {
 
 Review comment:
   Ah, I misread how you were using this property. But it would seem nice to 
base it on the current project's runtime scope.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347741)
Time Spent: 2h 40m  (was: 2.5h)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347738
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 21/Nov/19 22:12
Start Date: 21/Nov/19 22:12
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184#issuecomment-557296259
 
 
   Can you also add some links to documentation for the tool? I was just 
looking around for it, following the pointer from the maven central 
coordinates, which points to a real package but the web page registered for 
that package is a 404.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347738)
Time Spent: 2.5h  (was: 2h 20m)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8804) PCollectionList support in cross-language transforms

2019-11-21 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8804:
-

 Summary: PCollectionList support in cross-language transforms
 Key: BEAM-8804
 URL: https://issues.apache.org/jira/browse/BEAM-8804
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Heejong Lee


Currently, Beam model doesn't have any information on the order of output 
PCollections from PTransforms. So, PCollectionList needs to be converted to 
PCollectionTuple when it goes across the cross-language boundary (or even in 
the same language, when it is converted between in-memory object and proto). 
Maybe we should add more fields in PTransform proto definition to keep the 
ordering information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-21 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver reassigned BEAM-8512:
-

Assignee: Kyle Weaver  (was: Robert Bradshaw)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=347728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347728
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 21/Nov/19 21:53
Start Date: 21/Nov/19 21:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-557289249
 
 
   Thanks Ning!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347728)
Time Spent: 7h 10m  (was: 7h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >