Build failed in Jenkins: beam_PreCommit_Website_Cron #101

2018-09-25 Thread Apache Jenkins Server
See 


Changes:

[github] [BEAM-5378] Remove go wordcap example Dataflow runner (#6395)

--
[...truncated 24.37 KB...]
Step 5/7 : RUN bundle install --deployment --path $GEM_HOME
 ---> Using cache
 ---> 5ae973675c5d
Step 6/7 : ENV LC_ALL C.UTF-8
 ---> Using cache
 ---> b524c7533c23
Step 7/7 : CMD sleep 3600
 ---> Using cache
 ---> accf59a8a6a8
Successfully built accf59a8a6a8
Successfully tagged beam-website:latest
:beam-website:buildDockerImage (Thread[Task worker for ':' Thread 7,5,main]) 
completed. Took 0.342 secs.
:beam-website:createDockerContainer (Thread[Task worker for ':' Thread 
7,5,main]) started.

> Task :beam-website:createDockerContainer
Caching disabled for task ':beam-website:createDockerContainer': Caching has 
not been enabled for the task
Task ':beam-website:createDockerContainer' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
Starting process 'command '/bin/bash''. Working directory: 
 
Command: /bin/bash -c docker create -v 
:/repo -u 
$(id -u):$(id -g) beam-website
Successfully started process 'command '/bin/bash''
:beam-website:createDockerContainer (Thread[Task worker for ':' Thread 
7,5,main]) completed. Took 0.242 secs.
:beam-website:startDockerContainer (Thread[Task worker for ':' Thread 
7,5,main]) started.

> Task :beam-website:startDockerContainer
Caching disabled for task ':beam-website:startDockerContainer': Caching has not 
been enabled for the task
Task ':beam-website:startDockerContainer' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
Starting process 'command 'docker''. Working directory: 
 
Command: docker start 
d45aa5e3520d69676825dd7272fbe552a9186cd8166889661859af657d648845
Successfully started process 'command 'docker''
d45aa5e3520d69676825dd7272fbe552a9186cd8166889661859af657d648845
:beam-website:startDockerContainer (Thread[Task worker for ':' Thread 
7,5,main]) completed. Took 0.298 secs.
:beam-website:buildWebsite (Thread[Task worker for ':' Thread 7,5,main]) 
started.

> Task :rat
Build cache key for task ':rat' is a5b2aea4cbed3f709d4569fa72b542c4
Caching disabled for task ':rat': Caching has not been enabled for the task
Task ':rat' is not up-to-date because:
  No history is available.
Rat XML report: 

:rat (Thread[Task worker for ':',5,main]) completed. Took 2.988 secs.

> Task :beam-website:buildWebsite
Build cache key for task ':beam-website:buildWebsite' is 
6a565ed0953651f7c69bc5a474a17ce1
Caching disabled for task ':beam-website:buildWebsite': Caching has not been 
enabled for the task
Task ':beam-website:buildWebsite' is not up-to-date because:
  No history is available.
Starting process 'command 'docker''. Working directory: 
 
Command: docker exec 
d45aa5e3520d69676825dd7272fbe552a9186cd8166889661859af657d648845 /bin/bash -c 
cd /repo/build/website &&   bundle exec jekyll build   --config 
/repo/website/_config.yml   --incremental   --source /repo/website/src
  
Successfully started process 'command 'docker''
`/` is not writable.
Bundler will use `/tmp/bundler/home/unknown' as your home directory temporarily.
Configuration file: /repo/website/_config.yml
Source: /repo/website/src
   Destination: content
 Incremental build: enabled
  Generating... 
done in 12.399 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
:beam-website:buildWebsite (Thread[Task worker for ':' Thread 7,5,main]) 
completed. Took 14.032 secs.
:beam-website:testWebsite (Thread[Task worker for ':' Thread 7,5,main]) started.

> Task :beam-website:testWebsite
Caching disabled for task ':beam-website:testWebsite': Caching has not been 
enabled for the task
Task ':beam-website:testWebsite' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
Starting process 'command 'docker''. Working directory: 
 
Command: docker exec 
d45aa5e3520d69676825dd7272fbe552a9186cd8166889661859af657d648845 /bin/bash -c 
cd /repo/build/website &&   bundle exec rake test
Successfully started process 'command 'docker''
`/` is not writable.
Bundler will use `/tmp/bundler/home/unknown' as your home directory temporarily.
Running ["HtmlCheck", "LinkCheck", "ImageCheck", "ScriptCheck"] on 
["./content"] on *.html... 


Checking 651 external links...
Ran on 157 files!


- ./content/contribute/release-guide/index.html
  *  External link 

[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147726
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 18:46
Start Date: 25/Sep/18 18:46
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6423: [BEAM-5417] Parity 
between GCS and local match
URL: https://github.com/apache/beam/pull/6423#issuecomment-424457700
 
 
   Still reviewing, but FYI I have a related PR out 
https://github.com/apache/beam/pull/6480 that touches `gcsio.py`. I don't 
believe it conflicts with this PR.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147726)
Time Spent: 2h 40m  (was: 2.5h)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147753
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:05
Start Date: 25/Sep/18 20:05
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220335224
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem_test.py
 ##
 @@ -126,66 +180,69 @@ def test_match_glob(self):
 ('apple/dish/bat', 13),
 ('apple/dish/cat', 14),
 ('apple/dish/carl', 15),
+('banana/cat', 16),
+('banana/cyrano.md', 17),
+('banana/cyrano.mb', 18),
 ]
-for (object_name, size) in objects:
+bucket_name = 'gcsio-test'
+
+if expected_object_names is all:
+  # A hack around the fact that the parameterized decorator does not have
+  #  access to self.objects
+  expected_object_names = objects
+
+for object_name, size in objects:
   file_name = 'gs://%s/%s' % (bucket_name, object_name)
   self.fs._insert_random_file(file_name, size)
-test_cases = [
-('gs://*', objects),
-('gs://gcsio-test/*', objects),
-('gs://gcsio-test/cow/*', [
-('cow/cat/fish', 2),
-('cow/cat/blubber', 3),
-('cow/dog/blubber', 4),
-]),
-('gs://gcsio-test/cow/ca*', [
-('cow/cat/fish', 2),
-('cow/cat/blubber', 3),
-]),
-('gs://gcsio-test/apple/[df]ish/ca*', [
-('apple/fish/cat', 10),
-('apple/fish/cart', 11),
-('apple/fish/carl', 12),
-('apple/dish/cat', 14),
-('apple/dish/carl', 15),
-]),
-('gs://gcsio-test/apple/fish/car?', [
-('apple/fish/cart', 11),
-('apple/fish/carl', 12),
-]),
-('gs://gcsio-test/apple/fish/b*', [
-('apple/fish/blubber', 6),
-('apple/fish/blowfish', 7),
-('apple/fish/bambi', 8),
-('apple/fish/balloon', 9),
-]),
-('gs://gcsio-test/apple/f*/b*', [
-('apple/fish/blubber', 6),
-('apple/fish/blowfish', 7),
-('apple/fish/bambi', 8),
-('apple/fish/balloon', 9),
-]),
-('gs://gcsio-test/apple/dish/[cb]at', [
-('apple/dish/bat', 13),
-('apple/dish/cat', 14),
-]),
+
+expected_file_names = [('gs://%s/%s' % (bucket_name, object_name), size)
+   for object_name, size in expected_object_names]
+actual_file_names = [
+(file_metadata.path, file_metadata.size_in_bytes)
+for file_metadata in self._flatten_match(self.fs.match([file_pattern]))
 ]
-for file_pattern, expected_object_names in test_cases:
-  expected_file_names = [('gs://%s/%s' % (bucket_name, object_name), size)
- for (object_name, size) in expected_object_names]
-  self.assertEqual(
-  set([(file_metadata.path, file_metadata.size_in_bytes)
-   for file_metadata in
-   self._flatten_match(self.fs.match([file_pattern]))]),
-  set(expected_file_names))
+
+self.maxDiff = None
+self.assertEqual(set(actual_file_names), set(expected_file_names))
 
 # Check if limits are followed correctly
 limit = 3
-for file_pattern, expected_object_names in test_cases:
-  expected_num_items = min(len(expected_object_names), limit)
-  self.assertEqual(
-  len(self._flatten_match(self.fs.match([file_pattern], [limit]))),
-  expected_num_items)
+expected_num_items = min(len(expected_object_names), limit)
+self.assertEqual(
+len(self._flatten_match(self.fs.match([file_pattern], [limit]))),
+expected_num_items)
+
+  @parameterized.expand([
+  param(os_path=posixpath, sep_re='\\/'),
+  param(os_path=ntpath, sep_re=''),
+  ])
+  def test_translate_pattern(self, os_path, sep_re):
+star = r'[^/\\]*'
+double_star = r'.*'
+join = os_path.join
+
+def re_join(a, *p):
+  path = a
+  for b in p:
+if b.startswith(sep_re):
 
 Review comment:
   This branch is not used in any of `pattern__expected`, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147753)
Time Spent: 3h  (was: 2h 50m)

> FileSystems.match behaviour diff between GCS and local file 

[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147754
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:05
Start Date: 25/Sep/18 20:05
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220280914
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem.py
 ##
 @@ -531,24 +530,117 @@ def _list(self, dir_or_prefix):
 """
 raise NotImplementedError
 
+  @staticmethod
+  def _split_scheme(url_or_path):
+match = re.match(r'(^[a-z]+)://(.*)', url_or_path)
+if match is not None:
+  return match.groups()
+return None, url_or_path
+
+  @staticmethod
+  def _combine_scheme(scheme, path):
+if scheme is None:
+  return path
+return '{}://{}'.format(scheme, path)
+
   def _url_dirname(self, url_or_path):
 """Like posixpath.dirname, but preserves scheme:// prefix.
 
 Args:
   url_or_path: A string in the form of scheme://some/path OR /some/path.
 """
-match = re.match(r'([a-z]+://)(.*)', url_or_path)
-if match is None:
-  return posixpath.dirname(url_or_path)
-url_prefix, path = match.groups()
-return url_prefix + posixpath.dirname(path)
+scheme, path = self._split_scheme(url_or_path)
+return self._combine_scheme(scheme, posixpath.dirname(path))
+
+  def match_files(self, file_metas, pattern):
+"""Filter :class:`FileMetadata` objects by :data:`pattern`
+
+Args:
+  file_metas (:obj:`list` of :class:`FileMetadata`):
+Files to consider when matching
+  pattern (str): File pattern
+
+See Also:
+  :meth:`translate_pattern`
+
+Returns:
+  Generator of matching :class:`FileMetadata`
+"""
+re_pattern = re.compile(self.translate_pattern(pattern))
+match = re_pattern.match
+for file_metadata in file_metas:
+  is_match = match(file_metadata.path)
+  logger.debug('%r %r', is_match, file_metadata)
 
 Review comment:
   Could you make these debug logs more descriptive so it's easier to tell 
where they're from?
   I'd add the function name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147754)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147755
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:05
Start Date: 25/Sep/18 20:05
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220314048
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem_test.py
 ##
 @@ -109,8 +113,58 @@ def _flatten_match(self, match_results):
 for match_result in match_results
 for file_metadata in match_result.metadata_list]
 
-  def test_match_glob(self):
-bucket_name = 'gcsio-test'
+  @parameterized.expand([
+  ('**/*', all),
 
 Review comment:
   Is there a difference between `**/*` and `**`? Shouldn't they both return 
all?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147755)
Time Spent: 3h 10m  (was: 3h)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5504) Pubsub Write by BeamSQL

2018-09-25 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5504:
--

 Summary: Pubsub Write by BeamSQL
 Key: BEAM-5504
 URL: https://issues.apache.org/jira/browse/BEAM-5504
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147745
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:52
Start Date: 25/Sep/18 19:52
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#6451: [BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220332168
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/cache_manager.py
 ##
 @@ -30,6 +30,14 @@
 from apache_beam.io import filesystems
 from apache_beam.transforms import combiners
 
+try:# Python 3
 
 Review comment:
   The input of the `unquote()` call here is a `bytes` object, while it is a 
`str` in the dataflow_runner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147745)
Time Spent: 2h  (was: 1h 50m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #136

2018-09-25 Thread Apache Jenkins Server
See 


Changes:

[qinyeli] Interactive Beam -- read_cache_ids and write_cache_ids

[qinyeli] Interactive Beam -- renaming variables and functions

[qinyeli] Interactive Beam -- fixing PTransform # display issue

[migryz] Add metrics dashboard deployment script and logic

[migryz] Fix rat issues

--
[...truncated 51.32 MB...]
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
63e76c6458025f10d06dfa273cf959f3.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
3e5f282c6e7f7f63683eb116bc011574.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
7103e521d82b018cb2705ec5a13f365a.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
ff89fd076d7e55e3a9573674f783dcd5.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
7e73927d0cb6e1e82dde8f9fddd0e001.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
ea81f5c1f761ad351ca8cad0adfb6a07.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (10/16) 
(ebc995bff43a14150657f40ab8a2ea39) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
719e29a0e3fdf34cc7364ccd42f2452a.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
ac892e8a5c8a7a07e2cd02faa85cd198.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
475ac8c529755938328dff40c4ba68a7.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (8/16) 
(40eefe1021dce5f16cac583bb2a2883d) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (12/16) 
(c99018637ef4614ea4a1f22fcceb5c08) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (5/16) 
(a0abdc54c5226b8253b84a8e5cffc792) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(16/16) (0d70bcd7b2b0947c527d14e39a3f148c) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (14/16) 
(88f97a2b2081cc781c7e64c440b6084f) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (11/16) 
(1e7b79d6291376187dedad5348d3973f) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (3/16) 
(6e32a8e3d0c0cf344af3064024874b49) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (9/16) 
(db513d0221a33931f205c43bba74185e) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(13/16) (c32a431c18b5bf7afcee551134a11d2b) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - 

[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147787
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 21:26
Start Date: 25/Sep/18 21:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220361509
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/cache_manager.py
 ##
 @@ -30,6 +30,14 @@
 from apache_beam.io import filesystems
 from apache_beam.transforms import combiners
 
+try:# Python 3
 
 Review comment:
`urllib.parse.unquote` requires a `str` on Python 3, so this will still not 
work if we give it a `bytes` object?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147787)
Time Spent: 2h 20m  (was: 2h 10m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5503) Updated BigQueryIO time partitioning doc snippet

2018-09-25 Thread Melissa Pashniak (JIRA)
Melissa Pashniak created BEAM-5503:
--

 Summary: Updated BigQueryIO time partitioning doc snippet
 Key: BEAM-5503
 URL: https://issues.apache.org/jira/browse/BEAM-5503
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Melissa Pashniak
Assignee: David Cavazos


Need update to the doc Java snippet for BigQuery and time partitioning for this 
change: [https://github.com/apache/beam/pull/4694]

Snippet: 
/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java
 tag:BigQueryTimePartitioning

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147760
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:23
Start Date: 25/Sep/18 20:23
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #6418: [BEAM-5501] 
Interactive Beam -- display issue: number of PTransform executed wrongly 
displayed
URL: https://github.com/apache/beam/pull/6418
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/python/apache_beam/runners/interactive/display/display_manager.py 
b/sdks/python/apache_beam/runners/interactive/display/display_manager.py
index 1c0d902920c..84025f9b220 100644
--- a/sdks/python/apache_beam/runners/interactive/display/display_manager.py
+++ b/sdks/python/apache_beam/runners/interactive/display/display_manager.py
@@ -50,33 +50,28 @@ def _formatter(string, pp, cycle):  # pylint: 
disable=unused-argument
 class DisplayManager(object):
   """Manages displaying pipeline graph and execution status on the frontend."""
 
-  def __init__(self, pipeline_info, pipeline_proto, caches_used, cache_manager,
-   referenced_pcollections, required_transforms,
+  def __init__(self, pipeline_proto, pipeline_analyzer, cache_manager,
pipeline_graph_renderer):
 """Constructor of DisplayManager.
 
 Args:
-  pipeline_info: (interactive_runner.PipelineInfo)
   pipeline_proto: (Pipeline proto)
-  caches_used: (set of str) A set of PCollection IDs of those whose cached
-  results are used in the execution.
+  pipeline_analyzer: (PipelineAnalyzer) the pipeline analyzer that
+  corresponds to this round of execution. This will provide more
+  detailed informations about the pipeline
   cache_manager: (interactive_runner.CacheManager) DisplayManager fetches
   the latest status of pipeline execution by querying cache_manager.
-  referenced_pcollections: (dict from str to PCollection proto) PCollection
-  ID mapped to PCollection referenced during pipeline execution.
-  required_transforms: (dict from str to PTransform proto) mapping from
-  transform ID to transforms that leads to visible results.
   pipeline_graph_renderer: (pipeline_graph_renderer.PipelineGraphRenderer)
   decides how a pipeline graph is rendered.
 """
 # Every parameter except cache_manager is expected to remain constant.
+self._analyzer = pipeline_analyzer
 self._cache_manager = cache_manager
-self._referenced_pcollections = referenced_pcollections
 self._pipeline_graph = interactive_pipeline_graph.InteractivePipelineGraph(
 pipeline_proto,
-required_transforms=required_transforms,
-referenced_pcollections=referenced_pcollections,
-cached_pcollections=caches_used)
+required_transforms=self._analyzer.tl_required_trans_ids(),
+referenced_pcollections=self._analyzer.tl_referenced_pcoll_ids(),
+cached_pcollections=self._analyzer.caches_used())
 self._renderer = pipeline_graph_renderer
 
 # _text_to_print keeps track of information to be displayed.
@@ -84,20 +79,22 @@ def __init__(self, pipeline_info, pipeline_proto, 
caches_used, cache_manager,
 self._text_to_print['summary'] = (
 'Using %s cached PCollections\nExecuting %s of %s '
 'transforms.') % (
-# TODO(qinyeli): required_transforms includes ReadCache and
-# WriteCache fix it.
-len(caches_used), len(required_transforms),
+len(self._analyzer.caches_used()),
+(len(self._analyzer.tl_required_trans_ids())
+ - len(self._analyzer.read_cache_ids())
+ - len(self._analyzer.write_cache_ids())),
 len(pipeline_proto.components.transforms[
 pipeline_proto.root_transform_ids[0]].subtransforms))
-self._text_to_print.update(
-{pcoll_id: "" for pcoll_id in referenced_pcollections})
+self._text_to_print.update({
+pcoll_id: "" for pcoll_id
+in self._analyzer.tl_referenced_pcoll_ids()})
 
 # _pcollection_stats maps pcoll_id to
 # { 'cache_label': cache_label, version': version, 'sample': pcoll_in_list 
}
 self._pcollection_stats = {}
-for pcoll_id in pipeline_info.all_pcollections():
+for pcoll_id in self._analyzer.tl_referenced_pcoll_ids():
   self._pcollection_stats[pcoll_id] = {
-  'cache_label': pipeline_info.cache_label(pcoll_id),
+  'cache_label': self._analyzer.pipeline_info().cache_label(pcoll_id),
   'version': 

[beam] 01/01: Merge pull request #6418 from qinyeli/display_bug

2018-09-25 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 08161984d05836da2dee1166f5d9f35c6b8d6ee8
Merge: 403360c fefab67
Author: Pablo 
AuthorDate: Tue Sep 25 13:23:35 2018 -0700

Merge pull request #6418 from qinyeli/display_bug

[BEAM-5501] Interactive Beam -- display issue: number of PTransform 
executed wrongly displayed

 .../runners/interactive/display/display_manager.py | 39 ++---
 .../display/interactive_pipeline_graph.py  | 13 +++--
 .../runners/interactive/display/pipeline_graph.py  |  2 +-
 .../runners/interactive/interactive_runner.py  | 11 +---
 .../runners/interactive/pipeline_analyzer.py   | 64 --
 .../runners/interactive/pipeline_analyzer_test.py  | 13 +++--
 6 files changed, 84 insertions(+), 58 deletions(-)

diff --cc sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
index b0cf134,1b63975..1ac67ac
--- a/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
+++ b/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
@@@ -116,12 -121,12 +121,12 @@@ class PipelineAnalyzer(object)
  sample=True)
  
  required_transforms['_root'] = beam_runner_api_pb2.PTransform(
 -subtransforms=top_level_required_transforms.keys())
 +subtransforms=list(top_level_required_transforms.keys()))
  
- referenced_pcollection_ids = self._referenced_pcollection_ids(
+ referenced_pcoll_ids = self._referenced_pcoll_ids(
  required_transforms)
  referenced_pcollections = {}
- for pcoll_id in referenced_pcollection_ids:
+ for pcoll_id in referenced_pcoll_ids:
obj = self._context.pcollections.get_by_id(pcoll_id)
proto = self._context.pcollections.get_proto(obj)
referenced_pcollections[pcoll_id] = proto



[beam] branch master updated (403360c -> 0816198)

2018-09-25 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 403360c  [BEAM-1251] Upgrade pylint version for py27-lint3 (#6489)
 add 72c5404  Interactive Beam -- read_cache_ids and write_cache_ids
 add 8c77f88  Interactive Beam -- renaming variables and functions
 add fefab67  Interactive Beam -- fixing PTransform # display issue
 new 0816198  Merge pull request #6418 from qinyeli/display_bug

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../runners/interactive/display/display_manager.py | 39 ++---
 .../display/interactive_pipeline_graph.py  | 13 +++--
 .../runners/interactive/display/pipeline_graph.py  |  2 +-
 .../runners/interactive/interactive_runner.py  | 11 +---
 .../runners/interactive/pipeline_analyzer.py   | 64 --
 .../runners/interactive/pipeline_analyzer_test.py  | 13 +++--
 6 files changed, 84 insertions(+), 58 deletions(-)



[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=147769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147769
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:33
Start Date: 25/Sep/18 20:33
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-424491563
 
 
   PTAL @chamikaramj 
   I have added tests and refactored the original implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147769)
Time Spent: 4h 20m  (was: 4h 10m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=147770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147770
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:33
Start Date: 25/Sep/18 20:33
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6205: [BEAM-4374] 
Implementing a subset of the new metrics framework in python.
URL: https://github.com/apache/beam/pull/6205#issuecomment-424491591
 
 
   i think the test failures that you see in py precommit should be solved by 
rebasing


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147770)
Time Spent: 6h 50m  (was: 6h 40m)

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=147771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147771
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:33
Start Date: 25/Sep/18 20:33
Worklog Time Spent: 10m 
  Work Description: timrobertson100 edited a comment on issue #6289: 
[BEAM-5036] Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-424491563
 
 
   PTAL @chamikaramj - and thank you.
   I have added tests and refactored the original implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147771)
Time Spent: 4.5h  (was: 4h 20m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5506) Update Beam documentation

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5506?focusedWorklogId=147795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147795
 ]

ASF GitHub Bot logged work on BEAM-5506:


Author: ASF GitHub Bot
Created on: 25/Sep/18 21:40
Start Date: 25/Sep/18 21:40
Worklog Time Spent: 10m 
  Work Description: amaliujia opened a new pull request #6491: [BEAM-5506] 
Add reference link in CREATE TABLE exception
URL: https://github.com/apache/beam/pull/6491
 
 
   Add reference link in CREATE TABLE exception.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147795)
Time Spent: 10m
Remaining Estimate: 0h

> Update Beam documentation
> -
>
> Key: BEAM-5506
> URL: https://issues.apache.org/jira/browse/BEAM-5506
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>

[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=147761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147761
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:24
Start Date: 25/Sep/18 20:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6490: [BEAM-5240] Add 
metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6490#issuecomment-424488480
 
 
   LGTM. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147761)
Time Spent: 4h 20m  (was: 4h 10m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=147762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147762
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:24
Start Date: 25/Sep/18 20:24
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #6490: [BEAM-5240] Add 
metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6490
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.test-infra/metrics/README.md b/.test-infra/metrics/README.md
new file mode 100644
index 000..d442e43fe1c
--- /dev/null
+++ b/.test-infra/metrics/README.md
@@ -0,0 +1,123 @@
+
+# BeamMonitoring
+This folder contains files required to spin-up metrics dashboard for Beam.
+
+## Utilized technologies
+* [Grafana](https://grafana.com) as dashboarding engine.
+* PostgreSQL as underlying DB.
+
+Approach utilized is to fetch data from corresponding system: 
Jenkins/Jira/GithubArchives/etc, put it into PostreSQL and fetch it to show in 
Grafana.
+
+## Local setup
+
+Install docker
+* install docker
+* https://docs.docker.com/install/#supported-platforms
+* install docker-compose
+* https://docs.docker.com/compose/install/#install-compose
+
+```sh
+# Remove old docker
+sudo apt-get remove docker docker-engine docker.io
+
+# Install docker
+sudo apt-get update
+sudo apt-get install \
+ apt-transport-https \
+ ca-certificates \
+ curl \
+ gnupg2 \
+ software-properties-common
+curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
+sudo apt-key fingerprint 0EBFCD88
+sudo add-apt-repository \
+   "deb [arch=amd64] https://download.docker.com/linux/debian \
+   $(lsb_release -cs) \
+   stable"
+sudo apt-get update
+sudo apt-get install docker-ce
+
+# Install docker-compose
+sudo curl -L 
https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname
 -s)-$(uname -m) -o /usr/local/bin/docker-compose
+sudo chmod +x /usr/local/bin/docker-compose
+
+# start docker service if it is not running already
+sudo service docker start
+```
+
+## Kubernetes setup
+
+1. Configure gcloud & kubectl
+  * https://cloud.google.com/kubernetes-engine/docs/quickstart
+2. Configure PosgreSQL
+a. 
https://pantheon.corp.google.com/sql/instances?project=apache-beam-testing
+b. Check on this link to configure connection from kubernetes to 
postgresql: https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine
+3. add secrets for grafana
+a. `kubectl create secret generic grafana-admin-pwd 
--from-literal=grafana_admin_password=`
+4. create persistent volume claims:
+```sh
+kubectl create -f beam-grafana-etcdata-persistentvolumeclaim.yaml
+kubectl create -f beam-grafana-libdata-persistentvolumeclaim.yaml
+kubectl create -f beam-grafana-logdata-persistentvolumeclaim.yaml
+```
+5. Build and publish sync containers
+```sh
+cd sync/jenkins
+docker build -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1 .
+docker push gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1
+```
+6. Create deployment `kubectl create -f beamgrafana-deploy.yaml`
+
+## Kubernetes update
+https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
+
+1. Build and publish sync containers
+```sh
+cd sync/jenkins
+docker build -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1 .
+docker push -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1
+```
+1. Update image for container `kubectl set image deployment/beamgrafana 
container=`
+
+
+## Useful Kubernetes commands and hints
+```sh
+# Get pods
+kubectl get pods
+
+# Get detailed status
+kubectl describe pod 
+
+# Get logs
+kubectl log  
+
+# Set kubectl logging level: -v [1..10]
+https://github.com/kubernetes/kubernetes/issues/35054
+```
+
+## Useful docker commands and hints
+* Connect from one container to another
+* `curl :`
+* Remove all containers/images/volumes
+```sh
+sudo docker rm $(sudo docker ps -a -q)
+sudo docker rmi $(sudo docker images -q)
+sudo docker volume prune
+```
diff --git 
a/.test-infra/metrics/beam-grafana-etcdata-persistentvolumeclaim.yaml 
b/.test-infra/metrics/beam-grafana-etcdata-persistentvolumeclaim.yaml
new file mode 100644
index 000..1b85f3b85fd
--- /dev/null
+++ b/.test-infra/metrics/beam-grafana-etcdata-persistentvolumeclaim.yaml
@@ -0,0 +1,32 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  

[beam] branch master updated (0816198 -> 81c536c)

2018-09-25 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 0816198  Merge pull request #6418 from qinyeli/display_bug
 add b37e9a8  Add metrics dashboard deployment script and logic
 add 764db0d  Fix rat issues
 add 81c536c  Merge pull request #6490 from Ardagan/AddGrafana

No new revisions were added by this update.

Summary of changes:
 .test-infra/metrics/README.md  | 123 +
 ...beam-grafana-etcdata-persistentvolumeclaim.yaml |  18 +-
 ...eam-grafana-libdata-persistentvolumeclaim.yaml} |  20 +-
 ...beam-grafana-logdata-persistentvolumeclaim.yaml |  18 +-
 ...beam-postgresql-data-persistentvolumeclaim.yaml |  18 +-
 .test-infra/metrics/beamgrafana-deploy.yaml| 115 
 .test-infra/metrics/dashboards/dashboard.json  | 297 +
 .../metrics/docker-compose.yml |  40 ++-
 .../metrics/sync/jenkins/Dockerfile|  13 +-
 .../metrics/sync/jenkins/README.md |   9 +-
 .../sync/jenkins/requirements.txt} |   6 +-
 .test-infra/metrics/sync/jenkins/syncjenkins.py| 210 +++
 build.gradle   |   3 +
 13 files changed, 847 insertions(+), 43 deletions(-)
 create mode 100644 .test-infra/metrics/README.md
 copy runners/samza/src/main/resources/log4j.properties => 
.test-infra/metrics/beam-grafana-etcdata-persistentvolumeclaim.yaml (78%)
 copy 
.test-infra/{kubernetes/cassandra/SmallITCluster/cassandra-service-for-local-dev.yaml
 => metrics/beam-grafana-libdata-persistentvolumeclaim.yaml} (77%)
 copy runners/samza/src/main/resources/log4j.properties => 
.test-infra/metrics/beam-grafana-logdata-persistentvolumeclaim.yaml (78%)
 copy runners/samza/src/main/resources/log4j.properties => 
.test-infra/metrics/beam-postgresql-data-persistentvolumeclaim.yaml (78%)
 create mode 100644 .test-infra/metrics/beamgrafana-deploy.yaml
 create mode 100644 .test-infra/metrics/dashboards/dashboard.json
 copy runners/flink/src/test/resources/log4j-test.properties => 
.test-infra/metrics/docker-compose.yml (52%)
 copy sdks/java/extensions/sql/src/main/resources/saffron.properties => 
.test-infra/metrics/sync/jenkins/Dockerfile (88%)
 copy CONTRIBUTING.md => .test-infra/metrics/sync/jenkins/README.md (74%)
 copy .test-infra/{kubernetes/hadoop/LargeITCluster/setup.sh => 
metrics/sync/jenkins/requirements.txt} (92%)
 mode change 100755 => 100644
 create mode 100644 .test-infra/metrics/sync/jenkins/syncjenkins.py



[beam] 01/01: Merge pull request #6480 from udim/gcsio-parse-path

2018-09-25 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 8d3389df78aa2e0a0de06b7c5743ca3530dec4ac
Merge: 81c536c 89ce32b
Author: Pablo 
AuthorDate: Tue Sep 25 13:25:13 2018 -0700

Merge pull request #6480 from udim/gcsio-parse-path

[BEAM-5486] GCSIO: Allow empty object prefix in list_prefix().

 sdks/python/apache_beam/io/gcp/gcsio.py  | 10 +-
 sdks/python/apache_beam/io/gcp/gcsio_test.py | 27 ++-
 2 files changed, 27 insertions(+), 10 deletions(-)



[jira] [Work logged] (BEAM-5486) Python: Filesystems.match(['gs://bucket/*']) fails

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5486?focusedWorklogId=147764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147764
 ]

ASF GitHub Bot logged work on BEAM-5486:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:25
Start Date: 25/Sep/18 20:25
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #6480: [BEAM-5486] GCSIO: 
Allow empty object prefix in list_prefix().
URL: https://github.com/apache/beam/pull/6480
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/gcp/gcsio.py 
b/sdks/python/apache_beam/io/gcp/gcsio.py
index 7e8a9f01f76..bd6d96f3dd3 100644
--- a/sdks/python/apache_beam/io/gcp/gcsio.py
+++ b/sdks/python/apache_beam/io/gcp/gcsio.py
@@ -142,10 +142,10 @@ def get_new_http():
timeout=DEFAULT_HTTP_TIMEOUT_SECONDS)
 
 
-def parse_gcs_path(gcs_path):
+def parse_gcs_path(gcs_path, object_optional=False):
   """Return the bucket and object names of the given gs:// path."""
-  match = re.match('^gs://([^/]+)/(.+)$', gcs_path)
-  if match is None:
+  match = re.match('^gs://([^/]+)/(.*)$', gcs_path)
+  if match is None or (match.group(2) == '' and not object_optional):
 raise ValueError('GCS path must be in the form gs:///.')
   return match.group(1), match.group(2)
 
@@ -433,12 +433,12 @@ def list_prefix(self, path):
 """Lists files matching the prefix.
 
 Args:
-  path: GCS file path pattern in the form gs:///.
+  path: GCS file path pattern in the form gs:///[name].
 
 Returns:
   Dictionary of file name -> size.
 """
-bucket, prefix = parse_gcs_path(path)
+bucket, prefix = parse_gcs_path(path, object_optional=True)
 request = storage.StorageObjectsListRequest(bucket=bucket, prefix=prefix)
 file_sizes = {}
 counter = 0
diff --git a/sdks/python/apache_beam/io/gcp/gcsio_test.py 
b/sdks/python/apache_beam/io/gcp/gcsio_test.py
index 1e217a6381f..e0af7c962b5 100644
--- a/sdks/python/apache_beam/io/gcp/gcsio_test.py
+++ b/sdks/python/apache_beam/io/gcp/gcsio_test.py
@@ -219,6 +219,14 @@ def Execute(self, unused_http, **unused_kwargs):  # 
pylint: disable=invalid-name
 @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed')
 class TestGCSPathParser(unittest.TestCase):
 
+  BAD_GCS_PATHS = [
+  'gs://',
+  'gs://bucket',
+  'gs:///name',
+  'gs:///',
+  'gs:/blah/bucket/name',
+  ]
+
   def test_gcs_path(self):
 self.assertEqual(
 gcsio.parse_gcs_path('gs://bucket/name'), ('bucket', 'name'))
@@ -226,12 +234,21 @@ def test_gcs_path(self):
 gcsio.parse_gcs_path('gs://bucket/name/sub'), ('bucket', 'name/sub'))
 
   def test_bad_gcs_path(self):
-self.assertRaises(ValueError, gcsio.parse_gcs_path, 'gs://')
-self.assertRaises(ValueError, gcsio.parse_gcs_path, 'gs://bucket')
+for path in self.BAD_GCS_PATHS:
+  self.assertRaises(ValueError, gcsio.parse_gcs_path, path)
 self.assertRaises(ValueError, gcsio.parse_gcs_path, 'gs://bucket/')
-self.assertRaises(ValueError, gcsio.parse_gcs_path, 'gs:///name')
-self.assertRaises(ValueError, gcsio.parse_gcs_path, 'gs:///')
-self.assertRaises(ValueError, gcsio.parse_gcs_path, 'gs:/blah/bucket/name')
+
+  def test_gcs_path_object_optional(self):
+self.assertEqual(
+gcsio.parse_gcs_path('gs://bucket/name', object_optional=True),
+('bucket', 'name'))
+self.assertEqual(
+gcsio.parse_gcs_path('gs://bucket/', object_optional=True),
+('bucket', ''))
+
+  def test_bad_gcs_path_object_optional(self):
+for path in self.BAD_GCS_PATHS:
+  self.assertRaises(ValueError, gcsio.parse_gcs_path, path, True)
 
 
 @unittest.skipIf(HttpError is None, 'GCP dependencies are not installed')


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147764)
Time Spent: 50m  (was: 40m)

> Python: Filesystems.match(['gs://bucket/*']) fails
> --
>
> Key: BEAM-5486
> URL: https://issues.apache.org/jira/browse/BEAM-5486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Reported here: 

[jira] [Work logged] (BEAM-5486) Python: Filesystems.match(['gs://bucket/*']) fails

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5486?focusedWorklogId=147763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147763
 ]

ASF GitHub Bot logged work on BEAM-5486:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:25
Start Date: 25/Sep/18 20:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6480: [BEAM-5486] GCSIO: 
Allow empty object prefix in list_prefix().
URL: https://github.com/apache/beam/pull/6480#issuecomment-424488788
 
 
   LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147763)
Time Spent: 40m  (was: 0.5h)

> Python: Filesystems.match(['gs://bucket/*']) fails
> --
>
> Key: BEAM-5486
> URL: https://issues.apache.org/jira/browse/BEAM-5486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Reported here: https://github.com/apache/beam/pull/5024#issuecomment-406211816



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (81c536c -> 8d3389d)

2018-09-25 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 81c536c  Merge pull request #6490 from Ardagan/AddGrafana
 add 89ce32b  GCSIO: Allow empty object prefix in list_prefix().
 new 8d3389d  Merge pull request #6480 from udim/gcsio-parse-path

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/io/gcp/gcsio.py  | 10 +-
 sdks/python/apache_beam/io/gcp/gcsio_test.py | 27 ++-
 2 files changed, 27 insertions(+), 10 deletions(-)



[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=147784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147784
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:51
Start Date: 25/Sep/18 20:51
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6433: 
[BEAM-2687] Implement Timers over the Fn API.
URL: https://github.com/apache/beam/pull/6433#discussion_r220311924
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -254,6 +254,7 @@ def __init__(self, name, transforms,
 self.transforms = transforms
 self.downstream_side_inputs = downstream_side_inputs
 self.must_follow = must_follow
+self.timers = []
 
 Review comment:
   Changed the name to be more explicit. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147784)
Time Spent: 5h 20m  (was: 5h 10m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147788
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 21:28
Start Date: 25/Sep/18 21:28
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220361888
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/cache_manager.py
 ##
 @@ -30,6 +30,14 @@
 from apache_beam.io import filesystems
 from apache_beam.transforms import combiners
 
+try:# Python 3
 
 Review comment:
   Ah, nevermind, `urllib.parse.unquote_to_bytes` can accept `string` or 
`bytes`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147788)
Time Spent: 2.5h  (was: 2h 20m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147740
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:33
Start Date: 25/Sep/18 19:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220314013
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_runner_test.py
 ##
 @@ -42,6 +43,8 @@ def printer(elem):
 
 class InteractiveRunnerTest(unittest.TestCase):
 
+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
 
 Review comment:
   If the issues in interactive runner tests are specific to the interactive 
runner itself and are non-trivial to fix, we can also try to get original 
author involved, and track in a separate Jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147740)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147741
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:33
Start Date: 25/Sep/18 19:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r219308266
 
 

 ##
 File path: sdks/python/apache_beam/io/textio.py
 ##
 @@ -147,7 +147,7 @@ def display_data(self):
 
   def read_records(self, file_name, range_tracker):
 start_offset = range_tracker.start_position()
-read_buffer = _TextSource.ReadBuffer('', 0)
+read_buffer = _TextSource.ReadBuffer(b'', 0)
 
 Review comment:
   Should we change line 86 as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147741)
Time Spent: 1h 40m  (was: 1.5h)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147738
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:33
Start Date: 25/Sep/18 19:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220325259
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/stager_test.py
 ##
 @@ -222,6 +223,8 @@ def test_with_requirements_file_and_cache(self):
 self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt')))
 self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt')))
 
+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
+ 'fixed on Python 3')
 
 Review comment:
   Filed https://issues.apache.org/jira/browse/BEAM-5502 for this one. Please 
add a TODO with the Jira in the comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147738)
Time Spent: 1.5h  (was: 1h 20m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147739
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:33
Start Date: 25/Sep/18 19:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220325668
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/cache_manager.py
 ##
 @@ -30,6 +30,14 @@
 from apache_beam.io import filesystems
 from apache_beam.transforms import combiners
 
+try:# Python 3
 
 Review comment:
   Is there a reason not to use 
   ```
   from future.moves.urllib.parse import quote
   from future.moves.urllib.parse import unquote
   ```
   for consistency with dataflow_runner.py?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147739)
Time Spent: 1.5h  (was: 1h 20m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147744
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:48
Start Date: 25/Sep/18 19:48
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#6451: [BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220330992
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_runner_test.py
 ##
 @@ -42,6 +43,8 @@ def printer(elem):
 
 class InteractiveRunnerTest(unittest.TestCase):
 
+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
 
 Review comment:
   All of the skipped tests except for the one in `stager_test.py` seem to be 
failing due to the same underlying bug, so it should not be specific to the 
interactive runner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147744)
Time Spent: 1h 50m  (was: 1h 40m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #135

2018-09-25 Thread Apache Jenkins Server
See 


Changes:

[aaltay] [BEAM-1251] Upgrade pylint version for py27-lint3 (#6489)

--
[...truncated 51.35 MB...]
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (16/16) 
(469ebfde404b75848da18f7d597ac7f0) [FINISHED]
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
44bcbf4a30d36e8dd30123e01ae2fa2e.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
469ebfde404b75848da18f7d597ac7f0.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (4/16) 
(72dbc9d32a0242c800630cf09a434867) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (2/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (2/16) (694fb2c508438c82067f55a11b6be2c1) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (2/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (2/16) 
(694fb2c508438c82067f55a11b6be2c1).
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (1/16) (e941a2a2632b3c4fcf49a00b727616c1) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (1/16) 
(e941a2a2632b3c4fcf49a00b727616c1).
[ToKeyedWorkItem (2/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (2/16) 
(694fb2c508438c82067f55a11b6be2c1) [FINISHED]
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (1/16) 
(e941a2a2632b3c4fcf49a00b727616c1) [FINISHED]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
694fb2c508438c82067f55a11b6be2c1.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
e941a2a2632b3c4fcf49a00b727616c1.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (14/16) 
(85948206134f9c1e7fb464dc05719d4c) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (8/16) 
(23378121fe37acbbb3f23e116576b539) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (2/16) 
(515cd9207078b9499bc447014845f3df) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(13/16) (05577b092f3b7f752837bec35890e31b) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (3/16) 
(83b8e9a746abac8f1a01edcf8e47cc6a) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (15/16) 
(e5f8d497dbd32fa9a7f00834ec8c5503) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (4/16) 
(dca4ee6f922834cbc16489fd4960c009) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (9/16) 
(4e39dc9adcd2514f16620b1d785679ff) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (10/16) 
(ecfd174d00f5265fee44c33874de6f6c) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (16/16) 

[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147773
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220339562
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -79,18 +99,46 @@ public static Environment createDockerEnvironment(String 
dockerImageUrl) {
 .build();
   }
 
+  private static Environment createProcessEnvironment(String config) {
+try {
+  ProcessPayloadReferenceJSON payloadReferenceJSON =
+  MAPPER.readValue(config, ProcessPayloadReferenceJSON.class);
+  return createProcessEnvironment(
+  payloadReferenceJSON.getOs(),
+  payloadReferenceJSON.getArch(),
+  payloadReferenceJSON.getCommand(),
+  payloadReferenceJSON.getEnv());
+} catch (IOException e) {
+  throw new RuntimeException(
+  String.format("Unable to parse process environment config: %s", 
config), e);
+}
+  }
+
+  private static Environment createInProcessEnvironment(String config) {
+return Environment.newBuilder()
+.setUrn(ENVIRONMENT_EMBEDDED)
+.setPayload(ByteString.copyFromUtf8(MoreObjects.firstNonNull("", 
config)))
 
 Review comment:
   Thanks for spotting it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147773)
Time Spent: 11h 40m  (was: 11.5h)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147778
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220339942
 
 

 ##
 File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java
 ##
 @@ -117,7 +84,12 @@ public void tearDown() {
 
   @Test
   public void testExecution() throws Exception {
-Pipeline p = Pipeline.create();
+PipelineOptions options = PipelineOptionsFactory.create();
+options.setRunner(CrashingRunner.class);
+options.as(FlinkPipelineOptions.class).setFlinkMaster("[local]");
+options.as(FlinkPipelineOptions.class).setStreaming(isStreaming);
+
options.as(PortablePipelineOptions.class).setDefaultJavaEnvironmentType("EMBEDDED");
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147778)
Time Spent: 12h 10m  (was: 12h)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147782
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220344526
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -655,11 +655,18 @@ def _add_argparse_args(cls, parser):
 help=
 ('Job service endpoint to use. Should be in the form '
  'of address and port, e.g. localhost:3000'))
-parser.add_argument('--harness_docker_image',
-default=None,
-help=
-('Docker image to use for executing Python code '
- 'in the pipeline when running using the Fn API.'))
+parser.add_argument(
+'--python_environment_type', default=None,
+help=('Set the default environment type for running '
+  'user code. Possible options are DOCKER and PROCESS.'))
+parser.add_argument(
+'--python_environment_config', default=None,
+help=('Set environment configuration for running the user code. \n For 
'
+  'DOCKER: Url for the docker image. \n For PROCESS: json of the '
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147782)
Time Spent: 12.5h  (was: 12h 20m)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147775
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220339498
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -63,11 +74,20 @@
 
   private Environments() {}
 
-  public static Environment createOrGetDefaultDockerEnvironment(String url) {
-if (Strings.isNullOrEmpty(url)) {
+  public static Environment createOrGetDefaultEnvironment(String type, String 
config) {
+if (Strings.isNullOrEmpty(type)) {
   return JAVA_SDK_HARNESS_ENVIRONMENT;
 }
-return createDockerEnvironment(url);
+
+switch (type) {
+  case ENVIRONMENT_EMBEDDED:
+return createInProcessEnvironment(config);
 
 Review comment:
   Done!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147775)
Time Spent: 11h 50m  (was: 11h 40m)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147780
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220343176
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/LazyJobBundleFactoryTest.java
 ##
 @@ -77,35 +90,141 @@ public void setUpMocks() throws Exception {
 
when(remoteEnvironment.getInstructionRequestHandler()).thenReturn(instructionHandler);
 when(instructionHandler.handle(any()))
 .thenReturn(CompletableFuture.completedFuture(instructionResponse));
+when(dataServer.getApiServiceDescriptor())
+.thenReturn(ApiServiceDescriptor.getDefaultInstance());
+when(stateServer.getApiServiceDescriptor())
+.thenReturn(ApiServiceDescriptor.getDefaultInstance());
   }
 
   @Test
   public void createsCorrectEnvironment() throws Exception {
-try (ProcessJobBundleFactory bundleFactory =
-new ProcessJobBundleFactory(
+try (LazyJobBundleFactory bundleFactory =
+new LazyJobBundleFactory(
 envFactory,
-serverFactory,
 stageIdGenerator,
 controlServer,
 loggingServer,
 retrievalServer,
-provisioningServer)) {
+provisioningServer,
+dataServer,
+stateServer)) {
   bundleFactory.forStage(getExecutableStage(environment));
   verify(envFactory).createEnvironment(environment);
 }
   }
 
+  @Test
+  public void createsMultipleEnvironmentOfSingleType() throws Exception {
+// GrpcFnServer server = mock(GrpcFnServer.class);
+// 
when(server.getApiServiceDescriptor()).thenReturn(ApiServiceDescriptor.getDefaultInstance());
 
 Review comment:
   Thanks for spotting it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147780)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=14=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-14
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220343862
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java
 ##
 @@ -47,9 +55,21 @@
   void setJobEndpoint(String endpoint);
 
   @Description(
-  "Set the default environment for running user code. "
-  + "Currently only docker image URL are supported.")
-  String getDefaultJavaEnvironmentUrl();
+  "Set the default environment type for running user code. "
+  + "Possible options are DOCKER and PROCESS.")
+  String getDefaultJavaEnvironmentType();
 
-  void setDefaultJavaEnvironmentUrl(String url);
+  void setDefaultJavaEnvironmentType(String envitonmentType);
 
 Review comment:
   I though of adding language specific prefix to signify that the environment 
should be capable or running that language.
   So if user is setting this option then it should be able to run java code 
which is going to be in java.
   Let me know and I can remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 14)
Time Spent: 12h 10m  (was: 12h)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147779
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220339636
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java
 ##
 @@ -17,18 +17,34 @@
  */
 package org.apache.beam.runners.flink.translation.functions;
 
+import com.google.common.collect.ImmutableMap;
+import org.apache.beam.model.pipeline.v1.RunnerApi.StandardEnvironments;
+import org.apache.beam.runners.core.construction.BeamUrns;
+import org.apache.beam.runners.core.construction.Environments;
 import org.apache.beam.runners.core.construction.graph.ExecutableStage;
-import org.apache.beam.runners.fnexecution.control.DockerJobBundleFactory;
 import org.apache.beam.runners.fnexecution.control.JobBundleFactory;
+import org.apache.beam.runners.fnexecution.control.LazyJobBundleFactory;
 import org.apache.beam.runners.fnexecution.control.StageBundleFactory;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import 
org.apache.beam.runners.fnexecution.environment.InProcessEnvironmentFactory;
+import 
org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory;
 import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
 
 /** Implementation of a {@link FlinkExecutableStageContext}. */
 class FlinkDefaultExecutableStageContext implements 
FlinkExecutableStageContext, AutoCloseable {
   private final JobBundleFactory jobBundleFactory;
 
   private static FlinkDefaultExecutableStageContext create(JobInfo jobInfo) 
throws Exception {
-JobBundleFactory jobBundleFactory = 
DockerJobBundleFactory.FACTORY.get().create(jobInfo);
+JobBundleFactory jobBundleFactory =
+LazyJobBundleFactory.create(
+jobInfo,
+ImmutableMap.of(
+BeamUrns.getUrn(StandardEnvironments.Environments.DOCKER),
+new DockerEnvironmentFactory.Provider(),
+BeamUrns.getUrn(StandardEnvironments.Environments.PROCESS),
+new ProcessEnvironmentFactory.Provider(),
+Environments.ENVIRONMENT_EMBEDDED, // Non Public urn for 
testing.
+new InProcessEnvironmentFactory.Provider()));
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147779)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147774
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220340644
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/LazyJobBundleFactory.java
 ##
 @@ -53,92 +55,81 @@
 import org.apache.beam.sdk.fn.IdGenerator;
 import org.apache.beam.sdk.fn.IdGenerators;
 import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.function.ThrowingFunction;
 import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
- * A base for a {@link JobBundleFactory} for which the implementation can 
specify a custom {@link
+ * A {@link JobBundleFactory} for which the implementation can specify a 
custom {@link
  * EnvironmentFactory} for environment management. Note that returned {@link 
StageBundleFactory
  * stage bundle factories} are not thread-safe. Instead, a new stage factory 
should be created for
- * each client.
+ * each client. {@link LazyJobBundleFactory} initializes the Environment 
lazily when the forStage is
+ * called for a stage. This factory is not capable of handling a mixed types 
of environment.
  */
 @ThreadSafe
-public abstract class JobBundleFactoryBase implements JobBundleFactory {
-  private static final Logger LOG = 
LoggerFactory.getLogger(JobBundleFactoryBase.class);
+public class LazyJobBundleFactory implements JobBundleFactory {
 
 Review comment:
   We already have the interface `JobBundleFactory` in the same package. Will 
rename it to `JobBundleFactoryBase`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147774)
Time Spent: 11h 50m  (was: 11h 40m)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147776
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220339360
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -49,6 +53,13 @@
 
   private static final EnvironmentIdExtractor DEFAULT_SPEC_EXTRACTOR = 
(transform) -> null;
 
+  private static final ObjectMapper MAPPER =
+  new ObjectMapper()
+  
.registerModules(ObjectMapper.findModules(ReflectHelpers.findClassLoader()));
+  public static final String ENVIRONMENT_DOCKER = "DOCKER";
+  public static final String ENVIRONMENT_PROCESS = "PROCESS";
+  public static final String ENVIRONMENT_EMBEDDED = "EMBEDDED"; // Non Public 
urn for testing
 
 Review comment:
   It is used at other places like FlinkDefaultExecutableStageContext so 
changing the visibility will not work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147776)
Time Spent: 12h  (was: 11h 50m)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147781
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220342692
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/LazyJobBundleFactory.java
 ##
 @@ -53,92 +55,81 @@
 import org.apache.beam.sdk.fn.IdGenerator;
 import org.apache.beam.sdk.fn.IdGenerators;
 import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.function.ThrowingFunction;
 import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
- * A base for a {@link JobBundleFactory} for which the implementation can 
specify a custom {@link
+ * A {@link JobBundleFactory} for which the implementation can specify a 
custom {@link
  * EnvironmentFactory} for environment management. Note that returned {@link 
StageBundleFactory
  * stage bundle factories} are not thread-safe. Instead, a new stage factory 
should be created for
- * each client.
+ * each client. {@link LazyJobBundleFactory} initializes the Environment 
lazily when the forStage is
+ * called for a stage. This factory is not capable of handling a mixed types 
of environment.
  */
 @ThreadSafe
-public abstract class JobBundleFactoryBase implements JobBundleFactory {
-  private static final Logger LOG = 
LoggerFactory.getLogger(JobBundleFactoryBase.class);
+public class LazyJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(LazyJobBundleFactory.class);
 
   private final IdGenerator stageIdGenerator;
-  private final GrpcFnServer controlServer;
-  private final GrpcFnServer loggingServer;
-  private final GrpcFnServer retrievalServer;
-  private final GrpcFnServer provisioningServer;
-
   private final LoadingCache 
environmentCache;
+  // Using environment as the initialization marker.
+  private Environment environment;
+  private ExecutorService executor;
+  private GrpcFnServer controlServer;
+  private GrpcFnServer loggingServer;
+  private GrpcFnServer retrievalServer;
+  private GrpcFnServer provisioningServer;
+  private GrpcFnServer dataServer;
+  private GrpcFnServer stateServer;
+  private MapControlClientPool clientPool;
+  private EnvironmentFactory environmentFactory;
 
-  JobBundleFactoryBase(JobInfo jobInfo) throws Exception {
-ServerFactory serverFactory = getServerFactory();
-IdGenerator stageIdGenerator = IdGenerators.incrementingLongs();
-ControlClientPool clientPool = MapControlClientPool.create();
+  public static LazyJobBundleFactory create(
+  JobInfo jobInfo, Map 
environmentFactoryProviderMap) {
+return new LazyJobBundleFactory(jobInfo, environmentFactoryProviderMap);
+  }
 
-GrpcFnServer controlServer =
-GrpcFnServer.allocatePortAndCreateFor(
-FnApiControlClientPoolService.offeringClientsToPool(
-clientPool.getSink(), 
GrpcContextHeaderAccessorProvider.getHeaderAccessor()),
-serverFactory);
-GrpcFnServer loggingServer =
-GrpcFnServer.allocatePortAndCreateFor(
-GrpcLoggingService.forWriter(Slf4jLogWriter.getDefault()), 
serverFactory);
-GrpcFnServer retrievalServer =
-GrpcFnServer.allocatePortAndCreateFor(
-BeamFileSystemArtifactRetrievalService.create(), serverFactory);
-GrpcFnServer provisioningServer =
-GrpcFnServer.allocatePortAndCreateFor(
-StaticGrpcProvisionService.create(jobInfo.toProvisionInfo()), 
serverFactory);
-EnvironmentFactory environmentFactory =
-getEnvironmentFactory(
-controlServer,
-loggingServer,
-retrievalServer,
-provisioningServer,
-clientPool.getSource(),
-IdGenerators.incrementingLongs());
+  LazyJobBundleFactory(
+  JobInfo jobInfo, Map 
environmentFactoryMap) {
+IdGenerator stageIdGenerator = IdGenerators.incrementingLongs();
 this.stageIdGenerator = stageIdGenerator;
-this.controlServer = controlServer;
-this.loggingServer = loggingServer;
-this.retrievalServer = retrievalServer;
-this.provisioningServer = provisioningServer;
-this.environmentCache = createEnvironmentCache(environmentFactory, 
serverFactory);
+this.environmentCache =
+createEnvironmentCache(
+(environment) -> {
+  synchronized (this) {
+checkAndInitialize(jobInfo, 

[jira] [Work logged] (BEAM-5288) Modify Environment to support non-dockerized SDK harness deployments

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5288?focusedWorklogId=147772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147772
 ]

ASF GitHub Bot logged work on BEAM-5288:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:42
Start Date: 25/Sep/18 20:42
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6441: 
[BEAM-5288] Support environment pipeline option in Java and Python.
URL: https://github.com/apache/beam/pull/6441#discussion_r220344370
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java
 ##
 @@ -47,9 +55,21 @@
   void setJobEndpoint(String endpoint);
 
   @Description(
-  "Set the default environment for running user code. "
-  + "Currently only docker image URL are supported.")
-  String getDefaultJavaEnvironmentUrl();
+  "Set the default environment type for running user code. "
+  + "Possible options are DOCKER and PROCESS.")
+  String getDefaultJavaEnvironmentType();
 
-  void setDefaultJavaEnvironmentUrl(String url);
+  void setDefaultJavaEnvironmentType(String envitonmentType);
+
+  @Description(
+  "Set environment configuration for running the user code. \n"
+  + " For DOCKER: Url for the docker image. \n"
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147772)
Time Spent: 11.5h  (was: 11h 20m)

> Modify Environment to support non-dockerized SDK harness deployments 
> -
>
> Key: BEAM-5288
> URL: https://issues.apache.org/jira/browse/BEAM-5288
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Maximilian Michels
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> As of mailing discussions and BEAM-5187, it has become clear that we need to 
> extend the Environment information. In addition to the Docker environment, 
> the extended environment holds deployment options for 1) a process-based 
> environment, 2) an externally managed environment.
> The proto definition, as of now, looks as follows:
> {noformat}
>  message Environment {
>// (Required) The URN of the payload
>string urn = 1;
>// (Optional) The data specifying any parameters to the URN. If
>// the URN does not require any arguments, this may be omitted.
>bytes payload = 2;
>  }
>  message StandardEnvironments {
>enum Environments {
>  DOCKER = 0 [(beam_urn) = "beam:env:docker:v1"];
>  PROCESS = 1 [(beam_urn) = "beam:env:process:v1"];
>  EXTERNAL = 2 [(beam_urn) = "beam:env:external:v1"];
>}
>  }
>  // The payload of a Docker image
>  message DockerPayload {
>string container_image = 1;  // implicitly linux_amd64.
>  }
>  message ProcessPayload {
>string os = 1;  // "linux", "darwin", ..
>string arch = 2;  // "amd64", ..
>string command = 3; // process to execute
>map env = 4; // environment variables
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=147730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147730
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:06
Start Date: 25/Sep/18 19:06
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#6489: [BEAM-1251] Upgrade pylint version for py27-lint3
URL: https://github.com/apache/beam/pull/6489#discussion_r220317601
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/evaluation_context.py
 ##
 @@ -81,6 +81,7 @@ def __init__(self, side_inputs):
   self._transform_to_side_inputs[side.pvalue.producer].append(side)
 
   def __repr__(self):
+# pylint: disable=dict-values-not-iterating
 
 Review comment:
   You're right. I've removed the `values()` call.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147730)
Time Spent: 21h 10m  (was: 21h)

> Python 3 Support
> 
>
> Key: BEAM-1251
> URL: https://issues.apache.org/jira/browse/BEAM-1251
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Eyad Sibai
>Assignee: Robbe
>Priority: Major
>  Time Spent: 21h 10m
>  Remaining Estimate: 0h
>
> I have been trying to use google datalab with python3. As I see there are 
> several packages that does not support python3 yet which google datalab 
> depends on. This is one of them.
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5485) Fix time unit discrepancies in windowing documentation

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5485?focusedWorklogId=147736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147736
 ]

ASF GitHub Bot logged work on BEAM-5485:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:14
Start Date: 25/Sep/18 19:14
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #6476: [BEAM-5485] Fix time 
unit discrepancies in windowing documentation
URL: https://github.com/apache/beam/pull/6476#issuecomment-424466690
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147736)
Time Spent: 1h 40m  (was: 1.5h)

> Fix time unit discrepancies in windowing documentation
> --
>
> Key: BEAM-5485
> URL: https://issues.apache.org/jira/browse/BEAM-5485
> Project: Beam
>  Issue Type: Bug
>  Components: testing, website
>Reporter: Deepyaman Datta
>Assignee: Jason Kuster
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5502) Object stager tests are not hermetic

2018-09-25 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-5502:
-

Assignee: Valentyn Tymofieiev  (was: Ahmet Altay)

> Object stager tests are not hermetic
> 
>
> Key: BEAM-5502
> URL: https://issues.apache.org/jira/browse/BEAM-5502
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> As per discussion in https://github.com/apache/beam/pull/6451, 
> test_with_setup_file fails on Python 3, however it does not fail on its own 
> or when only running the runners tests. It only fails when running the full 
> test suite, so it seems like there is a conflict with another test.
> cc [~RobbeSneyders].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5502) Object stager tests are not hermetic

2018-09-25 Thread Valentyn Tymofieiev (JIRA)
Valentyn Tymofieiev created BEAM-5502:
-

 Summary: Object stager tests are not hermetic
 Key: BEAM-5502
 URL: https://issues.apache.org/jira/browse/BEAM-5502
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev
Assignee: Ahmet Altay


As per discussion in https://github.com/apache/beam/pull/6451, 
test_with_setup_file fails on Python 3, however it does not fail on its own or 
when only running the runners tests. It only fails when running the full test 
suite, so it seems like there is a conflict with another test.

cc [~RobbeSneyders].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (2e70ec8 -> 403360c)

2018-09-25 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 2e70ec8  Merge pull request #6487 from rodrigob/flatten-patch-1
 add 403360c  [BEAM-1251] Upgrade pylint version for py27-lint3 (#6489)

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/examples/complete/distribopt.py|  4 ++--
 sdks/python/apache_beam/io/vcfio.py|  1 +
 sdks/python/apache_beam/options/pipeline_options.py|  2 +-
 .../apache_beam/runners/direct/evaluation_context.py   |  2 +-
 .../apache_beam/runners/interactive/cache_manager.py   |  1 +
 .../apache_beam/runners/interactive/pipeline_analyzer.py   |  2 +-
 .../apache_beam/runners/portability/fn_api_runner.py   |  4 ++--
 sdks/python/apache_beam/transforms/trigger.py  |  2 +-
 sdks/python/apache_beam/transforms/trigger_test.py | 14 --
 sdks/python/apache_beam/typehints/trivial_inference.py |  1 +
 sdks/python/apache_beam/utils/counters.py  |  2 +-
 sdks/python/tox.ini|  2 +-
 12 files changed, 21 insertions(+), 16 deletions(-)



[jira] [Work logged] (BEAM-1251) Python 3 Support

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=147747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147747
 ]

ASF GitHub Bot logged work on BEAM-1251:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:55
Start Date: 25/Sep/18 19:55
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #6489: [BEAM-1251] Upgrade 
pylint version for py27-lint3
URL: https://github.com/apache/beam/pull/6489
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/examples/complete/distribopt.py 
b/sdks/python/apache_beam/examples/complete/distribopt.py
index f79489516e9..a16a40b6137 100644
--- a/sdks/python/apache_beam/examples/complete/distribopt.py
+++ b/sdks/python/apache_beam/examples/complete/distribopt.py
@@ -211,12 +211,12 @@ def process(self, element, quantities):
 
   # Create (crop, quantity) lists for each greenhouse
   greenhouses = defaultdict(list)
-  for crop, greenhouse in mapping.iteritems():
+  for crop, greenhouse in mapping.items():
 quantity = quantities[crop]
 greenhouses[greenhouse].append((crop, quantity))
 
   # Create input for OptimizeProductParameters
-  for greenhouse, crops in greenhouses.iteritems():
+  for greenhouse, crops in greenhouses.items():
 key = (mapping_identifier, greenhouse)
 yield (key, crops)
 
diff --git a/sdks/python/apache_beam/io/vcfio.py 
b/sdks/python/apache_beam/io/vcfio.py
index d96c8f79d65..abe33fc0dc4 100644
--- a/sdks/python/apache_beam/io/vcfio.py
+++ b/sdks/python/apache_beam/io/vcfio.py
@@ -321,6 +321,7 @@ def _create_generator(self):
 def __iter__(self):
   return self
 
+# pylint: disable=next-method-defined
 def next(self):
   return self.__next__()
 
diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index c1bb2ed89e9..7da91a668da 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -240,7 +240,7 @@ def _visible_option_list(self):
   for option in dir(self._visible_options) if option[0] != '_')
 
   def __dir__(self):
-return sorted(dir(type(self)) + list(self.__dict__.keys()) +
+return sorted(dir(type(self)) + list(self.__dict__) +
   self._visible_option_list())
 
   def __getattr__(self, name):
diff --git a/sdks/python/apache_beam/runners/direct/evaluation_context.py 
b/sdks/python/apache_beam/runners/direct/evaluation_context.py
index 01d0631c5ee..24b05b61527 100644
--- a/sdks/python/apache_beam/runners/direct/evaluation_context.py
+++ b/sdks/python/apache_beam/runners/direct/evaluation_context.py
@@ -82,7 +82,7 @@ def __init__(self, side_inputs):
 
   def __repr__(self):
 views_string = (', '.join(str(elm) for elm in self._views.values())
-if self._views.values() else '[]')
+if self._views else '[]')
 return '_SideInputsContainer(_views=%s)' % views_string
 
   def get_value_or_block_until_ready(self, side_input, task, block_until):
diff --git a/sdks/python/apache_beam/runners/interactive/cache_manager.py 
b/sdks/python/apache_beam/runners/interactive/cache_manager.py
index a9dd03d6455..d0c0b75716c 100644
--- a/sdks/python/apache_beam/runners/interactive/cache_manager.py
+++ b/sdks/python/apache_beam/runners/interactive/cache_manager.py
@@ -191,6 +191,7 @@ def expand(self, pcoll):
 
 class SafeFastPrimitivesCoder(coders.Coder):
   """This class add an quote/unquote step to escape special characters."""
+  # pylint: disable=deprecated-urllib-function
 
   def encode(self, value):
 return urllib.quote(coders.coders.FastPrimitivesCoder().encode(value))
diff --git a/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py 
b/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
index 1f597810270..b0cf1342c46 100644
--- a/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
+++ b/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
@@ -116,7 +116,7 @@ def _analyze_pipeline(self):
 sample=True)
 
 required_transforms['_root'] = beam_runner_api_pb2.PTransform(
-subtransforms=top_level_required_transforms.keys())
+subtransforms=list(top_level_required_transforms.keys()))
 
 referenced_pcollection_ids = self._referenced_pcollection_ids(
 required_transforms)
diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner.py 
b/sdks/python/apache_beam/runners/portability/fn_api_runner.py
index 2fc92e887bf..1ea7b42be6a 100644
--- 

[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147746
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 19:55
Start Date: 25/Sep/18 19:55
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#6451: [BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220332999
 
 

 ##
 File path: sdks/python/apache_beam/io/textio.py
 ##
 @@ -147,7 +147,7 @@ def display_data(self):
 
   def read_records(self, file_name, range_tracker):
 start_offset = range_tracker.start_position()
-read_buffer = _TextSource.ReadBuffer('', 0)
+read_buffer = _TextSource.ReadBuffer(b'', 0)
 
 Review comment:
   You're right. 
   I haven't focused too much on the `io` files yet, since this package still 
needs to be ported. However, since it is related to this change, I will already 
add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147746)
Time Spent: 2h 10m  (was: 2h)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147752
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:05
Start Date: 25/Sep/18 20:05
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220284399
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem.py
 ##
 @@ -531,24 +530,117 @@ def _list(self, dir_or_prefix):
 """
 raise NotImplementedError
 
+  @staticmethod
+  def _split_scheme(url_or_path):
+match = re.match(r'(^[a-z]+)://(.*)', url_or_path)
+if match is not None:
+  return match.groups()
+return None, url_or_path
+
+  @staticmethod
+  def _combine_scheme(scheme, path):
+if scheme is None:
+  return path
+return '{}://{}'.format(scheme, path)
+
   def _url_dirname(self, url_or_path):
 """Like posixpath.dirname, but preserves scheme:// prefix.
 
 Args:
   url_or_path: A string in the form of scheme://some/path OR /some/path.
 """
-match = re.match(r'([a-z]+://)(.*)', url_or_path)
-if match is None:
-  return posixpath.dirname(url_or_path)
-url_prefix, path = match.groups()
-return url_prefix + posixpath.dirname(path)
+scheme, path = self._split_scheme(url_or_path)
+return self._combine_scheme(scheme, posixpath.dirname(path))
+
+  def match_files(self, file_metas, pattern):
+"""Filter :class:`FileMetadata` objects by :data:`pattern`
+
+Args:
+  file_metas (:obj:`list` of :class:`FileMetadata`):
+Files to consider when matching
+  pattern (str): File pattern
+
+See Also:
+  :meth:`translate_pattern`
+
+Returns:
+  Generator of matching :class:`FileMetadata`
+"""
+re_pattern = re.compile(self.translate_pattern(pattern))
+match = re_pattern.match
+for file_metadata in file_metas:
+  is_match = match(file_metadata.path)
+  logger.debug('%r %r', is_match, file_metadata)
+  if is_match:
+yield file_metadata
+
+  @staticmethod
+  def translate_pattern(pattern):
+"""
+Translate a :data:`pattern` to a regular expression.
+There is no way to quote meta-characters.
+
+Pattern syntax:
+  The pattern syntax is based on the fnmatch_ syntax, with the following
+  differences:
+
+  -   ``*`` Is equivalent to ``[^/\\]*`` rather than ``.*``.
+  -   ``**`` Is equivalent to ``.*``.
+
+See also:
+  :meth:`match` uses this method
+
+This method is based on `Python 2.7's fnmatch.translate`_.
 
 Review comment:
   CC: @jbonofre 
   Thank you for attributing the code, however I'm not sure what the rules are 
for accepting code from other projects that aren't owned by Apache. 
Specifically, I'm not sure how copyright assignment works.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147752)
Time Spent: 2h 50m  (was: 2h 40m)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but 

[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147751
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:05
Start Date: 25/Sep/18 20:05
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220306985
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem.py
 ##
 @@ -531,24 +530,117 @@ def _list(self, dir_or_prefix):
 """
 raise NotImplementedError
 
+  @staticmethod
+  def _split_scheme(url_or_path):
+match = re.match(r'(^[a-z]+)://(.*)', url_or_path)
+if match is not None:
+  return match.groups()
+return None, url_or_path
+
+  @staticmethod
+  def _combine_scheme(scheme, path):
+if scheme is None:
+  return path
+return '{}://{}'.format(scheme, path)
+
   def _url_dirname(self, url_or_path):
 """Like posixpath.dirname, but preserves scheme:// prefix.
 
 Args:
   url_or_path: A string in the form of scheme://some/path OR /some/path.
 """
-match = re.match(r'([a-z]+://)(.*)', url_or_path)
-if match is None:
-  return posixpath.dirname(url_or_path)
-url_prefix, path = match.groups()
-return url_prefix + posixpath.dirname(path)
+scheme, path = self._split_scheme(url_or_path)
 
 Review comment:
   Why was this method split up? To make it more understandable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147751)
Time Spent: 2h 50m  (was: 2h 40m)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=147756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147756
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 25/Sep/18 20:05
Start Date: 25/Sep/18 20:05
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220315847
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem_test.py
 ##
 @@ -126,66 +180,69 @@ def test_match_glob(self):
 ('apple/dish/bat', 13),
 ('apple/dish/cat', 14),
 ('apple/dish/carl', 15),
+('banana/cat', 16),
+('banana/cyrano.md', 17),
+('banana/cyrano.mb', 18),
 ]
-for (object_name, size) in objects:
+bucket_name = 'gcsio-test'
+
+if expected_object_names is all:
+  # A hack around the fact that the parameterized decorator does not have
+  #  access to self.objects
+  expected_object_names = objects
+
+for object_name, size in objects:
   file_name = 'gs://%s/%s' % (bucket_name, object_name)
   self.fs._insert_random_file(file_name, size)
-test_cases = [
-('gs://*', objects),
-('gs://gcsio-test/*', objects),
-('gs://gcsio-test/cow/*', [
-('cow/cat/fish', 2),
-('cow/cat/blubber', 3),
-('cow/dog/blubber', 4),
-]),
-('gs://gcsio-test/cow/ca*', [
-('cow/cat/fish', 2),
-('cow/cat/blubber', 3),
-]),
-('gs://gcsio-test/apple/[df]ish/ca*', [
-('apple/fish/cat', 10),
-('apple/fish/cart', 11),
-('apple/fish/carl', 12),
-('apple/dish/cat', 14),
-('apple/dish/carl', 15),
-]),
-('gs://gcsio-test/apple/fish/car?', [
-('apple/fish/cart', 11),
-('apple/fish/carl', 12),
-]),
-('gs://gcsio-test/apple/fish/b*', [
-('apple/fish/blubber', 6),
-('apple/fish/blowfish', 7),
-('apple/fish/bambi', 8),
-('apple/fish/balloon', 9),
-]),
-('gs://gcsio-test/apple/f*/b*', [
-('apple/fish/blubber', 6),
-('apple/fish/blowfish', 7),
-('apple/fish/bambi', 8),
-('apple/fish/balloon', 9),
-]),
-('gs://gcsio-test/apple/dish/[cb]at', [
-('apple/dish/bat', 13),
-('apple/dish/cat', 14),
-]),
+
+expected_file_names = [('gs://%s/%s' % (bucket_name, object_name), size)
+   for object_name, size in expected_object_names]
+actual_file_names = [
+(file_metadata.path, file_metadata.size_in_bytes)
+for file_metadata in self._flatten_match(self.fs.match([file_pattern]))
 ]
-for file_pattern, expected_object_names in test_cases:
-  expected_file_names = [('gs://%s/%s' % (bucket_name, object_name), size)
- for (object_name, size) in expected_object_names]
-  self.assertEqual(
-  set([(file_metadata.path, file_metadata.size_in_bytes)
-   for file_metadata in
-   self._flatten_match(self.fs.match([file_pattern]))]),
-  set(expected_file_names))
+
+self.maxDiff = None
 
 Review comment:
   unused?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147756)
Time Spent: 3h 20m  (was: 3h 10m)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> 

[jira] [Created] (BEAM-5505) Disable flattening on Row in Apache Calcite

2018-09-25 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5505:
--

 Summary: Disable flattening on Row in Apache Calcite
 Key: BEAM-5505
 URL: https://issues.apache.org/jira/browse/BEAM-5505
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5505) Disable Row flattening in Apache Calcite

2018-09-25 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5505:
---
Summary: Disable Row flattening in Apache Calcite  (was: Disable flattening 
on Row in Apache Calcite)

> Disable Row flattening in Apache Calcite
> 
>
> Key: BEAM-5505
> URL: https://issues.apache.org/jira/browse/BEAM-5505
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Python_VR_Flink #137

2018-09-25 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-5506) Update Beam documentation

2018-09-25 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5506:
--

 Summary: Update Beam documentation
 Key: BEAM-5506
 URL: https://issues.apache.org/jira/browse/BEAM-5506
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5495) PipelineResources algorithm is not working in most environments

2018-09-25 Thread Romain Manni-Bucau (JIRA)
Romain Manni-Bucau created BEAM-5495:


 Summary: PipelineResources algorithm is not working in most 
environments
 Key: BEAM-5495
 URL: https://issues.apache.org/jira/browse/BEAM-5495
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Romain Manni-Bucau
Assignee: Amit Sela


Issue are:
1. it assumes the classloader is an URLClassLoader (not always true and java >= 
9 breaks that as well for the app loader)
2. it uses loader.getURLs() which leads to including the JRE itself in the 
staged file

Looks like this detect resource algorithm can't work and should be replaced by 
a SPI rather than a built-in and not extensible algorithm. Another valid 
alternative is to just drop that "guess" logic and force the user to set staged 
files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3622) DirectRunner memory issue with Python SDK

2018-09-25 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-3622:
--
Component/s: (was: sdk-py-harness)

> DirectRunner memory issue with Python SDK
> -
>
> Key: BEAM-3622
> URL: https://issues.apache.org/jira/browse/BEAM-3622
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: yuri krnr
>Assignee: Charles Chen
>Priority: Major
>
> After running pipeline for a while in a streaming mode (reading from Pub/Sub 
> and writing to BigQuery, Datastore and another Pub/Sub) I noticed drastic 
> memory usage of a process. Using guppy as a profiler I got the following 
> results:
> start
> {noformat}
>  INFO *** MemoryReport Heap:
>  Partition of a set of 240208 objects. Total size = 34988840 bytes.
>  Index  Count   % Size   % Cumulative  % Kind (class / dict of class)
>  0  88289  37  8696984  25   8696984  25 str
>  1  5  22  4897352  14  13594336  39 tuple
>  2   5083   2  2790664   8  16385000  47 dict (no owner)
>  3   1939   1  1749656   5  18134656  52 type
>  4699   0  1723272   5  19857928  57 dict of module
>  5  12337   5  1579136   5  21437064  61 types.CodeType
>  6  12403   5  1488360   4  22925424  66 function
>  7   1939   1  1452616   4  24378040  70 dict of type
>  8677   0   709496   2  25087536  72 dict of 0x1e4d880
>  9  25603  11   614472   2  25702008  73 int
> <1103 more rows. Type e.g. '_.more' to view.>
> {noformat}
> after several hours of running
> {noformat}
> INFO *** MemoryReport Heap:
>  Partition of a set of 1255662 objects. Total size = 315029632 bytes.
>  Index  Count   % Size   % Cumulative  % Kind (class / dict of class)
>  0  95554   8 99755056  32  99755056  32 dict of
>  
> apache_beam.runners.direct.bundle_factory._Bundle
>  1 117943   9 54193192  17 153948248  49 dict (no owner)
>  2 161068  13 27169296   9 181117544  57 unicode
>  3  94571   8 26479880   8 207597424  66 dict of apache_beam.pvalue.PBegin
>  4 126461  10 12715336   4 220312760  70 str
>  5  44374   4 12424720   4 232737480  74 dict of 
> apitools.base.protorpclite.messages.FieldList
>  6  44374   4  6348624   2 239086104  76 
> apitools.base.protorpclite.messages.FieldList
>  7  95556   8  6115584   2 245201688  78 
> apache_beam.runners.direct.bundle_factory._Bundle
>  8  94571   8  6052544   2 251254232  80 apache_beam.pvalue.PBegin
>  9  57371   5  5218424   2 256472656  81 tuple
> <1187 more rows. Type e.g. '_.more' to view.>
> {noformat}
>  
> I see that every bundle still sits in memory and all its data too. why aren't 
> the gc-ed?
> What is the policy for gc for the dataflow processes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5468) Allow runner to set worker log level in Python SDK harness.

2018-09-25 Thread Robert Bradshaw (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626973#comment-16626973
 ] 

Robert Bradshaw commented on BEAM-5468:
---

Namespaced logging in the Python SDK would be convenient (and could be done 
automatically based on modules, see [https://gist.github.com/bdarnell/3118509]) 
but this is somewhat orthogonal. 

I agree the default should be whatever the runner is using, with an option to 
set it explicitly (analogous to how Java does it), coordinating such that the 
harness never asks for logs it's just going to drop. 

> Allow runner to set worker log level in Python SDK harness.
> ---
>
> Key: BEAM-5468
> URL: https://issues.apache.org/jira/browse/BEAM-5468
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #128

2018-09-25 Thread Apache Jenkins Server
See 


--
[...truncated 51.35 MB...]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
0ade53610bbc38b58ff4e002436e8cc1.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
32f4f53c704d53efa448fa5e76c8ff1a.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
9b8295ecce304c228b6a3bd459d00f11.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
63c7156301faa15a43637fc395de5785.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (10/16) 
(6e9db9b5d4101bc9495cbb2e39e0eea2) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
8df44491a9ca40dc196a4b8c548d6d82.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
903c5b719c30a0ffb052a3bcacbdefd7.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
854079453bcf0cfc53b536a5147293c3.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
d7e442e822c96e1ce91e97669163c26f.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
0f14bcb1c22be4aa7f5cdc7a3b165925.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (5/16) 
(0c5823957d1b9224f00e04f11cc07a09) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
3ef944309e65714597d18934f61fb089.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
e02c2bb0e2c8391e49dd68ca61f80369.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
200c39066f05ef8156808b84df8ebad8.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (13/16) 
(31f89b0aa77b42049e777b07e23351df) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (9/16) 
(4dc6fb9f68a9174d6238ecdc2ef32d4f) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (1/16) 
(6054e12ec1b7f52652f4f14835db7f08) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (3/16) 
(ccec0615b77e3af45dba4d8849212cc6) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (4/16) 
(4f11e796bbf833c545f5856f3c053323) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(16/16) (0ade53610bbc38b58ff4e002436e8cc1) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (8/16) 

[beam] branch sketch-for-proposal deleted (was afe4c20)

2018-09-25 Thread lgajowy
This is an automated email from the ASF dual-hosted git repository.

lgajowy pushed a change to branch sketch-for-proposal
in repository https://gitbox.apache.org/repos/asf/beam.git.


 was afe4c20  Make ParDoLoadIT invoke ParDo multiple times (parametrized)

This change permanently discards the following revisions:

 discard afe4c20  Make ParDoLoadIT invoke ParDo multiple times (parametrized)
 discard a136ce6  Sketch for the proposal doc



[jira] [Work logged] (BEAM-5355) Create GroupByKey load test for Java SDK

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5355?focusedWorklogId=147521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147521
 ]

ASF GitHub Bot logged work on BEAM-5355:


Author: ASF GitHub Bot
Created on: 25/Sep/18 11:24
Start Date: 25/Sep/18 11:24
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #6361: [BEAM-5355] 
GroupByKey Load IT
URL: https://github.com/apache/beam/pull/6361#issuecomment-424305485
 
 
   Ok, I will add the metrics. This will require adding a connection to 
BigQuery similar to the one that is in Nexmark tests since PerfKit does not 
support this. I am currently working on this and will create a separate PR for 
this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147521)
Time Spent: 50m  (was: 40m)

> Create GroupByKey load test for Java SDK
> 
>
> Key: BEAM-5355
> URL: https://issues.apache.org/jira/browse/BEAM-5355
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is more thoroughly described in this proposal: 
> [https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing]
>  
> In short: this ticket is about implementing the GroupByKeyLoadIT that uses 
> SyntheticStep and Synthetic source to create load on the pipeline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #138

2018-09-25 Thread Apache Jenkins Server
See 


Changes:

[aaltay] [BEAM-5319] Partially port runners (#6451)

--
[...truncated 51.38 MB...]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
6c82e60b2f9d2ad96a069839f21b65a9.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
aa4526970903d2c78c65dedbb470c8d7.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (2/16) 
(f5b2449a114ad344d3c06db956183039) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
a04c5283c041fb7a6c5236f3bdc7aea1.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (1/16) 
(3fad66da659aa824ac8f37a6a953f02a) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
2cc57886ffcd17c458577c52ee535d6c.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
23a898993ff9cef43b1ccb4c0d9ae823.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (11/16) 
(139be10d8dff212e5f9281493ee123ae) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
3b3fa7a32bef111a4f52ce487fe0a514.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
00483d5ddc78dbe619e2d33c2e1645eb.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (16/16) 
(087aec3da7745433f6d745a5e45838fa) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
ac1a418688f5ba1c3b58370d8ed0e3c1.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
c423808bff91ced96814ae6d9fed0943.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
6490baa2de32ca3e63dfe10fb8f47887.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
50f2fe6fcf3546fd3ebab1e4d500f8d8.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
111f19dc3db3983d5e532dba6a733633.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (13/16) 
(1411ba333cf4a7dc620d804abb36d9ec) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
a602d65a6e18e32204dc7273a1acda14.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 

[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147831
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:35
Start Date: 25/Sep/18 23:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6418: 
[BEAM-5501] Interactive Beam -- display issue: number of PTransform executed 
wrongly displayed
URL: https://github.com/apache/beam/pull/6418#discussion_r220388005
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
 ##
 @@ -138,32 +143,51 @@ def _analyze_pipeline(self):
   self._context.to_runner_api().windowing_strategies)
 
 self._pipeline_proto_to_execute = pipeline_to_execute
-self._top_level_referenced_pcollection_ids = 
top_level_referenced_pcollection_ids # pylint: disable=line-too-long
+self._top_level_referenced_pcoll_ids = top_level_referenced_pcoll_ids
 self._top_level_required_transforms = top_level_required_transforms
 
+  # -- 
#
   # Getters
+  # -- 
#
+
+  def pipeline_info(self):
+"""Return PipelineInfo of the original pipeline.
+"""
+return self._pipeline_info
 
   def pipeline_proto_to_execute(self):
 """Returns Pipeline proto to be executed.
 """
 return self._pipeline_proto_to_execute
 
-  def top_level_referenced_pcollection_ids(self):
-"""Returns an array of top level referenced PCollection IDs.
+  def tl_referenced_pcoll_ids(self):
+"""Returns a set of PCollection IDs referenced by top level PTransforms.
 """
-return self._top_level_referenced_pcollection_ids
+return self._top_level_referenced_pcoll_ids
 
-  def top_level_required_transforms(self):
-"""Returns a dict mapping ID to proto of top level PTransforms.
+  def tl_required_trans_ids(self):
+"""Returns a set of required top level PTransform IDs.
 """
-return self._top_level_required_transforms
+return self._top_level_required_transforms.keys()
 
 Review comment:
   I'm trying to do the change now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147831)
Time Spent: 1h 10m  (was: 1h)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147867
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:38
Start Date: 26/Sep/18 01:38
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6431: [BEAM-4496] 
Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424555757
 
 
   Run Website Publish


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147867)
Time Spent: 10h 10m  (was: 10h)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4747) Python LocalFileSystem directory-creation semantics

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4747?focusedWorklogId=147878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147878
 ]

ASF GitHub Bot logged work on BEAM-4747:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:53
Start Date: 26/Sep/18 01:53
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5903: [BEAM-4747] 
mkdirs if they don't exist in localfilesystem
URL: https://github.com/apache/beam/pull/5903#issuecomment-424558094
 
 
   This PR breaks the 2.7.0 RC2 because it doesn't take into account relative 
paths.  I will therefore be reverting this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147878)
Time Spent: 1h  (was: 50m)

> Python LocalFileSystem directory-creation semantics
> ---
>
> Key: BEAM-4747
> URL: https://issues.apache.org/jira/browse/BEAM-4747
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Coming out of discussion on 
> [BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742] / 
> [#5903|https://github.com/apache/beam/pull/5903] is a question of whether 
> {{LocalFileSystem.open,create,copy,rename}} should create 
> intermediate (destination) directories, or fail with {{IOError}}'s (as the 
> stdlib {{os}} module generally will).
> If the semantics of {{LocalFileSystem}} should mimic those of distributed 
> filesystems (in the spirit of [recent discussion about {{DirectRunner}} being 
> more like a local simulation of a distributed runner than a production-grade 
> local 
> runner|https://www.mail-archive.com/dev@beam.apache.org/msg08410.html]), then 
> this makes sense, and it sounds like [~lcwik] and [~angoenka] are in favor of 
> this interpretation.
> I'll repurpose [#5903|https://github.com/apache/beam/pull/5903] to this end 
> unless I hear otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147799
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 21:57
Start Date: 25/Sep/18 21:57
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220368227
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py
 ##
 @@ -45,6 +45,10 @@ class 
ConsumerTrackingPipelineVisitorTest(unittest.TestCase):
   def setUp(self):
 self.pipeline = Pipeline(DirectRunner())
 self.visitor = ConsumerTrackingPipelineVisitor()
+try:# Python 2
+  self.assertCountEqual = self.assertItemsEqual
 
 Review comment:
   Would not this raise an AttributeError in any case?
   python 2 does not have the first method, python 3 does not have the second 
one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147799)
Time Spent: 2h 40m  (was: 2.5h)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147806
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 22:15
Start Date: 25/Sep/18 22:15
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #6451: [BEAM-5319] Partially 
port runners
URL: https://github.com/apache/beam/pull/6451
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/filebasedsource_test.py 
b/sdks/python/apache_beam/io/filebasedsource_test.py
index 71498d062a0..3ed27022300 100644
--- a/sdks/python/apache_beam/io/filebasedsource_test.py
+++ b/sdks/python/apache_beam/io/filebasedsource_test.py
@@ -89,9 +89,9 @@ def write_data(
   all_data = []
   with tempfile.NamedTemporaryFile(
   delete=False, dir=directory, prefix=prefix) as f:
-sep_values = ['\n', '\r\n']
+sep_values = [b'\n', b'\r\n']
 for i in range(num_lines):
-  data = '' if no_data else 'line' + str(i)
+  data = b'' if no_data else b'line' + str(i).encode()
   all_data.append(data)
 
   if eol == EOL.LF:
@@ -101,7 +101,7 @@ def write_data(
   elif eol == EOL.MIXED:
 sep = sep_values[i % len(sep_values)]
   elif eol == EOL.LF_WITH_NOTHING_AT_LAST_LINE:
-sep = '' if i == (num_lines - 1) else sep_values[0]
+sep = b'' if i == (num_lines - 1) else sep_values[0]
   else:
 raise ValueError('Received unknown value %s for eol.' % eol)
 
diff --git a/sdks/python/apache_beam/io/filesystem.py 
b/sdks/python/apache_beam/io/filesystem.py
index d83e8ff574f..ac19c517804 100644
--- a/sdks/python/apache_beam/io/filesystem.py
+++ b/sdks/python/apache_beam/io/filesystem.py
@@ -37,15 +37,12 @@
 from builtins import object
 from builtins import zip
 
-from future import standard_library
 from future.utils import with_metaclass
 from past.builtins import long
 from past.builtins import unicode
 
 from apache_beam.utils.plugin import BeamPlugin
 
-standard_library.install_aliases()
-
 logger = logging.getLogger(__name__)
 
 DEFAULT_READ_BUFFER_SIZE = 16 * 1024 * 1024
diff --git a/sdks/python/apache_beam/io/filesystem_test.py 
b/sdks/python/apache_beam/io/filesystem_test.py
index 9185cf8b398..2954626069c 100644
--- a/sdks/python/apache_beam/io/filesystem_test.py
+++ b/sdks/python/apache_beam/io/filesystem_test.py
@@ -29,7 +29,6 @@
 from builtins import range
 from io import BytesIO
 
-from future import standard_library
 from future.utils import iteritems
 
 from apache_beam.io.filesystem import CompressedFile
@@ -37,8 +36,6 @@
 from apache_beam.io.filesystem import FileMetadata
 from apache_beam.io.filesystem import FileSystem
 
-standard_library.install_aliases()
-
 
 class TestingFileSystem(FileSystem):
 
diff --git a/sdks/python/apache_beam/io/textio.py 
b/sdks/python/apache_beam/io/textio.py
index fe5e237c197..f96a02ec8a2 100644
--- a/sdks/python/apache_beam/io/textio.py
+++ b/sdks/python/apache_beam/io/textio.py
@@ -83,7 +83,7 @@ def position(self, value):
   self._position = value
 
 def reset(self):
-  self.data = ''
+  self.data = b''
   self.position = 0
 
   def __init__(self,
@@ -147,7 +147,7 @@ def display_data(self):
 
   def read_records(self, file_name, range_tracker):
 start_offset = range_tracker.start_position()
-read_buffer = _TextSource.ReadBuffer('', 0)
+read_buffer = _TextSource.ReadBuffer(b'', 0)
 
 next_record_start_position = -1
 
@@ -251,9 +251,9 @@ def _find_separator_bounds(self, file_to_read, read_buffer):
 
   # Using find() here is more efficient than a linear scan of the byte
   # array.
-  next_lf = read_buffer.data.find('\n', current_pos)
+  next_lf = read_buffer.data.find(b'\n', current_pos)
   if next_lf >= 0:
-if next_lf > 0 and read_buffer.data[next_lf - 1] == '\r':
+if next_lf > 0 and read_buffer.data[next_lf - 1] == b'\r':
   # Found a '\r\n'. Accepting that as the next separator.
   return (next_lf - 1, next_lf + 1)
 else:
diff --git a/sdks/python/apache_beam/io/tfrecordio_test.py 
b/sdks/python/apache_beam/io/tfrecordio_test.py
index dfc660d2ec0..3534eaa594a 100644
--- a/sdks/python/apache_beam/io/tfrecordio_test.py
+++ b/sdks/python/apache_beam/io/tfrecordio_test.py
@@ -30,7 +30,6 @@
 from builtins import range
 
 import crcmod
-from future import standard_library
 
 import apache_beam as beam
 from apache_beam import Create
@@ -46,8 +45,6 @@
 from apache_beam.testing.util import assert_that
 from apache_beam.testing.util import equal_to
 
-standard_library.install_aliases()
-
 try:
  

Build failed in Jenkins: beam_PostCommit_Python_Verify #6086

2018-09-25 Thread Apache Jenkins Server
See 


Changes:

[aaltay] [BEAM-5319] Partially port runners (#6451)

--
[...truncated 1.28 MB...]
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  File 
"
 line 414, in invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"
 line 1068, in 
wrapper = lambda x: [fn(x)]
  File 
"
 line 182, in raise_error
raise RuntimeError('x')
RuntimeError: x

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"
 line 131, in _execute
response = task()
  File 
"
 line 166, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"
 line 212, in do_instruction
request.instruction_id)
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
  File 
"
 line 349, in process_instruction_id
yield
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File 
"
 line 387, in process_bundle
input_op.process_encoded(data.data)
  File 
"
 line 123, in process_encoded
self.output(decoded_value)
  File 
"
 line 167, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 87, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 268, in process
self.output(windowed_value)
  File 
"
 line 167, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 87, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 414, in process
self.dofn_receiver.receive(o)
  File 
"
 line 673, in receive
self.process(windowed_value)
  File 
"
 line 679, in process
self._reraise_augmented(exn)
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  File 
"
 line 414, in invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"
 line 

[jira] [Work logged] (BEAM-2687) Python SDK support for Stateful Processing

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2687?focusedWorklogId=147824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147824
 ]

ASF GitHub Bot logged work on BEAM-2687:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:20
Start Date: 25/Sep/18 23:20
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6433: [BEAM-2687] 
Implement Timers over the Fn API.
URL: https://github.com/apache/beam/pull/6433#issuecomment-424533437
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147824)
Time Spent: 5.5h  (was: 5h 20m)

> Python SDK support for Stateful Processing
> --
>
> Key: BEAM-2687
> URL: https://issues.apache.org/jira/browse/BEAM-2687
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Python SDK should support stateful processing 
> (https://beam.apache.org/blog/2017/02/13/stateful-processing.html)
> In the meantime, runner capability matrix should be updated to show the lack 
> of this feature 
> (https://beam.apache.org/documentation/runners/capability-matrix/)
> Use this as an umbrella issue for all related issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147825
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:25
Start Date: 25/Sep/18 23:25
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6418: [BEAM-5501] 
Interactive Beam -- display issue: number of PTransform executed wrongly 
displayed
URL: https://github.com/apache/beam/pull/6418#issuecomment-424534346
 
 
   16:01:15 * Module 
apache_beam.runners.interactive.pipeline_analyzer
   16:01:15 W:171,11: dict.keys referenced when not iterating 
(dict-keys-not-iterating)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147825)
Time Spent: 40m  (was: 0.5h)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147830
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:31
Start Date: 25/Sep/18 23:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6418: 
[BEAM-5501] Interactive Beam -- display issue: number of PTransform executed 
wrongly displayed
URL: https://github.com/apache/beam/pull/6418#discussion_r220387333
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
 ##
 @@ -138,32 +143,51 @@ def _analyze_pipeline(self):
   self._context.to_runner_api().windowing_strategies)
 
 self._pipeline_proto_to_execute = pipeline_to_execute
-self._top_level_referenced_pcollection_ids = 
top_level_referenced_pcollection_ids # pylint: disable=line-too-long
+self._top_level_referenced_pcoll_ids = top_level_referenced_pcoll_ids
 self._top_level_required_transforms = top_level_required_transforms
 
+  # -- 
#
   # Getters
+  # -- 
#
+
+  def pipeline_info(self):
+"""Return PipelineInfo of the original pipeline.
+"""
+return self._pipeline_info
 
   def pipeline_proto_to_execute(self):
 """Returns Pipeline proto to be executed.
 """
 return self._pipeline_proto_to_execute
 
-  def top_level_referenced_pcollection_ids(self):
-"""Returns an array of top level referenced PCollection IDs.
+  def tl_referenced_pcoll_ids(self):
+"""Returns a set of PCollection IDs referenced by top level PTransforms.
 """
-return self._top_level_referenced_pcollection_ids
+return self._top_level_referenced_pcoll_ids
 
-  def top_level_required_transforms(self):
-"""Returns a dict mapping ID to proto of top level PTransforms.
+  def tl_required_trans_ids(self):
+"""Returns a set of required top level PTransform IDs.
 """
-return self._top_level_required_transforms
+return self._top_level_required_transforms.keys()
 
 Review comment:
   Also same change maybe needed in line 124 for Python 3 compatibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147830)
Time Spent: 1h  (was: 50m)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147829
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:30
Start Date: 25/Sep/18 23:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6418: 
[BEAM-5501] Interactive Beam -- display issue: number of PTransform executed 
wrongly displayed
URL: https://github.com/apache/beam/pull/6418#discussion_r220387137
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
 ##
 @@ -138,32 +143,51 @@ def _analyze_pipeline(self):
   self._context.to_runner_api().windowing_strategies)
 
 self._pipeline_proto_to_execute = pipeline_to_execute
-self._top_level_referenced_pcollection_ids = 
top_level_referenced_pcollection_ids # pylint: disable=line-too-long
+self._top_level_referenced_pcoll_ids = top_level_referenced_pcoll_ids
 self._top_level_required_transforms = top_level_required_transforms
 
+  # -- 
#
   # Getters
+  # -- 
#
+
+  def pipeline_info(self):
+"""Return PipelineInfo of the original pipeline.
+"""
+return self._pipeline_info
 
   def pipeline_proto_to_execute(self):
 """Returns Pipeline proto to be executed.
 """
 return self._pipeline_proto_to_execute
 
-  def top_level_referenced_pcollection_ids(self):
-"""Returns an array of top level referenced PCollection IDs.
+  def tl_referenced_pcoll_ids(self):
+"""Returns a set of PCollection IDs referenced by top level PTransforms.
 """
-return self._top_level_referenced_pcollection_ids
+return self._top_level_referenced_pcoll_ids
 
-  def top_level_required_transforms(self):
-"""Returns a dict mapping ID to proto of top level PTransforms.
+  def tl_required_trans_ids(self):
+"""Returns a set of required top level PTransform IDs.
 """
-return self._top_level_required_transforms
+return self._top_level_required_transforms.keys()
 
 Review comment:
   We should return list(self._top_level_required_transforms) to avoid the lint 
complaint.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147829)
Time Spent: 50m  (was: 40m)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147833
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:38
Start Date: 25/Sep/18 23:38
Worklog Time Spent: 10m 
  Work Description: pabloem opened a new pull request #6492: [BEAM-5501] 
Interactive Beam - Fixing Pylint 2-3 compatibility issue
URL: https://github.com/apache/beam/pull/6492
 
 
   r: @tvalentyn 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147833)
Time Spent: 1h 20m  (was: 1h 10m)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147835
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 25/Sep/18 23:49
Start Date: 25/Sep/18 23:49
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on a change in pull request 
#6431: [BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r219661281
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
+  /* Disabled for now, for testing
+  exec {
+executable 'git'
+args 'checkout', 'asf-site'
+  }
+  copy {
+from contentBuildDir
+into contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'add', contentRepoDir
+  }
+  exec {
+executable 'git'
+args 'commit', '-m', 'Update website ' + new Date().format('/MM/dd 
HH:mm:ss')
+  }
+  exec {
 
 Review comment:
   Created a gitCommit step to do everything but the push


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147835)
Time Spent: 9.5h  (was: 9h 20m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5507) Pass pubsubRootUrl option to Dataflow runner.

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5507?focusedWorklogId=147854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147854
 ]

ASF GitHub Bot logged work on BEAM-5507:


Author: ASF GitHub Bot
Created on: 26/Sep/18 00:20
Start Date: 26/Sep/18 00:20
Worklog Time Spent: 10m 
  Work Description: jasonkuster commented on a change in pull request 
#6477: [BEAM-5507] Pass --pubsubRooturl option to Dataflow runner.
URL: https://github.com/apache/beam/pull/6477#discussion_r220394646
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -685,6 +685,12 @@ def _add_argparse_args(cls, parser):
 help='The time to wait (in milliseconds) for test pipeline to finish. '
  'If it is set to None, it will wait indefinitely until the job '
  'is finished.')
+# This option is used by Dataflow Runner's Pub/Sub client. Its camelCase
+# style matches the runner's.
+parser.add_argument(
+'--pubsubRootUrl',
 
 Review comment:
   Awesome, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147854)
Time Spent: 10m
Remaining Estimate: 0h

> Pass pubsubRootUrl option to Dataflow runner.
> -
>
> Key: BEAM-5507
> URL: https://issues.apache.org/jira/browse/BEAM-5507
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This option will be used for testing only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147856
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 26/Sep/18 00:43
Start Date: 26/Sep/18 00:43
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6492: [BEAM-5501] 
Interactive Beam - Fixing Pylint 2-3 compatibility issue
URL: https://github.com/apache/beam/pull/6492#issuecomment-424547470
 
 
   Thanks, @pabloem .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147856)
Time Spent: 1.5h  (was: 1h 20m)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147877
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:50
Start Date: 26/Sep/18 01:50
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6431: [BEAM-4496] 
Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424557723
 
 
   Run Website Publish


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147877)
Time Spent: 11h 10m  (was: 11h)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4747) Python LocalFileSystem directory-creation semantics

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4747?focusedWorklogId=147880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147880
 ]

ASF GitHub Bot logged work on BEAM-4747:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:57
Start Date: 26/Sep/18 01:57
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6494: Revert 
"[BEAM-4747] mkdirs if they don't exist in localfilesystem (#5903)
URL: https://github.com/apache/beam/pull/6494#issuecomment-424558738
 
 
   R: @aaltay 
   CC: @ryan-williams


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147880)
Time Spent: 1h 20m  (was: 1h 10m)

> Python LocalFileSystem directory-creation semantics
> ---
>
> Key: BEAM-4747
> URL: https://issues.apache.org/jira/browse/BEAM-4747
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Coming out of discussion on 
> [BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742] / 
> [#5903|https://github.com/apache/beam/pull/5903] is a question of whether 
> {{LocalFileSystem.open,create,copy,rename}} should create 
> intermediate (destination) directories, or fail with {{IOError}}'s (as the 
> stdlib {{os}} module generally will).
> If the semantics of {{LocalFileSystem}} should mimic those of distributed 
> filesystems (in the spirit of [recent discussion about {{DirectRunner}} being 
> more like a local simulation of a distributed runner than a production-grade 
> local 
> runner|https://www.mail-archive.com/dev@beam.apache.org/msg08410.html]), then 
> this makes sense, and it sounds like [~lcwik] and [~angoenka] are in favor of 
> this interpretation.
> I'll repurpose [#5903|https://github.com/apache/beam/pull/5903] to this end 
> unless I hear otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4747) Python LocalFileSystem directory-creation semantics

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4747?focusedWorklogId=147881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147881
 ]

ASF GitHub Bot logged work on BEAM-4747:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:57
Start Date: 26/Sep/18 01:57
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6494: Revert 
"[BEAM-4747] mkdirs if they don't exist in localfilesystem (#5903)
URL: https://github.com/apache/beam/pull/6494#issuecomment-424558760
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147881)
Time Spent: 1.5h  (was: 1h 20m)

> Python LocalFileSystem directory-creation semantics
> ---
>
> Key: BEAM-4747
> URL: https://issues.apache.org/jira/browse/BEAM-4747
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.5.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Coming out of discussion on 
> [BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742] / 
> [#5903|https://github.com/apache/beam/pull/5903] is a question of whether 
> {{LocalFileSystem.open,create,copy,rename}} should create 
> intermediate (destination) directories, or fail with {{IOError}}'s (as the 
> stdlib {{os}} module generally will).
> If the semantics of {{LocalFileSystem}} should mimic those of distributed 
> filesystems (in the spirit of [recent discussion about {{DirectRunner}} being 
> more like a local simulation of a distributed runner than a production-grade 
> local 
> runner|https://www.mail-archive.com/dev@beam.apache.org/msg08410.html]), then 
> this makes sense, and it sounds like [~lcwik] and [~angoenka] are in favor of 
> this interpretation.
> I'll repurpose [#5903|https://github.com/apache/beam/pull/5903] to this end 
> unless I hear otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5486) Python: Filesystems.match(['gs://bucket/*']) fails

2018-09-25 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-5486.
-
   Resolution: Fixed
Fix Version/s: 2.8.0

> Python: Filesystems.match(['gs://bucket/*']) fails
> --
>
> Key: BEAM-5486
> URL: https://issues.apache.org/jira/browse/BEAM-5486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Reported here: https://github.com/apache/beam/pull/5024#issuecomment-406211816



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5507) Pass pubsubRootUrl option to Dataflow runner.

2018-09-25 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-5507:
---

 Summary: Pass pubsubRootUrl option to Dataflow runner.
 Key: BEAM-5507
 URL: https://issues.apache.org/jira/browse/BEAM-5507
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


This option will be used for testing only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5507) Pass pubsubRootUrl option to Dataflow runner.

2018-09-25 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628059#comment-16628059
 ] 

Udi Meiri commented on BEAM-5507:
-

https://github.com/apache/beam/pull/6477

> Pass pubsubRootUrl option to Dataflow runner.
> -
>
> Key: BEAM-5507
> URL: https://issues.apache.org/jira/browse/BEAM-5507
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> This option will be used for testing only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PreCommit_Python_Cron #390

2018-09-25 Thread Apache Jenkins Server
See 


Changes:

[qinyeli] Interactive Beam -- read_cache_ids and write_cache_ids

[qinyeli] Interactive Beam -- renaming variables and functions

[qinyeli] Interactive Beam -- fixing PTransform # display issue

[ehudm] GCSIO: Allow empty object prefix in list_prefix().

[migryz] Add metrics dashboard deployment script and logic

[migryz] Fix rat issues

[aaltay] [BEAM-1251] Upgrade pylint version for py27-lint3 (#6489)

[aaltay] [BEAM-5319] Partially port runners (#6451)

--
[...truncated 1.05 MB...]
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  File 
"
 line 414, in invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"
 line 1068, in 
wrapper = lambda x: [fn(x)]
  File 
"
 line 182, in raise_error
raise RuntimeError('x')
RuntimeError: x

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"
 line 131, in _execute
response = task()
  File 
"
 line 166, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"
 line 212, in do_instruction
request.instruction_id)
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
  File 
"
 line 349, in process_instruction_id
yield
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File 
"
 line 387, in process_bundle
input_op.process_encoded(data.data)
  File 
"
 line 123, in process_encoded
self.output(decoded_value)
  File 
"
 line 167, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 87, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 268, in process
self.output(windowed_value)
  File 
"
 line 167, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 87, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 414, in process
self.dofn_receiver.receive(o)
  File 
"
 line 673, in receive
self.process(windowed_value)
  File 
"
 line 679, in process
self._reraise_augmented(exn)
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  

[jira] [Created] (BEAM-5508) Upgrade gradle-versions-plugin to 0.20.0

2018-09-25 Thread Ted Yu (JIRA)
Ted Yu created BEAM-5508:


 Summary: Upgrade gradle-versions-plugin to 0.20.0
 Key: BEAM-5508
 URL: https://issues.apache.org/jira/browse/BEAM-5508
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Ted Yu
Assignee: Luke Cwik


Currently 0.17.0 is used for gradle-versions-plugin

This task upgrades to 0.20.0 version

https://bintray.com/fooberger/maven/com.github.ben-manes%3Agradle-versions-plugin/0.20.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147865
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:35
Start Date: 26/Sep/18 01:35
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on a change in pull request 
#6431: [BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r219661177
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -104,3 +106,32 @@ task testWebsite(type: Exec) {
 task preCommit {
   dependsOn testWebsite
 }
+
+task mergeWebsite << {
+  /* Disabled for now, for testing
+  exec {
+executable 'git'
 
 Review comment:
   Added the org.ajoberstar.grgit plugin


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147865)
Time Spent: 9h 50m  (was: 9h 40m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147866
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:36
Start Date: 26/Sep/18 01:36
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on a change in pull request 
#6431: [BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#discussion_r219661201
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -22,6 +22,8 @@ apply plugin: "base"
 def dockerImageTag = 'beam-website'
 def dockerWorkDir = "/repo"
 def buildDir = "$project.rootDir/build/website"
+def contentBuildDir = "$buildDir/content"
+def contentRepoDir = "$project.rootDir/website/content"
 
 Review comment:
   Removing the website/generated-content, both with git remove and gradle 
delete call.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147866)
Time Spent: 10h  (was: 9h 50m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147863
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:34
Start Date: 26/Sep/18 01:34
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6431: [BEAM-4496] 
Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424555113
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147863)
Time Spent: 9h 40m  (was: 9.5h)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147869
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:42
Start Date: 26/Sep/18 01:42
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6431: [BEAM-4496] 
Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424556506
 
 
   Run Website Publish


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147869)
Time Spent: 10h 20m  (was: 10h 10m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-25 Thread Thomas Weise (JIRA)
Thomas Weise created BEAM-5509:
--

 Summary: Python pipeline_options doesn't handle int type
 Key: BEAM-5509
 URL: https://issues.apache.org/jira/browse/BEAM-5509
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Thomas Weise
Assignee: Robert Bradshaw


The int option supplied at the command line is turned into a decimal during 
serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147874
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:46
Start Date: 26/Sep/18 01:46
Worklog Time Spent: 10m 
  Work Description: alanmyrvold removed a comment on issue #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424556506
 
 
   Run Website Publish


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147874)
Time Spent: 10h 50m  (was: 10h 40m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147873
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:46
Start Date: 26/Sep/18 01:46
Worklog Time Spent: 10m 
  Work Description: alanmyrvold removed a comment on issue #6431: 
[BEAM-4496] Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424557082
 
 
   Run Website Publish


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147873)
Time Spent: 10h 40m  (was: 10.5h)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?focusedWorklogId=147872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147872
 ]

ASF GitHub Bot logged work on BEAM-4496:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:46
Start Date: 26/Sep/18 01:46
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6431: [BEAM-4496] 
Website merge
URL: https://github.com/apache/beam/pull/6431#issuecomment-424557082
 
 
   Run Website Publish


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147872)
Time Spent: 10.5h  (was: 10h 20m)

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-25 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628141#comment-16628141
 ] 

Thomas Weise commented on BEAM-5509:


Pass --parallelism=1 and then job_utils.dict_to_struct(options) yields
{code:java}
fields {
  key: "beam:option:parallelism:v1"
  value {
    number_value: 1.0
  }
}{code}
Parsing in SDK harness will bark at it:
{code:java}
sdk_worker_main.py: error: argument --parallelism: invalid int value: u'1.0'

[grpc-default-worker-ELG-3-3] DEBUG 
org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServerHandler - [id: 
0x284d90f2, L:/127.0.0.1:57436 - R:/127.0.0.1:57442] INBOUND DATA: streamId=1 
padding=0 endStream=false length=980 
bytes=0003cf0acc070806120c08ccb8abdd0510e8b5f5e1011a9707507974686f6e2073646b206861726e657373206661696c65643a200a54726163656261636b...

[grpc-default-worker-ELG-3-3] DEBUG 
org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServerHandler - [id: 
0x284d90f2, L:/127.0.0.1:57436 - R:/127.0.0.1:57442] INBOUND DATA: streamId=1 
padding=0 endStream=true length=0 bytes=

[grpc-default-executor-0] ERROR sdk_worker_main.main - Python sdk harness 
failed:

Traceback (most recent call last):

  File 
"/Users/tweise/python-ve/beam/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 136, in main

    sdk_pipeline_options.get_all_options(drop_default=True))

  File 
"/Users/tweise/python-ve/beam/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
 line 216, in get_all_options

    known_args, _ = parser.parse_known_args(self._flags)

  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py",
 line 1740, in parse_known_args

    self.error(str(err))

  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py",
 line 2374, in error

    self.exit(2, _('%s: error: %s\n') % (self.prog, message))

  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py",
 line 2362, in exit

    _sys.exit(status)

SystemExit: 2{code}
 

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Robert Bradshaw
>Priority: Major
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5490) Beam should not irreversibly modify inspect.getargspec

2018-09-25 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-5490:
-

Assignee: Valentyn Tymofieiev

> Beam should not irreversibly modify inspect.getargspec
> --
>
> Key: BEAM-5490
> URL: https://issues.apache.org/jira/browse/BEAM-5490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chuan Yu Foo
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> In the Python SDK, in {{typehints/decorators.py}}, Beam irreversibly 
> monkey-patches {{inspect.getargspec}} for its own purposes. This is really 
> bad form, since it leads to hard to debug issues in other modules which also 
> use {{inspect.getargspec}} when Beam is imported. 
> Beam should either maintain its own modified copy of the functions it needs, 
> or else it should always reverse the monkey-patching so that it does not 
> affect other modules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (8d3389d -> 50111d5)

2018-09-25 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 8d3389d  Merge pull request #6480 from udim/gcsio-parse-path
 add 50111d5  [BEAM-5319] Partially port runners (#6451)

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/io/filebasedsource_test.py |  6 +++---
 sdks/python/apache_beam/io/textio.py   |  8 
 sdks/python/apache_beam/runners/common.py  |  6 ++
 .../consumer_tracking_pipeline_visitor_test.py | 24 +++---
 .../runners/interactive/cache_manager.py   | 12 +--
 .../runners/interactive/interactive_runner_test.py |  5 +
 .../runners/interactive/pipeline_analyzer_test.py  |  7 +++
 .../runners/portability/fn_api_runner.py   |  2 +-
 .../runners/portability/fn_api_runner_test.py  | 22 +---
 .../runners/portability/portable_runner.py |  2 +-
 .../runners/portability/portable_stager.py |  2 +-
 .../runners/portability/portable_stager_test.py|  3 ++-
 .../apache_beam/runners/portability/stager_test.py |  4 
 .../apache_beam/runners/worker/data_plane.py   |  2 +-
 .../apache_beam/runners/worker/data_plane_test.py  | 16 +++
 .../apache_beam/runners/worker/opcounters_test.py  |  3 +++
 .../apache_beam/runners/worker/sdk_worker_main.py  |  2 +-
 sdks/python/apache_beam/transforms/util.py |  2 +-
 sdks/python/tox.ini|  2 +-
 19 files changed, 86 insertions(+), 44 deletions(-)



[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=147805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147805
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 25/Sep/18 22:14
Start Date: 25/Sep/18 22:14
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6451: 
[BEAM-5319] Partially port runners
URL: https://github.com/apache/beam/pull/6451#discussion_r220372896
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py
 ##
 @@ -45,6 +45,10 @@ class 
ConsumerTrackingPipelineVisitorTest(unittest.TestCase):
   def setUp(self):
 self.pipeline = Pipeline(DirectRunner())
 self.visitor = ConsumerTrackingPipelineVisitor()
+try:# Python 2
+  self.assertCountEqual = self.assertItemsEqual
 
 Review comment:
   Nevermind, on py2, this will define assertCountEqual.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147805)
Time Spent: 2h 50m  (was: 2h 40m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_Verify #6087

2018-09-25 Thread Apache Jenkins Server
See 


--
[...truncated 1.28 MB...]
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  File 
"
 line 414, in invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"
 line 1068, in 
wrapper = lambda x: [fn(x)]
  File 
"
 line 182, in raise_error
raise RuntimeError('x')
RuntimeError: x

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"
 line 131, in _execute
response = task()
  File 
"
 line 166, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"
 line 212, in do_instruction
request.instruction_id)
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
  File 
"
 line 349, in process_instruction_id
yield
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File 
"
 line 387, in process_bundle
input_op.process_encoded(data.data)
  File 
"
 line 123, in process_encoded
self.output(decoded_value)
  File 
"
 line 167, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 87, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 268, in process
self.output(windowed_value)
  File 
"
 line 167, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 87, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 414, in process
self.dofn_receiver.receive(o)
  File 
"
 line 673, in receive
self.process(windowed_value)
  File 
"
 line 679, in process
self._reraise_augmented(exn)
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  File 
"
 line 414, in invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"
 line 787, in process_outputs
self.main_receivers.receive(windowed_value)
  File 

[jira] [Work logged] (BEAM-5501) Interactive Beam display issue

2018-09-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5501?focusedWorklogId=147859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147859
 ]

ASF GitHub Bot logged work on BEAM-5501:


Author: ASF GitHub Bot
Created on: 26/Sep/18 01:05
Start Date: 26/Sep/18 01:05
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #6492: [BEAM-5501] 
Interactive Beam - Fixing Pylint 2-3 compatibility issue
URL: https://github.com/apache/beam/pull/6492
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py 
b/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
index 1ac67ac7bac..9f815a36ff4 100644
--- a/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
+++ b/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py
@@ -121,7 +121,7 @@ def _analyze_pipeline(self):
 sample=True)
 
 required_transforms['_root'] = beam_runner_api_pb2.PTransform(
-subtransforms=list(top_level_required_transforms.keys()))
+subtransforms=list(top_level_required_transforms))
 
 referenced_pcoll_ids = self._referenced_pcoll_ids(
 required_transforms)
@@ -168,7 +168,7 @@ def tl_referenced_pcoll_ids(self):
   def tl_required_trans_ids(self):
 """Returns a set of required top level PTransform IDs.
 """
-return self._top_level_required_transforms.keys()
+return list(self._top_level_required_transforms)
 
   def caches_used(self):
 """Returns a set of PCollection IDs to read from cache.


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 147859)
Time Spent: 1h 40m  (was: 1.5h)

> Interactive Beam display issue
> --
>
> Key: BEAM-5501
> URL: https://issues.apache.org/jira/browse/BEAM-5501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-ideas
>Reporter: Qinye Li
>Assignee: Kenneth Knowles
>Priority: Trivial
> Attachments: 45650665-e0374200-ba83-11e8-9425-a2b5de9aa455.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The number of PTransform executed is wrongly displayed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >