[jira] [Created] (BEAM-1185) Remove the word Pipeline from the name of all PipelineRunner implementations

2016-12-19 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1185:
-

 Summary: Remove the word Pipeline from the name of all 
PipelineRunner implementations
 Key: BEAM-1185
 URL: https://issues.apache.org/jira/browse/BEAM-1185
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


See: https://issues.apache.org/jira/browse/BEAM-234

Rename all runners to remove the Pipeline word from their name in the Python 
SDK (e.g. DirectPipelineRunner -> DirectRunner).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1183) oauth2 client logger warning

2016-12-19 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1183:
-

 Summary: oauth2 client logger warning
 Key: BEAM-1183
 URL: https://issues.apache.org/jira/browse/BEAM-1183
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


importing apache_beam results in the following warning:

No handlers could be found for logger "oauth2client.contrib.multistore_file"

This is coming from the oauth2client and could be reproduced with the following:

import oauth2client.contrib.multistore_file # precompiled from 
/Users/emin/anaconda/lib/python2.7/site-packages/oauth2client/contrib/multistore_file.pyc
No handlers could be found for logger "oauth2client.contrib.multistore_file"

Upgrading the oauth2client (once all the dependencies allow that) would solve 
this problem.

User reported issue: 
https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1125) Rename PTransform.apply to PTransform.expand

2016-12-15 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-1125.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Rename PTransform.apply to PTransform.expand
> 
>
> Key: BEAM-1125
> URL: https://issues.apache.org/jira/browse/BEAM-1125
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
> Fix For: Not applicable
>
>
> For context see:
> [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> https://lists.apache.org/thread.html/b4d9bcfbfeaa5dbcd5b68fd2344cdffe45587ff88cb714638504e759@%3Cdev.beam.apache.org%3E
> This requires renaming the apply method, updating all custom PTransforms, and 
> runners where transform.apply is called. (Based on the Java PR, this could be 
> easily done with a refactoring tool.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1147) BigQuery tests do not run with DirectPipelineRunner

2016-12-13 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1147:
-

 Summary: BigQuery tests do not run with DirectPipelineRunner
 Key: BEAM-1147
 URL: https://issues.apache.org/jira/browse/BEAM-1147
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Frances Perry
Priority: Minor


BQ uses a NativeSink, and these are not meant to be pickled. 
DirectPipelineRunner tests pickling/unpickling nevertheless causing failures.

It would be best if we could add an authenticated directpipelierunner test to 
catch this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-1147) BigQuery tests do not run with DirectPipelineRunner

2016-12-13 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-1147:
-

Assignee: Ahmet Altay  (was: Frances Perry)

> BigQuery tests do not run with DirectPipelineRunner
> ---
>
> Key: BEAM-1147
> URL: https://issues.apache.org/jira/browse/BEAM-1147
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
>
> BQ uses a NativeSink, and these are not meant to be pickled. 
> DirectPipelineRunner tests pickling/unpickling nevertheless causing failures.
> It would be best if we could add an authenticated directpipelierunner test to 
> catch this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1125) Rename PTransform.apply to PTransform.expand

2016-12-09 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-1125:
--
Summary: Rename PTransform.apply to PTransform.expand  (was: Rename one of 
PTransform.apply to PTransform.expand())

> Rename PTransform.apply to PTransform.expand
> 
>
> Key: BEAM-1125
> URL: https://issues.apache.org/jira/browse/BEAM-1125
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
>
> For context see:
> [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> https://lists.apache.org/thread.html/b4d9bcfbfeaa5dbcd5b68fd2344cdffe45587ff88cb714638504e759@%3Cdev.beam.apache.org%3E
> This requires renaming the apply method, updating all custom PTransforms, and 
> runners where transform.apply is called. (Based on the Java PR, this could be 
> easily done with a refactoring tool.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1125) Rename one of PTransform.apply to PTransform.expand()

2016-12-09 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1125:
-

 Summary: Rename one of PTransform.apply to PTransform.expand()
 Key: BEAM-1125
 URL: https://issues.apache.org/jira/browse/BEAM-1125
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Frances Perry


For context see:
[BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
[PR #1538] https://github.com/apache/incubator-beam/pull/1538

https://lists.apache.org/thread.html/b4d9bcfbfeaa5dbcd5b68fd2344cdffe45587ff88cb714638504e759@%3Cdev.beam.apache.org%3E

This requires renaming the apply method, updating all custom PTransforms, and 
runners where transform.apply is called. (Based on the Java PR, this could be 
easily done with a refactoring tool.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-1125) Rename one of PTransform.apply to PTransform.expand()

2016-12-09 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-1125:
-

Assignee: Ahmet Altay  (was: Frances Perry)

> Rename one of PTransform.apply to PTransform.expand()
> -
>
> Key: BEAM-1125
> URL: https://issues.apache.org/jira/browse/BEAM-1125
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
>
> For context see:
> [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> https://lists.apache.org/thread.html/b4d9bcfbfeaa5dbcd5b68fd2344cdffe45587ff88cb714638504e759@%3Cdev.beam.apache.org%3E
> This requires renaming the apply method, updating all custom PTransforms, and 
> runners where transform.apply is called. (Based on the Java PR, this could be 
> easily done with a refactoring tool.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-886) Support new DoFn in Python SDK

2016-12-05 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-886:
-
Assignee: Sourabh Bajaj  (was: Ahmet Altay)

> Support new DoFn in Python SDK
> --
>
> Key: BEAM-886
> URL: https://issues.apache.org/jira/browse/BEAM-886
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Sourabh Bajaj
>  Labels: backward-incompatible, sdk-consistency
>
> Figure out what is needed for supporting new DoFns, add support and removed 
> old DoFns.
> Related Docs from Java:
> Original Proposal email:
> https://lists.apache.org/thread.html/2abf32d528dbb64b79853552c5d10c217e2194f0685af21aeb4635dd@%3Cdev.beam.apache.org%3E
> Presentation & Doc (with short Python sections):
> https://s.apache.org/presenting-a-new-dofn
> https://s.apache.org/a-new-dofn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1088) submit_job_description needs job argument (Fails post commit)

2016-12-05 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1088:
-

 Summary: submit_job_description needs job argument (Fails post 
commit)
 Key: BEAM-1088
 URL: https://issues.apache.org/jira/browse/BEAM-1088
 Project: Beam
  Issue Type: Bug
Reporter: Ahmet Altay
Assignee: Ahmet Altay


https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/822/consoleFull

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/examples/wordcount.py",
 line 106, in 
run()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/examples/wordcount.py",
 line 97, in run
result = p.run()
  File "apache_beam/pipeline.py", line 159, in run
return self.runner.run(self)
  File "apache_beam/runners/dataflow_runner.py", line 179, in run
self.dataflow_client.create_job(self.job))
  File "apache_beam/utils/retry.py", line 167, in wrapper
return fun(*args, **kwargs)
  File "apache_beam/internal/apiclient.py", line 415, in create_job
return self.submit_job_description()
  File "apache_beam/internal/apiclient.py", line 433, in submit_job_description
request.job = job.proto
NameError: global name 'job' is not defined



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1081) annotations should support custom messages and classes

2016-12-02 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-1081:
--
Assignee: (was: Frances Perry)

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starter
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1081) annotations should support custom messages and classes

2016-12-02 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1081:
-

 Summary: annotations should support custom messages and classes
 Key: BEAM-1081
 URL: https://issues.apache.org/jira/browse/BEAM-1081
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Frances Perry
Priority: Minor


Update 
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
 to add 2 new features:

1. ability to customize message
2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1044) tests run before install fails

2016-12-02 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-1044.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> tests run before install fails
> --
>
> Key: BEAM-1044
> URL: https://issues.apache.org/jira/browse/BEAM-1044
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Vikas Kedigehalli
> Fix For: Not applicable
>
>
> `python setup.py test` fails for datastore tests when run in a new virtual 
> environment. Running `python setup.py install` fixes the problem but that 
> should not be necessary. Stack for one of the failing tests:
> ==
> ERROR: Failure: ImportError (cannot import name descriptor)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/apache_beam/io/datastore/v1/query_splitter_test.py",
>  line 25, in 
> from apache_beam.io.datastore.v1 import fake_datastore
>   File 
> "/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/apache_beam/io/datastore/v1/fake_datastore.py",
>  line 21, in 
> from google.datastore.v1 import datastore_pb2
>   File "build/bdist.linux-x86_64/egg/google/datastore/v1/datastore_pb2.py", 
> line 6, in 
> ImportError: cannot import name descriptor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1046) Travis (Linux) failing for python sdk

2016-11-23 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1046:
-

 Summary: Travis (Linux) failing for python sdk
 Key: BEAM-1046
 URL: https://issues.apache.org/jira/browse/BEAM-1046
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay


All PRs are failing with the same error:

An example: https://travis-ci.org/apache/incubator-beam/builds/178435675

$ if [ "$TEST_PYTHON" ] && ! pip list | grep tox; then travis_retry pip install 
tox --user `whoami`; fi
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting tox
  Downloading tox-2.5.0-py2.py3-none-any.whl (42kB)
100% || 45kB 6.3MB/s 
Collecting travis
  Could not find any downloads that satisfy the requirement travis
  No distributions at all found for travis

The command "pip install tox --user travis" failed. Retrying, 2 of 3.

You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting tox
  Using cached tox-2.5.0-py2.py3-none-any.whl
Collecting travis
  Could not find any downloads that satisfy the requirement travis
  No distributions at all found for travis

The command "pip install tox --user travis" failed. Retrying, 3 of 3.

You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting tox
  Using cached tox-2.5.0-py2.py3-none-any.whl
Collecting travis
  Could not find any downloads that satisfy the requirement travis
  No distributions at all found for travis

The command "pip install tox --user travis" failed 3 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1044) tests run before install fails

2016-11-23 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1044:
-

 Summary: tests run before install fails
 Key: BEAM-1044
 URL: https://issues.apache.org/jira/browse/BEAM-1044
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Vikas Kedigehalli


`python setup.py test` fails for datastore tests when run in a new virtual 
environment. Running `python setup.py install` fixes the problem but that 
should not be necessary. Stack for one of the failing tests:

==
ERROR: Failure: ImportError (cannot import name descriptor)
--
Traceback (most recent call last):
  File 
"/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/loader.py",
 line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/importer.py",
 line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  File 
"/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/importer.py",
 line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File 
"/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/apache_beam/io/datastore/v1/query_splitter_test.py",
 line 25, in 
from apache_beam.io.datastore.v1 import fake_datastore
  File 
"/usr/local/google/home/altay/Desktop/beam/temp/incubator-beam/sdks/python/apache_beam/io/datastore/v1/fake_datastore.py",
 line 21, in 
from google.datastore.v1 import datastore_pb2
  File "build/bdist.linux-x86_64/egg/google/datastore/v1/datastore_pb2.py", 
line 6, in 
ImportError: cannot import name descriptor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-731) Replace DirectRunner with InProcessRunner

2016-11-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-731.

   Resolution: Fixed
Fix Version/s: Not applicable

> Replace DirectRunner with InProcessRunner
> -
>
> Key: BEAM-731
> URL: https://issues.apache.org/jira/browse/BEAM-731
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
> Fix For: Not applicable
>
>
> Remove the old DirectRunner and replace with the new InProcessRunner.
> There is an overhead for keeping both runners (testing/code maintenance etc.) 
> InProcessRunner has been available for a while, it is tested enough for the 
> being the default runner. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-886) Support new DoFn in Python SDK

2016-11-02 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-886:


 Summary: Support new DoFn in Python SDK
 Key: BEAM-886
 URL: https://issues.apache.org/jira/browse/BEAM-886
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


Figure out what is needed for supporting new DoFns, add support and removed old 
DoFns.


Related Docs from Java:

Original Proposal email:
https://lists.apache.org/thread.html/2abf32d528dbb64b79853552c5d10c217e2194f0685af21aeb4635dd@%3Cdev.beam.apache.org%3E

Presentation & Doc (with short Python sections):
https://s.apache.org/presenting-a-new-dofn
https://s.apache.org/a-new-dofn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-525) Verify that ParDo with multiple outputs with tags un declared in with_outputs() work

2016-10-26 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610266#comment-15610266
 ] 

Ahmet Altay commented on BEAM-525:
--

Looks like the checks in that TODO are already part of the test. Yes, let's 
close this issue but first remove that TODO from the code.

> Verify that ParDo with multiple outputs with tags un declared in 
> with_outputs() work 
> -
>
> Key: BEAM-525
> URL: https://issues.apache.org/jira/browse/BEAM-525
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>
> test_undeclared_side_outputs was failing (when last checked) under certain 
> conditions:
> See this TODO:
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/dataflow_test.py#L202
> This is probably not failing any more but it needs to be verified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-832) Python 3 support

2016-10-25 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-832:


 Summary: Python 3 support
 Key: BEAM-832
 URL: https://issues.apache.org/jira/browse/BEAM-832
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor


Investigate, and add Python 3 support for python sdk. Resulting sdk needs to 
have same support for existing Python 2.7 users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-693) pydoc is not working

2016-10-25 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606307#comment-15606307
 ] 

Ahmet Altay commented on BEAM-693:
--

Thank you for looking at this. We can defer the other options for documentation 
to the https://issues.apache.org/jira/browse/BEAM-817 . Let's fix the 
instruction for pydoc in the readme. 

According to [1] develop has the advantage if you are making frequent code 
changes. Most of our users would not change the code so install is a better 
option. (This also helps because we recommend install for using in other parts 
of the readme.)

Readme should be updated to say something along these lines:

"""
Make sure you installed the package first, if not run 'python setup.py install' 
then run pydoc with 'pydoc -p '
...
"""

Marco, would you like to make that change?

[1] 
http://stackoverflow.com/questions/19048732/python-setup-py-develop-vs-install

> pydoc is not working
> 
>
> Key: BEAM-693
> URL: https://issues.apache.org/jira/browse/BEAM-693
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>
> Repro:
> Start the pydoc server (pydoc -p ) and navigate to the apache_beam root:
> http://localhost:/apache_beam.html
> Following errors are shown instead of the actual documentation:
> problem in apache_beam - : No module named avro
> problem in apache_beam - : cannot import name 
> coders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-753) Pin versions of all dependencies

2016-10-25 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-753.

Resolution: Fixed

> Pin versions of all dependencies
> 
>
> Key: BEAM-753
> URL: https://issues.apache.org/jira/browse/BEAM-753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> ERROR: Failure: ImportError (cannot import name locked_file)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
>  line 78, in 
> from apache_beam import io
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
>  line 21, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
>  line 29, in 
> from apache_beam.io import filebasedsource
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
>  line 31, in 
> from apache_beam.io import concat_source
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
>  line 24, in 
> from apache_beam.io import iobase
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
>  line 818, in 
> from apache_beam.runners.dataflow.native_io.iobase import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
>  line 23, in 
> from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
>  line 43, in 
> from apache_beam.internal.clients import dataflow as dataflow_api
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
>  line 23, in 
> from apitools.base.py import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
>  line 22, in 
> from apitools.base.py.credentials_lib import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
>  line 50, in 
> from oauth2client import locked_file
> ImportError: cannot import name locked_file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-802) Support Dynamic PipelineOptions for python

2016-10-25 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-802:
-
Assignee: María GH

> Support Dynamic PipelineOptions for python
> --
>
> Key: BEAM-802
> URL: https://issues.apache.org/jira/browse/BEAM-802
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: María GH
>Assignee: María GH
>Priority: Minor
>   Original Estimate: 1,680h
>  Remaining Estimate: 1,680h
>
> Goal:  Enable users to run pipelines from templates filled via CL (pipeline 
> options)
> Background: Currently, the Runner creates the JSON pipeline description which 
> can be sent to the worker as is, since everything is already defined there 
> (with links to gs:// for input and binaries). With the parametrized approach, 
> those descriptions are empty and filled by the user or defaulted, so the 
> pipeline needs to be stored somewhere first until the values become available.
> Tasks:
> 1- Create template-style pipeline description (TemplateRunner)
> The graph description is now a template (some parts are not filled) that 
> needs to be saved.
> 2- Define values to inject to the template (ValueProviders API)
> The placeholders can be filled with default values (static) or with dynamic 
> key/value pairs provided at runtime (dynamic)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-817) Fill in the learn/sdks/python portion of the website

2016-10-25 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-817:


 Summary: Fill in the learn/sdks/python portion of the website
 Key: BEAM-817
 URL: https://issues.apache.org/jira/browse/BEAM-817
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Hadar Hod


Should be a landing page for the Python SDK similar to learn/sdks/java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-693) pydoc is not working

2016-10-24 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602948#comment-15602948
 ] 

Ahmet Altay commented on BEAM-693:
--

Hi Marco, I do not know the answer to your question. Would you be interested in 
investigating this?

If this is really a matter of running setup.py in the way you mentioned before 
running pydoc then it should be noted in the README. Another question I have 
is, what is the standard for other python projects using pydoc?

Finally we need to have a test, maybe a post commit shell script that verifies 
that pydoc continues to work. (This could be a follow up issue and does not 
need to be part of this.)

> pydoc is not working
> 
>
> Key: BEAM-693
> URL: https://issues.apache.org/jira/browse/BEAM-693
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>
> Repro:
> Start the pydoc server (pydoc -p ) and navigate to the apache_beam root:
> http://localhost:/apache_beam.html
> Following errors are shown instead of the actual documentation:
> problem in apache_beam - : No module named avro
> problem in apache_beam - : cannot import name 
> coders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-789) Review python sdk dependencies

2016-10-20 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-789:


 Summary: Review python sdk dependencies
 Key: BEAM-789
 URL: https://issues.apache.org/jira/browse/BEAM-789
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Review the existing dependencies for the python sdk. Are they still all 
required? (e.g. protorpc might not be a required dependency any more.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-731) Replace DirectRunner with InProcessRunner

2016-10-20 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-731:
-
Labels: sdk-consistency  (was: )

> Replace DirectRunner with InProcessRunner
> -
>
> Key: BEAM-731
> URL: https://issues.apache.org/jira/browse/BEAM-731
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
>
> Remove the old DirectRunner and replace with the new InProcessRunner.
> There is an overhead for keeping both runners (testing/code maintenance etc.) 
> InProcessRunner has been available for a while, it is tested enough for the 
> being the default runner. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-780) Add support for pipeline metrics

2016-10-19 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-780:


 Summary: Add support for pipeline metrics
 Key: BEAM-780
 URL: https://issues.apache.org/jira/browse/BEAM-780
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Frances Perry


Remove aggregators and replace them with the metrics API.

See: https://issues.apache.org/jira/browse/BEAM-147 for the Java SDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-782) Resolve runners in a case-insensitive manner.

2016-10-19 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-782:


 Summary:  Resolve runners in a case-insensitive manner.
 Key: BEAM-782
 URL: https://issues.apache.org/jira/browse/BEAM-782
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Frances Perry


See:
https://github.com/apache/incubator-beam/pull/1087
https://issues.apache.org/jira/browse/BEAM-770

e.g. the DirectRunner can be specified with (among others) any of
"--runner=direct", "--runner=directrunner", "--runner=DirectRunner",
"--runner=Direct", or "--runner=directRunner"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-517) Check versions of pip and cython

2016-10-18 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-517.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Check versions of pip and cython
> 
>
> Key: BEAM-517
> URL: https://issues.apache.org/jira/browse/BEAM-517
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starter
> Fix For: Not applicable
>
>
> Python SDK depends on pip and cython however it does not check the versions 
> of these.
> Some of the pip flags does not exist in older versions:
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/28#issuecomment-236382953
> (Note: Even though the above issue was reported by the user in a different 
> repo it is related to the apache beam sdk)
> Similarly with cython, SDK supports running with or without Cython. Because 
> of that reason it is not list it as a requirement in the setup.py file. 
> However, with an old version of cython SDK might fail.
> To avoid the above problem: In the SDK check the version of these packages 
> and show a warning to upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-562) DoFn Reuse: Add new methods to DoFn

2016-10-18 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-562:
-
Labels: sdk-consistency  (was: )

> DoFn Reuse: Add new methods to DoFn
> ---
>
> Key: BEAM-562
> URL: https://issues.apache.org/jira/browse/BEAM-562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
>
> Java SDK added setup and teardown methods to the DoFns. This makes DoFns 
> reusable and provide performance improvements. Python SDK should add support 
> for these new DoFn methods:
> Proposal doc: 
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-681) DoFns should be serialized at apply time and deserialized when executing

2016-10-18 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-681:
-
Labels: sdk-consistency  (was: )

> DoFns should be serialized at apply time and deserialized when executing
> 
>
> Key: BEAM-681
> URL: https://issues.apache.org/jira/browse/BEAM-681
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ben Chambers
>Assignee: Frances Perry
>  Labels: sdk-consistency
>
> 1. Serializing DoFns at application time ensures that any modifications of 
> fields within the DoFn after application do not accidentally pollute the 
> execution. This mirrors the approach taken in Java to provide an 
> approximation of lexical-closure (eg., you only need to know the state of the 
> DoFn at the time it was applied, not afterwards, to understand its behavior).
> 2. Based on 1, the DIrectRunner should also be deserializing DoFns before 
> running them, which should also detect other classes of errors such as using 
> the pipeline object (which is not pickleable) within the DoFn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-759) PipelineResult needs waitToFinish() and cancel()

2016-10-18 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-759:
-
Labels: sdk-consistency  (was: )

> PipelineResult needs waitToFinish() and cancel()
> 
>
> Key: BEAM-759
> URL: https://issues.apache.org/jira/browse/BEAM-759
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: sdk-consistency
>
> Java SDK, added waitToFinish() and cancel() to the PipelineResults, and as a 
> result were able to remove BlockingDataflowRunner.
> (See: https://issues.apache.org/jira/browse/BEAM-443)
> The same changes needs to happen in python sdk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-753) Pin versions of all dependencies

2016-10-18 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-753:
-
Labels:   (was: sdk-consistency)

> Pin versions of all dependencies
> 
>
> Key: BEAM-753
> URL: https://issues.apache.org/jira/browse/BEAM-753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> ERROR: Failure: ImportError (cannot import name locked_file)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
>  line 78, in 
> from apache_beam import io
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
>  line 21, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
>  line 29, in 
> from apache_beam.io import filebasedsource
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
>  line 31, in 
> from apache_beam.io import concat_source
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
>  line 24, in 
> from apache_beam.io import iobase
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
>  line 818, in 
> from apache_beam.runners.dataflow.native_io.iobase import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
>  line 23, in 
> from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
>  line 43, in 
> from apache_beam.internal.clients import dataflow as dataflow_api
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
>  line 23, in 
> from apitools.base.py import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
>  line 22, in 
> from apitools.base.py.credentials_lib import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
>  line 50, in 
> from oauth2client import locked_file
> ImportError: cannot import name locked_file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-753) Pin versions of all dependencies

2016-10-18 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-753:
-
Labels: sdk-consistency  (was: )

> Pin versions of all dependencies
> 
>
> Key: BEAM-753
> URL: https://issues.apache.org/jira/browse/BEAM-753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
> Fix For: Not applicable
>
>
> ERROR: Failure: ImportError (cannot import name locked_file)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
>  line 78, in 
> from apache_beam import io
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
>  line 21, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
>  line 29, in 
> from apache_beam.io import filebasedsource
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
>  line 31, in 
> from apache_beam.io import concat_source
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
>  line 24, in 
> from apache_beam.io import iobase
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
>  line 818, in 
> from apache_beam.runners.dataflow.native_io.iobase import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
>  line 23, in 
> from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
>  line 43, in 
> from apache_beam.internal.clients import dataflow as dataflow_api
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
>  line 23, in 
> from apitools.base.py import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
>  line 22, in 
> from apitools.base.py.credentials_lib import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
>  line 50, in 
> from oauth2client import locked_file
> ImportError: cannot import name locked_file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-759) PipelineResult needs waitToFinish() and cancel()

2016-10-17 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-759:


 Summary: PipelineResult needs waitToFinish() and cancel()
 Key: BEAM-759
 URL: https://issues.apache.org/jira/browse/BEAM-759
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Java SDK, added waitToFinish() and cancel() to the PipelineResults, and as a 
result were able to remove BlockingDataflowRunner.
(See: https://issues.apache.org/jira/browse/BEAM-443)

The same changes needs to happen in python sdk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-753) Pin versions of all dependencies

2016-10-17 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-753:
-
Summary: Pin versions of all dependencies  (was: Travis failure (cannot 
import name locked_file))

> Pin versions of all dependencies
> 
>
> Key: BEAM-753
> URL: https://issues.apache.org/jira/browse/BEAM-753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> ERROR: Failure: ImportError (cannot import name locked_file)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
>  line 78, in 
> from apache_beam import io
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
>  line 21, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
>  line 29, in 
> from apache_beam.io import filebasedsource
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
>  line 31, in 
> from apache_beam.io import concat_source
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
>  line 24, in 
> from apache_beam.io import iobase
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
>  line 818, in 
> from apache_beam.runners.dataflow.native_io.iobase import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
>  line 23, in 
> from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
>  line 43, in 
> from apache_beam.internal.clients import dataflow as dataflow_api
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
>  line 23, in 
> from apitools.base.py import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
>  line 22, in 
> from apitools.base.py.credentials_lib import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
>  line 50, in 
> from oauth2client import locked_file
> ImportError: cannot import name locked_file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-753) Travis failure (cannot import name locked_file)

2016-10-17 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582670#comment-15582670
 ] 

Ahmet Altay commented on BEAM-753:
--

This bug shows that, python sdk is open to future errors because it depends on 
the latest version of its dependencies. This should be fixed by pinning the 
versions of all dependencies.

> Travis failure (cannot import name locked_file)
> ---
>
> Key: BEAM-753
> URL: https://issues.apache.org/jira/browse/BEAM-753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> ERROR: Failure: ImportError (cannot import name locked_file)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
>  line 78, in 
> from apache_beam import io
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
>  line 21, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
>  line 29, in 
> from apache_beam.io import filebasedsource
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
>  line 31, in 
> from apache_beam.io import concat_source
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
>  line 24, in 
> from apache_beam.io import iobase
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
>  line 818, in 
> from apache_beam.runners.dataflow.native_io.iobase import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
>  line 23, in 
> from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
>  line 43, in 
> from apache_beam.internal.clients import dataflow as dataflow_api
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
>  line 23, in 
> from apitools.base.py import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
>  line 22, in 
> from apitools.base.py.credentials_lib import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
>  line 50, in 
> from oauth2client import locked_file
> ImportError: cannot import name locked_file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-753) Travis failure (cannot import name locked_file)

2016-10-14 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576178#comment-15576178
 ] 

Ahmet Altay commented on BEAM-753:
--

There was a new oauth2client release to PyPI this morning, it brake sdk 
installation and tests. 

Python SDK setup.py list this requirement:
oauth2client>=2.0.1

There should be two updates:
1. Short term, change setup.py to fix the break and work with an older version 
of oauth2client, the previous version 3.0.0 was working fine.
2. Understand the difference and update the code to work with oauth2client 4.0.0

> Travis failure (cannot import name locked_file)
> ---
>
> Key: BEAM-753
> URL: https://issues.apache.org/jira/browse/BEAM-753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>
> ERROR: Failure: ImportError (cannot import name locked_file)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
>  line 78, in 
> from apache_beam import io
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
>  line 21, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
>  line 29, in 
> from apache_beam.io import filebasedsource
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
>  line 31, in 
> from apache_beam.io import concat_source
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
>  line 24, in 
> from apache_beam.io import iobase
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
>  line 818, in 
> from apache_beam.runners.dataflow.native_io.iobase import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
>  line 23, in 
> from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
>  line 43, in 
> from apache_beam.internal.clients import dataflow as dataflow_api
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
>  line 23, in 
> from apitools.base.py import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
>  line 22, in 
> from apitools.base.py.credentials_lib import *
>   File 
> "/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
>  line 50, in 
> from oauth2client import locked_file
> ImportError: cannot import name locked_file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-520) Update Python SDK example tests to use assert_that

2016-10-14 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576148#comment-15576148
 ] 

Ahmet Altay commented on BEAM-520:
--

Thank you for noticing this. I can reproduce the error at head. This is not 
related to your change. I created 
(https://issues.apache.org/jira/browse/BEAM-753) for this and looking at it.

> Update Python SDK example tests to use assert_that
> --
>
> Key: BEAM-520
> URL: https://issues.apache.org/jira/browse/BEAM-520
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starter
>
> Most of our examples use assert_that to test examples:
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/complete/autocomplete_test.py#L38
> Some of our examples use this pattern:
> 1) Create a path(s)
> 2) Construct fake command line arguments using these paths
> 3) Construct an argparse object to parse these flags
> 4) Do the (often trivial logic)
> 5) Write to a file
> 6) Manually open and read the file
> 7) Compare results. 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py
> As well as being cumbersome, this obscures the core of what is being 
> illustrated and tested. As many as possible tests should be updated to use 
> assert_that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-753) Travis failure (cannot import name locked_file)

2016-10-14 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-753:


 Summary: Travis failure (cannot import name locked_file)
 Key: BEAM-753
 URL: https://issues.apache.org/jira/browse/BEAM-753
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


ERROR: Failure: ImportError (cannot import name locked_file)
--
Traceback (most recent call last):
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/loader.py",
 line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
 line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/nose-1.3.7-py2.7.egg/nose/importer.py",
 line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/__init__.py",
 line 78, in 
from apache_beam import io
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/__init__.py",
 line 21, in 
from apache_beam.io.avroio import *
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/avroio.py",
 line 29, in 
from apache_beam.io import filebasedsource
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/filebasedsource.py",
 line 31, in 
from apache_beam.io import concat_source
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/concat_source.py",
 line 24, in 
from apache_beam.io import iobase
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/io/iobase.py",
 line 818, in 
from apache_beam.runners.dataflow.native_io.iobase import *
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/__init__.py",
 line 23, in 
from apache_beam.runners.dataflow_runner import DataflowPipelineRunner
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/runners/dataflow_runner.py",
 line 43, in 
from apache_beam.internal.clients import dataflow as dataflow_api
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/apache_beam/internal/clients/dataflow/__init__.py",
 line 23, in 
from apitools.base.py import *
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/__init__.py",
 line 22, in 
from apitools.base.py.credentials_lib import *
  File 
"/usr/local/google/home/altay/Desktop/beam/test/incubator-beam/sdks/python/.tox/py27/local/lib/python2.7/site-packages/apitools/base/py/credentials_lib.py",
 line 50, in 
from oauth2client import locked_file
ImportError: cannot import name locked_file




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-520) Update Python SDK example tests to use assert_that

2016-10-13 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573113#comment-15573113
 ] 

Ahmet Altay commented on BEAM-520:
--

Ven, thank you for your interest.

A partial cleanup was already done in an earlier PR 
(https://github.com/apache/incubator-beam/pull/650). Take a look at that. 
Similar changes needs to happen in the remaining tests (e.g. 
multiple_output_pardo_test.py). Convert as many tests/examples as possible to 
the assert_that pattern and send a PR.

If you are not familiar, you can also look at Beam contribution guide 
(http://beam.incubator.apache.org/contribute/contribution-guide/) for the 
general workflow of working with Beam.

> Update Python SDK example tests to use assert_that
> --
>
> Key: BEAM-520
> URL: https://issues.apache.org/jira/browse/BEAM-520
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starter
>
> Most of our examples use assert_that to test examples:
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/complete/autocomplete_test.py#L38
> Some of our examples use this pattern:
> 1) Create a path(s)
> 2) Construct fake command line arguments using these paths
> 3) Construct an argparse object to parse these flags
> 4) Do the (often trivial logic)
> 5) Write to a file
> 6) Manually open and read the file
> 7) Compare results. 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py
> As well as being cumbersome, this obscures the core of what is being 
> illustrated and tested. As many as possible tests should be updated to use 
> assert_that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-520) Update Python SDK example tests to use assert_that

2016-10-13 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-520:
-
Assignee: Frances Perry

> Update Python SDK example tests to use assert_that
> --
>
> Key: BEAM-520
> URL: https://issues.apache.org/jira/browse/BEAM-520
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Frances Perry
>Priority: Minor
>  Labels: starter
>
> Most of our examples use assert_that to test examples:
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/complete/autocomplete_test.py#L38
> Some of our examples use this pattern:
> 1) Create a path(s)
> 2) Construct fake command line arguments using these paths
> 3) Construct an argparse object to parse these flags
> 4) Do the (often trivial logic)
> 5) Write to a file
> 6) Manually open and read the file
> 7) Compare results. 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py
> As well as being cumbersome, this obscures the core of what is being 
> illustrated and tested. As many as possible tests should be updated to use 
> assert_that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-520) Update Python SDK example tests to use assert_that

2016-10-13 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-520:
-
Assignee: (was: Frances Perry)

> Update Python SDK example tests to use assert_that
> --
>
> Key: BEAM-520
> URL: https://issues.apache.org/jira/browse/BEAM-520
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starter
>
> Most of our examples use assert_that to test examples:
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/complete/autocomplete_test.py#L38
> Some of our examples use this pattern:
> 1) Create a path(s)
> 2) Construct fake command line arguments using these paths
> 3) Construct an argparse object to parse these flags
> 4) Do the (often trivial logic)
> 5) Write to a file
> 6) Manually open and read the file
> 7) Compare results. 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py
> As well as being cumbersome, this obscures the core of what is being 
> illustrated and tested. As many as possible tests should be updated to use 
> assert_that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-721) Travis CI fails to run Python tox tests on Mac

2016-10-07 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556345#comment-15556345
 ] 

Ahmet Altay commented on BEAM-721:
--

This failure happens in only some of the Travis/Mac runs. For some reason 
sometimes the image contains a pre-installed tox, and the path for that binary 
is different than the one the script expects.

> Travis CI fails to run Python tox tests on Mac
> --
>
> Key: BEAM-721
> URL: https://issues.apache.org/jira/browse/BEAM-721
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
> Environment: Mac
>Reporter: Pablo Estrada
>Assignee: Frances Perry
>
> Some Travis CI runs on Mac are failing because the test script can not find 
> tox.
> See: https://travis-ci.org/apache/incubator-beam/jobs/165306424#L86
> The travis.yml file does attempt to install tox (See: 
> https://github.com/apache/incubator-beam/blob/python-sdk/.travis.yml#L66)
> Looking at the logs, it seems that tox is available in a different directory 
> (/usr/local), and TOX_HOME is set to $HOME/Library/Python/2.7/bin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-731) Replace DirectRunner with InProcessRunner

2016-10-07 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-731:


 Summary: Replace DirectRunner with InProcessRunner
 Key: BEAM-731
 URL: https://issues.apache.org/jira/browse/BEAM-731
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


Remove the old DirectRunner and replace with the new InProcessRunner.

There is an overhead for keeping both runners (testing/code maintenance etc.) 
InProcessRunner has been available for a while, it is tested enough for the 
being the default runner. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-528) Add @experimental annotations

2016-10-05 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-528.

   Resolution: Fixed
Fix Version/s: Not applicable

> Add @experimental annotations 
> --
>
> Key: BEAM-528
> URL: https://issues.apache.org/jira/browse/BEAM-528
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: María GH
>Priority: Minor
>  Labels: starter
> Fix For: Not applicable
>
>
> Experimental/deprecation warnings: use the warnings standard module in 
> conjunction with decorators as described here:
> https://docs.python.org/2/library/warnings.html
> Some code sample for a deprecated decorator that is kinda/sorta similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-693) pydoc is not working

2016-09-28 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-693:


 Summary: pydoc is not working
 Key: BEAM-693
 URL: https://issues.apache.org/jira/browse/BEAM-693
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Repro:
Start the pydoc server (pydoc -p ) and navigate to the apache_beam root:
http://localhost:/apache_beam.html

Following errors are shown instead of the actual documentation:

problem in apache_beam - : No module named avro
problem in apache_beam - : cannot import name 
coders





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-517) Check versions of pip and cython

2016-09-26 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523923#comment-15523923
 ] 

Ahmet Altay commented on BEAM-517:
--

Adding that snippet makes sense. 

[~robertwb] (cython and Beam developer) could comment on the oldest required 
version of cython. 

> Check versions of pip and cython
> 
>
> Key: BEAM-517
> URL: https://issues.apache.org/jira/browse/BEAM-517
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starter
>
> Python SDK depends on pip and cython however it does not check the versions 
> of these.
> Some of the pip flags does not exist in older versions:
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/28#issuecomment-236382953
> (Note: Even though the above issue was reported by the user in a different 
> repo it is related to the apache beam sdk)
> Similarly with cython, SDK supports running with or without Cython. Because 
> of that reason it is not list it as a requirement in the setup.py file. 
> However, with an old version of cython SDK might fail.
> To avoid the above problem: In the SDK check the version of these packages 
> and show a warning to upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-588) All runners should support ProfilingOptions

2016-08-25 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-588:


 Summary: All runners should support ProfilingOptions
 Key: BEAM-588
 URL: https://issues.apache.org/jira/browse/BEAM-588
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor


https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/options.py#L366

This is useful for profiling pipelines in different environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-563) DoFn Reuse: Update DirectRunner

2016-08-17 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-563:


 Summary: DoFn Reuse: Update DirectRunner
 Key: BEAM-563
 URL: https://issues.apache.org/jira/browse/BEAM-563
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


https://issues.apache.org/jira/browse/BEAM-562 will add setup and teardown 
methods to DoFns. Update DirectRunner to add support for these new methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-562) DoFn Reuse: Add new methods to DoFn

2016-08-17 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-562:


 Summary: DoFn Reuse: Add new methods to DoFn
 Key: BEAM-562
 URL: https://issues.apache.org/jira/browse/BEAM-562
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


Java SDK added setup and teardown methods to the DoFns. This makes DoFns 
reusable and provide performance improvements. Python SDK should add support 
for these new DoFn methods:

Proposal doc: 
https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-523) Minor typo in aggregator_test.py

2016-08-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-523:
-
Assignee: (was: Ahmet Altay)

> Minor typo in aggregator_test.py
> 
>
> Key: BEAM-523
> URL: https://issues.apache.org/jira/browse/BEAM-523
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Frank Yellin
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> aggregators is repeatedly misspelled as aggeregators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-523) Minor typo in aggregator_test.py

2016-08-11 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-523:
-
Assignee: Ahmet Altay

> Minor typo in aggregator_test.py
> 
>
> Key: BEAM-523
> URL: https://issues.apache.org/jira/browse/BEAM-523
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Frank Yellin
>Assignee: Ahmet Altay
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> aggregators is repeatedly misspelled as aggeregators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-539) Error when writing to the root of a GCS location

2016-08-08 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-539:


 Summary: Error when writing to the root of a GCS location
 Key: BEAM-539
 URL: https://issues.apache.org/jira/browse/BEAM-539
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Chamikara Jayalath
Priority: Minor


User issue: 
http://stackoverflow.com/questions/38811152/google-dataflow-python-pipeline-write-failure

Reproduction: use a TextFileSink and set output locations as gs://mybucket and 
it fails. Change it to gs://mybucket/ and it works.

The final output path is generated here:
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L495

And this seemingly works in the Java SDK.

Stack:

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/iobase.py", line 
1058, in finish_bundle
yield window.TimestampedValue(self.writer.close(), window.MAX_TIMESTAMP)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/fileio.py", line 
601, in close
self.sink.close(self.temp_handle)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/fileio.py", line 
687, in close
file_handle.close()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line 
617, in close
self._flush_write_buffer()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line 
647, in _flush_write_buffer
raise self.upload_thread.last_error  # pylint: disable=raising-bad-type
HttpError: HttpError accessing 
:
 response: <{'status': '404', 'alternate-protocol': '443:quic', 
'content-length': '165', 'vary': 'Origin, X-Origin', 'server': 'UploadServer', 
'x-guploader-uploadid': 
'AEnB2Uq6ZGb_CsrMVxozv6aL48k4OMMiRgYVeVGmJrM-sMQWRGeGMkesOQg5F0W7HZuaqTBog_d4ml-DlIars_ZvJTejdfcbAUr4gswZWVieq82ufc3WR2g',
 'date': 'Mon, 08 Aug 2016 21:29:46 GMT', 'alt-svc': 'quic=":443"; ma=2592000; 
v="36,35,34,33,32,31,30"', 'content-type': 'application/json; charset=UTF-8'}>, 
content <{
 "error": {
  "errors": [
   {
"domain": "global",
"reason": "notFound",
"message": "Not Found"
   }
  ],
  "code": 404,
  "message": "Not Found"
 }
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-536) Aggregator.py. More misleading documentation. More bad documentation

2016-08-05 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-536:
-
Component/s: sdk-py

> Aggregator.py.  More misleading documentation.  More bad documentation
> --
>
> Key: BEAM-536
> URL: https://issues.apache.org/jira/browse/BEAM-536
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Frank Yellin
>Priority: Minor
>
> The last paragraph of the documentation for Aggregator is:
> You can also query the combined value(s) of an aggregator by calling
> aggregated_value() or aggregated_values() on the result object returned after
> running a pipeline.
> There are multiple problems in this one sentence!
> #1) There is no such method aggregated_value() that I can find anywhere.
> #2) DirectRunner implements aggregated_values(), but DirectPipelineRunner 
> does not.  The latter is the far more interesting case.
> #3) When I use a BlockingDirectPipelineRunner and ask for its 
> aggregated_values(), I get an error message indicating that this is not 
> implemented in DirectPipelineRunner.  Very confusing since I never asked for 
> a DirectPipelineRunner.
> It is clear that this is because BlockingDirectPipelineRunner is a method 
> rather than a class.  Is this really the right thing?  Will there be other 
> confusing error messages.
> #4) The documentation for aggregated_values() says "returns a dict of step 
> names to values of the aggregator."  I have no idea what a "step" means in 
> this context.  In practice, it seems to be a single-element dictionary whose 
> key is 'user--' prefixed onto the aggregator name.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-531) Add support for getting aggregated values with dataflow runner

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-531:


 Summary: Add support for getting aggregated values with dataflow 
runner
 Key: BEAM-531
 URL: https://issues.apache.org/jira/browse/BEAM-531
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Charles Chen
Priority: Minor


The SDK for Python cannot extract metrics from the Dataflow service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-530) Decide where to place the tests and examples

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-530:


 Summary: Decide where to place the tests and examples
 Key: BEAM-530
 URL: https://issues.apache.org/jira/browse/BEAM-530
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor


Right now they are literally part of the package space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-529) Check immutability violations in DirectPipelineRunner

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-529:


 Summary: Check immutability violations in DirectPipelineRunner
 Key: BEAM-529
 URL: https://issues.apache.org/jira/browse/BEAM-529
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor


Users are going to mutate inputs and outputs of DoFn inappropriately. We should 
help their tests fail to catch such mistakes. (Similar to the 
DirectPipelineRunner in Java SDK)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-528) Add @experimental annotations

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-528:


 Summary: Add @experimental annotations 
 Key: BEAM-528
 URL: https://issues.apache.org/jira/browse/BEAM-528
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Experimental/deprecation warnings: use the warnings standard module in 
conjunction with decorators as described here:

https://docs.python.org/2/library/warnings.html

Some code sample for a deprecated decorator that is kinda/sorta similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-527) Pickling error when pickling a nested function

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-527:


 Summary: Pickling error when pickling a nested function 
 Key: BEAM-527
 URL: https://issues.apache.org/jira/browse/BEAM-527
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


There is a pickling error under the following conditions all happen:
- a function is defined inside a transforms' apply method
- then using it as MapFn
- that function references an instance variable of the outer transform.

Rewriting the nested function as an unnested DoFn appears to solve the problem.

If the limitations of pickling make it difficult to support nested functions 
then perhaps there's a way to make it easier for users to detect problems 
caused by nested functions and recommend appropriate fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-526) Mismatched pipelines give unclear error

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-526:


 Summary: Mismatched pipelines give unclear error
 Key: BEAM-526
 URL: https://issues.apache.org/jira/browse/BEAM-526
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Mistakenly mixing two pipeline gives an unclear error. This is an error, 
however we should improve the error message.

This could be reproduced by trying to flatten two things from different 
pipelines.

Improve the message for this assert:
https://github.com/aaltay/incubator-beam/blob/python-sdk/sdks/python/apache_beam/transforms/util.py#L135




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-525) Verify that ParDo with multiple outputs with tags un declared in with_outputs() work

2016-08-03 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-525:


 Summary: Verify that ParDo with multiple outputs with tags un 
declared in with_outputs() work 
 Key: BEAM-525
 URL: https://issues.apache.org/jira/browse/BEAM-525
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


test_undeclared_side_outputs was failing (when last checked) under certain 
conditions:

See this TODO:
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/dataflow_test.py#L202

This is probably not failing any more but it needs to be verified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-524) Description of "type" argument in Aggregator is incorrect

2016-08-03 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-524:
-
Component/s: sdk-py

> Description of "type" argument in Aggregator is incorrect
> -
>
> Key: BEAM-524
> URL: https://issues.apache.org/jira/browse/BEAM-524
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Frank Yellin
>Priority: Minor
>
> Two problems with documentation for "type" argument.
> Trivial:  Remove "by default".  This phrase implies that there are other 
> alternatives besides what is listed.  There aren't.
> Non trivial.  The documentation says "types appropriate to the combine_fn" 
> are accepted.  I tried 
> Accumulator("foo", max, datetime.datetime)
> This failed even though "datetime.datetime" is a perfectly reasonable type to 
> want to take the max of.  (I wanted to know precisely when the last job 
> finished.)
> Either the documentation needs to be changed to specify that max/min only 
> apply to numeric types, or the code needs to be changed to allow other uses 
> of min and max.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-523) Minor typo in aggregator_test.py

2016-08-03 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-523:
-
   Assignee: (was: Frances Perry)
 Labels: starter  (was: )
Component/s: (was: beam-model)
 sdk-py

> Minor typo in aggregator_test.py
> 
>
> Key: BEAM-523
> URL: https://issues.apache.org/jira/browse/BEAM-523
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Frank Yellin
>Priority: Trivial
>  Labels: starter
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> aggregators is repeatedly misspelled as aggeregators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-520) Update Python SDK example tests to use assert_that

2016-08-02 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-520:


 Summary: Update Python SDK example tests to use assert_that
 Key: BEAM-520
 URL: https://issues.apache.org/jira/browse/BEAM-520
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Most of our examples use assert_that to test examples:

https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/complete/autocomplete_test.py#L38

Some of our examples use this pattern:

1) Create a path(s)
2) Construct fake command line arguments using these paths
3) Construct an argparse object to parse these flags
4) Do the (often trivial logic)
5) Write to a file
6) Manually open and read the file
7) Compare results. 

https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/examples/cookbook/multiple_output_pardo_test.py

As well as being cumbersome, this obscures the core of what is being 
illustrated and tested. As many as possible tests should be updated to use 
assert_that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-519) fileio.CompressionType requires a __ne__ method

2016-08-02 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-519:


 Summary:  fileio.CompressionType requires a __ne__ method
 Key: BEAM-519
 URL: https://issues.apache.org/jira/browse/BEAM-519
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


This code: 
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L279

Without the __ne__ operator instances of this class cannot be used in != 
expressions (only ==).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-518) More sophisticated assert matchers

2016-08-02 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-518:


 Summary:  More sophisticated assert matchers
 Key: BEAM-518
 URL: https://issues.apache.org/jira/browse/BEAM-518
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Expand the list of matchers for assert_that,

Example of work from Java: 
https://docs.google.com/document/d/1fZUUbG2LxBtqCVabQshldXIhkMcXepsbv2vuuny8Ix4/edit?pref=2=1#heading=h.lt80jryok8cs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-517) Check versions of pip and cython

2016-08-02 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-517:


 Summary: Check versions of pip and cython
 Key: BEAM-517
 URL: https://issues.apache.org/jira/browse/BEAM-517
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor


Python SDK depends on pip and cython however it does not check the versions of 
these.

Some of the pip flags does not exist in older versions:

https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/28#issuecomment-236382953

(Note: Even though the above issue was reported by the user in a different repo 
it is related to the apache beam sdk)

Similarly with cython, SDK supports running with or without Cython. Because of 
that reason it is not list it as a requirement in the setup.py file. However, 
with an old version of cython SDK might fail.

To avoid the above problem: In the SDK check the version of these packages and 
show a warning to upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-391) Exceptions in gcsio upload thread causes pipeline to stall

2016-07-15 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-391.

   Resolution: Fixed
Fix Version/s: 0.2.0-incubating

> Exceptions in gcsio upload thread causes pipeline to stall
> --
>
> Key: BEAM-391
> URL: https://issues.apache.org/jira/browse/BEAM-391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
> Fix For: 0.2.0-incubating
>
>
> gcsio got stuck with invalid bucket name
> GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket 
> does not exist. This causes upload thread to silenty fail. It logs exception 
> to the log but this does not stop the pipeline or closes the receiving end of 
> the multiprocessing.Pipe(). Later a call in to write() blocks at 
> self.conn.send_bytes(). Note that send may block if the buffer is full.
> Upload thread should have a finally clause to close the socket connection. Or 
> better propagating the exception to its parent. This is true for other types 
> of exceptions also.
> Another small issue in the GcsBufferedWriter.close(). It does not self 
> self.close to True.
> reproduction: python -m apache_beam.examples.wordcount --output 
> gs://no-such-thing/
> Prints the exception but goes on forever. Ctrl + C breaks the main thread 
> shows where it got stuck.
> Similarly reproducible on the service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-391) Exceptions in gcsio upload thread causes pipeline to stall

2016-07-08 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368449#comment-15368449
 ] 

Ahmet Altay commented on BEAM-391:
--

Another type of Exception that result in the same behavior:

Exception in thread Thread-10:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner 
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run 
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 
160, in wrapper return fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line 
563, in _start_upload self.client.objects.Insert(self.insert_request, 
upload=self.upload)
File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/storage/storage_v1_client.py",
 line 970, in Insertdownload=download)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", 
line 687, in _RunMethodhttp_request, client=self.client)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/transfer.py", 
line 838, in InitializeUploadretries=self.num_retries)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", 
line 351, in MakeRequestmax_retry_wait, total_wait_sec))
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", 
line 341, in MakeRequestcheck_response_func=check_response_func)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", 
line 391, in _MakeRequestNoRetry redirections=redirections, 
connection_type=connection_type)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 616, 
in new_request self._refresh(request_orig)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/auth.py", 
line 90, in _refresh token_data = json.loads(urllib2.urlopen(req).read())
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen return 
opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 431, in open response = 
self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 449, in _open '_open', req)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain result = 
func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open return 
self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open raise URLError(err)

Error is coming from auth.py _refresh(). That may require retries based on the 
type of error.

> Exceptions in gcsio upload thread causes pipeline to stall
> --
>
> Key: BEAM-391
> URL: https://issues.apache.org/jira/browse/BEAM-391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>
> gcsio got stuck with invalid bucket name
> GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket 
> does not exist. This causes upload thread to silenty fail. It logs exception 
> to the log but this does not stop the pipeline or closes the receiving end of 
> the multiprocessing.Pipe(). Later a call in to write() blocks at 
> self.conn.send_bytes(). Note that send may block if the buffer is full.
> Upload thread should have a finally clause to close the socket connection. Or 
> better propagating the exception to its parent. This is true for other types 
> of exceptions also.
> Another small issue in the GcsBufferedWriter.close(). It does not self 
> self.close to True.
> reproduction: python -m apache_beam.examples.wordcount --output 
> gs://no-such-thing/
> Prints the exception but goes on forever. Ctrl + C breaks the main thread 
> shows where it got stuck.
> Similarly reproducible on the service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-391) Exceptions in gcsio upload thread causes pipeline to stall

2016-07-07 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-391:
-
Summary: Exceptions in gcsio upload thread causes pipeline to stall  (was: 
Invalid GCS bucket name causes pipeline to stall)

> Exceptions in gcsio upload thread causes pipeline to stall
> --
>
> Key: BEAM-391
> URL: https://issues.apache.org/jira/browse/BEAM-391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>
> gcsio got stuck with invalid bucket name
> GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket 
> does not exist. This causes upload thread to silenty fail. It logs exception 
> to the log but this does not stop the pipeline or closes the receiving end of 
> the multiprocessing.Pipe(). Later a call in to write() blocks at 
> self.conn.send_bytes(). Note that send may block if the buffer is full.
> Upload thread should have a finally clause to close the socket connection. Or 
> better propagating the exception to its parent. This is true for other types 
> of exceptions also.
> Another small issue in the GcsBufferedWriter.close(). It does not self 
> self.close to True.
> reproduction: python -m apache_beam.examples.wordcount --output 
> gs://no-such-thing/
> Prints the exception but goes on forever. Ctrl + C breaks the main thread 
> shows where it got stuck.
> Similarly reproducible on the service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-428) InProcessRunner - Bundle based local runner

2016-07-06 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-428:


 Summary: InProcessRunner - Bundle based local runner
 Key: BEAM-428
 URL: https://issues.apache.org/jira/browse/BEAM-428
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


InProcessRunner is a bundle based drop in replacement for DirectRunner.

Similar to its Java equivalent it improves DirectRunner by executing transforms 
in parallel using bundles similar to a service based implementations. It offers 
better performance and more validation options.

Initially it will be a runner for executing batch jobs only. The target of this 
phase is to develop a drop in replacement for DirectRunner. Later it will be 
improved by adding streaming execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-416) Jenkins Python Verify post commit tests are timing out

2016-07-01 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359245#comment-15359245
 ] 

Ahmet Altay commented on BEAM-416:
--

This is probably related:


> Jenkins Python Verify post commit tests are timing out
> --
>
> Key: BEAM-416
> URL: https://issues.apache.org/jira/browse/BEAM-416
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Silviu Calinoiu
> Attachments: job_output (2)
>
>
> beam_PostCommit_PythonVerify is timing out at the e2e testing phase:
> Console output:
> https://builds.apache.org/view/Beam/job/beam_PostCommit_PythonVerify/8/console
> e2e test:
> https://pantheon.corp.google.com/dataflow/job/2016-07-01_08_02_45-15435546446836030984?project=apache-beam-testing
> Workers are failing to find the correct container image (from worker logs):
> Error syncing pod 6d3e3a71409d65aa43494143d705455b, skipping: failed to 
> "StartContainer" for "python" with ImagePullBackOff: "Back-off pulling image 
> \"dataflow.gcr.io/v1beta3/python:latest\""
> It might be related to this commit:
> https://github.com/apache/incubator-beam/commit/0bda677d47d5bd5d9c45b74e00e5c3fd113a4f81



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-416) Jenkins Python Verify post commit tests are timing out

2016-07-01 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-416:
-
Attachment: job_output (2)

job output from the timing out job

> Jenkins Python Verify post commit tests are timing out
> --
>
> Key: BEAM-416
> URL: https://issues.apache.org/jira/browse/BEAM-416
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Silviu Calinoiu
> Attachments: job_output (2)
>
>
> beam_PostCommit_PythonVerify is timing out at the e2e testing phase:
> Console output:
> https://builds.apache.org/view/Beam/job/beam_PostCommit_PythonVerify/8/console
> e2e test:
> https://pantheon.corp.google.com/dataflow/job/2016-07-01_08_02_45-15435546446836030984?project=apache-beam-testing
> Workers are failing to find the correct container image (from worker logs):
> Error syncing pod 6d3e3a71409d65aa43494143d705455b, skipping: failed to 
> "StartContainer" for "python" with ImagePullBackOff: "Back-off pulling image 
> \"dataflow.gcr.io/v1beta3/python:latest\""
> It might be related to this commit:
> https://github.com/apache/incubator-beam/commit/0bda677d47d5bd5d9c45b74e00e5c3fd113a4f81



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-403) Support staging SDK packages from PyPI for remote execution

2016-06-30 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-403:
-
Component/s: sdk-py

> Support staging SDK packages from PyPI for remote execution
> ---
>
> Key: BEAM-403
> URL: https://issues.apache.org/jira/browse/BEAM-403
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py
>Reporter: Silviu Calinoiu
>Assignee: Davor Bonaci
>Priority: Minor
>
> Currently the dataflow runner will pickup the SDK tarball from the old github 
> repo and stage it. We need to pick it up from PyPI (where packages will be 
> released) and stage it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-391) Invalid GCS bucket name causes pipeline to stall

2016-06-29 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-391:
-
Summary: Invalid GCS bucket name causes pipeline to stall  (was: gcsio got 
stuck with invalid bucket name)

> Invalid GCS bucket name causes pipeline to stall
> 
>
> Key: BEAM-391
> URL: https://issues.apache.org/jira/browse/BEAM-391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>
> gcsio got stuck with invalid bucket name
> GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket 
> does not exist. This causes upload thread to silenty fail. It logs exception 
> to the log but this does not stop the pipeline or closes the receiving end of 
> the multiprocessing.Pipe(). Later a call in to write() blocks at 
> self.conn.send_bytes(). Note that send may block if the buffer is full.
> Upload thread should have a finally clause to close the socket connection. Or 
> better propagating the exception to its parent. This is true for other types 
> of exceptions also.
> Another small issue in the GcsBufferedWriter.close(). It does not self 
> self.close to True.
> reproduction: python -m apache_beam.examples.wordcount --output 
> gs://no-such-thing/
> Prints the exception but goes on forever. Ctrl + C breaks the main thread 
> shows where it got stuck.
> Similarly reproducible on the service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-391) gcsio got stuck with invalid bucket name

2016-06-29 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-391:


 Summary: gcsio got stuck with invalid bucket name
 Key: BEAM-391
 URL: https://issues.apache.org/jira/browse/BEAM-391
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay


gcsio got stuck with invalid bucket name

GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket 
does not exist. This causes upload thread to silenty fail. It logs exception to 
the log but this does not stop the pipeline or closes the receiving end of the 
multiprocessing.Pipe(). Later a call in to write() blocks at 
self.conn.send_bytes(). Note that send may block if the buffer is full.

Upload thread should have a finally clause to close the socket connection. Or 
better propagating the exception to its parent. This is true for other types of 
exceptions also.

Another small issue in the GcsBufferedWriter.close(). It does not self 
self.close to True.

reproduction: python -m apache_beam.examples.wordcount --output 
gs://no-such-thing/

Prints the exception but goes on forever. Ctrl + C breaks the main thread shows 
where it got stuck.

Similarly reproducible on the service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-348) Clean temp_dir usage in _stage_extra_packages

2016-06-17 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-348.
--
Resolution: Fixed

> Clean temp_dir usage in _stage_extra_packages
> -
>
> Key: BEAM-348
> URL: https://issues.apache.org/jira/browse/BEAM-348
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-353) Correct the licenses

2016-06-16 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-353.
--
Resolution: Fixed

> Correct the licenses
> 
>
> Key: BEAM-353
> URL: https://issues.apache.org/jira/browse/BEAM-353
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>
> Fix the licenses to the correct one and add license to files with a missing 
> license.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-353) Correct the licenses

2016-06-16 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-353:


 Summary: Correct the licenses
 Key: BEAM-353
 URL: https://issues.apache.org/jira/browse/BEAM-353
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


Fix the licenses to the correct one and add license to files with a missing 
license.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-348) Clean temp_dir usage in _stage_extra_packages

2016-06-15 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-348:


 Summary: Clean temp_dir usage in _stage_extra_packages
 Key: BEAM-348
 URL: https://issues.apache.org/jira/browse/BEAM-348
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)