[jira] [Resolved] (BEAM-1410) Reduce sdk-py DirectRunner running time and memory consumption

2017-02-21 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon resolved BEAM-1410.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> Reduce sdk-py DirectRunner running time and memory consumption
> --
>
> Key: BEAM-1410
> URL: https://issues.apache.org/jira/browse/BEAM-1410
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>  Labels: performance, python
> Fix For: 0.6.0
>
>
> Some experimental benchmarks shows that DirectRunner can improve performance 
> in cpu and memory. 
> I will roll out some CLs to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon closed BEAM-1496.
---
   Resolution: Not A Problem
Fix Version/s: Not applicable

> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: Not applicable
>
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869200#comment-15869200
 ] 

Younghee Kwon commented on BEAM-1496:
-

I see; sorry for the noise. I thought it might have broken the automated tests, 
but I confirmed that travis-ci passes.

Closing..

> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868926#comment-15868926
 ] 

Younghee Kwon commented on BEAM-1496:
-

I could add nose to setup.py, but the notice in the site discourages me..
https://nose.readthedocs.io/en/latest/

Note to Users

Nose has been in maintenance mode for the past several years and will likely 
cease without a new person/team to take over maintainership. New projects 
should consider using Nose2, py.test, or just plain unittest/unittest2.



> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)
Younghee Kwon created BEAM-1496:
---

 Summary: pysdk's sideinputs_test requires nose, but not installed 
by default
 Key: BEAM-1496
 URL: https://issues.apache.org/jira/browse/BEAM-1496
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Younghee Kwon
Assignee: Ahmet Altay
Priority: Minor


$ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test  
   
No handlers could be found for logger "oauth2client.contrib.multistore_file"
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
  File 
"/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 23, in 
from nose.plugins.attrib import attr
ImportError: No module named nose.plugins.attrib




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-588) All runners should support ProfilingOptions

2017-02-07 Thread Younghee Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856468#comment-15856468
 ] 

Younghee Kwon commented on BEAM-588:


The PR is about to be merged. 

Several things to do in a successive PR: 
 1. integrate the memory reporter into DirectRunner using PipelineOptions
 2. having an option to dump full profile into disk (as opposed to only logging 
the biggest 10 as now).
 3. (optional) experiment with other profilers for the platforms that guppy is 
not available.


> All runners should support ProfilingOptions
> ---
>
> Key: BEAM-588
> URL: https://issues.apache.org/jira/browse/BEAM-588
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
>
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/options.py#L366
> This is useful for profiling pipelines in different environments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1410) Reduce sdk-py DirectRunner running time and memory consumption

2017-02-06 Thread Younghee Kwon (JIRA)
Younghee Kwon created BEAM-1410:
---

 Summary: Reduce sdk-py DirectRunner running time and memory 
consumption
 Key: BEAM-1410
 URL: https://issues.apache.org/jira/browse/BEAM-1410
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Younghee Kwon
Assignee: Ahmet Altay
Priority: Minor


Some experimental benchmarks shows that DirectRunner can improve performance in 
cpu and memory. 

I will roll out some CLs to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1246) Update README.md to remove incubating notion

2017-01-09 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon closed BEAM-1246.
---
   Resolution: Fixed
Fix Version/s: Not applicable

PR merged.

> Update README.md to remove incubating notion
> 
>
> Key: BEAM-1246
> URL: https://issues.apache.org/jira/browse/BEAM-1246
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Trivial
>  Labels: documentation
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)

2017-01-09 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon closed BEAM-1233.
---

> Implement TFRecordIO (Reading/writing Tensorflow Standard format)
> -
>
> Key: BEAM-1233
> URL: https://issues.apache.org/jira/browse/BEAM-1233
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> Tensorflow is an open source Machine Learning project, which is getting lots 
> of attention these days. Apache Beam can be used as a good preprocessing tool 
> for this Machine Learning tool, however Tensorflow supports limited number of 
> input file formats -- only csv and its own record format (so called TFRecord).
> On the other hand, Apache Beam doesn't support reading/writing in TFRecord 
> format. This would be useful once it supports TFRecordIO natively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1245) Use @unittest.skip instead of try/except in avroio_test

2017-01-09 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon closed BEAM-1245.
---
   Resolution: Fixed
Fix Version/s: Not applicable

PR 1736 merged. 

> Use @unittest.skip instead of try/except in avroio_test
> ---
>
> Key: BEAM-1245
> URL: https://issues.apache.org/jira/browse/BEAM-1245
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: Not applicable
>
>
> As said in the summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)

2017-01-09 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon resolved BEAM-1233.
-
   Resolution: Fixed
Fix Version/s: Not applicable

The PR that adds TFRecordIO is pushed to python-sdk branch.

> Implement TFRecordIO (Reading/writing Tensorflow Standard format)
> -
>
> Key: BEAM-1233
> URL: https://issues.apache.org/jira/browse/BEAM-1233
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
> Fix For: Not applicable
>
>
> Tensorflow is an open source Machine Learning project, which is getting lots 
> of attention these days. Apache Beam can be used as a good preprocessing tool 
> for this Machine Learning tool, however Tensorflow supports limited number of 
> input file formats -- only csv and its own record format (so called TFRecord).
> On the other hand, Apache Beam doesn't support reading/writing in TFRecord 
> format. This would be useful once it supports TFRecordIO natively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1246) Update README.md to remove incubating notion

2017-01-04 Thread Younghee Kwon (JIRA)
Younghee Kwon created BEAM-1246:
---

 Summary: Update README.md to remove incubating notion
 Key: BEAM-1246
 URL: https://issues.apache.org/jira/browse/BEAM-1246
 Project: Beam
  Issue Type: Task
  Components: sdk-py
Reporter: Younghee Kwon
Assignee: Ahmet Altay
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1245) Use @unittest.skip instead of try/except in avroio_test

2017-01-04 Thread Younghee Kwon (JIRA)
Younghee Kwon created BEAM-1245:
---

 Summary: Use @unittest.skip instead of try/except in avroio_test
 Key: BEAM-1245
 URL: https://issues.apache.org/jira/browse/BEAM-1245
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Younghee Kwon
Assignee: Ahmet Altay
Priority: Minor


As said in the summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)