[jira] [Commented] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)

2017-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812873#comment-15812873
 ] 

ASF GitHub Bot commented on BEAM-1233:
--

Github user yk5 closed the pull request at:

https://github.com/apache/beam/pull/1749


> Implement TFRecordIO (Reading/writing Tensorflow Standard format)
> -
>
> Key: BEAM-1233
> URL: https://issues.apache.org/jira/browse/BEAM-1233
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>
> Tensorflow is an open source Machine Learning project, which is getting lots 
> of attention these days. Apache Beam can be used as a good preprocessing tool 
> for this Machine Learning tool, however Tensorflow supports limited number of 
> input file formats -- only csv and its own record format (so called TFRecord).
> On the other hand, Apache Beam doesn't support reading/writing in TFRecord 
> format. This would be useful once it supports TFRecordIO natively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806525#comment-15806525
 ] 

ASF GitHub Bot commented on BEAM-1233:
--

GitHub user yk5 opened a pull request:

https://github.com/apache/beam/pull/1749

[BEAM-1233] Create TFRecordIO, providing source/sink for TFRecords, 

which is the dedicated record format for Tensorflow.

For more about TFRecords, refer to 
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/python_io.md

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yk5/beam tfrecord

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1749.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1749


commit 3bbd2c1c208860c48c7a4c1909e3936a1fab4faa
Author: Younghee Kwon 
Date:   2017-01-07T02:05:56Z

Create TFRecordIO, which provides source/sink for TFRecords, the dedicated 
record format for Tensorflow.

For more about TFRecords, refer to 
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/python_io.md




> Implement TFRecordIO (Reading/writing Tensorflow Standard format)
> -
>
> Key: BEAM-1233
> URL: https://issues.apache.org/jira/browse/BEAM-1233
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>
> Tensorflow is an open source Machine Learning project, which is getting lots 
> of attention these days. Apache Beam can be used as a good preprocessing tool 
> for this Machine Learning tool, however Tensorflow supports limited number of 
> input file formats -- only csv and its own record format (so called TFRecord).
> On the other hand, Apache Beam doesn't support reading/writing in TFRecord 
> format. This would be useful once it supports TFRecordIO natively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)