[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576760#comment-14576760
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-109912571
  
Thank you for your contribution.


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576759#comment-14576759
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/762


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576746#comment-14576746
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-109907044
  
Thanks for the documentation. Could you open a JIRA to account for the 
necessary changes in terms of extensibility?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577248#comment-14577248
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-110013765
  
Okay, will do that.


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor
 Fix For: 0.9


 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572471#comment-14572471
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108812916
  
:+1: This has been requested multiple times now. I would merge your pull 
request. Can you add some documentation?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572545#comment-14572545
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108844255
  
Sure, I can do that. Do you talk about a user documentation or more Java 
docs. And if the former applies, where would I put that documentation 
preferrably?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572550#comment-14572550
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845395
  
I'm talking about the user documentation. You could mention support for 
gzip and add an example here: 
http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sources


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572553#comment-14572553
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845535
  
You can modify the documentation in the `docs/apis/programming_guide.md` 
file.


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570882#comment-14570882
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108443527
  
I exchanged that part with the Validate with Preconditions.


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569601#comment-14569601
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31560285
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

so if there is no inflater input stream available, it will just fall back 
to the compressed data stream?
Wouldn't it better to at least log something or fail?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569625#comment-14569625
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user sekruse commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562256
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

It might also be the case that the stream was not compressed at all. It 
would of course be nice to react appropriately to a missing codec, but how 
would we know if the current input split belongs to an uncompressed file or a 
compressed file with an unknown codec?


 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569638#comment-14569638
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562955
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory? inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

Ah, okay, I see. I didn't read the code closely enough.



 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569589#comment-14569589
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31559688
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -21,10 +21,16 @@
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 
+import org.apache.commons.lang3.Validate;
--- End diff --

I'm really sorry that you ran into this, but the community recently decided 
to use Guava's Preconditions.check() instead of commons lang.
Can you replace that?



 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569425#comment-14569425
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

GitHub user sekruse opened a pull request:

https://github.com/apache/flink/pull/762

[FLINK-1981] add support for GZIP files

* register decompression algorithms with file extensions for extensibility
* fit deflate decompression into this scheme
* add support for GZIP files
* test support for deflate and GZIP files with the CsvInputFormat

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sekruse/flink FLINK-1981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #762


commit 6acae7faa4e27837ce3c9272d4310ec6c46895ab
Author: Sebastian Kruse sebastian.kr...@hpi.de
Date:   2015-06-02T16:58:35Z

[FLINK-1981] add support for GZIP files

* register decompression algorithms with file extensions for extensibility
* fit deflate decompression into this scheme
* add support for GZIP files
* test support for deflate and GZIP files with the CsvInputFormat




 Add GZip support
 

 Key: FLINK-1981
 URL: https://issues.apache.org/jira/browse/FLINK-1981
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Minor

 GZip, as a commonly used compression format, should be supported in the same 
 way as the already supported deflate files. This allows to use GZip files 
 with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)