[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579550#comment-16579550
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha commented on issue #6532: [FLINK-10109] Add documentation for 
StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#issuecomment-412816175
 
 
   @zentol Ah, I saw that typo fix too late. Will fix now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.6.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579544#comment-16579544
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha closed pull request #6532: [FLINK-10109] Add documentation for 
StreamingFileSink
URL: https://github.com/apache/flink/pull/6532
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/dev/connectors/streamfile_sink.md 
b/docs/dev/connectors/streamfile_sink.md
new file mode 100644
index 000..3db5577f67c
--- /dev/null
+++ b/docs/dev/connectors/streamfile_sink.md
@@ -0,0 +1,123 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+The `StreamingFileSink` supports both row-wise encoding formats and
+bulk-encoding formats, such as [Apache Parquet](http://parquet.apache.org).
+
+ Using Row-encoded Output Formats
+
+The only required configuration are the base path where we want to output our
+data and an
+[Encoder]({{ site.baseurl 
}}/api/java/org/apache/flink/api/common/serialization/Encoder.html)
+that is used for serializing records to the `OutputStream` for each file.
+
+Basic usage thus looks like this:
+
+
+
+
+{% highlight java %}
+import org.apache.flink.api.common.serialization.Encoder;
+import org.apache.flink.core.fs.Path;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink;
+
+DataStream input = ...;
+
+final StreamingFileSink sink = StreamingFileSink
+   .forRowFormat(new Path(outputPath), (Encoder) (element, stream) 
-> {
+   PrintStream out = new PrintStream(stream);
+   out.println(element.f1);
+   })
+   .build();
+
+input.addSink(sink);
+
+{% endhighlight %}
+
+
+{% highlight scala %}
+import org.apache.flink.api.common.serialization.Encoder
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+
+val input: DataStream[String] = ...
+
+final StreamingFileSink[String] sink = StreamingFileSink
+   .forRowFormat(new Path(outputPath), (element, stream) => {
+   val out = new PrintStream(stream)
+   out.println(element.f1)
+   })
+   .build()
+
+input.addSink(sink)
+
+{% endhighlight %}
+
+
+
+This will create a streaming sink that creates hourly buckets and uses a
+default rolling policy. The default bucket assigner is
+[DateTimeBucketAssigner]({{ site.baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
+and the default rolling policy is
+[DefaultRollingPolicy]({{ site.baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html).
+You can specify a custom
+[BucketAssigner]({{ site.baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html)
+and
+[RollingPolicy]({{ site.baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)
+on the sink builder. Please check out the JavaDoc for
+[StreamingFileSink]({{ site.baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html)
+for more configuration options and more documentation about the workings and
+interactions of bucket assigners and rolling policies.
+
+ Using Bulk-encoded Output Formats
+
+In the above example we used an `Encoder` that can encode or serialize each
+record individually. The streaming file sink also supports bulk-encoded output
+formats such as [Apache Parquet](http://parquet.apache.org). To use these,
+instead of `StreamingFileSink.forRowFormat()` you would use
+`StreamingFileSink.forBulkFormat()` and specify a `BulkWriter.Factory`.
+
+[ParquetAvroWriters]({{ site.baseurl 
}}/api/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.html)
+has static methods for creating a 

[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579546#comment-16579546
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha commented on issue #6532: [FLINK-10109] Add documentation for 
StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#issuecomment-412815785
 
 
   Merged, thanks for the reviews!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.6.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578254#comment-16578254
 ] 

ASF GitHub Bot commented on FLINK-10109:


zentol commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r209600326
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,123 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+The `StreamingFileSink` supports both row-wise encoding formats and
+bulk-encoding formats, such as [Apache Parquet](http://parquet.apache.org).
+
+ Using Row-encoded Output Formats
+
+The only required configuration are the base path were we want to output our
 
 Review comment:
   typo: were -> where


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.6.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576352#comment-16576352
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha commented on issue #6532: [FLINK-10109] Add documentation for 
StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#issuecomment-412101222
 
 
   I think I addressed all comments, PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.6.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575909#comment-16575909
 ] 

ASF GitHub Bot commented on FLINK-10109:


kl0u commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r209176266
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,120 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+ Usage
+
+The only required configuration are the base path were we want to output our
+data and an
+[Encoder](http://flink.apache.org/docs/latest/api/java/org/apache/flink/api/common/serialization/Encoder.html)
+that is used for serializing records to the `OutputStream` for each file.
+
+Basic usage thus looks like this:
+
+
+
+
+{% highlight java %}
+import org.apache.flink.api.common.serialization.Encoder;
+import org.apache.flink.core.fs.Path;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink;
+
+DataStream input = ...;
+
+final StreamingFileSink sink = StreamingFileSink
+   .forRowFormat(new Path(outputPath), (Encoder) (element, stream) 
-> {
+   PrintStream out = new PrintStream(stream);
+   out.println(element.f1);
+   })
+   .build();
+
+input.addSink(sink);
+
+{% endhighlight %}
+
+
+{% highlight scala %}
+import org.apache.flink.api.common.serialization.Encoder
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+
+val input: DataStream[String] = ...
+
+final StreamingFileSink[String] sink = StreamingFileSink
+   .forRowFormat(new Path(outputPath), (element, stream) => {
+   val out = new PrintStream(stream)
+   out.println(element.f1)
+   })
+   .build()
+
+input.addSink(sink)
+
+{% endhighlight %}
+
+
+
+This will create a streaming sink that creates hourly buckets and uses a
+default rolling policy. The default bucket assigner is
+[DateTimeBucketAssigner](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
+and the default rolling policy is
+[DefaultRollingPolicy](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html).
+You can specify a custom
+[BucketAssigner](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html)
+and
+[RollingPolicy](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)
+on the sink builder. Please check out the JavaDoc for
+[StreamingFileSink](http://flink.apache.org/docs/latest/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html)
+for more configuration options and more documentation about the workings and
+interactions of bucket assigners and rolling policies.
+
+ Using Bulk-encoded Output Formats
+
+In the above example we used an `Encoder` that can encode or serialize each
+record individually. The streaming file sink also supports bulk-encoded output
+formats such as [Apache Parquet](http://parquet.apache.org). To use these,
+instead of `StreamingFileSink.forRowFormat()` you would use
+`StreamingFileSink.forBulkFormat()` and specify a `BulkWriter.Factory`.
+
+[ParquetAvroWriters](http://flink.apache.org/docs/latest/api/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.html)
+has static methods for creating a `BulkWriter.Factory` for various types.
+
+
+  Note: With Bulk Writers, only the
+  OnCheckpointRollingPolicy, which rolls the part file on every
+  checkpoint, is supported.
+
+
 
 Review comment:
   And this:
   
   ```
   IMPORTANT: Bulk-encoding formats can only be combined with the
   `OnCheckpointRollingPolicy`, which rolls the in-progress part file on
   every checkpoint.
   ```


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575908#comment-16575908
 ] 

ASF GitHub Bot commented on FLINK-10109:


kl0u commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r209175913
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,120 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+ Usage
+
 
 Review comment:
   I would add this:
   
   ```
   
   The new `StreamingFileSink` comes with support for both Row-Wise encoding
   formats and Bulk encoding ones, such as [Apache 
Parquet](http://parquet.apache.org).
   
    Using Row-encoded Output Formats
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575892#comment-16575892
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha commented on issue #6532: [FLINK-10109] Add documentation for 
StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#issuecomment-412003356
 
 
   I addressed the comments that I currently can address. PTAL  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575871#comment-16575871
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r209167309
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,95 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+ Usage
+
+The only required configuration are the base path were we want to output our
+data and an
+[Encoder](http://flink.apache.org/docs/latest/api/java/org/apache/flink/api/common/serialization/Encoder.html)
 
 Review comment:
    I first have to figure out how I can get links to specific versions 
because we currently only have `http://flink.apache.org/docs/latest` and 
`http://flink.apache.org/docs/stable` and I don't want to link to the 
`https://ci.apache.org/projects/flink`-type urls.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575872#comment-16575872
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r209167365
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,95 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+ Usage
+
+The only required configuration are the base path were we want to output our
+data and an
+[Encoder](http://flink.apache.org/docs/latest/api/java/org/apache/flink/api/common/serialization/Encoder.html)
+that is used for serializing records to the `OutputStream` for each file.
+
+Basic usage thus looks like this:
+
+
+
+
+{% highlight java %}
+DataStream input = ...;
+
+final StreamingFileSink sink = StreamingFileSink
 
 Review comment:
    


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575052#comment-16575052
 ] 

ASF GitHub Bot commented on FLINK-10109:


twalthr commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r208985527
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,95 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+ Usage
+
+The only required configuration are the base path were we want to output our
+data and an
+[Encoder](http://flink.apache.org/docs/latest/api/java/org/apache/flink/api/common/serialization/Encoder.html)
 
 Review comment:
   We should link to version-specific class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575054#comment-16575054
 ] 

ASF GitHub Bot commented on FLINK-10109:


twalthr commented on a change in pull request #6532: [FLINK-10109] Add 
documentation for StreamingFileSink
URL: https://github.com/apache/flink/pull/6532#discussion_r208986029
 
 

 ##
 File path: docs/dev/connectors/streamfile_sink.md
 ##
 @@ -0,0 +1,95 @@
+---
+title: "Streaming File Sink"
+nav-title: Streaming File Sink
+nav-parent_id: connectors
+nav-pos: 5
+---
+
+
+This connector provides a Sink that writes partitioned files to filesystems
+supported by the Flink `FileSystem` abstraction. Since in streaming the input
+is potentially infinite, the streaming file sink writes data into buckets. The
+bucketing behaviour is configurable but a useful default is time-based
+bucketing where we start writing a new bucket every hour and thus get
+individual files that each contain a part of the infinite output stream.
+
+Within a bucket, we further split the output into smaller part files based on a
+rolling policy. This is useful to prevent individual bucket files from getting
+too big. This is also configurable but the default policy rolls files based on
+file size and a timeout, i.e if no new data was written to a part file. 
+
+ Usage
+
+The only required configuration are the base path were we want to output our
+data and an
+[Encoder](http://flink.apache.org/docs/latest/api/java/org/apache/flink/api/common/serialization/Encoder.html)
+that is used for serializing records to the `OutputStream` for each file.
+
+Basic usage thus looks like this:
+
+
+
+
+{% highlight java %}
+DataStream input = ...;
+
+final StreamingFileSink sink = StreamingFileSink
 
 Review comment:
   Please add imports to code examples. I try to do this recently to avoid user 
confusion. Where is `StreamingFileSink` located, which `Path` are we using, 
etc..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10109) Add documentation for StreamingFileSink

2018-08-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574637#comment-16574637
 ] 

ASF GitHub Bot commented on FLINK-10109:


aljoscha opened a new pull request #6532: [FLINK-10109] Add documentation for 
StreamingFileSink
URL: https://github.com/apache/flink/pull/6532
 
 
   I kept this simple on purpose and defer mostly to the Javadocs, which I 
assume will be more accurate over time and they already have good explanations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add documentation for StreamingFileSink
> ---
>
> Key: FLINK-10109
> URL: https://issues.apache.org/jira/browse/FLINK-10109
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)