Checkpointing to S3
Hey, I am trying to checkpoint my streaming job to S3 but it seems that the checkpoints never complete but also I don't get any error in the logs. The state backend connects properly to S3 apparently as it creates the following file in the given S3 directory : 95560b1acf5307bc3096020071c83230_$folder$(this is a file, not a folder) The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd and might cause the problem. It seems that the backend doesnt properly create a folder for the job checkpoints for the job id. Does anyone have any idea what might cause this problem? Thanks, Gyula
[jira] [Created] (FLINK-3198) Rename Grouping.getDataSet() method and add JavaDocs
Fabian Hueske created FLINK-3198: Summary: Rename Grouping.getDataSet() method and add JavaDocs Key: FLINK-3198 URL: https://issues.apache.org/jira/browse/FLINK-3198 Project: Flink Issue Type: Improvement Components: DataSet API Affects Versions: 0.10.1, 1.0.0 Reporter: Fabian Hueske Fix For: 1.0.0, 0.10.2 The {{getDataSet()}} method of {{Grouping}} is public and visible to users. It returns the input of the grouping operation which can cause confusion. If this function is used in a regular DataSet program like this {code} DataSet notGrouped = input.groupBy().getDataSet(); DataSet allReduced = notGrouped.reduce() {code} the previous {{groupBy()}} call is basically discarded and an AllReduce is applied instead of a grouped Reduce. Since this method is not meant to be part of the public API we should help users to avoid this method. In the current API, we cannot easily change the visibility of the method without package restructuring or adding additional classes (and hence breaking binary compatibility). Instead I proprose to rename the method to something like {{getInputDataSet()}} or {{getGroupingInput()}} and add descriptive JavaDocs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Checkpointing to S3
Ok, I could figure out the problem, it was my fault :). The issue was that I was running a short testing job and the sources finished before triggering the checkpoint. So the folder was created for the job in S3 but since we didn't write anything to it is shown as a file in S3. Maybe it would be good to give some info to the user in case the source is finished when the checkpoint is triggered. On the bright side, it seems to work well, also with the savepoints :) Cheers Gyula Gyula Fóraezt írta (időpont: 2016. jan. 2., Szo, 11:57): > Hey, > > I am trying to checkpoint my streaming job to S3 but it seems that the > checkpoints never complete but also I don't get any error in the logs. > > The state backend connects properly to S3 apparently as it creates the > following file in the given S3 directory : > > 95560b1acf5307bc3096020071c83230_$folder$(this is a file, not a folder) > > The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd > and might cause the problem. > It seems that the backend doesnt properly create a folder for the job > checkpoints for the job id. > > Does anyone have any idea what might cause this problem? > > Thanks, > Gyula > > > > >
Streaming Iterations, no headOperator ?
Hi, I am working on FLINK-1870 and my changes break some unit tests. The problem is in streaming.api.IterateTest. I tracked the problem down to StreamTask.registerInputOutput(). It calls headOperator.setup(...). My changes depend on this call, however, there is no head operator (ie, ==null), and the call to setup(...) is not made. Thus, for one operator the StreamingRuntimeContext member variable "runtimeConext" is not initialized (ie, is null) and the test fails with a NPE. Can you give a short explanation about those code parts? What is the condition for a missing headOperator? How can I ensure, that setup() is called for all operators? You can find my code here: https://github.com/mjsax/flink/tree/flink-1870-inputChannelIndex Thanks! And happy new year! -Matthias signature.asc Description: OpenPGP digital signature