Re: Breaking changes in the Streaming API

2016-02-11 Thread Robert Metzger
big +1 on reworking the timestamp extractor. There were too many users
stumbling across this. I also misunderstood it when trying it out the first
time.

On Tue, Feb 9, 2016 at 4:54 PM, Stephan Ewen  wrote:

> Hi everyone!
>
> There are two remaining issues right now pending for 1.0 that will cause
> breaking API changes in the Streaming API.
>
>
> 1)
> [FLINK-3379]
>
> The Timestamp Extractor needs to be changed. That really seems necessary,
> based on the user feedback, because a lot of people mentioned that they are
> getting confused about the TimestampExtractor's mixed two-way system of
> generating watermarks.
>
> The issue suggests to pull the two different modes of generating watermarks
> into two different classes.
>
>
> 2)
>
> [FLINK-3371] makes the "Trigger" an abstract class (currently interface)
> and moves the "TriggerResult" to a dedicated class. This is necessary for
> avoiding breaking changes in the future, after the release.
>
> The reason why for these changes are "aligned windows", which have one
> Trigger for the entire window across all keys (
> https://issues.apache.org/jira/browse/FLINK-3370)
>
> Aligned windows are for example most sliding/tumbling time windows, while
> unaligned windows (with a trigger per key) are for example session and
> count windows. For aligned windows, we can implement an optimized
> representation that uses less memory and is more lightweight to checkpoint.
>
> Also, the Trigger class may evolve a bit, and and with an abstract class we
> can add methods without breaking user-defined Triggers in the future.
>
>
> Greetings,
> Stephan
>


[jira] [Created] (FLINK-3391) Typo in SavepointITCase

2016-02-11 Thread Sebastian Klemke (JIRA)
Sebastian Klemke created FLINK-3391:
---

 Summary: Typo in SavepointITCase
 Key: FLINK-3391
 URL: https://issues.apache.org/jira/browse/FLINK-3391
 Project: Flink
  Issue Type: Bug
Reporter: Sebastian Klemke
Priority: Trivial


{code}
--- 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/SavepointITCase.java
  (revision 8ccd7544edb25e82cc8a898809cc7c8bb7893620)
+++ 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/SavepointITCase.java
  (revision )
@@ -441,7 +441,7 @@
LOG.info("Created temporary savepoint directory: " + 
savepointDir + ".");
 
config.setString(ConfigConstants.STATE_BACKEND, 
"filesystem");
-   
config.setString(SavepointStoreFactory.SAVEPOINT_BACKEND_KEY, "fileystem");
+   
config.setString(SavepointStoreFactory.SAVEPOINT_BACKEND_KEY, "filesystem");
 

config.setString(FsStateBackendFactory.CHECKPOINT_DIRECTORY_URI_CONF_KEY,
checkpointDir.toURI().toString());

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3389) Add Pre-defined Options settings for RocksDB State backend

2016-02-11 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-3389:
---

 Summary: Add Pre-defined Options settings for RocksDB State backend
 Key: FLINK-3389
 URL: https://issues.apache.org/jira/browse/FLINK-3389
 Project: Flink
  Issue Type: Improvement
  Components: state backends
Affects Versions: 1.0.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.0.0


The RocksDB State Backend performance can be optimized for certain settings 
(for example running on disk or SSD) with certain options.

Since it is hard to tune for users, we should add a set of predefined options 
for certain settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3390) Savepoint resume is not retried

2016-02-11 Thread Sebastian Klemke (JIRA)
Sebastian Klemke created FLINK-3390:
---

 Summary: Savepoint resume is not retried
 Key: FLINK-3390
 URL: https://issues.apache.org/jira/browse/FLINK-3390
 Project: Flink
  Issue Type: Bug
Reporter: Sebastian Klemke


When during resuming from a savepoint, restoreState fails for a task node, job 
is retried but without retrying resume from savepoint state. This leads to the 
job being restarted with empty state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


StateBackend

2016-02-11 Thread Matthias J. Sax
Hi,

In Flink it is possible to have different backends for operator state. I
am wondering what the best approach for different state backends would be.

Let's assume the backend is a database server. The following questions
arise:
  - Should the database server be started manually by the user or can
Flink start the server automatically it used?
(this seems to be the approach for RocksDB as embedded servers)
  - Should each job use the same or individual backup server (or maybe a
mix of both?)

I personally think, that Flink should not start-up a backup server but
assume that it is available when the job is submitted. This allows the
user also the start up multiple instances of the backup server and
choose which one to use for each job individually.

What do you think about it? I ask because of the current PR for Redis as
StateBackend:
https://github.com/apache/flink/pull/1617

There is no embedded mode for Redis as for RocksDB.

-Matthias



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3392) Unprotected access to elements in ClosableBlockingQueue#size()

2016-02-11 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3392:
-

 Summary: Unprotected access to elements in 
ClosableBlockingQueue#size()
 Key: FLINK-3392
 URL: https://issues.apache.org/jira/browse/FLINK-3392
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
  public int size() {
return elements.size();
  }
{code}
Access to elements should be protected by lock.lock() / lock.unlock() pair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3386) Kafka consumers should not necessarily fail on expiring data

2016-02-11 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-3386:
-

 Summary: Kafka consumers should not necessarily fail on expiring 
data
 Key: FLINK-3386
 URL: https://issues.apache.org/jira/browse/FLINK-3386
 Project: Flink
  Issue Type: Improvement
  Components: Streaming Connectors
Affects Versions: 1.0.0
Reporter: Gyula Fora


Currently if the data in a kafka topic expires while reading from it, it causes 
an unrecoverable failure as subsequent retries will also fail on invalid 
offsets.

While this might be the desired behaviour under some circumstances, it would 
probably be better in most cases to automatically jump to the earliest valid 
offset in these cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)