Async I/O: preserve stream order for requests on key level

2023-06-09 Thread Juho Autio
I need to make some slower external requests in parallel, so Async I/O helps greatly with that. However, I also need to make the requests in a certain order per key. Is that possible with Async I/O? The documentation[1] talks about preserving the stream order of results, but it doesn't discuss

Re: Data loss when restoring from savepoint

2019-02-14 Thread Juho Autio
(no missing data)? > > Cheers, > > Konstantin > > > > > On Wed, Feb 13, 2019 at 3:19 PM Gyula Fóra wrote: > >> Sorry not posting on the mail list was my mistake :/ >> >> >> On Wed, 13 Feb 2019 at 15:01, Juho Autio wrote: >> >>>

Re: Data loss when restoring from savepoint

2019-02-13 Thread Juho Autio
Stefan (or anyone!), please, could I have some feedback on the findings that I reported on Dec 21, 2018? This is still a major blocker.. On Thu, Jan 31, 2019 at 11:46 AM Juho Autio wrote: > Hello, is there anyone that could help with this? > > On Fri, Jan 11, 2019 at 8:14 AM Juho Aut

Re: Data loss when restoring from savepoint

2019-01-31 Thread Juho Autio
Hello, is there anyone that could help with this? On Fri, Jan 11, 2019 at 8:14 AM Juho Autio wrote: > Stefan, would you have time to comment? > > On Wednesday, January 2, 2019, Juho Autio wrote: > >> Bump – does anyone know if Stefan will be available to comment the latest &

Re: Data loss when restoring from savepoint

2019-01-10 Thread Juho Autio
Stefan, would you have time to comment? On Wednesday, January 2, 2019, Juho Autio wrote: > Bump – does anyone know if Stefan will be available to comment the latest > findings? Thanks. > > On Fri, Dec 21, 2018 at 2:33 PM Juho Autio wrote: > >> Stefan, I managed to analyze

Re: Data loss when restoring from savepoint

2019-01-02 Thread Juho Autio
Bump – does anyone know if Stefan will be available to comment the latest findings? Thanks. On Fri, Dec 21, 2018 at 2:33 PM Juho Autio wrote: > Stefan, I managed to analyze savepoint with bravo. It seems that the data > that's missing from output *is* found in savepoint. > > I s

Re: Data loss when restoring from savepoint

2018-12-21 Thread Juho Autio
ot;side effect kafka output" on individual operators. This should allow tracking more closely at which point the data gets lost. However, maybe this would have to be in some Flink's internal components, and I'm not sure which those would be. Cheers, Juho On Mon, Nov 19, 2018 at 11:52 AM Juho Autio

Re: Data loss when restoring from savepoint

2018-11-19 Thread Juho Autio
PM Juho Autio wrote: > I was glad to find that bravo had now been updated to support installing > bravo to a local maven repo. > > I was able to load a checkpoint created by my job, thanks to the example > provided in bravo README, but I'm still missing the essential pie

Re: Data loss when restoring from savepoint

2018-10-23 Thread Juho Autio
uot; threw – obviously there's no operator by that name). Cheers, Juho On Mon, Oct 15, 2018 at 2:25 PM Juho Autio wrote: > Hi Stefan, > > Sorry but it doesn't seem immediately clear to me what's a good way to use > https://github.com/king/bravo. > > How are people using it? Would you for

Re: Data loss when restoring from savepoint

2018-10-15 Thread Juho Autio
) and run it from there (using an IDE, for example)? Also it doesn't seem like the bravo gradle project supports building a flink job jar, but if it does, how do I do it? Thanks. On Thu, Oct 4, 2018 at 9:30 PM Juho Autio wrote: > Good then, I'll try to analyze the savepoints with Bravo. Tha

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
> > Best, > Stefan > > [1] https://github.com/king/bravo > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration > > Am 04.10.2018 um 14:53 schrieb Juho Autio : > > Thanks for the sugges

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
primitive types saved as >> cvs >> - minimal deduplication job which processes them and misses records >> - check if it happens for shorter windows, like 1h etc >> - setup which you use for the job, ideally locally reproducible or cloud >> >> Best, >> Andr

Re: Data loss when restoring from savepoint

2018-10-04 Thread Juho Autio
that is missing from window output. On Mon, Oct 1, 2018 at 11:56 AM Juho Autio wrote: > Hi Andrey, > > To rule out for good any questions about sink behaviour, the job was > killed and started with an additional Kafka sink. > > The same number of ids were missed in both

Re: Data loss when restoring from savepoint

2018-10-01 Thread Juho Autio
9 PM Juho Autio wrote: > Thanks, Andrey. > > > so it means that the savepoint does not loose at least some dropped > records. > > I'm not sure what you mean by that? I mean, it was known from the > beginning, that not everything is lost before/after restoring a savepoint

Re: Data loss when restoring from savepoint

2018-09-21 Thread Juho Autio
. > > Another suggestion is to try to write records to some other sink or to > both. > E.g. if you can access file system of workers, maybe just into local files > and check whether the records are also dropped there. > > Best, > Andrey > > On 20 Sep 2018, at 15:37,

Re: Data loss when restoring from savepoint

2018-09-20 Thread Juho Autio
D) > returns false and you risk to rewrite the previous part. > > The BucketingSink was designed for a standard file system. s3 is used over > a file system wrapper atm but does not always provide normal file system > guarantees. See also last example in [1]. > > Cheers, &g

Re: 1.5.1

2018-09-17 Thread Juho Autio
the log extracts you sent, I cannot really draw any > conclusions. > > Best, > Gary > > > On Wed, Aug 15, 2018 at 10:38 AM, Juho Autio wrote: > >> Thanks Gary.. >> >> What could be blocking the RPC threads? Slow checkpointing? >>

Re: Data loss when restoring from savepoint

2018-08-29 Thread Juho Autio
nts that should > be included into the savepoint (logged before) or not: > “{} ({}, synchronous part) in thread {} took {} ms” (template) > Also check if the savepoint has been overall completed: > "{} ({}, asynchronous part) in thread {} took {} ms." > > Best, > Andre

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
he previous 'BucketingSink’? > > Cheers, > Andrey > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html > > On 24 Aug 2018, at 18:03, Juho Autio wrote: > > Yes, sorry for my confusing comment. I just meant that it

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
before the window result is written into s3. > > Allowed lateness should not affect it, I am just saying that the final > result in s3 should include all records after it. > This is what should be guaranteed but not the contents of the intermediate > savepoint. > > Cheers, > A

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
t in Kafka. > Then it should just come later after the restore and should be reduced > within the allowed lateness into the final result which is saved into s3. > > Also, is this `DistinctFunction.reduce` just an example or the actual > implementation, basically saving just one of

Re: Data loss when restoring from savepoint

2018-08-23 Thread Juho Autio
I changed to allowedLateness=0, no change, still missing data when restoring from savepoint. On Tue, Aug 21, 2018 at 10:43 AM Juho Autio wrote: > I realized that BucketingSink must not play any role in this problem. This > is because only when the 24-hour window triggers, BucketinSin

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Juho Autio
lete first and then cancel > the job. Thus, the later savepoints might complete or not depending on the > correct timing. Since savepoint can flush results to external systems, I > would recommend not calling the API multiple times. > > Cheers, > Till > > On Wed, Aug 22, 2018 at 10:40

State TTL in Flink 1.6.0

2018-08-22 Thread Juho Autio
First, I couldn't find anything about State TTL in Flink docs, is there anything like that? I can manage based on Javadocs & source code, but just wondering. Then to main main question, why doesn't the TTL support event time, and is there any sensible use case for the TTL if the streaming

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Juho Autio
ecause cancelWithSavepoint is actually waiting for savepoint to complete >> synchronization, and then execute the cancel command. >> >> We do not use CLI. I think since you are through the CLI, you can observe >> whether the savepoint is complete by combining the

Re: Data loss when restoring from savepoint

2018-08-21 Thread Juho Autio
restoring a savepoint? On Fri, Aug 17, 2018 at 4:23 PM Juho Autio wrote: > Some data is silently lost on my Flink stream job when state is restored > from a savepoint. > > Do you have any debugging hints to find out where exactly the data gets > dropped? > > My job gathers

Data loss when restoring from savepoint

2018-08-17 Thread Juho Autio
Some data is silently lost on my Flink stream job when state is restored from a savepoint. Do you have any debugging hints to find out where exactly the data gets dropped? My job gathers distinct values using a 24-hour window. It doesn't have any custom state management. When I cancel the job

Re: 1.5.1

2018-08-15 Thread Juho Autio
g.html#web-timeout Cheers, Juho On Wed, Aug 15, 2018 at 11:43 AM Juho Autio wrote: > Vishal, from which version did you upgrade to 1.5.1? Maybe from 1.5.0 > (release)? Knowing that might help narrowing down the source of this. > > On Wed, Aug 15, 2018 at 11:38 AM Juho Autio wrote:

Re: 1.5.1

2018-08-15 Thread Juho Autio
Vishal, from which version did you upgrade to 1.5.1? Maybe from 1.5.0 (release)? Knowing that might help narrowing down the source of this. On Wed, Aug 15, 2018 at 11:38 AM Juho Autio wrote: > Thanks Gary.. > > What could be blocking the RPC threads? Slow checkpointing? > > In p

Re: 1.5.1

2018-08-15 Thread Juho Autio
0/flink-runtime/src/main/java/org/apache/flink/runtime/heartbeat/HeartbeatManagerSenderImpl.java#L64 > > On Mon, Aug 13, 2018 at 9:52 AM, Juho Autio wrote: > >> I also have jobs failing on a daily basis with the error "Heartbeat of >> TaskManager with id timed out"

Re: 1.5.1

2018-08-13 Thread Juho Autio
I also have jobs failing on a daily basis with the error "Heartbeat of TaskManager with id timed out". I'm using Flink 1.5.2. Could anyone suggest how to debug possible causes? I already set these in flink-conf.yaml, but I'm still getting failures: heartbeat.interval: 1 heartbeat.timeout:

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-09 Thread Juho Autio
users on the mailing list have > encountered similar problems. > > In our environment, it seems that JM shows that the save point is complete > and JM has stopped itself, but the client will still connect to the old JM > and report a timeout exception. > > Thanks, vino. > &g

Could not cancel job (with savepoint) "Ask timed out"

2018-08-08 Thread Juho Autio
I was trying to cancel a job with savepoint, but the CLI command failed with "akka.pattern.AskTimeoutException: Ask timed out". The stack trace reveals that ask timeout is 10 seconds: Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_0#106635280]]

Re: Flink kafka consumer stopped committing offsets

2018-06-19 Thread Juho Autio
>> offsets for fault tolerance guarantees. The committed offsets are only a >>> means to expose the consumer’s progress for monitoring purposes. >>> >>> Can you post full logs from all TaskManagers/JobManager and can you >>> say/estimate when did the committ

Re: Flink kafka consumer stopped committing offsets

2018-06-11 Thread Juho Autio
mer-79720/m-p/51188 > https://stackoverflow.com/questions/42362911/kafka-high-leve > l-consumer-error-code-15/42416232#42416232 > Especially that in your case this offsets committing is superseded by > Kafka coordinator failure. > > Piotrek > > > On 8 Jun 2018, at 10:05,

Flink kafka consumer stopped committing offsets

2018-06-08 Thread Juho Autio
Hi, We have a Flink stream job that uses Flink kafka consumer. Normally it commits consumer offsets to Kafka. However this stream ended up in a state where it's otherwise working just fine, but it isn't committing offsets to Kafka any more. The job keeps writing correct aggregation results to

Re: Late data before window end is even close

2018-06-08 Thread Juho Autio
mponents Etc. >> Either such process help you figure out what’s wrong on your own and if >> not, if you share us such minimal program that reproduces the issue, it >> will allow us to debug it. >> >> Piotrek >> >> >> On 11 May 2018, at 13:54, Juho

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-05-30 Thread Juho Autio
only. One thing which we have to > fix first is that also the jar file upload goes through REST. > > [1] https://issues.apache.org/jira/browse/FLINK-9478 > > Cheers, > Till > > On Wed, May 30, 2018 at 9:07 AM, Juho Autio wrote: > >> Hi, I tried to search Flink Jira

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-05-30 Thread Juho Autio
provement will make it into the 1.6 release. > > Cheers, > Till > > On Thu, Apr 5, 2018 at 4:45 PM, Juho Autio wrote: > >> Thanks for the answer. Wrapping with GET sounds good to me. You said next >> version; do you mean that Flink 1.5 would already include this improvem

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
er.handleWatermark(StreamInputProcessor.java:262) ... 7 more On Fri, May 18, 2018 at 11:06 AM, Juho Autio <juho.au...@rovio.com> wrote: > Thanks Sihua, I'll give that RC a try. > > On Fri, May 18, 2018 at 10:58 AM, sihua zhou <summerle...@163.com> wrote: > >> Hi Juho,

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
t; checkpoint? The latest RC includes a fix for the potential silently data > lost. If it's the reason, you will see a different exception when you > trying to recover you job. > > Best, Sihua > > > > On 05/18/2018 15:02,Juho Autio<juho.au...@rovio.com> > <juho.au.

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
e existing checkpoint or also creating other problematic > checkpoints? I am asking because maybe a log from the job that produces the > problematic checkpoint might be more helpful. You can create a ticket if > you want. > > Best, > Stefan > > > Am 18.05.2018 um 09:02 sch

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Juho Autio
I see. I appreciate keeping this option available even if it's "beta". The current situation could be documented better, though. As long as rescaling from checkpoint is not officially supported, I would put it behind a flag similar to --allowNonRestoredState. The flag could be called

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Juho Autio
k you're asking the question I have asked in > https://github.com/apache/flink/pull/5490, you can refer to it and find > the comments there. > > @Stefan, PR(https://github.com/apache/flink/pull/6020) has been prepared. > > Best, Sihua > > > On 05/16/2018 17:20,Juho Auti

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Juho Autio
to dig it out because currently we also > use the checkpoint like the way you are) ... > > Best, Sihua > > On 05/16/2018 01:46,Juho Autio<juho.au...@rovio.com> > <juho.au...@rovio.com> wrote: > > I was able to reproduce this error. > > I just happened to

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Juho Autio
-console.properties log4j.properties log4j-yarn-session.properties logback-console.xml logback.xml logback-yarn.xml On Tue, May 15, 2018 at 11:49 AM, Stefan Richter < s.rich...@data-artisans.com> wrote: > Hi, > > Am 15.05.2018 um 10:34 schrieb Juho Autio <juho.au...@rovio.com>

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Juho Autio
- files that are written for checkpoints/savepoints. > - completed checkpoints/savepoints ids. > - the restored checkpoint/savepoint id. > - files that are loaded on restore. > > Am 15.05.2018 um 10:02 schrieb Juho Autio <juho.au...@rovio.com>: > > Thanks all. I'll have to s

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Juho Autio
e using incremental checkpoints or not. > Are you using the local recovery feature? Are you restarting the job from a > checkpoint or a savepoint? Can you provide logs for both the job that > failed and the restarted job? > > Best, > Stefan > > > Am 14.05.2018 um 13:00 schrieb

Missing MapState when Timer fires after restored state

2018-05-14 Thread Juho Autio
We have a Flink streaming job (1.5-SNAPSHOT) that uses timers to clear old state. After restoring state from a checkpoint, it seems like a timer had been restored, but not the data that was expected to be in a related MapState if such timer has been added. The way I see this is that there's a

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
: > > https://gist.github.com/pnowojski/8cd650170925cf35be521cf236f1d97a > > Prints only ONE number to the standard err: > > > 1394 > > And there is nothing on the side output. > > Piotrek > > On 11 May 2018, at 12:32, Juho Autio <juho.au...@rovio.com> w

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
ould end up in can > also be tricky if records are assigned to multiple windows (e.g., sliding > windows). > In this case, a side-outputted records could still be in some windows and > not in others. > > @Aljoscha (CC) Might have an explanation for the current behavior. > > Thanks

Late data before window end is even close

2018-05-11 Thread Juho Autio
I don't understand why I'm getting some data discarded as late on my Flink stream job a long time before the window even closes. I can not be 100% sure, but to me it seems like the kafka consumer is basically causing the data to be dropped as "late", not the window. I didn't expect this to ever

Re: Flink State monitoring

2018-05-11 Thread Juho Autio
Bump this. I can create a ticket if it helps? On Tue, Apr 24, 2018 at 4:47 PM, Juho Autio <juho.au...@rovio.com> wrote: > Anything to add? Is there a Jira ticket for this yet? > > > On Fri, Apr 20, 2018 at 1:03 PM, Stefan Richter < > s.rich...@data-artisans.com> wrote:

Re: Application logs missing from jobmanager log

2018-05-11 Thread Juho Autio
> It seems like good feature request to include the client logs though. > Would you mind opening a JIRA issue for this? > > Thanks, Fabian > > 2018-04-27 11:27 GMT+02:00 Juho Autio <juho.au...@rovio.com>: > >> Ah, found the place! In my case they seem to be going to >

Re: Kafka Consumers Partition Discovery doesn't work

2018-05-11 Thread Juho Autio
ave a code snippet to demonstrate the > configuration for partition discovery. > Could you open a JIRA for that? > > Cheers, > Gordon > > On Tue, Apr 10, 2018, 8:44 AM Juho Autio <juho.au...@rovio.com> wrote: > >> Ahhh looks like I had simply misunderstood where that property

Re: Application logs missing from jobmanager log

2018-04-27 Thread Juho Autio
Ah, found the place! In my case they seem to be going to /home/hadoop/flink-1.5-SNAPSHOT/log/flink-hadoop-client-ip-10-0-10-29.log (for example). Any reason why these can't be shown in Flink UI, maybe in jobmanager log? On Fri, Apr 27, 2018 at 12:13 PM, Juho Autio <juho.au...@rovio.com>

Application logs missing from jobmanager log

2018-04-27 Thread Juho Autio
The logs logged by my job jar before env.execute can't be found in jobmanager log. I can't find them anywhere else either. I can see all the usual logs by Flink components in the jobmanager log, though. And in taskmanager log I can see both Flink's internal & my application's logs from the

Re: Flink State monitoring

2018-04-24 Thread Juho Autio
he estimates of > RocksDB are helpful or could be misleading. > > > Am 20.04.2018 um 11:59 schrieb Juho Autio <juho.au...@rovio.com>: > > Thanks. At least for us it doesn't matter how exact the number is. I would > expect most users to be only interested in monitoring if the to

Re: Flink State monitoring

2018-04-20 Thread Juho Autio
Thanks. At least for us it doesn't matter how exact the number is. I would expect most users to be only interested in monitoring if the total state size keeps growing (rapidly), or remains about the same. I suppose all of the options that you suggested would satisfy this need? On Fri, Apr 20,

Re: Flink State monitoring

2018-04-20 Thread Juho Autio
Hi Aljoscha & co., Is there any way to monitor the state size yet? Maybe a ticket in Jira? When using incremental checkpointing, the total state size can't be seen anywhere. For example the checkpoint details only show the size of the increment. It would be nice to add the total size there as

Re: Kafka topic partition skewness causes watermark not being emitted

2018-04-17 Thread Juho Autio
A possible workaround while waiting for FLINK-5479, if someone is hitting the same problem: we chose to send "heartbeat" messages periodically to all topics & partitions found on our Kafka. We do that through the service that normally writes to our Kafka. This way every partition always has some

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Juho Autio
wrote: > Fully agree Juho! > > Do you want to contribute the docs fix? > If yes, we should update FLINK-5479 to make sure that the warning is > removed once the bug is fixed. > > Thanks, Fabian > > 2018-04-12 9:32 GMT+02:00 Juho Autio <juho.au...@rovio.com>: > >&

Re: Kafka consumer to sync topics by event time?

2018-04-12 Thread Juho Autio
ess using watermarks? > > 2017-12-04 14:42 GMT+01:00 Juho Autio <juho.au...@rovio.com>: > >> Thank you Fabian. Really clear explanation. That matches with my >> observation indeed (data is not dropped from either small or big topic, but >> the offsets are advancing

Re: Allowed lateness + side output late data, what's included?

2018-04-11 Thread Juho Autio
Thanks! On Wed, Apr 11, 2018 at 12:59 PM, Chesnay Schepler <ches...@apache.org> wrote: > Data that arrives within the allowed lateness should not be written to the > side output. > > > On 11.04.2018 11:12, Juho Autio wrote: > > If I use a non-zero value f

Allowed lateness + side output late data, what's included?

2018-04-11 Thread Juho Autio
If I use a non-zero value for allowedLateness and also sideOutputLateData, does the late data output contain also the events that were triggered in the bounds of allowed lateness? By looking at the docs I can't be sure which way it is. Code example: .timeWindow(Time.days(1))

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-10 Thread Juho Autio
ake. The docs are clear though, I just had become blind to this detail as I thought I had already read it. On Thu, Apr 5, 2018 at 10:26 AM, Juho Autio <juho.au...@rovio.com> wrote: > Still not working after I had a fresh build from https://github.com/ > apache/flink/tree/release-1.5

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-05 Thread Juho Autio
s.apache.org/jira/browse/YARN-2084 > > Cheers, > Till > > On Wed, Apr 4, 2018 at 4:31 PM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi Juho, >> >> Thanks for raising this point! >> >> I'll add Chesnay and Till to the thread who contributed to

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-05 Thread Juho Autio
gt; anything about discovering a new partition. We should probably add this. > > And yes, it would be great if you can report back on this using either the > latest master, release-1.5 or release-1.4 branches. > > On 22 March 2018 at 10:24:09 PM, Juho Autio (juho.au...@rovio.com) wrote: >

REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-04 Thread Juho Autio
I just learned that Flink savepoints API was refactored to require using HTTP POST. That's fine otherwise, but makes life harder when Flink is run on top of YARN. I've added example calls below to show how POST is declined by the hadoop-yarn-server-web-proxy*, which only supports GET and PUT.

Re: cancel-with-savepoint: 404 Not Found

2018-04-04 Thread Juho Autio
:port/jobs/:jobid/savepoints -d '{"cancel-job": true}' > > Let me know if it works for you. > > Best, > Gary > > On Thu, Mar 29, 2018 at 10:39 AM, Juho Autio <juho.au...@rovio.com> wrote: > >> Thanks Gary. And what if I want to match the old behaviour ie

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio
Sorry, my bad. I checked the persisted jobmanager logs and can see that job was still being restarted at 15:31 and then at 15:36. If I wouldn't have terminated the cluster, I believe the flink job / yarn app would've eventually exited as failed. On Thu, Mar 29, 2018 at 4:49 PM, Juho Autio

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio
> minutes. > > As a side note, beginning from Flink 1.5, you do not need to specify -yn > -ys > because resource allocations are dynamic by default (FLIP-6). The > parameter -yst > is deprecated and should not be needed either. > > Best, > Gary > > On Thu, Mar 29, 2018

Re: cancel-with-savepoint: 404 Not Found

2018-03-29 Thread Juho Autio
] https://issues.apache.org/jira/browse/FLINK-9104 > [4] https://github.com/apache/flink/blob/release-1.5/flink- > runtime/src/main/java/org/apache/flink/runtime/rest/ > handler/job/savepoints/SavepointHandlers.java#L59 > > On Thu, Mar 29, 2018 at 10:04 AM, Juho Autio <juho.au...@

cancel-with-savepoint: 404 Not Found

2018-03-29 Thread Juho Autio
With a fresh build from release-1.5 branch, calling /cancel-with-savepoint fails with 404 Not Found. The snapshot docs still mention /cancel-with-savepoint: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#cancel-job-with-savepoint 1. How can I achieve the same

All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio
I built a new Flink distribution from release-1.5 branch yesterday. The first time I tried to run a job with it ended up in some stalled state so that the job didn't manage to (re)start but what makes it worse is that it didn't exit as failed either. Next time I tried running the same job (but

Re: NoClassDefFoundError for jersey-core on YARN

2018-03-29 Thread Juho Autio
Never mind, I'll post this new problem as a new thread. On Wed, Mar 28, 2018 at 6:35 PM, Juho Autio <juho.au...@rovio.com> wrote: > Thank you. The YARN job was started now, but the Flink job itself is in > some bad state. > > Flink UI keeps showing status CREATED for all sub

Re: NoClassDefFoundError for jersey-core on YARN

2018-03-28 Thread Juho Autio
gt; Best, > Gary > > [1] https://ci.apache.org/projects/flink/flink-docs- > master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths > > > On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <juho.au...@rovio.com> wrote: > >> I built a new Flink distribution

NoClassDefFoundError for jersey-core on YARN

2018-03-28 Thread Juho Autio
I built a new Flink distribution from release-1.5 branch today. I tried running a job but get this error: java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties I use yarn-cluster mode. The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's

Re: Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Juho Autio
y this issue: > https://issues.apache.org/jira/browse/FLINK-8419 > > <https://issues.apache.org/jira/browse/FLINK-8419> > This issue should have been fixed in the recently released 1.4.2 version. > > Cheers, > Gordon > > On 22 March 2018 at 8:02:40 PM, Juho Autio (

Kafka Consumers Partition Discovery doesn't work

2018-03-22 Thread Juho Autio
According to the docs*, flink.partition-discovery.interval-millis can be set to enable automatic partition discovery. I'm testing this, apparently it doesn't work. I'm using Flink Version: 1.5-SNAPSHOT Commit: 8395508 and FlinkKafkaConsumer010. I had my flink stream running, consuming an

Re: SQL Table API: Naming operations done in query

2018-03-16 Thread Juho Autio
Hi, has there been any changes to state handling with Flink SQL? Anything planned? I didn't find it at https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html. Recently I ran into problems when trying to restore the state after changes that I thought wouldn't change the

Adding a new field in java class -> restore fails with "KryoException: Unable to find class"

2018-03-16 Thread Juho Autio
Is it possible to add new fields to the object type of a stream, and then restore from savepoint? I tried to add a new field "private String" to my java class. It previously had "private String" and a "private final Map". When trying to restore an old savepoint after this code

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
ts in real-time. If the batching is not required, you should be good > by adding a filter on occurrence = 1. > Otherwise, you could add the filter and wrap it by 10 secs tumbling window. > > Hope this helps, > Fabian > > > 2018-02-14 15:30 GMT+01:00 Juho Autio <juho.au...@r

Unexpected hop start & end timestamps after stream SQL join

2018-02-14 Thread Juho Autio
I'm joining a tumbling & hopping window in Flink 1.5-SNAPSHOT. The result is unexpected. Am I doing something wrong? Maybe this is just not a supported join type at all? Any way here goes: I first register these two tables: 1. new_ids: a tumbling window of seen ids within the last 10 seconds:

Trigger Time vs. Latest Acknowledgement

2018-01-29 Thread Juho Autio
I'm triggering nightly savepoints at 23:59:00 with crontab on the flink cluster. For example last night's savepoint has this information: Trigger Time: 23:59:14 Latest Acknowledgement: 00:00:59 What are the min/max boundaries for the data contained by the savepoint? Can I deduce from this

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-15 Thread Juho Autio
serve a JIRA issue. > > > On 15.01.2018 14:09, Juho Autio wrote: > > Thanks for the explanation. Did you meant that process() would return a > SingleOutputWithSideOutputOperator? > > Any way, that should be enough to avoid the problem that I hit (and it > also seems like t

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-15 Thread Juho Autio
would have to explicitly define the code as below, which makes the > behavior unambiguous: > > processed = stream > .process(...) > > filtered = processed > .filter(...) > > filteredSideOutput = processed > .getSideOutput(...) > .filter(...) > >

Re: SideOutput doesn't receive anything if filter is applied after the process function

2018-01-15 Thread Juho Autio
ich one should get resource and how much > sharing same hardware resources, I guess it will open gate to lots of edge > cases -> strategies-> more edge cases :) > > Chen > > On Fri, Jan 12, 2018 at 6:34 AM, Juho Autio <juho.au...@rovio.com> wrote: > >> Maybe I

Re: Restoring 1.3.1 savepoint in 1.4.0 fails in TimerService

2018-01-12 Thread Juho Autio
Thanks, the window operator is just: .timeWindow(Time.seconds(10)) We haven't changed key types. I tried debugging this issue in IDE and found the root cause to be this: !this.keyDeserializer.equals(keySerializer) -> true => throw new IllegalStateException("Tried to initialize restored

SideOutput doesn't receive anything if filter is applied after the process function

2018-01-12 Thread Juho Autio
When I run the code below (Flink 1.4.0 or 1.3.1), only "a" is printed. If I switch the position of .process() & .filter() (ie. filter first, then process), both "a" & "b" are printed, as expected. I guess it's a bit hard to say what the side output should include in this case: the stream before

Re: Kafka consumer to sync topics by event time?

2017-12-04 Thread Juho Autio
pic is still catching up. > The watermarks are moving at the speed of the bigger topic, but all > "early" events of the smaller topic are stored in stateful operators and > are checkpointed as well. > > So, you do not lose neither early nor late data. > > Best, F

Re: Kafka consumer to sync topics by event time?

2017-12-01 Thread Juho Autio
Thanks for the answers, I still don't understand why I can see the offsets being quickly committed to Kafka for the "small topic"? Are they committed to Kafka before their watermark has passed on Flink's side? That would be quite confusing.. Indeed when Flink handles the state/offsets internally,

Kafka consumer to sync topics by event time?

2017-11-22 Thread Juho Autio
I would like to understand how FlinkKafkaConsumer treats "unbalanced" topics. We're using FlinkKafkaConsumer010 with 2 topics, say "small_topic" & "big_topic". After restoring from an old savepoint (4 hours before), I checked the consumer offsets on Kafka (Flink commits offsets to kafka for

Flink config as an argument (akka.client.timeout)

2016-05-25 Thread Juho Autio
Is there any way to set akka.client.timeout (or other flink config) when calling bin/flink run instead of editing flink-conf.yaml? I tried to add it as a -yD flag but couldn't get it working. Related: https://issues.apache.org/jira/browse/FLINK-3964

Re: Dynamic partitioning for stream output

2016-05-25 Thread Juho Autio
Related issue: https://issues.apache.org/jira/browse/FLINK-2672 On Wed, May 25, 2016 at 9:21 AM, Juho Autio <juho.au...@rovio.com> wrote: > Thanks, indeed the desired behavior is to flush if bucket size exceeds a > limit but also if the bucket has been open long enough. Contrary to t

Re: writeAsCSV with partitionBy

2016-05-25 Thread Juho Autio
RollingSink is part of Flink Streaming API. Can it be used in Flink Batch jobs, too? As implied in FLINK-2672, RollingSink doesn't support dynamic bucket paths based on the tuple fields. The path must be given when creating the RollingSink instance, ie. before deploying the job. Yes, a custom

Re: Dynamic partitioning for stream output

2016-05-25 Thread Juho Autio
tition by date, > you mean the date of the event, right? Not the processing time. > > Kostas > > > On May 24, 2016, at 1:22 PM, Juho Autio <juho.au...@rovio.com> wrote: > > > > Could you suggest how to dynamically partition data with Flink streaming? > > > > We'

Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Could you suggest how to dynamically partition data with Flink streaming? We've looked at RollingSink, that takes care of writing batches to S3, but it doesn't allow defining the partition dynamically based on the tuple fields. Our data is coming from Kafka and essentially has the kafka topic

Re: FlinkKafkaConsumer bootstrap.servers vs. broker hosts

2015-10-15 Thread Juho Autio
Hi, Don't worry – this is quite a low priority question. Definitely not a production issue and as a work around it can be fixed rather easily with suitable network setup. Probably quite rare, too, that this kind of network scenario happens with anyone. But I think that it might be possible to