Hello Fabian,
I created the JIRA bug https://issues.apache.org/jira/browse/FLINK-9940
BTW, I have one more question: Is it worth to checkpoint that list of
processed files? Does the current implementation of file-source guarantee
exactly-once?
Thanks for your support.
--
Sent from:
Thank you Fabian.
I tried to implement a quick test basing on what you suggested: having an
offset from system time, and I did get improvement: with offset = 500ms -
the problem has completely gone. With offset = 50ms, I still got around 3-5
files missed out of 10,000. This number might come
Hi all,
I have a standalone cluster with 3 jobmanagers, and set high-availability to
zookeeper. Our client submits job by REST API(POST /jars/:jarid/run), which
means we need to know the host of the any of the current alive jobmanagers. The
problem is that, how can we know which job manager is
Hi Cederic,
I just read the project you gave, it includes the following statement in
its README file.
*“flink-jpmml is tested with the latest Flink (i.e. 1.3.2), but any working
Apache Flink version (repo) should work properly.”*
This project was born a year ago and should not rely on
Hi Alex,
Is it possible that the data has been corrupted?
Or have you confirmed that the avro version is consistent in different
Flink versions?
Also, if you don't upgrade Flink and still use version 1.3.1, can it be
recovered?
Thanks, vino.
2018-07-25 8:32 GMT+08:00 Alex Vinnik :
> Vino,
>
Vino,
Upgraded flink to Hadoop 2.8.1
$ docker exec -it flink-jobmanager cat /var/log/flink/flink.log | grep
entrypoint | grep 'Hadoop version'
2018-07-25T00:19:46.142+
[org.apache.flink.runtime.entrypoint.ClusterEntrypoint] INFO Hadoop
version: 2.8.1
but job still fails to start
Ideas?
Dear
I am working on a streaming prediction model for which I want to try to use
the flink-jpmml extension. (https://github.com/FlinkML/flink-jpmml)
Unfortunately, it only supports only the 0.7.0-SNAPSHOT and 0.6.1 versions
of Flink and I am using the 1.7-SNAPSHOT version of Flink.
How can I
The previous list exclude a number of dependencies to prevent clashes
with Flink (for example netty)
which is no longer required.
If you could provide the output of "mvn dependency:tree" we might be
able to figure out why the jar is larger.
On 24.07.2018 20:49, jlist9 wrote:
We started out
We started out with a sample project from an earlier version of flink-java.
The sample project's pom.xml contained a long list of elements
for building the fat jar. The fat jar size is slightly over 100MB in our
case.
We are looking to upgrade to Flink 1.5 so we updated the pom.xml using one
Hi, there,
I am using avro format and write data to S3, recently upgraded from Flink
1.3.2 to 1.5 and noticed the following errors as below:
I am using RocksDB and checkpointDataUri is an S3 location.
My writer looks like something below.
val writer = new AvroKeyValueSinkWriter[String,
App is checkpointing, so will pick up if an operation fails. I suppose you mean
a TM completely crashes and even in that case another TM would spin up and it
“should” pick up from checkpoint. We are running YARN but I would assume TM
recovery would be possible in any other cluster. I havent
Hi Jayant,
I think you should be able to implement your own StaticLoggerBinder which
returns your own LoggerFactory. That is quite similar to how the different
logging backends (log4j, logback) integrate with slf4j.
Cheers,
Till
On Tue, Jul 24, 2018 at 5:41 PM Jayant Ameta wrote:
> I am using
What happens when one of your workers dies? Say the machine is dead is not
recoverable. How do you recover from that situation? Will the pipeline die
and you go over the entire bootstrap process?
On Tue, Jul 24, 2018 at 11:56 ashish pok wrote:
> BTW,
>
> We got around bootstrap problem for
BTW,
We got around bootstrap problem for similar use case using a “nohup” topic as
input stream. Our CICD pipeline currently passes an initialize option to app IF
there is a need to bootstrap and waits for X minutes before taking a savepoint
and restart app normally listening to right
I am using a custom LoggingFactory. Is there a way to provide access to
this custom LoggingFactory to all the operators other than adding it to all
constructors?
This is somewhat related to:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Adding-Context-To-Logs-td7351.html
Alas, this suffer from the bootstrap problem. At the moment Flink does not
allow you to pause a source (the positions), so you can't fully consume the
and preload the accounts or products to perform the join before the
positions start flowing. Additionally, Flink SQL does not support
Hi Alex,
Based on your log information, the potential reason is Hadoop version. To
troubleshoot the exception comes from different Hadoop version. I suggest
you match the both side of Hadoop version.
You can :
1. Upgrade the Hadoop version which Flink Cluster depends on, Flink's
official
Hi Till,
Thanks for responding. Below is entrypoint logs. One thing I noticed that
"Hadoop version: 2.7.3". My fat jar is built with hadoop 2.8 client. Could
it be a reason for that error? If so how can i use same hadoop version 2.8
on flink server side? BTW job runs fine locally reading from
Yes, using Kafka which you initialize with the initial values and then feed
changes to the Kafka topic from which you consume could be a solution.
On Tue, Jul 24, 2018 at 3:58 PM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:
> Hi Till,
>
> How would we do the initial hydration of
Hi Till,
How would we do the initial hydration of the Product and Account data since
it’s currently in a relational DB? Do we have to copy over data to Kafka
and then use them?
Regards,
Harsh
On Tue, Jul 24, 2018 at 09:22 Till Rohrmann wrote:
> Hi Harshvardhan,
>
> I agree with Ankit that
Dear community,
this is the weekly community update thread #30. Please post any news and
updates you want to share with the community to this thread.
# First RC for Flink 1.6.0
The community is published the first release candidate for Flink 1.6.0 [1].
Please help the community by trying the RC
Hi Harshvardhan,
I agree with Ankit that this problem could actually be solved quite
elegantly with Flink's state. If you can ingest the product/account
information changes as a stream, you can keep the latest version of it in
Flink state by using a co-map function [1, 2]. One input of the co-map
Hi,
The problem is that Flink tracks which files it has read by remembering the
modification time of the file that was added (or modified) last.
We use the modification time, to avoid that we have to remember the names
of all files that were ever consumed, which would be expensive to check and
Hello Jörn.
Thanks for your help.
"/Probably the system is putting them to the folder and Flink is triggered
before they are consistent./" <<< yes, I also guess so. However, if Flink is
triggered before they are consistent, either (a) there should be some error
messages, or (b) Flink should be
Hi Oliver,
which Flink image are you using? If you are using the docker image from
docker hub [1], then the memory logging will go to stdout and not to a log
file. The reason for this behavior is that the docker image configures the
logger to print to stdout such that one can easily access the
Hi Chris,
a `DataStream` represents a stream of events which have the same type. A
`SingleOutputStreamOperator` is a subclass of `DataStream` and represents a
user defined transformation applied to an input `DataStream` and producing
an output `DataStream` (represented by itself). Since you can
Hi Chang Liu,
if you are dealing with an unlimited number of keys and keep state around
for every key, then your state size will keep growing with the number of
keys. If you are using the FileStateBackend which keeps state in memory,
you will eventually run into an OutOfMemoryException. One way
Dear Cederic,
I did something similar as yours a while ago along this work [1] but I've
always been working within the batch context. I'm also the co-author of
flink-jpmml and, since a flink2pmml model saver library doesn't exist
currently, I'd suggest you a twofold strategy to tackle this
Hi Syed,
could you check whether this class is actually contained in the twitter
example jar? If not, then you have to build an uber jar containing all
required dependencies.
Cheers,
Till
On Tue, Jul 24, 2018 at 5:11 AM syed wrote:
> I am facing the *java.lang.NoClassDefFoundError:
>
Hi Alex,
I'm not entirely sure what causes this problem because it is the first time
I see it.
First question would be if the problem also arises if using a different
Hadoop version.
Are you using the same Java versions on the client as well as on the server?
Could you provide us with the
Hi,
Thanks for your responses.
There is no fixed interval for the data being updated. It’s more like
whenever you onboard a new product or there are any mandates that change
will trigger the reference data to change.
It’s not just the enrichment we are doing here. Once we have enriched the
data
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
You will find there a passage of the consistency model.
Probably the system is putting them to the folder and Flink is triggered before
they are consistent.
What happens after Flink put s them on S3 ? Are they reused by another
Hi Gerard,
the first log snippet from the client does not show anything suspicious.
The warning just says that you cannot use the Yarn CLI because it lacks the
Hadoop dependencies in the classpath.
The second snippet is indeed more interesting. If the TaskExecutors are not
notified about the
Hi everyone,
We are using a simple Flink setup with one jobmanager and one taskmanager
running inside a docker container. We are having issues enabling the
*taskmanager.debug.memory.startLogThread
*setting. We added
*taskmanager.debug.memory.startLogThread: true*
One option (which I haven't tried myself) would be to somehow get the model
into PMML format, and then use https://github.com/FlinkML/flink-jpmml to
score the model. You could either use another machine learning framework to
train the model (i.e., a framework that directly supports PMML export),
I'm trying to get a list of late elements in my Tumbling Windows application
and I noticed
that I need to use SingleOutputStreamOperator instead of DataStream to
get
access to the .sideOutputLateData(...) method.
Can someone explain what the difference is between
SingleOutputStreamOperator and
Could you please help explain more details on "/try read after write
consistency (assuming the files are not modified) /"?
I guess that the problem I got comes from the inconsistency in S3 files
listing. Otherwise, I would have got exceptions on file not found.
My use case is to read output
Sure kinesis is another way.
Can you try read after write consistency (assuming the files are not modified)
In any case it looks you would be better suited with a NoSQL store or kinesis
(I don’t know your exact use case in order to provide you more details)
> On 24. Jul 2018, at 09:51, Averell
Dear All,
I have questions regarding the keys. In general, the questions are:
what happens if I am doing keyBy based on unlimited number of keys? How Flink
is managing each KeyedStream under the hood? Will I get memory overflow, for
example, if every KeyStream associated with a specific key is
How often is the product db updated? Based on that you can store product
metadata as state in Flink, maybe setup the state on cluster startup and then
update daily etc.
Also, just based on this feature, flink doesn’t seem to add a lot of value on
top of Kafka. As Jorn said below, you can very
Just some update: I tried to enable "EMRFS Consistent View" option, but it
didn't help. Not sure whether that's what you recommended, or something
else.
Thanks!
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi Darshan,
The join implementation in SQL / Table API does what is demanded by the SQL
semantics.
Hence, what results to emit and also what data to store (state) to compute
these results is pretty much given.
You can think of the semantics of the join as writing both streams into a
relational
Hi Jörn,
Thanks. I had missed that EMRFS strong consistency configuration. Will try
that now.
We also had a backup solution - using Kinesis instead of S3 (I don't see
Kinesis in your suggestion, but hope that it would be alright).
"/The small size and high rate is not suitable for S3 or HDFS/"
It could be related to S3 that seems to be configured for eventual consistency.
Maybe it helps to configure strong consistency.
However, I recommend to replace S3 with a NoSQL database (since you are amazon
Dynamo would help + Dynamodb streams, alternatively sns or sqs). The small size
and
Hi Darshan,
In your use case, I think you can implement the outer join with DataStream
API ( use State + ProcessFunction + Timer ). Using suitable statue, you can
store 1 value per key and do not need to keep all the value's history for
every key.
And you can refer to Flink's implementation of
45 matches
Mail list logo