is
this? specific operator latency? because the end to end latency is around
50ms and 370 ms.
Was just curious how latency is seen from a different perspective, would
really help me in my understanding.
Thanks a lot,
Biplob
Thanks & Regards
Biplob Biswas
On Mon, Oct 30, 2017 at 8:53 AM, S
Hi Hayden,
>From what I know, "No KvStateLocation found for KvState instance with name
'word_sums'" is exactly what it means. Your current job can't find the
KVState instance. This could result due to a few reasons that I know of:
1. The jobID you supplied for the queryclient job is not equal to
Hi,
are you sure your jobmanager is running and is accessible from the supplied
hostname and port? If you can start up the FLink UI of the job which creates
your queryable state, it should have the details of the job manager and the
port to be used in this queryable client job.
--
Sent from:
Change the type of the mainstream from DataStream to
SingleOutputStreamOperator
The getSideOutput() function is not part of the base class DataStream rather
the extended Class SingleOutputStreamOperator
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
How are you determining your data is stale? Also if you want to know the key,
why don't you store the key in your state as well?
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
I just came across this slide about drizzle where they claim to achieve
sub-millisecond latency and they compare with Flink,
https://www.slideshare.net/SparkSummit/drizzlelow-latency-execution-for-apache-spark-spark-summit-east-talk-by-shivaram-venkataraman
The normal drizzle still performs a
Hi,
I am still stuck here, and I still couldn't find a way to make Avro accept
null values.
Any help here would be really appreciated.
Thanks,
Biplob
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi Till,
Thanks for the response. I was assuming that the Avro Serializer will
create a corresponding Avro schema with the Object class I provide. In that
respect, I did the following:
AvroSerializer txnAvroSerde = new
AvroSerializer<>(TransactionStateModel.class);
ValueStateDescriptor
Hi,
I am getting the following exception in my code, I can observe that there's
something wrong while serializing my Object, the class of which looks
something like this:
https://gist.github.com/revolutionisme/1eea5ccf5e1d4a5452f27a1fd5c05ff1
The exact cause it seems is some field inside my
Thanks a lot Gordon, that really helps a lot. :) One last thing, is there any
way to verify that an object has been serialized with a specific serializer?
except trying to deserialize with a different deserializer and failing?
--
View this message in context:
Can anyone please shed some light on this?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Avro-Serialization-and-RocksDB-Internal-State-tp14912p15002.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Hi Basanth,
AFAIK, CEP works like sessions window and a session is started for each
event which comes in and expires at the end of the time limit.
Technically the count is kept separately for each event, so there's no
reset. For ex, if you have 6 events,
1,2,3,4,5,6 (and they arrive in order
I am not really sure you can do that out of the box, if not, indeed that
should be possible in the near future.
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility
There are already plans for state migration (with
Could you clarify a bit more? Do you want an existing state on a running job
to be migrated from FsStateBackend to RocksDbStateBackend?
Or
Do you have the option of restarting your job after changing existing code?
--
View this message in context:
Hi Alex,
Your problem sounds interesting and I have always found dealing with
timestamps cumbersome.
Nevertheless, what I understand is that your start and end timsstamp for
American and European customers are based on their local clock.
For ex the start and end timestamp of 12 AM - 12 AM in
Hi,
This is somewhat related to my previous query here:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html
I was exploring Avro Serialization and in that regard I enabled the force
use of avro using,
Regarding timezones, you should probably convert your time to the unix
timestamp which will be consistent all over the world, and then you can
create your window based on this timestamp.
--
View this message in context:
Hi Stefan,
Thanks a lot for such a helpful response. That really made thing a lot
clearer for me. Although at this point I have one more and probably last
question.
According to the Flink documentation,
[Attention] Currently, as of Flink 1.3, if the result of the compatibility
check
Hi Aljoscha,
So basically, I am reading events from a kafka topic. These events have
corresponding eventIds and a list of modes.
*Job1*
1. When I first read from the kafka topic, I key by the eventId's and use a
processfuntion to create a state named "events". Also, the list of modes are
used
Hi,
We have a set of XSDs which define our schema for the data and we are using
the corresponding POJO's for serialization with Flink Serialization stack.
Now, I was concerned about any evolution of our XSD schema which will
subsequently change the generated POJO's which in turn are used for
Hi Aljoscha,
Thanks for the link. I read through it but I still can't imagine
implementing something similar for my usecase.
I explained my usecase to Fabian in a previous post, I would try to be again
as clear as possible.
So, my use case, in short, is something like below:
1) From input
Thanks Aljoscha, this clarification probably ends my search of accessing
local states from within the same job.
Thanks for the help :)
--
View this message in context:
Hi Aljoscha,
I was expecting that I could set the jobmanager address and port by setting
it up in the configuration and passing it to the execution environment, but
learnt later that it was a wrong approach.
My motivation of accessing the jobmanager coordinates was to setup a
Hi nico,
This behaviour was on my cluster and not on the local mode as I wanted to
check whether it's an issue of my job or the behaviour with jobmanager is
consistent everywhere.
When I run my job on the yarn-cluster mode, it's not honouring the IP and
port I specified and its randomly
Hi Nico,
I had actually tried doing that but I still get the same error as before
with the actor not found. I then ran on my mock cluster and I was getting
the same error although I could observe the jobmanager on the yarn cluster
mode with a defined port.
The addres and port combination was
Also, is it possible to get the JobID from a running flink instance for a
streaming job?
I know I can get for a batch job with
ExecutionEnvironment.getExecutionEnvironment().getId() but apparently, it
generates a new execution environment and returns the job id of that
environment for a batch
Hi,
Is there a way to fetch the jobmanager address and port from a running flink
job, I was expecting the address and port to be constant but it changes
everytime I am running a job. ANd somehow its not honoring the
jobmanager.rpc.address and jobmanager.rpc.port set in the flink-conf.yaml
file.
I managed to get the Web UI up and running but I am still getting the error
with "Actor not found"
Before the job failed I got the output for the Flink config from the WebUI
and it seems okay to me, this corresponds to the config I have already set.
When I start my flink job I get the following warning, if I am not wrong this
is because it can't find the jobmanager at the given address(localhost), I
tried changing:
config.setString(JobManagerOptions.ADDRESS, "localhost");
to LAN IP, 127.0.0.1 and localhost but none of it seems to work. I
Hi,
I had a simple query as to how POJO's are stored in a state back end like
RocksDB? Is it deserialized internally(with a default serde or we have to
specify something)? and if yes, is Kryo the default serde?
Thanks,
Biplob
--
View this message in context:
Hi Fabian,
I am not really sure using CoProcessFunction would be useful for my use
case. My use case, in short, can be explained as follows:
1) create 2 different local state store, where both have 1-N relationship.
For eg. 1 -> [A,B,C] and A -> [1,2,3]
2) Based on the key A, get list of
Hi Fabian,
Thanks for the insight, I am currently exploring QueryableStateClient and
would attempt to get the value for a corresponding key using the
getkvstate() function,
I was confused about the jobId but I am expecting this would provide me with
the jobid of the current job -
Hi Fabian,
I read about the process function and it seems a perfect fit for my
requirement. Although while reading more about queryable-state I found that
its not meant to do lookups within job (Your comment in the following link).
Hi Fabian,
Thanks a lot for pointing that out would read about it and give it a try.
Regards,
Biplob
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-QueryableState-with-Sliding-Window-on-RocksDB-tp14514p14551.html
Sent from the
Thanks Fabian for the reply, I was reconsidering my design and the
requirement and what I mentioned already is partially confusing.
I realized that using a sessionwindow is better in this scenario where I
want a value to be updated per key and the session resets to wait for the
gap period with
Hi,
We recently moved from Spark Streaming to Flink for our stream processing
requirements in our organization and we are in the process of removing the
number of external calls as much as possible. Earlier we were using HBASE to
store the incoming data, but we now want to try out stateful
Hi Kostas,
I ended up setting my
currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp);
to
currentMaxTimestamp = Math.min(timestamp, currentMaxTimestamp);
and changing this :
if(firstEventFlag && (currentTime - systemTimeSinceLastModification >
1)){
I know that there wouldn't be a scenario where the first event type(coming
from topic t1) would be coming with a timestamp higher than the current
watermark. Although I am still investigating whether the other events from
other topics (specifically t3 and t4) are arriving after the watermark
Hi Kostas,
Yes, I have a flag in my timestampextractor.
As you can see from the code below, I am checking whether
currentTime - systemTimeSinceLastModification > 10 sec. as new events
come then the watermark wouldn't be incremented. But as soon as I have a
difference of more than 10
Hi Kostas,
I have out-of-orderness of around 5 seconds from what I have observed but
that too from events coming from a different topic. The initial topic
doesn't have out-of-order events still I have added a generous time bound of
20 seconds. Still, I will try for a higher number just in order
But if that's the case, I don't understand why some of my events are just
lost If the watermark which is used is the smallest ... They either I
expect a match or I expect a timed out event.
The only way I can imagine my events getting lost is higher watermark than
the incoming event and
Hi dawid,
Yes I am reading from multiple topics and yes a few topics have multiple
partitions, not all of them.
But I didn't understand the concept of stalled partition.
--
View this message in context:
Hi Kostas,
Implementing my custom timestamp assigned made em realise a problem which we
have in our architecture you may say.
Any inputs would be really appreciated.
So, for now, we are reading from 4 different kafka topics, and we have a
flow similar to something like this:
Event 1(on topic
Hi Kostas,
Thanks for that suggestion, I would try that next, I have out of order
events on one of my Kafka topics and that's why I am using
BoundedOutOfOrdernessTimestampExtractor(), now that this doesn't work as
expected I would try to work with the Base class as you suggested.
Although this
Hi Andrea,
If you are using Flink for research and/or testing purpose, standalone Flink
is more or less sufficient. Although if you have a huge amount of data, it
may take forever to process data with only one node/machine and that's where
a cluster would be needed. A yarn and mesos cluster could
Hi Kostas,
Thanks for the reply, makes things a bit more clear.
Also, I went through this link and it is something similar I am trying to
observe.
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Listening-to-timed-out-patterns-in-Flink-CEP-td9371.html
I am checking for
new event comes in the system. So, I
added the setAutoWatermarkInterval(1000) to the code but no avail.
Thanks & Regards,
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-CEP-not-emitting-timed-out-events-properly-tp1
Thanks a lot Aljoscha (again)
I created my project from scratch and used the flink-maven-archetype and now
it works on the yarn-cluster mode. I was creating a fat jar initially as
well with my old project setup so not really sure what went wrong there as
it was working on my local test
One more thing i noticed is that the streaming wordcount from the flink
package works when i run it but when i used the same github code, packaged
it and uploaded the fat jar with the word count example to the cluster, i
get the same error.
I am wondering, How can making my uber jar produce such
Hi Nico,
I tried running my job with 3 and even 2 yarn containers and the result is
the same. Then I tried running the example wordcount(streaming and batch
both) and they seem to find the task and job managers and run succesfully.
./bin/flink run -m yarn-cluster -yn 3 -yt ./lib
I am trying to run a flink job on our cluster with 3 dn and 1 nn. I am usng
the following command line argument to run this job, but I get an exception
saying "Could not connect to the leading JobManager. Please check that the
JobManager is running" ... what could I be doing wrong?
Surprisingly,
Hi,
Can anyone check, whether they can reproduce this issue on their end?
There's no log yet as t what is happening. Any idea to debug this issue is
well appreciated.
Regards,
Biplob
--
View this message in context:
Hi Dawid,
Yes, now I understood what you meant. Although I added exactly the input you
asked me to and I still get no alerts.
I also observed that I am not getting alerts even with normal ordering of
timestamp and with ascedingTimestampExtractor.
I am adding an image where I entered the data
Hi Dawid,
What you wrote is exactly correct, it wouldn't generate a new waatermark
(and subsequently throw events) unless maxOutOfOrderness time is elapsed.
Thus, I was expecting for alerts to be raised as the stream was out of order
but not out of maxOutOfOrderness.
Nevertheless I tried your
Sorry to bombard with so many messages , but one last thing is the example
would produce alert if the line specifying Event Time is commented out.
More specifically, this one:
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Only with event time, there is no alert.
Thanks,
Also, my test environment was Flink 1.4-Snapshot with Kafka 0.10.0 on HDP
2.5.
And I sent my test messages via the console producer.
Thanks,
Biplob
--
View this message in context:
Thanks a lot, Till and Dawid for such detailed reply.
I tried to check and wait what both of you suggested and I still have no
events. Thus as pointed out by till, I created a self-contained example to
reproduce the issue and the behaviour is the same as was in my original
case.
Please find the
Hi ,
Thanks a lot for the help last time, I have a few more questions and I chose
to create a new topic as the problem in the previous topic was solved,
thanks to useful inputs from Flink Community. The questions are as follows
*1.* What time does the "within" operator works on "Event Time" or
Hi,
I am sorry it worked with the BoundedOutOfOrdernessTimestampExtractor,
somehow I replayed my events from kafka and the older events were also on
the bus and it didnt correlate with my new events.
Now i cleaned up my code and restarted it from the begninning and it works.
Thanks a lot for
Hi Kostas,
I am okay with processing time at the moment but as my events already have a
creation timestamp added to them and also to explore further the event time
aspect with FlinkCEP, I proceeded further with evaluating with event time.
For this I tried both
1. AscendingTimestampExtractor:
Hi Kostas,
My application didn't have any timestamp extractor nor my events had any
timestamp. Still I was using event time for processing it, probably that's
why it was blocked.
Now I removed the part where I mention timechracteristics as Event time and
it works now.
For example:
Previously:
Hi Dawid,
Thanks for the response. Timeout patterns work like a charm, I saw it
previously but didn't understood what it does, thanks for explaining that.
Also, my problem with no alerts is solved now.
The problem was that I was using "Event Time" for processing whereas my
events didn't have
Hello Kostas,
Thanks for the suggestions.
I checked and I am getting my events in the partitionedInput stream when i
am printing it but still nothing on the alert side. I checked flink UI for
backpressure and all seems to be normal (I am having at max 1000 events per
second on the kafka topic so
Hi,
I just started exploring Flink CEP a day back and I thought I can use it to
make a simple event processor. For that I looked into the CEP examples by
Till and some other articles.
Now I have 2 questions which i would like to ask:
*Part 1:*
I came up with the following piece of code, but
Hi,
I want to run test my flink streaming code, and thus I want to run flink
streaming jobs with different parameters one by one.
So, when one job finishes after it doesn't receive new data points for
sometime , the next job with a different set of parameter should start.
For this, I am
Hi Max,
Yeah I tried that and its definitely better. Only a few points go missing
compared to a huge amount in the beginning. For now, its good for me and my
work.
Thanks a lot for the workaround.
-Biplob
--
View this message in context:
Thanks a ton, Till.
That worked. Thank you so much.
-Biplob
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-t-access-Flink-Dashboard-at-8081-running-Flink-program-using-Eclipse-tp8016p8035.html
Sent from the Apache Flink User Mailing
d connect to it using the getRemoteExecutionEnvironment(). That will
>> allow
>> you to access the jobs statuses on the dashboard when you finish running
>> your job.
>>
>> Sameer
>>
>> On Tue, Jul 19, 2016 at 6:35 AM, Biplob Biswas
> revolutionisme@
to access the jobs statuses on the dashboard when you finish
> running your job.
>
> Sameer
>
> On Tue, Jul 19, 2016 at 6:35 AM, Biplob Biswas
> revolutionisme@
>
> wrote:
>
>> Hi,
>>
>> I am running my flink program using Eclipse and I can't
Hi,
I am running my flink program using Eclipse and I can't access the dashboard
at http://localhost:8081, can someone help me with this?
I read that I need to check my flink-conf.yaml, but its a maven project and
I don't have a flink-conf.
Any help would be really appreciated.
Thanks a lot
Hi Ufuk,
Thanks for the update, is there any known way to fix this issue? Any
workaround that you know of, which I can try?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8015.html
Sent from
Hi Ufuk,
Did you get time to go through my issue, just wanted to follow up to see
whether I can get a solution or not.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8010.html
Sent from the
Hi,
Sorry for the late reply, was trying different stuff on my code. And from
what I observed, its very weird for me.
So after experimentation, I found out that when I increase the number of
centroids, the number of data points forwarded decreases, when I lower the
umber of centroids, the
Thanks a lot, would really appreciate it.
Also. please let me know if you don't understand it well, the documentation
is not really great at the moment in the code.
--
View this message in context:
I have sent you my code in a separate email, I hope you can solve my issue.
Thanks a lot
Biplob
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p7798.html
Sent from the Apache Flink User
Can anyone check this once, and help me out with this?
I would be really obliged.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p7795.html
Sent from the Apache Flink User Mailing List
total number of datapoints.
I don't really know what is happening here exactly, why would the number of
data points reduce like that suddenly?
Thanks and Regards
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-mi
and not as fast as the first one.
Thanks for replying though, will try that out.
Regards
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Way-to-hold-execution-of-one-of-the-map-operator-in-Co-FlatMaps-tp7689p7699.html
Sent from
Hi,
I was wondering whether it is possible to stop streaming data in from one of
the map operators until some data arrives in the second map operator.
For ex,
if i have ds1.connect(ds2).map(new coflatmapper())
then, i need data to stop flowing from ds1 until some data arrives in ds2.
Is
Hi,
I want to keep the latest data point which is processed in a datastream
variable. So technically I need just one value in the variable and discard
all the older ones.
Can this be done somehow? I was thinking about using filters but i don't
think i can use it for this scenario.
Any ideas as
Yes Thanks a lot, also the fact that I was using ParallelSourceFunction was
problematic. So as suggested by Fabian and Robert, I used Source Function
and then in the flink job, i set the output of map with a parallelism of 4
to get the desired result.
Thanks again.
--
View this message in
Hi Aljoscha,
I went to the Flink hackathon by Buzzwords yesterday where Fabian and Robert
helped me with this issue. Apparently I was assuming that the file would be
handled in a single thread but I was using parallelsourcefunction and it was
creating 4 different threads and thus reading the same
://pastebin.com/NenvXShH
I hope to get a solution out of it.
Thanks and Regards
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392p7405.html
Sent from the Apache
and Regards
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
Hi,
Before giving the method u described above a try, i tried adding the
timestamp with my data directly at the stream source.
Following is my stream source:
http://pastebin.com/AsXiStMC
and I am using the stream source as follows:
DataStream tuples = env.addSource(new
well.
Any help is much appreciated.
Thanks a lot.
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Extracting-Timestamp-in-MapFunction-tp7240.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
Aah, thanks a lot for that insight. Pretty new to the Flink systems and
learning on my own so prone to making mistakes.
Thanks a lot for helping.
--
View this message in context:
Hi,
I am trying to send some static integer values down to each map function,
using the following code
public static void main(String[] args) throws Exception {
ParameterTool params = ParameterTool.fromArgs(args);
String
and also,
"Incorrect number of arguments for type
IterativeStream.ConnectedIterativeStreams; it cannot be parameterized with
arguments <Long[], Long, Long[]>"
What a I doing wrong here?
Thanks and Regards
Biplob Biswas
Ufuk Celebi wrote
> Please provide the error message a
Hi,
Is there a way to connect 2 datastreams and iterate and then get the
feedback as a third stream?
I tried doing
mergedDS = datastream1.connect(datastream2)
iterateDS = mergedDS.iterate().withFeedback(datastream3type.class)
but this didnt work and it was showing me errors.
Is there any
I am also a newbie but from what i experienced during my experiments is that
...The same implementation doesnt work for the streaming context because
1) In streaming context the stream is assumed to be infinite so the process
of iteration is also infinite and the part with which you close your
Hi,
i read that article already but it is very simplistic and thus based on that
article and other examples, i was trying to understand how my centroids can
be sent to all the partitions and update accordingly.
I also understood that the order of the input and the feedback stream cant
be
tml>
Because its not working the way i am expecting it to be and the inputstream
is completely consumed before anything is sent back and iterated.
Could you please send me to a proper direction and help me in understanding
the things properly?
Thanks and Regards
Biplob Biswas
--
View this m
Can anyone help me understand how the out.collect() and the corresponding
broadcast) is working?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unexpected-behaviour-in-datastream-broadcast-tp6848p6925.html
Sent from the Apache Flink User
Hi,
I am running this following sample code to understand how iteration and
broadcast works in streaming context.
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(4);
long i = 5;
Hi Gyula,
I tried doing something like the following in the 2 flatmaps, but i am not
getting desired results and still confused how the concept you put forward
would work:
public static final class MyCoFlatmap implements CoFlatMapFunction{
t());/
but i can't see the centroids already broadcasted by
centroidStream.broadcast() in the map functions.
Any kind of help is hugely appreciated.
Thanks and Regards
Biplob Biswas
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-source-Par
for your help throughout.
Regards
Biplob Biswas
Gyula Fóra wrote
> Hi,
>
> Iterating after every incoming point/centroid update means that you
> basically defeat the purpose of having parallelism in your Flink job.
>
> If you only "sync" the centroids periodically by
that this operation is not done in
>> parallel as if streams are sent in parallel how would I ensure correct
>> update of the centroids as multiple points can try to update the same
>> centroid in parallel .
>>
>> I hope I made myself clear with this.
>>
>>
as if streams are sent in parallel how would I ensure correct
update of the centroids as multiple points can try to update the same
centroid in parallel .
I hope I made myself clear with this.
Thanks and Regards
Biplob
Biplob Biswas wrote
> Hi Gyula,
>
> I read your workaround and starte
1 - 100 of 110 matches
Mail list logo