Re: state size effects latency

2017-10-30 Thread Biplob Biswas
is this? specific operator latency? because the end to end latency is around 50ms and 370 ms. Was just curious how latency is seen from a different perspective, would really help me in my understanding. Thanks a lot, Biplob Thanks & Regards Biplob Biswas On Mon, Oct 30, 2017 at 8:53 AM, S

Re: QueryableState - No KvStateLocation found for KvState instance

2017-09-13 Thread Biplob Biswas
Hi Hayden, >From what I know, "No KvStateLocation found for KvState instance with name 'word_sums'" is exactly what it means. Your current job can't find the KVState instance. This could result due to a few reasons that I know of: 1. The jobID you supplied for the queryclient job is not equal to

Re: Queryable State

2017-09-13 Thread Biplob Biswas
Hi, are you sure your jobmanager is running and is accessible from the supplied hostname and port? If you can start up the FLink UI of the job which creates your queryable state, it should have the details of the job manager and the port to be used in this queryable client job. -- Sent from:

Re: Fwd: some question about side output

2017-09-06 Thread Biplob Biswas
Change the type of the mainstream from DataStream to SingleOutputStreamOperator The getSideOutput() function is not part of the base class DataStream rather the extended Class SingleOutputStreamOperator -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Process Function

2017-09-05 Thread Biplob Biswas
How are you determining your data is stale? Also if you want to know the key, why don't you store the key in your state as well? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Optimized-Drizzle vs Flink

2017-09-04 Thread Biplob Biswas
I just came across this slide about drizzle where they claim to achieve sub-millisecond latency and they compare with Flink, https://www.slideshare.net/SparkSummit/drizzlelow-latency-execution-for-apache-spark-spark-summit-east-talk-by-shivaram-venkataraman The normal drizzle still performs a

Re: Expception with Avro Serialization on RocksDBStateBackend

2017-08-31 Thread Biplob Biswas
Hi, I am still stuck here, and I still couldn't find a way to make Avro accept null values. Any help here would be really appreciated. Thanks, Biplob -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Expception with Avro Serialization on RocksDBStateBackend

2017-08-24 Thread Biplob Biswas
Hi Till, Thanks for the response. I was assuming that the Avro Serializer will create a corresponding Avro schema with the Object class I provide. In that respect, I did the following: AvroSerializer txnAvroSerde = new AvroSerializer<>(TransactionStateModel.class); ValueStateDescriptor

Expception with Avro Serialization on RocksDBStateBackend

2017-08-22 Thread Biplob Biswas
Hi, I am getting the following exception in my code, I can observe that there's something wrong while serializing my Object, the class of which looks something like this: https://gist.github.com/revolutionisme/1eea5ccf5e1d4a5452f27a1fd5c05ff1 The exact cause it seems is some field inside my

Re: Avro Serialization and RocksDB Internal State

2017-08-18 Thread Biplob Biswas
Thanks a lot Gordon, that really helps a lot. :) One last thing, is there any way to verify that an object has been serialized with a specific serializer? except trying to deserialize with a different deserializer and failing? -- View this message in context:

Re: Avro Serialization and RocksDB Internal State

2017-08-18 Thread Biplob Biswas
Can anyone please shed some light on this? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Avro-Serialization-and-RocksDB-Internal-State-tp14912p15002.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: Flink CEP questions

2017-08-18 Thread Biplob Biswas
Hi Basanth, AFAIK, CEP works like sessions window and a session is started for each event which comes in and expires at the end of the time limit. Technically the count is kept separately for each event, so there's no reset. For ex, if you have 6 events, 1,2,3,4,5,6 (and they arrive in order

Re: Change state backend.

2017-08-17 Thread Biplob Biswas
I am not really sure you can do that out of the box, if not, indeed that should be possible in the near future. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility There are already plans for state migration (with

Re: Change state backend.

2017-08-16 Thread Biplob Biswas
Could you clarify a bit more? Do you want an existing state on a running job to be migrated from FsStateBackend to RocksDbStateBackend? Or Do you have the option of restarting your job after changing existing code? -- View this message in context:

Re: Time zones problem

2017-08-16 Thread Biplob Biswas
Hi Alex, Your problem sounds interesting and I have always found dealing with timestamps cumbersome. Nevertheless, what I understand is that your start and end timsstamp for American and European customers are based on their local clock. For ex the start and end timestamp of 12 AM - 12 AM in

Avro Serialization and RocksDB Internal State

2017-08-15 Thread Biplob Biswas
Hi, This is somewhat related to my previous query here: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html I was exploring Avro Serialization and in that regard I enabled the force use of avro using,

Re: Time zones problem

2017-08-15 Thread Biplob Biswas
Regarding timezones, you should probably convert your time to the unix timestamp which will be consistent all over the world, and then you can create your window based on this timestamp. -- View this message in context:

Re: Evolving serializers and impact on flink managed states

2017-08-11 Thread Biplob Biswas
Hi Stefan, Thanks a lot for such a helpful response. That really made thing a lot clearer for me. Although at this point I have one more and probably last question. According to the Flink documentation, [Attention] Currently, as of Flink 1.3, if the result of the compatibility check

Re: Getting JobManager address and port within a running job

2017-08-09 Thread Biplob Biswas
Hi Aljoscha, So basically, I am reading events from a kafka topic. These events have corresponding eventIds and a list of modes. *Job1* 1. When I first read from the kafka topic, I key by the eventId's and use a processfuntion to create a state named "events". Also, the list of modes are used

Evolving serializers and impact on flink managed states

2017-08-09 Thread Biplob Biswas
Hi, We have a set of XSDs which define our schema for the data and we are using the corresponding POJO's for serialization with Flink Serialization stack. Now, I was concerned about any evolution of our XSD schema which will subsequently change the generated POJO's which in turn are used for

Re: Getting JobManager address and port within a running job

2017-08-09 Thread Biplob Biswas
Hi Aljoscha, Thanks for the link. I read through it but I still can't imagine implementing something similar for my usecase. I explained my usecase to Fabian in a previous post, I would try to be again as clear as possible. So, my use case, in short, is something like below: 1) From input

Re: Can't find correct JobManager address, job fails with Queryable state

2017-08-09 Thread Biplob Biswas
Thanks Aljoscha, this clarification probably ends my search of accessing local states from within the same job. Thanks for the help :) -- View this message in context:

Re: Getting JobManager address and port within a running job

2017-08-09 Thread Biplob Biswas
Hi Aljoscha, I was expecting that I could set the jobmanager address and port by setting it up in the configuration and passing it to the execution environment, but learnt later that it was a wrong approach. My motivation of accessing the jobmanager coordinates was to setup a

Re: Getting JobManager address and port within a running job

2017-08-03 Thread Biplob Biswas
Hi nico, This behaviour was on my cluster and not on the local mode as I wanted to check whether it's an issue of my job or the behaviour with jobmanager is consistent everywhere. When I run my job on the yarn-cluster mode, it's not honouring the IP and port I specified and its randomly

Re: Can't find correct JobManager address, job fails with Queryable state

2017-08-03 Thread Biplob Biswas
Hi Nico, I had actually tried doing that but I still get the same error as before with the actor not found. I then ran on my mock cluster and I was getting the same error although I could observe the jobmanager on the yarn cluster mode with a defined port. The addres and port combination was

Re: Getting JobManager address and port within a running job

2017-08-03 Thread Biplob Biswas
Also, is it possible to get the JobID from a running flink instance for a streaming job? I know I can get for a batch job with ExecutionEnvironment.getExecutionEnvironment().getId() but apparently, it generates a new execution environment and returns the job id of that environment for a batch

Getting JobManager address and port within a running job

2017-08-03 Thread Biplob Biswas
Hi, Is there a way to fetch the jobmanager address and port from a running flink job, I was expecting the address and port to be constant but it changes everytime I am running a job. ANd somehow its not honoring the jobmanager.rpc.address and jobmanager.rpc.port set in the flink-conf.yaml file.

Re: Can't find correct JobManager address, job fails with Queryable state

2017-08-03 Thread Biplob Biswas
I managed to get the Web UI up and running but I am still getting the error with "Actor not found" Before the job failed I got the output for the Flink config from the WebUI and it seems okay to me, this corresponds to the config I have already set.

Can't find correct JobManager address, job fails with Queryable state

2017-08-02 Thread Biplob Biswas
When I start my flink job I get the following warning, if I am not wrong this is because it can't find the jobmanager at the given address(localhost), I tried changing: config.setString(JobManagerOptions.ADDRESS, "localhost"); to LAN IP, 127.0.0.1 and localhost but none of it seems to work. I

Storing POJO's to RocksDB state backend

2017-08-02 Thread Biplob Biswas
Hi, I had a simple query as to how POJO's are stored in a state back end like RocksDB? Is it deserialized internally(with a default serde or we have to specify something)? and if yes, is Kryo the default serde? Thanks, Biplob -- View this message in context:

Re: Flink QueryableState with Sliding Window on RocksDB

2017-08-01 Thread Biplob Biswas
Hi Fabian, I am not really sure using CoProcessFunction would be useful for my use case. My use case, in short, can be explained as follows: 1) create 2 different local state store, where both have 1-N relationship. For eg. 1 -> [A,B,C] and A -> [1,2,3] 2) Based on the key A, get list of

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Hi Fabian, Thanks for the insight, I am currently exploring QueryableStateClient and would attempt to get the value for a corresponding key using the getkvstate() function, I was confused about the jobId but I am expecting this would provide me with the jobid of the current job -

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Hi Fabian, I read about the process function and it seems a perfect fit for my requirement. Although while reading more about queryable-state I found that its not meant to do lookups within job (Your comment in the following link).

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Hi Fabian, Thanks a lot for pointing that out would read about it and give it a try. Regards, Biplob -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-QueryableState-with-Sliding-Window-on-RocksDB-tp14514p14551.html Sent from the

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Thanks Fabian for the reply, I was reconsidering my design and the requirement and what I mentioned already is partially confusing. I realized that using a sessionwindow is better in this scenario where I want a value to be updated per key and the session resets to wait for the gap period with

Flink QueryableState with Sliding Window on RocksDB

2017-07-28 Thread Biplob Biswas
Hi, We recently moved from Spark Streaming to Flink for our stream processing requirements in our organization and we are in the process of removing the number of external calls as much as possible. Earlier we were using HBASE to store the incoming data, but we now want to try out stateful

Re: Flink CEP not emitting timed out events properly

2017-06-26 Thread Biplob Biswas
Hi Kostas, I ended up setting my currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp); to currentMaxTimestamp = Math.min(timestamp, currentMaxTimestamp); and changing this : if(firstEventFlag && (currentTime - systemTimeSinceLastModification > 1)){

Re: Flink CEP not emitting timed out events properly

2017-06-20 Thread Biplob Biswas
I know that there wouldn't be a scenario where the first event type(coming from topic t1) would be coming with a timestamp higher than the current watermark. Although I am still investigating whether the other events from other topics (specifically t3 and t4) are arriving after the watermark

Re: Flink CEP not emitting timed out events properly

2017-06-20 Thread Biplob Biswas
Hi Kostas, Yes, I have a flag in my timestampextractor. As you can see from the code below, I am checking whether currentTime - systemTimeSinceLastModification > 10 sec. as new events come then the watermark wouldn't be incremented. But as soon as I have a difference of more than 10

Re: Flink CEP not emitting timed out events properly

2017-06-20 Thread Biplob Biswas
Hi Kostas, I have out-of-orderness of around 5 seconds from what I have observed but that too from events coming from a different topic. The initial topic doesn't have out-of-order events still I have added a generous time bound of 20 seconds. Still, I will try for a higher number just in order

Re: Flink CEP not emitting timed out events properly

2017-06-20 Thread Biplob Biswas
But if that's the case, I don't understand why some of my events are just lost If the watermark which is used is the smallest ... They either I expect a match or I expect a timed out event. The only way I can imagine my events getting lost is higher watermark than the incoming event and

Re: Queries regarding FlinkCEP

2017-06-20 Thread Biplob Biswas
Hi dawid, Yes I am reading from multiple topics and yes a few topics have multiple partitions, not all of them. But I didn't understand the concept of stalled partition. -- View this message in context:

Re: Flink CEP not emitting timed out events properly

2017-06-20 Thread Biplob Biswas
Hi Kostas, Implementing my custom timestamp assigned made em realise a problem which we have in our architecture you may say. Any inputs would be really appreciated. So, for now, we are reading from 4 different kafka topics, and we have a flow similar to something like this: Event 1(on topic

Re: Flink CEP not emitting timed out events properly

2017-06-16 Thread Biplob Biswas
Hi Kostas, Thanks for that suggestion, I would try that next, I have out of order events on one of my Kafka topics and that's why I am using BoundedOutOfOrdernessTimestampExtractor(), now that this doesn't work as expected I would try to work with the Base class as you suggested. Although this

Re: How choose between YARN/Mesos/StandAlone Flink

2017-06-16 Thread Biplob Biswas
Hi Andrea, If you are using Flink for research and/or testing purpose, standalone Flink is more or less sufficient. Although if you have a huge amount of data, it may take forever to process data with only one node/machine and that's where a cluster would be needed. A yarn and mesos cluster could

Re: Flink CEP not emitting timed out events properly

2017-06-16 Thread Biplob Biswas
Hi Kostas, Thanks for the reply, makes things a bit more clear. Also, I went through this link and it is something similar I am trying to observe. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Listening-to-timed-out-patterns-in-Flink-CEP-td9371.html I am checking for

Flink CEP not emitting timed out events properly

2017-06-16 Thread Biplob Biswas
new event comes in the system. So, I added the setAutoWatermarkInterval(1000) to the code but no avail. Thanks & Regards, Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-CEP-not-emitting-timed-out-events-properly-tp1

Re: Error running Flink job in Yarn-cluster mode

2017-06-13 Thread Biplob Biswas
Thanks a lot Aljoscha (again) I created my project from scratch and used the flink-maven-archetype and now it works on the yarn-cluster mode. I was creating a fat jar initially as well with my old project setup so not really sure what went wrong there as it was working on my local test

Re: Error running Flink job in Yarn-cluster mode

2017-06-09 Thread Biplob Biswas
One more thing i noticed is that the streaming wordcount from the flink package works when i run it but when i used the same github code, packaged it and uploaded the fat jar with the word count example to the cluster, i get the same error. I am wondering, How can making my uber jar produce such

Re: Error running Flink job in Yarn-cluster mode

2017-06-09 Thread Biplob Biswas
Hi Nico, I tried running my job with 3 and even 2 yarn containers and the result is the same. Then I tried running the example wordcount(streaming and batch both) and they seem to find the task and job managers and run succesfully. ./bin/flink run -m yarn-cluster -yn 3 -yt ./lib

Error running Flink job in Yarn-cluster mode

2017-06-08 Thread Biplob Biswas
I am trying to run a flink job on our cluster with 3 dn and 1 nn. I am usng the following command line argument to run this job, but I get an exception saying "Could not connect to the leading JobManager. Please check that the JobManager is running" ... what could I be doing wrong? Surprisingly,

Re: Queries regarding FlinkCEP

2017-06-08 Thread Biplob Biswas
Hi, Can anyone check, whether they can reproduce this issue on their end? There's no log yet as t what is happening. Any idea to debug this issue is well appreciated. Regards, Biplob -- View this message in context:

Re: Queries regarding FlinkCEP

2017-06-07 Thread Biplob Biswas
Hi Dawid, Yes, now I understood what you meant. Although I added exactly the input you asked me to and I still get no alerts. I also observed that I am not getting alerts even with normal ordering of timestamp and with ascedingTimestampExtractor. I am adding an image where I entered the data

Re: Queries regarding FlinkCEP

2017-06-06 Thread Biplob Biswas
Hi Dawid, What you wrote is exactly correct, it wouldn't generate a new waatermark (and subsequently throw events) unless maxOutOfOrderness time is elapsed. Thus, I was expecting for alerts to be raised as the stream was out of order but not out of maxOutOfOrderness. Nevertheless I tried your

Re: Queries regarding FlinkCEP

2017-06-06 Thread Biplob Biswas
Sorry to bombard with so many messages , but one last thing is the example would produce alert if the line specifying Event Time is commented out. More specifically, this one: env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); Only with event time, there is no alert. Thanks,

Re: Queries regarding FlinkCEP

2017-06-06 Thread Biplob Biswas
Also, my test environment was Flink 1.4-Snapshot with Kafka 0.10.0 on HDP 2.5. And I sent my test messages via the console producer. Thanks, Biplob -- View this message in context:

Re: Queries regarding FlinkCEP

2017-06-06 Thread Biplob Biswas
Thanks a lot, Till and Dawid for such detailed reply. I tried to check and wait what both of you suggested and I still have no events. Thus as pointed out by till, I created a self-contained example to reproduce the issue and the behaviour is the same as was in my original case. Please find the

Queries regarding FlinkCEP

2017-06-02 Thread Biplob Biswas
Hi , Thanks a lot for the help last time, I have a few more questions and I chose to create a new topic as the problem in the previous topic was solved, thanks to useful inputs from Flink Community. The questions are as follows *1.* What time does the "within" operator works on "Event Time" or

Re: No Alerts with FinkCEP

2017-05-31 Thread Biplob Biswas
Hi, I am sorry it worked with the BoundedOutOfOrdernessTimestampExtractor, somehow I replayed my events from kafka and the older events were also on the bus and it didnt correlate with my new events. Now i cleaned up my code and restarted it from the begninning and it works. Thanks a lot for

Re: No Alerts with FinkCEP

2017-05-31 Thread Biplob Biswas
Hi Kostas, I am okay with processing time at the moment but as my events already have a creation timestamp added to them and also to explore further the event time aspect with FlinkCEP, I proceeded further with evaluating with event time. For this I tried both 1. AscendingTimestampExtractor:

Re: No Alerts with FinkCEP

2017-05-31 Thread Biplob Biswas
Hi Kostas, My application didn't have any timestamp extractor nor my events had any timestamp. Still I was using event time for processing it, probably that's why it was blocked. Now I removed the part where I mention timechracteristics as Event time and it works now. For example: Previously:

Re: No Alerts with FinkCEP

2017-05-31 Thread Biplob Biswas
Hi Dawid, Thanks for the response. Timeout patterns work like a charm, I saw it previously but didn't understood what it does, thanks for explaining that. Also, my problem with no alerts is solved now. The problem was that I was using "Event Time" for processing whereas my events didn't have

Re: No Alerts with FinkCEP

2017-05-26 Thread Biplob Biswas
Hello Kostas, Thanks for the suggestions. I checked and I am getting my events in the partitionedInput stream when i am printing it but still nothing on the alert side. I checked flink UI for backpressure and all seems to be normal (I am having at max 1000 events per second on the kafka topic so

No Alerts with FinkCEP

2017-05-26 Thread Biplob Biswas
Hi, I just started exploring Flink CEP a day back and I thought I can use it to make a simple event processor. For that I looked into the CEP examples by Till and some other articles. Now I have 2 questions which i would like to ask: *Part 1:* I came up with the following piece of code, but

Running multiple Flink Streaming Jobs, one by one

2016-07-20 Thread Biplob Biswas
Hi, I want to run test my flink streaming code, and thus I want to run flink streaming jobs with different parameters one by one. So, when one job finishes after it doesn't receive new data points for sometime , the next job with a different set of parameter should start. For this, I am

Re: Data point goes missing within iteration

2016-07-20 Thread Biplob Biswas
Hi Max, Yeah I tried that and its definitely better. Only a few points go missing compared to a huge amount in the beginning. For now, its good for me and my work. Thanks a lot for the workaround. -Biplob -- View this message in context:

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
Thanks a ton, Till. That worked. Thank you so much. -Biplob -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-t-access-Flink-Dashboard-at-8081-running-Flink-program-using-Eclipse-tp8016p8035.html Sent from the Apache Flink User Mailing

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
d connect to it using the getRemoteExecutionEnvironment(). That will >> allow >> you to access the jobs statuses on the dashboard when you finish running >> your job. >> >> Sameer >> >> On Tue, Jul 19, 2016 at 6:35 AM, Biplob Biswas > revolutionisme@

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
to access the jobs statuses on the dashboard when you finish > running your job. > > Sameer > > On Tue, Jul 19, 2016 at 6:35 AM, Biplob Biswas > revolutionisme@ > > wrote: > >> Hi, >> >> I am running my flink program using Eclipse and I can't

Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
Hi, I am running my flink program using Eclipse and I can't access the dashboard at http://localhost:8081, can someone help me with this? I read that I need to check my flink-conf.yaml, but its a maven project and I don't have a flink-conf. Any help would be really appreciated. Thanks a lot

Re: Data point goes missing within iteration

2016-07-19 Thread Biplob Biswas
Hi Ufuk, Thanks for the update, is there any known way to fix this issue? Any workaround that you know of, which I can try? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8015.html Sent from

Re: Data point goes missing within iteration

2016-07-19 Thread Biplob Biswas
Hi Ufuk, Did you get time to go through my issue, just wanted to follow up to see whether I can get a solution or not. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8010.html Sent from the

Re: Data point goes missing within iteration

2016-07-13 Thread Biplob Biswas
Hi, Sorry for the late reply, was trying different stuff on my code. And from what I observed, its very weird for me. So after experimentation, I found out that when I increase the number of centroids, the number of data points forwarded decreases, when I lower the umber of centroids, the

Re: Data point goes missing within iteration

2016-07-06 Thread Biplob Biswas
Thanks a lot, would really appreciate it. Also. please let me know if you don't understand it well, the documentation is not really great at the moment in the code. -- View this message in context:

Re: Data point goes missing within iteration

2016-07-04 Thread Biplob Biswas
I have sent you my code in a separate email, I hope you can solve my issue. Thanks a lot Biplob -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p7798.html Sent from the Apache Flink User

Re: Data point goes missing within iteration

2016-07-04 Thread Biplob Biswas
Can anyone check this once, and help me out with this? I would be really obliged. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p7795.html Sent from the Apache Flink User Mailing List

Data point goes missing within iteration

2016-07-03 Thread Biplob Biswas
total number of datapoints. I don't really know what is happening here exactly, why would the number of data points reduce like that suddenly? Thanks and Regards Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-mi

Re: Way to hold execution of one of the map operator in Co-FlatMaps

2016-06-27 Thread Biplob Biswas
and not as fast as the first one. Thanks for replying though, will try that out. Regards Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Way-to-hold-execution-of-one-of-the-map-operator-in-Co-FlatMaps-tp7689p7699.html Sent from

Way to hold execution of one of the map operator in Co-FlatMaps

2016-06-26 Thread Biplob Biswas
Hi, I was wondering whether it is possible to stop streaming data in from one of the map operators until some data arrives in the second map operator. For ex, if i have ds1.connect(ds2).map(new coflatmapper()) then, i need data to stop flowing from ds1 until some data arrives in ds2. Is

Keeping latest data point in a data stream variable

2016-06-21 Thread Biplob Biswas
Hi, I want to keep the latest data point which is processed in a datastream variable. So technically I need just one value in the variable and discard all the older ones. Can this be done somehow? I was thinking about using filters but i don't think i can use it for this scenario. Any ideas as

Re: Data Source Generator emits 4 instances of the same tuple

2016-06-09 Thread Biplob Biswas
Yes Thanks a lot, also the fact that I was using ParallelSourceFunction was problematic. So as suggested by Fabian and Robert, I used Source Function and then in the flink job, i set the output of map with a parallelism of 4 to get the desired result. Thanks again. -- View this message in

Re: Extracting Timestamp in MapFunction

2016-06-09 Thread Biplob Biswas
Hi Aljoscha, I went to the Flink hackathon by Buzzwords yesterday where Fabian and Robert helped me with this issue. Apparently I was assuming that the file would be handled in a single thread but I was using parallelsourcefunction and it was creating 4 different threads and thus reading the same

Re: Data Source Generator emits 4 instances of the same tuple

2016-06-06 Thread Biplob Biswas
://pastebin.com/NenvXShH I hope to get a solution out of it. Thanks and Regards Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392p7405.html Sent from the Apache

Data Source Generator emits 4 instances of the same tuple

2016-06-06 Thread Biplob Biswas
and Regards Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Extracting Timestamp in MapFunction

2016-06-01 Thread Biplob Biswas
Hi, Before giving the method u described above a try, i tried adding the timestamp with my data directly at the stream source. Following is my stream source: http://pastebin.com/AsXiStMC and I am using the stream source as follows: DataStream tuples = env.addSource(new

Extracting Timestamp in MapFunction

2016-05-29 Thread Biplob Biswas
well. Any help is much appreciated. Thanks a lot. Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Extracting-Timestamp-in-MapFunction-tp7240.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Reading Parameter values sent to partition

2016-05-29 Thread Biplob Biswas
Aah, thanks a lot for that insight. Pretty new to the Flink systems and learning on my own so prone to making mistakes. Thanks a lot for helping. -- View this message in context:

Reading Parameter values sent to partition

2016-05-28 Thread Biplob Biswas
Hi, I am trying to send some static integer values down to each map function, using the following code public static void main(String[] args) throws Exception { ParameterTool params = ParameterTool.fromArgs(args); String

Re: Connect 2 datastreams and iterate

2016-05-24 Thread Biplob Biswas
and also, "Incorrect number of arguments for type IterativeStream.ConnectedIterativeStreams; it cannot be parameterized with arguments <Long[], Long, Long[]>" What a I doing wrong here? Thanks and Regards Biplob Biswas Ufuk Celebi wrote > Please provide the error message a

Connect 2 datastreams and iterate

2016-05-19 Thread Biplob Biswas
Hi, Is there a way to connect 2 datastreams and iterate and then get the feedback as a third stream? I tried doing mergedDS = datastream1.connect(datastream2) iterateDS = mergedDS.iterate().withFeedback(datastream3type.class) but this didnt work and it was showing me errors. Is there any

Re: closewith(...) not working in DataStream error, but works in DataSet

2016-05-17 Thread Biplob Biswas
I am also a newbie but from what i experienced during my experiments is that ...The same implementation doesnt work for the streaming context because 1) In streaming context the stream is assumed to be infinite so the process of iteration is also infinite and the part with which you close your

Re: Regarding Broadcast of datasets in streaming context

2016-05-15 Thread Biplob Biswas
Hi, i read that article already but it is very simplistic and thus based on that article and other examples, i was trying to understand how my centroids can be sent to all the partitions and update accordingly. I also understood that the order of the input and the feedback stream cant be

Re: Regarding Broadcast of datasets in streaming context

2016-05-15 Thread Biplob Biswas
tml> Because its not working the way i am expecting it to be and the inputstream is completely consumed before anything is sent back and iterated. Could you please send me to a proper direction and help me in understanding the things properly? Thanks and Regards Biplob Biswas -- View this m

Re: Unexpected behaviour in datastream.broadcast()

2016-05-14 Thread Biplob Biswas
Can anyone help me understand how the out.collect() and the corresponding broadcast) is working? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unexpected-behaviour-in-datastream-broadcast-tp6848p6925.html Sent from the Apache Flink User

Unexpected behaviour in datastream.broadcast()

2016-05-12 Thread Biplob Biswas
Hi, I am running this following sample code to understand how iteration and broadcast works in streaming context. final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(4); long i = 5;

Re: Regarding Broadcast of datasets in streaming context

2016-05-11 Thread Biplob Biswas
Hi Gyula, I tried doing something like the following in the 2 flatmaps, but i am not getting desired results and still confused how the concept you put forward would work: public static final class MyCoFlatmap implements CoFlatMapFunction{

Regarding source Parallelism and usage of coflatmap transformation

2016-05-05 Thread Biplob Biswas
t());/ but i can't see the centroids already broadcasted by centroidStream.broadcast() in the map functions. Any kind of help is hugely appreciated. Thanks and Regards Biplob Biswas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-source-Par

Re: Regarding Broadcast of datasets in streaming context

2016-05-05 Thread Biplob Biswas
for your help throughout. Regards Biplob Biswas Gyula Fóra wrote > Hi, > > Iterating after every incoming point/centroid update means that you > basically defeat the purpose of having parallelism in your Flink job. > > If you only "sync" the centroids periodically by

Re: Regarding Broadcast of datasets in streaming context

2016-05-02 Thread Biplob Biswas
that this operation is not done in >> parallel as if streams are sent in parallel how would I ensure correct >> update of the centroids as multiple points can try to update the same >> centroid in parallel . >> >> I hope I made myself clear with this. >> >>

Re: Regarding Broadcast of datasets in streaming context

2016-05-02 Thread Biplob Biswas
as if streams are sent in parallel how would I ensure correct update of the centroids as multiple points can try to update the same centroid in parallel . I hope I made myself clear with this. Thanks and Regards Biplob Biplob Biswas wrote > Hi Gyula, > > I read your workaround and starte

  1   2   >