from:"Darshan Singh"

Question regarding State in full outer join

2018-07-23 Thread Darshan Singh

Hi I was looking at the new full outer join. This seems to be working fine for my use case however I have a question regarding the state size. I have 2 streams each will have 100's of million unique keys. Also, Each of these will get the updated value of keys 100's of times per day. As per my un

flink command line with save point job is not being submitted

2018-07-30 Thread Darshan Singh

I am trying to submit a job with the savepoint/checkpoint and it is failing with below error. Without -s flag it works fine. Am i missing something here? Thanks >bin/flink run -d -c st -s file:///tmp/db/checkpoint/ ./target/poc-1.0-SNAPSHOT-jar-with-dependencies.jar Starting execution of progr

Introduce Barriers in stream source

2018-08-13 Thread Darshan Singh

Hi, I am implementing a source and I want to use checkpointing and would like to restore the job from these external checkpoints. I used Kafka for my tests and it worked fine. However, I would like to know if I have my own source what do I need to do. I am sure that I will need to implement Check

Limit on number of files to read for Dataset

2018-08-13 Thread Darshan Singh

Hi Guys, Is there a limit on number of files flink dataset can read? My question is will there be any sort of issues if I have say millions of files to read to create single dataset. Thanks

Re: Limit on number of files to read for Dataset

2018-08-14 Thread Darshan Singh

urthermore if you have them stored on HDFS then the bottleneck is the >> namenode which will have to answer millions of requests. >> The latter point will change in future Hadoop versions with >> http://ozone.hadoop.apache.org/ >> >> On 13. Aug 2018, at 21:01, Darshan S

Re: Limit on number of files to read for Dataset

2018-08-14 Thread Darshan Singh

0 Jörn Franke : > >> It causes more overhead (processes etc) which might make it slower. >> Furthermore if you have them stored on HDFS then the bottleneck is the >> namenode which will have to answer millions of requests. >> The latter point will change in future Hadoop ve

Kryo Serialization Issue

2018-08-22 Thread Darshan Singh

Hi, I am using a map function on a data stream which has 1 column i.e. a json string. Map function simply uses Jackson mapper and convert the String to ObjectNode and also assign key based on one of the value in Object node. The code seems to work fine for 2-3 minutes as expected and then suddenl

Re: Kryo Serialization Issue

2018-08-27 Thread Darshan Singh

ve been a corrupted message in your > decoding, for example a malformed JSON string. > > -- > Rong > > [1] https://issues.apache.org/jira/browse/FLINK-8836 > > On Wed, Aug 22, 2018 at 8:41 AM Darshan Singh > wrote: > >> Hi, >> >> I am using a map function

Backpressure? for Batches

2018-08-29 Thread Darshan Singh

I faced the issue with back pressure in streams. I was wondering if we could face the same with the batches as well. In theory it should be possible. But in Web UI for backpressure tab for batches I was seeing that it was just showing the tasks status and no status like "OK" etc. So I was wonderi

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh

tream > finishes, so the slower downstream will not block upstream running, then > the backpressure may not exist in this case. > > Best, > Zhijiang > > ------ > 发件人：Darshan Singh > 发送时间：2018年8月29日(星期三) 16:20 > 收件人

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh

record was emitted by > the previous operator. As such back-pressure exists in batch just like in > streaming. > > On 29.08.2018 11:39, Darshan Singh wrote: > > Thanks, > > My job is simple. I am using table Api > 1. Read from hdfs > 2. Deserialize json to pojo and

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh

fast > node(source in your case). > > Best, > Zhijiang > > ------ > 发件人：Darshan Singh > 发送时间：2018年8月29日(星期三) 18:16 > 收件人：chesnay > 抄送：wangzhijiang999 ; user < > user@flink.apache.org> > 主题：Re: Bac

Queryable State

2018-09-04 Thread Darshan Singh

Hi All, I was playing with queryable state. As queryable stream can not be modified how do I use the output of say my reduce function for further processing. Below is 1 example. I am sure I have done it wrong :). I am using reduce function twice or do I need to use rich reduce function and use th

Queryable State

2018-09-05 Thread Darshan Singh

Hi All, I was playing with queryable state. As queryable stream can not be modified how do I use the output of say my reduce function for further processing. Below is 1 example. I can see that I am processing the data twice. One for the Queryable stream and once for the my original stream. That m

Unit/Integration test for stateful source

2018-09-20 Thread Darshan Singh

Hi, I am writing a stateful source very similar to KafkaBaseConsumer but not as generic. I was looking on how we can use unit test cases and integration tests on this. I looked at the kafka-connector-based unit test cases. It seems that there is too much external things at play here like lots of

flink custom stream source

2018-10-02 Thread Darshan Singh

Hi , I am creating a new custom source for reading some streaming data which has different streams. So I assign streams to each task slots and then read it. This works fine but in some cases I have less streams than task slots and in that case some of workers are not assigned any streams and these

Need to understand the execution model of the Flink

2018-02-18 Thread Darshan Singh

Hi I would like to understand the execution model. 1. I have a csv files which is say 10 GB. 2. I created a table from this file. 3. Now I have created filtered tables on this say 10 of these. 4. Now I created a writetosink for all these 10 filtered tables. Now my question is that are these 10 f

Re: Need to understand the execution model of the Flink

2018-02-18 Thread Darshan Singh

ad them out again. > > Don't ask *me* about what happens in failure scenarios... I have myself > not figured that out yet. > > HTH > Niclas > > On Mon, Feb 19, 2018 at 3:11 AM, Darshan Singh > wrote: > >> Hi I would like to understand the execution mode

Re: Need to understand the execution model of the Flink

2018-02-19 Thread Darshan Singh

s that the SQL optimizer cannot translate queries with multiple sinks. Instead, each sink is individually translated and the optimizer does not know that common execution paths could be shared. Best, Fabian 2018-02-19 2:19 GMT+01:00 Darshan Singh : > Thanks for reply. > > I guess I am no

Re: Need to understand the execution model of the Flink

2018-02-19 Thread Darshan Singh

elp to diagnose such problems. > > Best, Fabian > > 2018-02-19 11:22 GMT+01:00 Darshan Singh : > >> Thanks Fabian for such detailed explanation. >> >> I am using a datset in between so i guess csv is read once. Now to my >> real issue i have 6 task managers ea

Re: Need to understand the execution model of the Flink

2018-02-20 Thread Darshan Singh

:34 AM, Fabian Hueske wrote: > No, there is no size or cardinality estimation happening at the moment. > > Best, Fabian > > 2018-02-19 21:56 GMT+01:00 Darshan Singh : > >> Thanks , is there a metric or other way to know how much space each >> task/job is taking?

bad data output

2018-03-29 Thread Darshan Singh

Hi I have a dataset which has almost 99% of correct data. As of now if say some data is bad I just ignore it and log it and return only correct data. I do this inside a map function. The part which decides whether data is correct or not is expensive one. Now I want to store the bad data somewher

how to query the output of the scalar table function

2018-03-30 Thread Darshan Singh

Hi, I am not able to find what is best way to query the output of a scalar table function. Suppose I have table which has column col1 which is string. I have a scalar function and returns a POJO {col1V1 String, col1V2 String , col1V3 String}. I am using following. so table.select("sf(col1) as

Re: bad data output

2018-04-03 Thread Darshan Singh

and see if it fits you). > > Thanks, > Kostas > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/dev/stream/side_output.html > > > On Mar 29, 2018, at 8:53 PM, Darshan Singh wrote: > > Hi > > I have a dataset which has almost 99% of correct d

Re: how to query the output of the scalar table function

2018-04-04 Thread Darshan Singh

plan into two parts which are independently optimized > and hence the select() operators would not be merged. > > Best, Fabian > > > > 2018-03-30 22:07 GMT+02:00 Darshan Singh : > >> Hi, >> >> I am not able to find what is best way to query the output of a sc

Getting runtime context from scalar and table functions

2018-04-04 Thread Darshan Singh

Hi, I would like to use accumulators with table /scalar functions. However, I am not able to figure out how to get the runtime context from inside of scalar function open method. Only thing i see is function context which can not provide the runtime context. I tried using AbstractRichFunction.get

Re: Getting runtime context from scalar and table functions

2018-04-04 Thread Darshan Singh

abian > > 2018-04-04 21:31 GMT+02:00 Darshan Singh : > >> Hi, >> >> I would like to use accumulators with table /scalar functions. However, I >> am not able to figure out how to get the runtime context from inside of >> scalar function open method. >> On

Re: Getting runtime context from scalar and table functions

2018-04-04 Thread Darshan Singh

once job is done. The number of these strings is very small. I thought I could create a custom accumulator which will work more like list and in the end I will get all values. I guess I will need to keep trying something. Thanks On Wed, Apr 4, 2018 at 9:01 PM, Darshan Singh wrote: > Thanks

How to rebalance a table without converting to dataset

2018-04-13 Thread Darshan Singh

Hi I have a table and I want to rebalance the data so that each partition is equal. I cna convert to dataset and rebalance and then convert to table. I couldnt find any rebalance on table api. Does anyone know any better idea to rebalance table data? Thanks

Any metrics to get the shuffled and intermediate data in flink

2018-04-13 Thread Darshan Singh

Hi Is there any useful metrics in flink which tells me that a given operator read say 1 GB of data and shuffled(or anything else) and written(in case it was written to temp or anywhere else) say 1 or 2 GB data. One of my job is failing with disk space and there are many sort, group and join is ha

Re: Any metrics to get the shuffled and intermediate data in flink

2018-04-13 Thread Darshan Singh

e nice to have as well. > > Michael > > > On Apr 13, 2018, at 6:37 AM, Darshan Singh > wrote: > > > > Hi > > > > Is there any useful metrics in flink which tells me that a given > operator read say 1 GB of data and shuffled(or anything else) and > writt

Question regarding State in full outer join

flink command line with save point job is not being submitted

Introduce Barriers in stream source

Limit on number of files to read for Dataset

Re: Limit on number of files to read for Dataset

Re: Limit on number of files to read for Dataset

Kryo Serialization Issue

Re: Kryo Serialization Issue

Backpressure? for Batches

Re: Backpressure? for Batches

Re: Backpressure? for Batches

Re: Backpressure? for Batches

Queryable State

Queryable State

Unit/Integration test for stateful source

flink custom stream source

Need to understand the execution model of the Flink

Re: Need to understand the execution model of the Flink

Re: Need to understand the execution model of the Flink

Re: Need to understand the execution model of the Flink

Re: Need to understand the execution model of the Flink

bad data output

how to query the output of the scalar table function

Re: bad data output

Re: how to query the output of the scalar table function

Getting runtime context from scalar and table functions

Re: Getting runtime context from scalar and table functions

Re: Getting runtime context from scalar and table functions

How to rebalance a table without converting to dataset

Any metrics to get the shuffled and intermediate data in flink

Re: Any metrics to get the shuffled and intermediate data in flink

31 matches

Site Navigation

Mail list logo

Footer information