Hi
I was looking at the new full outer join. This seems to be working fine for
my use case however I have a question regarding the state size.
I have 2 streams each will have 100's of million unique keys. Also, Each of
these will get the updated value of keys 100's of times per day.
As per my un
I am trying to submit a job with the savepoint/checkpoint and it is failing
with below error. Without -s flag it works fine. Am i missing something
here?
Thanks
>bin/flink run -d -c st -s file:///tmp/db/checkpoint/
./target/poc-1.0-SNAPSHOT-jar-with-dependencies.jar
Starting execution of progr
Hi,
I am implementing a source and I want to use checkpointing and would like
to restore the job from these external checkpoints. I used Kafka for my
tests and it worked fine.
However, I would like to know if I have my own source what do I need to do.
I am sure that I will need to implement Check
Hi Guys,
Is there a limit on number of files flink dataset can read? My question is
will there be any sort of issues if I have say millions of files to read to
create single dataset.
Thanks
urthermore if you have them stored on HDFS then the bottleneck is the
>> namenode which will have to answer millions of requests.
>> The latter point will change in future Hadoop versions with
>> http://ozone.hadoop.apache.org/
>>
>> On 13. Aug 2018, at 21:01, Darshan S
0 Jörn Franke :
>
>> It causes more overhead (processes etc) which might make it slower.
>> Furthermore if you have them stored on HDFS then the bottleneck is the
>> namenode which will have to answer millions of requests.
>> The latter point will change in future Hadoop ve
Hi,
I am using a map function on a data stream which has 1 column i.e. a json
string. Map function simply uses Jackson mapper and convert the String to
ObjectNode and also assign key based on one of the value in Object node.
The code seems to work fine for 2-3 minutes as expected and then suddenl
ve been a corrupted message in your
> decoding, for example a malformed JSON string.
>
> --
> Rong
>
> [1] https://issues.apache.org/jira/browse/FLINK-8836
>
> On Wed, Aug 22, 2018 at 8:41 AM Darshan Singh
> wrote:
>
>> Hi,
>>
>> I am using a map function
I faced the issue with back pressure in streams. I was wondering if we
could face the same with the batches as well.
In theory it should be possible. But in Web UI for backpressure tab for
batches I was seeing that it was just showing the tasks status and no
status like "OK" etc.
So I was wonderi
tream
> finishes, so the slower downstream will not block upstream running, then
> the backpressure may not exist in this case.
>
> Best,
> Zhijiang
>
> ------
> 发件人:Darshan Singh
> 发送时间:2018年8月29日(星期三) 16:20
> 收件人
record was emitted by
> the previous operator. As such back-pressure exists in batch just like in
> streaming.
>
> On 29.08.2018 11:39, Darshan Singh wrote:
>
> Thanks,
>
> My job is simple. I am using table Api
> 1. Read from hdfs
> 2. Deserialize json to pojo and
fast
> node(source in your case).
>
> Best,
> Zhijiang
>
> ------
> 发件人:Darshan Singh
> 发送时间:2018年8月29日(星期三) 18:16
> 收件人:chesnay
> 抄 送:wangzhijiang999 ; user <
> user@flink.apache.org>
> 主 题:Re: Bac
Hi All,
I was playing with queryable state. As queryable stream can not be modified
how do I use the output of say my reduce function for further processing.
Below is 1 example. I am sure I have done it wrong :). I am using reduce
function twice or do I need to use rich reduce function and use th
Hi All,
I was playing with queryable state. As queryable stream can not be modified
how do I use the output of say my reduce function for further processing.
Below is 1 example. I can see that I am processing the data twice. One for
the Queryable stream and once for the my original stream. That m
Hi,
I am writing a stateful source very similar to KafkaBaseConsumer but not as
generic. I was looking on how we can use unit test cases and integration
tests on this. I looked at the kafka-connector-based unit test cases. It
seems that there is too much external things at play here like lots of
Hi ,
I am creating a new custom source for reading some streaming data which has
different streams. So I assign streams to each task slots and then read it.
This works fine but in some cases I have less streams than task slots and
in that case some of workers are not assigned any streams and these
Hi I would like to understand the execution model.
1. I have a csv files which is say 10 GB.
2. I created a table from this file.
3. Now I have created filtered tables on this say 10 of these.
4. Now I created a writetosink for all these 10 filtered tables.
Now my question is that are these 10 f
ad them out again.
>
> Don't ask *me* about what happens in failure scenarios... I have myself
> not figured that out yet.
>
> HTH
> Niclas
>
> On Mon, Feb 19, 2018 at 3:11 AM, Darshan Singh
> wrote:
>
>> Hi I would like to understand the execution mode
s that the SQL optimizer cannot translate queries
with multiple sinks.
Instead, each sink is individually translated and the optimizer does not
know that common execution paths could be shared.
Best,
Fabian
2018-02-19 2:19 GMT+01:00 Darshan Singh :
> Thanks for reply.
>
> I guess I am no
elp to diagnose such problems.
>
> Best, Fabian
>
> 2018-02-19 11:22 GMT+01:00 Darshan Singh :
>
>> Thanks Fabian for such detailed explanation.
>>
>> I am using a datset in between so i guess csv is read once. Now to my
>> real issue i have 6 task managers ea
:34 AM, Fabian Hueske wrote:
> No, there is no size or cardinality estimation happening at the moment.
>
> Best, Fabian
>
> 2018-02-19 21:56 GMT+01:00 Darshan Singh :
>
>> Thanks , is there a metric or other way to know how much space each
>> task/job is taking?
Hi
I have a dataset which has almost 99% of correct data. As of now if say
some data is bad I just ignore it and log it and return only correct data.
I do this inside a map function.
The part which decides whether data is correct or not is expensive one.
Now I want to store the bad data somewher
Hi,
I am not able to find what is best way to query the output of a scalar
table function.
Suppose I have table which has column col1 which is string.
I have a scalar function and returns a POJO
{col1V1 String, col1V2 String , col1V3 String}.
I am using following.
so table.select("sf(col1) as
and see if it fits you).
>
> Thanks,
> Kostas
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/dev/stream/side_output.html
>
>
> On Mar 29, 2018, at 8:53 PM, Darshan Singh wrote:
>
> Hi
>
> I have a dataset which has almost 99% of correct d
plan into two parts which are independently optimized
> and hence the select() operators would not be merged.
>
> Best, Fabian
>
>
>
> 2018-03-30 22:07 GMT+02:00 Darshan Singh :
>
>> Hi,
>>
>> I am not able to find what is best way to query the output of a sc
Hi,
I would like to use accumulators with table /scalar functions. However, I
am not able to figure out how to get the runtime context from inside of
scalar function open method.
Only thing i see is function context which can not provide the runtime
context.
I tried using AbstractRichFunction.get
abian
>
> 2018-04-04 21:31 GMT+02:00 Darshan Singh :
>
>> Hi,
>>
>> I would like to use accumulators with table /scalar functions. However, I
>> am not able to figure out how to get the runtime context from inside of
>> scalar function open method.
>> On
once job is done. The number of these strings is
very small. I thought I could create a custom accumulator which will work
more like list and in the end I will get all values.
I guess I will need to keep trying something.
Thanks
On Wed, Apr 4, 2018 at 9:01 PM, Darshan Singh
wrote:
> Thanks
Hi
I have a table and I want to rebalance the data so that each partition is
equal. I cna convert to dataset and rebalance and then convert to table.
I couldnt find any rebalance on table api. Does anyone know any better idea
to rebalance table data?
Thanks
Hi
Is there any useful metrics in flink which tells me that a given operator
read say 1 GB of data and shuffled(or anything else) and written(in case it
was written to temp or anywhere else) say 1 or 2 GB data.
One of my job is failing with disk space and there are many sort, group and
join is ha
e nice to have as well.
>
> Michael
>
> > On Apr 13, 2018, at 6:37 AM, Darshan Singh
> wrote:
> >
> > Hi
> >
> > Is there any useful metrics in flink which tells me that a given
> operator read say 1 GB of data and shuffled(or anything else) and
> writt
31 matches
Mail list logo