Re: Queryable state and TTL

2019-07-06 Thread Avi Levi
Thanks, I'll check it out.

On Sun, Jul 7, 2019 at 5:40 AM Eron Wright  wrote:

> *This Message originated outside your organization.*
> --
> Here's a PR for queryable state TLS that I closed because I didn't have
> time, and because I get the impression that the queryable state feature is
> used very often.Feel free to take it up, if you like.
> https://github.com/apache/flink/pull/6626
> 
>
> -Eron
>
> On Wed, Jul 3, 2019 at 11:21 PM Avi Levi  wrote:
>
>> Hi Yu,
>> Our sink is actually Kafka hence we cannot query it properly, from there
>> we distribute it to different consumers. We keep info in our state such as
>> entry time, some accumulated data etc' , this data is not kept elsewhere
>> hence we need to query our state.
>>
>> Best regards
>> Avi
>>
>>
>> On Thu, Jul 4, 2019 at 7:20 AM Yu Li  wrote:
>>
>>> *This Message originated outside your organization.*
>>> --
>>> Thanks for the ping Andrey.
>>>
>>> For me the general answer is yes, but TBH it will probably not be added
>>> in the foreseeable future due to lack of committer bandwidth (not only
>>> QueryableState with TTL but all about QueryableState module) as per
>>> Aljoscha pointed out in another thread [1].
>>>
>>> Although we could see emerging requirements and proposals on
>>> QueryableState recently, prioritizing is important for each open source
>>> project. And personally I think it may help if we could gather more and
>>> clearly describe the other-than-debugging use cases of QueryableState in
>>> production [2]. Could you share your case with us and why QueryableState is
>>> necessary rather than querying the data from sink @Avi? Thanks.
>>>
>>> [1] https://s.apache.org/MaOl
>>> 
>>> [2] https://s.apache.org/hJDA
>>> 
>>>
>>> Best Regards,
>>> Yu
>>>
>>>
>>> On Wed, 3 Jul 2019 at 23:13, Andrey Zagrebin 
>>> wrote:
>>>
 Hi Avi,

 It is on the road map but I am not aware about plans of any contributor
 to work on it for the next releases.
 I think the community will firstly work on the event time support for
 TTL.
 I will loop Yu in, maybe he has some plans to work on TTL for the
 queryable state.

 Best,
 Andrey

 On Wed, Jul 3, 2019 at 3:17 PM Avi Levi 
 wrote:

> Hi,
> Adding queryable state to state with ttl is not supported at 1.8.0
> (throwing java.lang.IllegalArgumentException: Queryable state is currently
> not supported with TTL)
>
> I saw in previous mailing thread
> that
> it is on the roadmap. Is it still on the roadmap ?
>
> * There is a workaround which is using timers to clear the state, but
> in our case, it means firing billons of timers on daily basis all at the
> same time, which seems no to very efficient and might cause some resources
> issues
>
> Cheers
> Avi
>
>
>


Re: Queryable state and TTL

2019-07-06 Thread Eron Wright
Here's a PR for queryable state TLS that I closed because I didn't have
time, and because I get the impression that the queryable state feature is
used very often.Feel free to take it up, if you like.
https://github.com/apache/flink/pull/6626

-Eron

On Wed, Jul 3, 2019 at 11:21 PM Avi Levi  wrote:

> Hi Yu,
> Our sink is actually Kafka hence we cannot query it properly, from there
> we distribute it to different consumers. We keep info in our state such as
> entry time, some accumulated data etc' , this data is not kept elsewhere
> hence we need to query our state.
>
> Best regards
> Avi
>
>
> On Thu, Jul 4, 2019 at 7:20 AM Yu Li  wrote:
>
>> *This Message originated outside your organization.*
>> --
>> Thanks for the ping Andrey.
>>
>> For me the general answer is yes, but TBH it will probably not be added
>> in the foreseeable future due to lack of committer bandwidth (not only
>> QueryableState with TTL but all about QueryableState module) as per
>> Aljoscha pointed out in another thread [1].
>>
>> Although we could see emerging requirements and proposals on
>> QueryableState recently, prioritizing is important for each open source
>> project. And personally I think it may help if we could gather more and
>> clearly describe the other-than-debugging use cases of QueryableState in
>> production [2]. Could you share your case with us and why QueryableState is
>> necessary rather than querying the data from sink @Avi? Thanks.
>>
>> [1] https://s.apache.org/MaOl
>> [2] https://s.apache.org/hJDA
>>
>> Best Regards,
>> Yu
>>
>>
>> On Wed, 3 Jul 2019 at 23:13, Andrey Zagrebin 
>> wrote:
>>
>>> Hi Avi,
>>>
>>> It is on the road map but I am not aware about plans of any contributor
>>> to work on it for the next releases.
>>> I think the community will firstly work on the event time support for
>>> TTL.
>>> I will loop Yu in, maybe he has some plans to work on TTL for the
>>> queryable state.
>>>
>>> Best,
>>> Andrey
>>>
>>> On Wed, Jul 3, 2019 at 3:17 PM Avi Levi  wrote:
>>>
 Hi,
 Adding queryable state to state with ttl is not supported at 1.8.0
 (throwing java.lang.IllegalArgumentException: Queryable state is currently
 not supported with TTL)

 I saw in previous mailing thread
 that
 it is on the roadmap. Is it still on the roadmap ?

 * There is a workaround which is using timers to clear the state, but
 in our case, it means firing billons of timers on daily basis all at the
 same time, which seems no to very efficient and might cause some resources
 issues

 Cheers
 Avi





Flink best configurations for Production

2019-07-06 Thread Cam Mach
Hello Flink community,

I believe the question below has been already asked, but since I couldn't find 
my answer from internet, I'd love to reach out the community for help. 

We basically want to find out the best configurations for Flink that running on 
Kubernetes to achieve the best performance. Thinks like what are the parameters 
to tun e.g. number of Task Manager? number of task slot? parallelism?  

Our use case:
We have terabyte of data from legacy systems, and want to stream them to cloud. 
Our pipeline is a streaming one which has 2 sources (one from Kinesis, and the 
other from SQL), one operator (that join the two sources by key), and a sink  
We like to enable RocksDb and checkpointing to S3. We're also looking for what 
is the best windowing strategy that can be applied in this scenario?

We would love to achieve at least 100GB/s, assuming resources is not a 
constraints (since we're running Flink on AWS's Kubernetes)

Appreciate if you can help or give us some pointers.

Thanks,
Cam Mach

Cannot write DataSet as csv file

2019-07-06 Thread Soheil Pourbafrani
Hi,

Using the JDBCInputFormat I loaded a DataSet type. When I tried to
save it as CSV file it errors:
java.lang.ClassCastException: org.apache.flink.types.Row cannot be cast to
org.apache.flink.api.java.tuple.Tuple

That's while I can save it as a text file. Here is the code.

DataSet dataset = env.createInput(InputFormat);

dataset.writeAsCsv("table_data");

Is it a bug?