[jira] [Created] (FLINK-5873) KeyedStream: expose an user defined function for key group assignment

2017-02-21 Thread Ovidiu Marcu (JIRA)
Ovidiu Marcu created FLINK-5873:
---

 Summary: KeyedStream: expose an user defined function for key 
group assignment
 Key: FLINK-5873
 URL: https://issues.apache.org/jira/browse/FLINK-5873
 Project: Flink
  Issue Type: New Feature
  Components: DataStream API
Reporter: Ovidiu Marcu
Priority: Minor


http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/KeyGroupRangeAssignment-td16041.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-4815) Automatic fallback to earlier checkpoints when checkpoint restore fails

2016-10-13 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571398#comment-15571398
 ] 

Ovidiu Marcu commented on FLINK-4815:
-

hi
I am currently working on a very similar problem.
I am interested in any documentation that should describe details of what is 
currently done in the framework.
Any other discussion on these issues will be on the dev/user lists? 
I would like to join, thanks.
Ovidiu

> Automatic fallback to earlier checkpoints when checkpoint restore fails
> ---
>
> Key: FLINK-4815
> URL: https://issues.apache.org/jira/browse/FLINK-4815
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Reporter: Stephan Ewen
>
> Flink should keep multiple completed checkpoints.
> When the restore of one completed checkpoint fails for a certain number of 
> times, the CheckpointCoordinator should fall back to an earlier checkpoint to 
> restore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1404) Add support to cache intermediate results

2016-04-09 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233507#comment-15233507
 ] 

Ovidiu Marcu commented on FLINK-1404:
-

Thanks, I am trying to understand the engine including its runtime but I don't 
'see' what kind of abstractions you have arrived to. Runtime intermediate 
results are described in so many ways in Flink, but they rely on network 
buffers also..


> Add support to cache intermediate results
> -
>
> Key: FLINK-1404
> URL: https://issues.apache.org/jira/browse/FLINK-1404
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
>
> With blocking intermediate results (FLINK-1350) and proper partition state 
> management (FLINK-1359) it is necessary to allow the network buffer pool to 
> request eviction of historic intermediate results when not enough buffers are 
> available. With the currently available pipelined intermediate partitions 
> this is not an issue, because buffer pools can be released as soon as a 
> partition is consumed.
> We need to be able to trigger the recycling of buffers held by historic 
> intermediate results when not enough buffers are available for new local 
> pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1404) Add support to cache intermediate results

2016-04-08 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232639#comment-15232639
 ] 

Ovidiu Marcu commented on FLINK-1404:
-

hi, is this related to the 'persistent-blocking-no back pressure intermediate 
result partition variant' specified in https://github.com/apache/flink/pull/254 
?

> Add support to cache intermediate results
> -
>
> Key: FLINK-1404
> URL: https://issues.apache.org/jira/browse/FLINK-1404
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
>
> With blocking intermediate results (FLINK-1350) and proper partition state 
> management (FLINK-1359) it is necessary to allow the network buffer pool to 
> request eviction of historic intermediate results when not enough buffers are 
> available. With the currently available pipelined intermediate partitions 
> this is not an issue, because buffer pools can be released as soon as a 
> partition is consumed.
> We need to be able to trigger the recycling of buffers held by historic 
> intermediate results when not enough buffers are available for new local 
> pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2099) Add a SQL API

2015-12-09 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048601#comment-15048601
 ] 

Ovidiu Marcu commented on FLINK-2099:
-

Thanks, I will look into FLINK-3152.
Can you please make an umbrella task to link all SQL jira issues currently 
opened?
I am also investigating ways to use the benchmark TPC-DS. This will raise other 
missing features probably.

Regarding Table API this is all the docs I can find 
https://ci.apache.org/projects/flink/flink-docs-release-0.10/libs/table.html , 
maybe there is something else I'm missing and you can point out?

Thanks.

> Add a SQL API
> -
>
> Key: FLINK-2099
> URL: https://issues.apache.org/jira/browse/FLINK-2099
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> From the mailing list:
> Fabian: Flink's Table API is pretty close to what SQL provides. IMO, the best
> approach would be to leverage that and build a SQL parser (maybe together
> with a logical optimizer) on top of the Table API. Parser (and optimizer)
> could be built using Apache Calcite which is providing exactly this.
> Since the Table API is still a fairly new component and not very feature
> rich, it might make sense to extend and strengthen it before putting
> something major on top.
> Ted: It would also be relatively simple (I think) to retarget drill to Flink 
> if
> Flink doesn't provide enough typing meta-data to do traditional SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2099) Add a SQL API

2015-12-08 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046707#comment-15046707
 ] 

Ovidiu Marcu commented on FLINK-2099:
-

Hi Timo,
Thank you for your detailed response.
For the benchmark you want to use, I am not sure if TPC-H is the best to 
follow, as the new benchmark TPC-DS (next generation decision support) is 
developed to handle the deficiencies of TPC-H (reference here 
http://www.tpc.org/tpcds/presentations/The_Making_of_TPCDS.pdf)
I would like to help/contribute, I will look into your branch and come back to 
you.

I think this is one the most important features of Flink, I don't know why 
Flink guys don't want to increase its priority and give more attention.

Best regards,
Ovidiu

> Add a SQL API
> -
>
> Key: FLINK-2099
> URL: https://issues.apache.org/jira/browse/FLINK-2099
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> From the mailing list:
> Fabian: Flink's Table API is pretty close to what SQL provides. IMO, the best
> approach would be to leverage that and build a SQL parser (maybe together
> with a logical optimizer) on top of the Table API. Parser (and optimizer)
> could be built using Apache Calcite which is providing exactly this.
> Since the Table API is still a fairly new component and not very feature
> rich, it might make sense to extend and strengthen it before putting
> something major on top.
> Ted: It would also be relatively simple (I think) to retarget drill to Flink 
> if
> Flink doesn't provide enough typing meta-data to do traditional SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2099) Add a SQL API

2015-12-07 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045170#comment-15045170
 ] 

Ovidiu Marcu commented on FLINK-2099:
-

Hi Timo,
I would be interested in the SQL prototype. 
What are your objectives for this presentation? Will it be available soon?
This is a very important feature, can you give more details about it?
Thank you!

> Add a SQL API
> -
>
> Key: FLINK-2099
> URL: https://issues.apache.org/jira/browse/FLINK-2099
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> From the mailing list:
> Fabian: Flink's Table API is pretty close to what SQL provides. IMO, the best
> approach would be to leverage that and build a SQL parser (maybe together
> with a logical optimizer) on top of the Table API. Parser (and optimizer)
> could be built using Apache Calcite which is providing exactly this.
> Since the Table API is still a fairly new component and not very feature
> rich, it might make sense to extend and strengthen it before putting
> something major on top.
> Ted: It would also be relatively simple (I think) to retarget drill to Flink 
> if
> Flink doesn't provide enough typing meta-data to do traditional SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-12-06 Thread Ovidiu Marcu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044136#comment-15044136
 ] 

Ovidiu Marcu commented on FLINK-1731:
-

Will you consider this issue within the next release?

> Add kMeans clustering algorithm to machine learning library
> ---
>
> Key: FLINK-1731
> URL: https://issues.apache.org/jira/browse/FLINK-1731
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Peter Schrott
>  Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not 
> yet ported to the machine learning library. I assume that only the used data 
> types have to be adapted and then it can be more or less directly moved to 
> flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
> implementation because the improve the initial seeding phase to achieve near 
> optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)