[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-06-24 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347835#comment-15347835
 ] 

Simone Robutti commented on FLINK-1873:
---

I opened a PR for FLINK-3920 with the rest of my work.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-06-03 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313938#comment-15313938
 ] 

Simone Robutti commented on FLINK-1873:
---

That would be perfect :)

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-06-03 Thread Chiwan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313915#comment-15313915
 ] 

Chiwan Park commented on FLINK-1873:


I think we don't need to hurry, but I'll review the first PR and merge it in 5 
hours if there is no more problem.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-06-03 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313903#comment-15313903
 ] 

Simone Robutti commented on FLINK-1873:
---

Ok. Anyway tomorrow will be my first day of holiday and from wednesday I won't 
have continous access to the internet for 2 weeks. I hope to get the first PR 
merged before that day so that I could submit the second PR. Otherwise, for 
trivial corrections to the first PR, I will hand over to a colleague of mine 
for the 2 weeks I'm missing.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-06-02 Thread Chiwan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313585#comment-15313585
 ] 

Chiwan Park commented on FLINK-1873:


I think we can use this issue (FLINK-1873) as finalizing issue including the 
documentation. 

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-06-01 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309972#comment-15309972
 ] 

Simone Robutti commented on FLINK-1873:
---

I thought I could open a dedicated issue for the documentation. Is it ok?

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-25 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1527#comment-1527
 ] 

Simone Robutti commented on FLINK-1873:
---

I know there's a lot of backlog with all the PRs on the ML part but I would 
like to know if there's a chance to get this PR reviewed in the following 
weeks. If not it's not a problem (even if I've already developed some stuff 
depending on this contribution and it would be nice to have it included). It's 
just to better schedule my work and avoid taking more issues if there's work to 
do on these PRs.

Thank you.


> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-18 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288605#comment-15288605
 ] 

Till Rohrmann commented on FLINK-1873:
--

I fear that you have to make a fake commit to trigger a new execution. The 
commits will be anyway squashed when the PR is merged.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-18 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288533#comment-15288533
 ] 

Simone Robutti commented on FLINK-1873:
---

[~till.rohrmann] I have opened two issues and a PR for one of them. Travis 
failed because apache nexus was down and I would like to know how to trigger 
the build again without making a fake commit, if it's possible.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286948#comment-15286948
 ] 

Suneel Marthi commented on FLINK-1873:
--

u r more than welcome to post on dev@mahout and also dev@flink. We are 
subscribed to both the lists.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286945#comment-15286945
 ] 

Simone Robutti commented on FLINK-1873:
---

Sorry Trevor, I was talking about Mahout's mailing list, not Flink's one. I 
don't know their etiquette and I don't know if this would be considered a 
nuisance in the dev ML. Anyway I will go for it anyway, it looks like a better 
option than the user ML.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Trevor Grant (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286900#comment-15286900
 ] 

Trevor Grant commented on FLINK-1873:
-

It's ok and preferred to ask on dev.  No reason for confidentiality here. 


> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286888#comment-15286888
 ] 

Simone Robutti commented on FLINK-1873:
---

Is it ok to ask for this on the dev mailing list? Or do you have a contact I 
can engage in private?

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286725#comment-15286725
 ] 

Suneel Marthi commented on FLINK-1873:
--

I would suggest that u talk this out with the Mahout community, we have faced 
issues with Matrix Multiplications for DRMs using Flink Dataset Api. Adding 
Andrew Palumbo and Dmitriy Lyubimov to this watch list.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286562#comment-15286562
 ] 

Simone Robutti commented on FLINK-1873:
---

Yes and I know your work. I watched your design before working on this 
contribution. Samsara is clearly a mature option to do the same things I've 
implemented but the purpose of Flink-ML, to my knowledge, is to offer 
algorithms out-of-the-box and so most of its algorithms are reimplementations 
already available through other libraries or that could be available more 
easily with an integration with other ML engines like H2O. Also, in this 
specific case, these structures and operations are functional to implement 
other algorithms based on matrix operations.
 

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286532#comment-15286532
 ] 

Suneel Marthi commented on FLINK-1873:
--

FWIW, this seems to be pretty much what's already been done on the Mahout 
project with implementing our DRMs using Flink DataSets. We do support 
Row-based and block-based matrices. Feel free to reach out to us.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286214#comment-15286214
 ] 

Simone Robutti commented on FLINK-1873:
---

Ok, I just need to split the stuff I did and I can open it. The stuff I did can 
be split in two but not more: I would like to keep conversions between formats 
for its own PR but actually you can't directly build block-based matrices in a 
clean way but you have to convert from a row distributed matrix. So I think I 
will go for a row-based matrix PR and then the rest.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-17 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286201#comment-15286201
 ] 

Till Rohrmann commented on FLINK-1873:
--

Great to hear that you've made good progress with the implementation :-) I 
think the feature set is perfectly fine. We can always extend it when we see 
the need for it. 

I agree with Chiwan that the smaller the PRs the easier they are to review. I 
don't know how much they are interconnected but if it's possible (without too 
much work required) it would be good if we could start with the row-based 
distribution. A neat side effect is that you're gonna get more contributions 
that way :-) So whenever you think the code is in good shape you can simply 
open a PR.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-16 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284278#comment-15284278
 ] 

Simone Robutti commented on FLINK-1873:
---

The tasks are not independent so I would go for sub-tasks if necessary but I 
will adapt to what you consider better. Just let me know. 

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-13 Thread Chiwan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283386#comment-15283386
 ] 

Chiwan Park commented on FLINK-1873:


[~chobeat] [~till.rohrmann] How about split this issue to several issues? For 
example, a issue that covers row-based matrix implementation and the other 
issue that covers block-based matrix implementation. This approach makes review 
and tracking this issue easy.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-13 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282684#comment-15282684
 ] 

Simone Robutti commented on FLINK-1873:
---

Hello Till,

I worked full time on this issue this week and I almost have a draft for a PR.

I would like to submit it with the following features:

2 matrix formats:
*row-based distribution*
*block-based distribution*

*conversion from block-based to row-based*
*conversion from row-based to block-based*

Operations on block-based matrices:
*per-block operations on two matrices
*sum*
*sub*
*multiplication*

Row-based builders:

*from COO*

Row-based collectors

*local SparseMatrix*
*local DenseMatrix*
*local Seq of COO entries*

There are many basic features that are actually simpler than the one I already 
implemented and many others that may have a rather high priority (SVD?) but 
before proceeding I would like to receive a review on what is already done to 
stabilize the structures I'm working on. Also this is my first open source 
contribution so I would receive a validation on the technical and stylistical 
aspects to avoid the same errors on the work yet to be done.

If you think there are other core features to consider for this first 
iteration, please let me know. Otherwise I plan to open a PR next week. 


> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-09 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276091#comment-15276091
 ] 

Till Rohrmann commented on FLINK-1873:
--

I wouldn't like to bake in the dependency on Breeze, because there are also 
other math libraries out there which could be used (e.g. BIDMat). Ideally these 
things are swappable so that the user can choose which math backend suits his 
needs the best. For the beginning we've only included Breeze but adding support 
for different math libraries should not be a problem.

To make a long story short, I think it would be best if we could implement the 
matrix representation math-backend agnostic. Whenever we have to perform a math 
operation, we can then choose one math backend (right now there is only one 
choice) for it.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274265#comment-15274265
 ] 

Simone Robutti commented on FLINK-1873:
---

Umh ok. I think that on block-partitioned matrix I will need to perform 
block-wise operations so I think it makes sense to represent the blocks as 
Breeze matrices. 

Anyway, talking about Flink's implementations, why were they implemented in the 
first place if we must rely on Breeze for operations?

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274229#comment-15274229
 ] 

Till Rohrmann commented on FLINK-1873:
--

I think it's good to use Flink's matrix and vector representations as long as 
you don't have to perform operations. For that you can convert Flink's 
primitives into Breeze's primitives. The conversion should almost come for 
free, because Flink uses the same underlying representation of dense and sparse 
matrices/vectors as Breeze. 

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274177#comment-15274177
 ] 

Simone Robutti commented on FLINK-1873:
---

I began working right away on this issue. 

For now I'm focusing on an indexed row matrix format but I will probably 
implement a partitioned format with the same operations to perform some 
operations in a more straightforward way. I will write conversions from one 
format to the other.

For now I'm just initializing the distributed data structure and writing 
conversions to local formats (COO, Sparse, Dense). I'm doing everything with 
the standards of the local linear algebra package (indices as Int, values as 
Doubles, same names for methods and so on). Also I'm working with Flink's 
implementations of all these classes. Is it ok or should I go directly to 
Breeze's implementations?

Then I will start thinking about common operations (multiplication, dot 
product, svd (?), ATA and so on).

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273867#comment-15273867
 ] 

Fabian Hueske commented on FLINK-1873:
--

[~chobeat], I gave you contributor permissions for JIRA as well. You can now 
assign issues to yourself. 

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273854#comment-15273854
 ] 

Simone Robutti commented on FLINK-1873:
---

Sure. I will have to study a bit to do it properly but I was already going to 
do something like that for an algorithm I'm implementing (MinHash).

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: liaoyuxi
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273826#comment-15273826
 ] 

Fabian Hueske commented on FLINK-1873:
--

I agree, looks like the issue is abandoned. 
[~chobeat], if you want to work on this issue, I can assign it to you.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: liaoyuxi
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-06 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273770#comment-15273770
 ] 

Simone Robutti commented on FLINK-1873:
---

This is issue has been dead for one year. What about reassigning it? I think it 
would help implement many algorithms and right now people needs to implement 
their distributed operation on matrices everytime.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: liaoyuxi
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2015-04-17 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499594#comment-14499594
 ] 

Till Rohrmann commented on FLINK-1873:
--

+1 for distributed matrices :-)

 Distributed matrix implementation
 -

 Key: FLINK-1873
 URL: https://issues.apache.org/jira/browse/FLINK-1873
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: liaoyuxi
Assignee: liaoyuxi
  Labels: ML

 It would help to implement machine learning algorithm more quickly and 
 concise if Flink would provide support for storing data and computation in 
 distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2015-04-13 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492009#comment-14492009
 ] 

Robert Metzger commented on FLINK-1873:
---

I gave you Contributor permission in our JIRA. You should now be able to 
attach files.
I assigned the issue to you.

 Distributed matrix implementation
 -

 Key: FLINK-1873
 URL: https://issues.apache.org/jira/browse/FLINK-1873
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: liaoyuxi

 It would help to implement machine learning algorithm more quickly and 
 concise if Flink would provide support for storing data and computation in 
 distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)