[jira] [Commented] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-05-31 Thread Chandan Misra (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496306#comment-16496306
 ] 

Chandan Misra commented on SPARK-23266:
---

I want to add this feature in any of the coming versions. Kindly let me know 
how this can be done.

> Matrix Inversion on BlockMatrix
> ---
>
> Key: SPARK-23266
> URL: https://issues.apache.org/jira/browse/SPARK-23266
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Chandan Misra
>Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-03-06 Thread Chandan Misra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389061#comment-16389061
 ] 

Chandan Misra commented on SPARK-23266:
---

I have implemented matrix inversion using Spark version 2.2.0. Though the 
implementation can be executed using Spark version 2.0.0 onwards. It would be 
really helpful if the inversion is added in the next Spark version. As already 
mentioned, I have the implementation of the inversion and happy to contribute.

> Matrix Inversion on BlockMatrix
> ---
>
> Key: SPARK-23266
> URL: https://issues.apache.org/jira/browse/SPARK-23266
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Chandan Misra
>Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-02-01 Thread Chandan Misra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348310#comment-16348310
 ] 

Chandan Misra commented on SPARK-23266:
---

*How big is n typically for your use case?*
To give a glimpse of how enormous data is used in Kriging, the following paper 
might interests you
 [http://www.tandfonline.com/doi/full/10.1080/2150704X.2016.1275053]
The number of points here is 650 million and the size is 18 GB. I think the 
inversion of variance-covariance matrix C is impossible if it is considered to 
be processed locally.

*I'm also not clear how common this operation is?*

Kriging is used extensively in many fields like earth science, mining, weather 
prediction, wireless sensor networks, remote sensing applications like filling 
gaps in satellite raster images, creating Digital Elevation Model from LiDAR 
data to name a few and backed by a large number of research papers. There are 
separate R packages which are implemented solely for Kriging, like gstat, geoR 
etc. But these are limited to a single node and fail when a large dataset is 
fed to the system.

Additionally, there have been researches (like 
[this|https://www.spiedigitallibrary.org/journals/Journal-of-Applied-Remote-Sensing/volume-11/issue-1/016011/High-performance-parallel-approaches-for-three-dimensional-light-detection-and/10.1117/1.JRS.11.016011.short?SSO=1])
 going on for parallelizing Kriging in MPI, Hadoop, GPU. One of the teams is 
[GIST at Oak Ridge national 
laboratory|http://web.ornl.gov/sci/gist/res_high_performance.shtml], performing 
geo-computation in HPC setup. I think Spark can easily substitute others for 
its benefits in this regard. Thus, as a core processing component of Kriging, 
matrix inversion is highly relevant and a spark implementation will provide a 
hassle-free solution to a large fraction of the non-computer science 
researchers.

> Matrix Inversion on BlockMatrix
> ---
>
> Key: SPARK-23266
> URL: https://issues.apache.org/jira/browse/SPARK-23266
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Chandan Misra
>Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-01-31 Thread Chandan Misra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346398#comment-16346398
 ] 

Chandan Misra edited comment on SPARK-23266 at 1/31/18 8:05 AM:


[Kriging|https://en.wikipedia.org/wiki/Kriging] is a geostatistical method for 
interpolating known attribute values of scattered data points to predict 
unknown attribute values at some other points. It is based on kriging weight 
calculation which has the equation of the form Cw=D. To get the weight vector, 
we need one matrix inversion and one matrix-vector multiplication. The size of 
C (covariance matrix), w vector and D vector are nxn, nx1 and nx1 respectively, 
where n is the number of interpolating points (input points) and interpolation 
is done for a single output point.

Now, when the output points change, we only require to change matrix D (vector 
for single point), and thus do not have to do the inverse again and again. We 
require matrix-vector multiplication on several nodes which is quick i.e. 
O(n^2) and easy.



was (Author: chandan-misra):
Kriging is a geostatistical method for interpolating known attribute values of 
scattered data points to predict unknown attribute values at some other points. 
It is based on kriging weight calculation which has the equation of the form 
Cw=D. To get the weight vector, we need one matrix inversion and one 
matrix-vector multiplication. The size of C (covariance matrix), w vector and D 
vector are nxn, nx1 and nx1 respectively, where n is the number of 
interpolating points (input points) and interpolation is done for a single 
output point.

Now, when the output points change, we only require to change matrix D (vector 
for single point), and thus do not have to do the inverse again and again. We 
require matrix-vector multiplication on several nodes which is quick i.e. 
O(n^2) and easy.


> Matrix Inversion on BlockMatrix
> ---
>
> Key: SPARK-23266
> URL: https://issues.apache.org/jira/browse/SPARK-23266
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Chandan Misra
>Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-01-31 Thread Chandan Misra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346398#comment-16346398
 ] 

Chandan Misra commented on SPARK-23266:
---

Kriging is a geostatistical method for interpolating known attribute values of 
scattered data points to predict unknown attribute values at some other points. 
It is based on kriging weight calculation which has the equation of the form 
Cw=D. To get the weight vector, we need one matrix inversion and one 
matrix-vector multiplication. The size of C (covariance matrix), w vector and D 
vector are nxn, nx1 and nx1 respectively, where n is the number of 
interpolating points (input points) and interpolation is done for a single 
output point.

Now, when the output points change, we only require to change matrix D (vector 
for single point), and thus do not have to do the inverse again and again. We 
require matrix-vector multiplication on several nodes which is quick i.e. 
O(n^2) and easy.


> Matrix Inversion on BlockMatrix
> ---
>
> Key: SPARK-23266
> URL: https://issues.apache.org/jira/browse/SPARK-23266
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Chandan Misra
>Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-01-30 Thread Chandan Misra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345083#comment-16345083
 ] 

Chandan Misra commented on SPARK-23266:
---

I am one of the authors of the above-mentioned paper. I would like to 
contribute and help in this regard.

> Matrix Inversion on BlockMatrix
> ---
>
> Key: SPARK-23266
> URL: https://issues.apache.org/jira/browse/SPARK-23266
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Chandan Misra
>Priority: Critical
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23266) Matrix Inversion on BlockMatrix

2018-01-30 Thread Chandan Misra (JIRA)
Chandan Misra created SPARK-23266:
-

 Summary: Matrix Inversion on BlockMatrix
 Key: SPARK-23266
 URL: https://issues.apache.org/jira/browse/SPARK-23266
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 2.2.1
Reporter: Chandan Misra


Matrix inversion is the basic building block for many other algorithms like 
regression, classification, geostatistical analysis using ordinary kriging etc. 
A simple Spark BlockMatrix based efficient distributed divide-and-conquer 
algorithm can be implemented using only *6* multiplications in each recursion 
level of the algorithm. The reference paper can be found in

[https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org