subject:"\[jira\] \[Commented\] \(SPARK\-3278\) Isotonic regression"

[jira] [Commented] (SPARK-3278) Isotonic regression

2015-03-23 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377205#comment-14377205
 ] 

Xiangrui Meng commented on SPARK-3278:
--

Did you try truncating the digits of x to reduce the number of possible 
buckets? If the loss of precision is not super important, this could help 
scalability.

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
 Fix For: 1.3.0


 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2015-03-16 Thread Martin Zapletal (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362986#comment-14362986
]

Martin Zapletal commented on SPARK-3278:

Vladimir,

just to update you on the progress. I was able to complete the isotonic
regression with 100M records, but failed with insufficient memory error with
150M records on my machine. You may be able to run with larger amounts of data
on better machines.

Pool adjacent violators algorithm can theoretically have linear time
complexity, but although I have used the best algorithm I could find I am not
convinced it reaches this efficiency. I will work on providing evidence.

The biggest issue with the current algorithm is however with the
parallelization approach. Its properties are unfortunately nowhere near linear
scalability (linear solution time increase with linear parallelism increase or
constant solution time with linear parallelism increase and linear problem size
increase). This was expected and is caused by the algorithm itself for the
following reasons

1) The algorithm works in two steps. First the computation is distributed to
all partitions, but then gathered and the algorithm is run again on the whole
data set. This approach may leave most of work for the last sequential step and
thus gaining very little compared to purely sequential implementation or even
performing worse. That can happen in case where parallel isotonic regressions
return a locally optimal solution that will however have to change for a global
solution in the last step. Another performance drawback in comparison to
sequential processing is the potential need to copy data to each process.
2) It requires the whole dataset to fit into one process’ memory in the last
step (or repeated disk access).

I started looking into the issue and was able to design an iterative algorithm
that adressed both the above issues and performed very close to linear
scalability. It however still has correctness (rounding) issues and will
require further research.

Let me know if that helped. In the meantime I will continue working on
benchmarks and performance quantification of the current algorithm as well as
on research for potentially more efficient solutions.

Isotonic regression
---

Key: SPARK-3278
URL: https://issues.apache.org/jira/browse/SPARK-3278
Project: Spark
Issue Type: New Feature
Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
Fix For: 1.3.0

Add isotonic regression for score calibration.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2015-03-10 Thread Vladimir Vladimirov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355119#comment-14355119
 ] 

Vladimir Vladimirov commented on SPARK-3278:


Martin. 
This would be really nice.
Is it possible to run isotonic regression on the data I'll provide? (~150-200M 
records).
The answers I'm looking for - how long it would take to train the model on this 
data set, how much resources it would take on a cluster and confirm that it 
won't blow spark.

I'll export values in format float1, float2 per line - similar to how it is 
described in the doc 
http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/mllib-isotonic-regression.html
?
Where float1 - is between 0 and 1. And float2 - is 0 or 1

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
 Fix For: 1.3.0


 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2015-03-09 Thread Vladimir Vladimirov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353252#comment-14353252
 ] 

Vladimir Vladimirov commented on SPARK-3278:


Had anyone benchmarked the performance of Spark Isotonic Regression 
implementation on big datasets (100 M, 1000M) ?

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
 Fix For: 1.3.0


 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2015-03-09 Thread Martin Zapletal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353549#comment-14353549
 ] 

Martin Zapletal commented on SPARK-3278:


What particular benchmarks would you like to see? I can do them.

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
 Fix For: 1.3.0


 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2015-03-09 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353419#comment-14353419
 ] 

Xiangrui Meng commented on SPARK-3278:
--

I don't know any. It really depends on how may buckets it outputs. I can 
imagine problems with 100M buckets.

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal
 Fix For: 1.3.0


 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2014-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229271#comment-14229271
 ] 

Apache Spark commented on SPARK-3278:
-

User 'zapletal-martin' has created a pull request for this issue:
https://github.com/apache/spark/pull/3519

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Martin Zapletal

 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

2014-10-23 Thread Martin Zapletal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182064#comment-14182064
 ] 

Martin Zapletal commented on SPARK-3278:


I am interested in working on this ticket. Can you please assign to me?

 Isotonic regression
 ---

 Key: SPARK-3278
 URL: https://issues.apache.org/jira/browse/SPARK-3278
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng

 Add isotonic regression for score calibration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

[jira] [Commented] (SPARK-3278) Isotonic regression

8 matches

Site Navigation

Mail list logo

Footer information