[jira] [Commented] (SPARK-23528) Expose vital statistics of GaussianMixtureModel

2018-03-05 Thread Erich Schubert (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386318#comment-16386318
 ] 

Erich Schubert commented on SPARK-23528:


I had only been looking at the mllib API. There is no summary there. What a 
mess that is.

> Expose vital statistics of GaussianMixtureModel
> ---
>
> Key: SPARK-23528
> URL: https://issues.apache.org/jira/browse/SPARK-23528
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.2.1
>Reporter: Erich Schubert
>Priority: Minor
>
> Spark ML should expose vital statistics of the GMM model:
>  * *Number of iterations* (actual, not max) until the tolerance threshold was 
> hit: we can set a maximum, but how do we know the limit was large enough, and 
> how many iterations it really took?
>  * Final *log likelihood* of the model: if we run multiple times with 
> different starting conditions, how do we know which run converged to the 
> better fit?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23528) Expose vital statistics of GaussianMixtureModel

2018-02-27 Thread Erich Schubert (JIRA)
Erich Schubert created SPARK-23528:
--

 Summary: Expose vital statistics of GaussianMixtureModel
 Key: SPARK-23528
 URL: https://issues.apache.org/jira/browse/SPARK-23528
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 2.2.1
Reporter: Erich Schubert


Spark ML should expose vital statistics of the GMM model:
 * *Number of iterations* (actual, not max) until the tolerance threshold was 
hit: we can set a maximum, but how do we know the limit was large enough, and 
how many iterations it really took?
 * Final *log likelihood* of the model: if we run multiple times with different 
starting conditions, how do we know which run converged to the better fit?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3624) Failed to find Spark assembly in /usr/share/spark/lib for RELEASED debian packages

2015-04-14 Thread Erich Schubert (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493799#comment-14493799
 ] 

Erich Schubert commented on SPARK-3624:
---

https://github.com/kno10/bigtop/commit/038c5184329da132252957d5ca5ab44dbe9fd980

Is a one-liner bug fix to include the .jar files in the Debian package.

Doesn't solve all the other issues with building scala on Debian, or the 
overall abyssal quality of the Debian/Ubuntu pacakges. :-(

 Failed to find Spark assembly in /usr/share/spark/lib for RELEASED debian 
 packages
 

 Key: SPARK-3624
 URL: https://issues.apache.org/jira/browse/SPARK-3624
 Project: Spark
  Issue Type: Bug
  Components: Build, Deploy
Affects Versions: 1.1.0
Reporter: Christian Tzolov
Priority: Minor

 The compute-classpath.sh requires that for a 'RELASED' package the Spark 
 assembly jar is accessible from a spark home/lib folder.
 Currently the jdeb packaging (assembly module) bundles the assembly jar into 
 a folder called 'jars'. 
 The result is :
 /usr/share/spark/bin/spark-submit   --num-executors 10--master 
 yarn-cluster   --class org.apache.spark.examples.SparkPi   
 /usr/share/spark/jars/spark-examples-1.1.0-hadoop2.2.0-gphd-3.0.1.0.jar 10
 ls: cannot access /usr/share/spark/lib: No such file or directory
 Failed to find Spark assembly in /usr/share/spark/lib
 You need to build Spark before running this program.
 Trivial solution is to rename the 'prefix${deb.install.path}/jars/prefix' 
 inside assembly/pom.xml to prefix${deb.install.path}/lib/prefix.
 Another less impactful (considering backward compatibility) solution is to 
 define a lib-jars symlink in the assembly/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org