Re: Spark 2.0.0 : GLM problem

2016-06-22 Thread april_ZMQ
The picture below shows the reply from the creator for this package, Yanbo
Liang( https://github.com/yanboliang   )

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145p27203.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 2.0.0 : GLM problem

2016-06-14 Thread april_ZMQ
To update the post:

•   First problem:  This problem can be solved by adding a epsilon(very 
small
value to 0 value). Because in poisson model, it doesn't allow the y value to
be zero. But in general, it doesn't have this requirement.

But now I encounter another problem that in every GLM model.
"Values to assemble cannot be null"
 

I've found the code in 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala

  
 

Can you guys explain what that mean?














--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145p27164.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 2.0.0 : GLM problem

2016-06-13 Thread april_ZMQ
Hi ALL,

I’ve tried the GLM (General Linear Model) of Spark 2.0.0-preview. And I’ve
countered some unexpected problems.
•   First problem:
I test the “poisson” family type GLM with a very small dataset using SparkR
2.0.0 This dataset can run “poisson” family type GLM in general R
successfully. But SparkR showed the error below. And I have no idea where
this came from.

16/06/13 14:10:58 WARN WeightedLeastSquares: regParam is zero, which might
cause numerical instability and overfitting.
16/06/13 14:10:58 ERROR Executor: Exception in task 0.0 in stage 28.0 (TID
28)
java.lang.IllegalArgumentException: requirement failed: The response
variable of Poisson family should be positive, but got 0.0
 

•   Second problem:
When I run the same dataset which I ran successfully on Spark 1.6.0, Spark
2.0.0 generated the error below.

ERROR RBackendHandler: fit on
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  org.apache.spark.SparkException: Currently, GeneralizedLinearRegression
only supports number of features <= 4096. Found 7664 in the input dataset.
 

This is the R code:
“model <- glm(flow~Origin + Destination, data = distance_flow,family =
gaussian(link = "identity"))”
Dose this because Spark 2.0.0 not support as large dataset as the previous
version?






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SparkR : glm model

2016-06-09 Thread april_ZMQ
Hi all,

I'm a student who are working on a data analysis project with sparkR.

I found out that GLM (generalized linear model) only supports two types of
distribution,  "gaussian" and  "binomial". 
However, our project is requiring the "poisson" distribution. Meanwhile, I
found out that sparkR was supporting "poisson"before but now this function
is closed. https://issues.apache.org/jira/browse/SPARK-12566
  

Is there any approaches that I can use the previous official package of
poisson distribution in SparkR instead?

Thank you very much!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-glm-model-tp27134.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org