GSoC projects related to Spark

2016-10-29 Thread aditya1702
Hello all,
I am really interested in Spark. I have been doing small projects in machine
learning using spark and would love to do a project in this year's GSoC. Can
anyone tell me are there any projects related to Spark for this year's GSoC?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/GSoC-projects-related-to-Spark-tp19653.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Regularized Logistic regression

2016-10-14 Thread aditya1702
I used the cross validator tool for tuning the parameter. My code is here:

from pyspark.ml.classification import LogisticRegression
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.evaluation import BinaryClassificationEvaluator
reg=100.0
lr=LogisticRegression(maxIter=500)

paramGrid = ParamGridBuilder().addGrid(lr.regParam,
[0.02,0.01,0.2,0.1,1.0,2.0,10.0,15.0,20.0,100.0]).addGrid(lr.elasticNetParam,
[0.0, 0.5, 1.0]).build()

crossval = CrossValidator(estimator=lr,
  estimatorParamMaps=paramGrid,
  evaluator=BinaryClassificationEvaluator(),
  numFolds=3)
model=crossval.fit(data_train_df)

And finally predicted the values:

prediction = model.transform(data_test_df)
prediction.show()
  
+-+--+
|label|prediction|
+-+--+
|  1.0|   1.0|
|  1.0|   1.0|
|  1.0|   1.0|
|  1.0|   1.0|
|  1.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
|  0.0|   1.0|
+-+--+

Why am  I getting the wrong predictions?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432p19448.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



RE: Regularized Logistic regression

2016-10-13 Thread aditya1702
Ok so I tried setting the regParam and tried lowering it. how do I evaluate
which regParam is best. Do I have to to do it by trial and error. I am
currently calculating the log_loss for the model. Is it good to find the
best regparam value. here is my code:

from math import exp,log
#from pyspark.sql.functions import log
epsilon = 1e-16
def sigmoid_log_loss(w,x):
  ans=float(1/(1+exp(-(w.dot(x.features)
  if ans==0:
ans=ans+epsilon
  if ans==1:
ans=ans-epsilon
  log_loss=-((x.label)*log(ans)+(1-x.label)*log(1-ans))
  return ((ans,x.label),log_loss)

---
reg=0.02
from pyspark.ml.classification import LogisticRegression
lr=LogisticRegression(regParam=reg,maxIter=500,standardization=True,elasticNetParam=0.5)
model=lr.fit(data_train_df)

w=model.coefficients
intercept=model.intercept
data_predicted_df=data_val_df.map(lambda x:(sigmoid_log_loss(w,x)))
log_loss=data_predicted_df.map(lambda x:x[1]).mean()
print log_loss



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432p19444.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



RE: Regularized Logistic regression

2016-10-13 Thread aditya1702
Thank you Anurag Verma for replying. I tried increasing the iterations.
However I still get underfitted results. I am checking the model's
prediction by seeing how many pairs of labels and predictions it gets right

data_predict_with_model=best_model.transform(data_test_df)
final_pred_df=data_predict_with_model.select(col('label'),col('prediction'))
ans=final_pred_df.map(lambda x:((x[0],x[1]),1)).reduceByKey(lambda
a,b:a+b).toDF()
ans.show()

-+---+
|   _1| _2|
+-+---+
|[1.0,1.0]|  5|
|[0.0,1.0]| 12|
+-+---+

Do you know any other methods by which I can check the model? and what is it
that I am doing wrong. I have filtered the data and arranged it in a
features and label column. So now only the model creation part is wrong I
guess. Can anyone help me please. I am still learning machine learning.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432p19443.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Regularized Logistic regression

2016-10-13 Thread aditya1702
Hello, I am trying to solve a problem using regularized logistic regression
in spark. I am using the model created by LogisticRegression():

lr=LogisticRegression(regParam=10.0,maxIter=10,standardization=True)
model=lr.fit(data_train_df)
data_predict_with_model=model.transform(data_test_df)

However I am not able to get proper results. Can anyone tell me whether we
have to pass any other parameters in the model?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Regularized-Logistic-regression-tp19432.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Contribution to Apache Spark

2016-09-03 Thread aditya1702
Hello,
I am Aditya Vyas and I am currently in my third year of college doing BTech
in my engineering. I know python, a little bit of Java. I want to start
contribution in Apache Spark. This is my first time in the field of Big
Data. Can someone please help me as to how to get started. Which resources
to look at?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Contribution-to-Apache-Spark-tp18852.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org