Contribution to Apache Spark

2016-09-03 Thread aditya1702
Hello, I am Aditya Vyas and I am currently in my third year of college doing BTech in my engineering. I know python, a little bit of Java. I want to start contribution in Apache Spark. This is my first time in the field of Big Data. Can someone please help me as to how to get started. Which resourc

Re: Contribution to Apache Spark

2016-09-05 Thread aditya1702
Thank you for replying. I will surely check the link :) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contribution-to-Apache-Spark-tp18852p18864.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Regularized Logistic regression

2016-10-13 Thread aditya1702
Hello, I am trying to solve a problem using regularized logistic regression in spark. I am using the model created by LogisticRegression(): lr=LogisticRegression(regParam=10.0,maxIter=10,standardization=True) model=lr.fit(data_train_df) data_predict_with_model=model.transform(data_test_df) Howeve

RE: Regularized Logistic regression

2016-10-13 Thread aditya1702
Thank you Anurag Verma for replying. I tried increasing the iterations. However I still get underfitted results. I am checking the model's prediction by seeing how many pairs of labels and predictions it gets right data_predict_with_model=best_model.transform(data_test_df) final_pred_df=data_predi

RE: Regularized Logistic regression

2016-10-13 Thread aditya1702
Ok so I tried setting the regParam and tried lowering it. how do I evaluate which regParam is best. Do I have to to do it by trial and error. I am currently calculating the log_loss for the model. Is it good to find the best regparam value. here is my code: from math import exp,log #from pyspark.s

Re: Regularized Logistic regression

2016-10-14 Thread aditya1702
I used the cross validator tool for tuning the parameter. My code is here: from pyspark.ml.classification import LogisticRegression from pyspark.ml.tuning import ParamGridBuilder, CrossValidator from pyspark.ml.evaluation import BinaryClassificationEvaluator reg=100.0 lr=LogisticRegression(maxIter

GSoC projects related to Spark

2016-10-29 Thread aditya1702
Hello all, I am really interested in Spark. I have been doing small projects in machine learning using spark and would love to do a project in this year's GSoC. Can anyone tell me are there any projects related to Spark for this year's GSoC? -- View this message in context: http://apache-spark-