How to perform basic statistics on a Json file to explore my numeric and non-numeric variables?

2015-07-30 Thread SparknewUser
I've imported a Json file which has this schema : sqlContext.read.json("filename").printSchema root |-- COL: long (nullable = true) |-- DATA: array (nullable = true) ||-- element: struct (containsNull = true) |||-- Crate: string (nullable = true)

How to read a Json file with a specific format?

2015-07-29 Thread SparknewUser
I'm trying to read a Json file which is like : [ {"IFAM":"EQR","KTM":143000640,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"

Re: How to get the best performance with LogisticRegressionWithSGD?

2015-05-29 Thread SparknewUser
I've tried several different couple of parameters for my LogisticRegressionWithSGD and here are my results. My numIterations varies from 100 to 500 by 50 and my stepSize varies from 0.1 to 1 by 0.1. My last line represents the maximum of each column and my last column the maximum of each line and w

How to get the best performance with LogisticRegressionWithSGD?

2015-05-27 Thread SparknewUser
I'm new to Spark and I'm getting bad performance with classification methods on Spark MLlib (worse than R in terms of AUC). I am trying to put my own parameters rather than the default parameters. Here is the method I want to use : train(RDD input, int numIterations, doub

MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

2015-05-22 Thread SparknewUser
I am new in MLlib and in Spark.(I use Scala) I'm trying to understand how LogisticRegressionWithLBFGS and LogisticRegressionWithSGD work. I usually use R to do logistic regressions but now I do it on Spark to be able to analyze Big Data. The model only returns weights and intercept. My problem is