RE: SparkR read.df Option type doesn't match
Yes - please see the code example on the SparkR API doc: http://spark.apache.org/docs/latest/api/R/read.df.html Suggestion or contribution to improve the doc is welcome! > Date: Thu, 26 Nov 2015 15:08:31 -0700 > From: s...@phemi.com > To: dev@spark.apache.org > Subject: Re: SparkR read.df Option type doesn't match > > I found the answer myself. > options should be added like: > read.df(sqlContext,path=NULL,source="***",option1="",option2="" ) > > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/SparkR-read-df-Option-type-doesn-t-match-tp15365p15370.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >
RE: SparkR read.df Option type doesn't match
There is a bug at this page in the examples I have file it in the JIRA It's SPARK-12019. I don't know how to change the page. But I think an example that shows how to write options would be great. Like sc <- sparkR.init(master="yarn-client",appName= "SparkR", sparkHome = "/home/spark", sparkEnvir =list(spark.executor.memory="1g"), sparkExecutorEnv =list(LD_LIBRARY_PATH="/directory of JVM libraries (libjvm.so) on workers/"), sparkJars = c("jarfile1.jar,jarfile2.jar"), sparkPackages = "" option1="",option2="") sparkJars example in the https://spark.apache.org/docs/1.5.2/api/R/sparkR.init.html have bug in it. This one I showed here is the correct one. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SparkR-read-df-Option-type-doesn-t-match-tp15365p15378.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Subtract implementation using broadcast
We need to first implement subtract and intersect in Spark SQL natively first (i.e. add physical operator for them rather than using RDD.subtract/intersect). Then it should be pretty easy to do that, given it is just about injecting the right exchange operators. > On Nov 27, 2015, at 11:19 PM, Justin Uangwrote: > > Hi, > > I have seen massive gains with the broadcast hint for joins with DataFrames, > and I was wondering if we have thought about allowing the broadcast hint for > the implementation of subtract and intersect. > > Right now, when I try it, it says that there is no plan for the broadcast > hint. > > Justin - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Problem in running MLlib SVM
Hi, I am trying to run the straightforward example of SVm but I am getting low accuracy (around 50%) when I predict using the same data I used for training. I am probably doing the prediction in a wrong way. My code is below. I would appreciate any help. import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.mllib.classification.SVMModel; import org.apache.spark.mllib.classification.SVMWithSGD; import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.util.MLUtils; import scala.Tuple2; import edu.illinois.biglbjava.readers.LabeledPointReader; public class SimpleDistSVM { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("SVM Classifier Example"); SparkContext sc = new SparkContext(conf); String inputPath=args[0]; // Read training data JavaRDD data = MLUtils.loadLibSVMFile(sc, inputPath).toJavaRDD(); // Run training algorithm to build the model. int numIterations = 3; final SVMModel model = SVMWithSGD.train(data.rdd(), numIterations); // Clear the default threshold. model.clearThreshold(); // Predict points in test set and map to an RDD of 0/1 values where 0 is misclassication and 1 is correct classification JavaRDD classification = data.map(new Function() { public Integer call(LabeledPoint p) { int label = (int) p.label(); Double score = model.predict(p.features()); if((score >=0 && label == 1) || (score <0 && label == 0)) { return 1; //correct classiciation } else return 0; } } ); // sum up all values in the rdd to get the number of correctly classified examples int sum=classification.reduce(new Function2 () { public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0+arg1; }}); //compute accuracy as the percentage of the correctly classified examples double accuracy=((double)sum)/((double)classification.count()); System.out.println("Accuracy = " + accuracy); } } ); } }
Subtract implementation using broadcast
Hi, I have seen massive gains with the broadcast hint for joins with DataFrames, and I was wondering if we have thought about allowing the broadcast hint for the implementation of subtract and intersect. Right now, when I try it, it says that there is no plan for the broadcast hint. Justin