RE: SparkR read.df Option type doesn't match

2015-11-27 Thread Felix Cheung
Yes - please see the code example on the SparkR API doc: 
http://spark.apache.org/docs/latest/api/R/read.df.html
Suggestion or contribution to improve the doc is welcome!

 
> Date: Thu, 26 Nov 2015 15:08:31 -0700
> From: s...@phemi.com
> To: dev@spark.apache.org
> Subject: Re: SparkR read.df Option type doesn't match
> 
> I found the answer myself.
> options should be added like:
> read.df(sqlContext,path=NULL,source="***",option1="",option2="" )
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/SparkR-read-df-Option-type-doesn-t-match-tp15365p15370.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 
  

RE: SparkR read.df Option type doesn't match

2015-11-27 Thread liushiqi9
There is a bug at this page in the examples
I have file it in the JIRA It's SPARK-12019.
I don't know how to change the page.
But I think an example that shows how to write options would be great.
Like 

sc <- sparkR.init(master="yarn-client",appName= "SparkR", sparkHome =
"/home/spark",
 sparkEnvir =list(spark.executor.memory="1g"),
 sparkExecutorEnv =list(LD_LIBRARY_PATH="/directory of JVM
libraries (libjvm.so) on workers/"),
 sparkJars =  c("jarfile1.jar,jarfile2.jar"), sparkPackages
= ""
 option1="",option2="")

sparkJars example in the
https://spark.apache.org/docs/1.5.2/api/R/sparkR.init.html have bug in it.
This one I showed here is the correct one.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkR-read-df-Option-type-doesn-t-match-tp15365p15378.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Subtract implementation using broadcast

2015-11-27 Thread Reynold Xin
We need to first implement subtract and intersect in Spark SQL natively first 
(i.e. add physical operator for them rather than using RDD.subtract/intersect).

Then it should be pretty easy to do that, given it is just about injecting the 
right exchange operators.



> On Nov 27, 2015, at 11:19 PM, Justin Uang  wrote:
> 
> Hi,
> 
> I have seen massive gains with the broadcast hint for joins with DataFrames, 
> and I was wondering if we have thought about allowing the broadcast hint for 
> the implementation of subtract and intersect.
> 
> Right now, when I try it, it says that there is no plan for the broadcast 
> hint.
> 
> Justin

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Problem in running MLlib SVM

2015-11-27 Thread Tarek Elgamal
Hi,

I am trying to run the straightforward example of SVm but I am getting low
accuracy (around 50%) when I predict using the same data I used for
training. I am probably doing the prediction in a wrong way. My code is
below. I would appreciate any help.


import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.mllib.classification.SVMModel;
import org.apache.spark.mllib.classification.SVMWithSGD;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.util.MLUtils;

import scala.Tuple2;
import edu.illinois.biglbjava.readers.LabeledPointReader;

public class SimpleDistSVM {
  public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("SVM Classifier Example");
SparkContext sc = new SparkContext(conf);
String inputPath=args[0];

// Read training data
JavaRDD data = MLUtils.loadLibSVMFile(sc,
inputPath).toJavaRDD();

// Run training algorithm to build the model.
int numIterations = 3;
final SVMModel model = SVMWithSGD.train(data.rdd(), numIterations);

// Clear the default threshold.
model.clearThreshold();


// Predict points in test set and map to an RDD of 0/1 values where 0
is misclassication and 1 is correct classification
JavaRDD classification = data.map(new Function() {
 public Integer call(LabeledPoint p) {
   int label = (int) p.label();
   Double score = model.predict(p.features());
   if((score >=0 && label == 1) || (score <0 && label == 0))
   {
   return 1; //correct classiciation
   }
   else
return 0;

 }
   }
 );
// sum up all values in the rdd to get the number of correctly
classified examples
 int sum=classification.reduce(new Function2()
{
public Integer call(Integer arg0, Integer arg1)
throws Exception {
return arg0+arg1;
}});

 //compute accuracy as the percentage of the correctly classified
examples
 double accuracy=((double)sum)/((double)classification.count());
 System.out.println("Accuracy = " + accuracy);

}
  }
);
  }
}


Subtract implementation using broadcast

2015-11-27 Thread Justin Uang
Hi,

I have seen massive gains with the broadcast hint for joins with
DataFrames, and I was wondering if we have thought about allowing the
broadcast hint for the implementation of subtract and intersect.

Right now, when I try it, it says that there is no plan for the broadcast
hint.

Justin