Example of GBTClassifier

2017-10-02 Thread mckunkel
Greetings, 
I am trying to run the example in the example directory for the
GBTClassifier. But when I view this code in eclipse, I get an error such
that
"The method setLabelCol(String) is undefined for the type GBTClassifier"
For the line

GBTClassifier gbt = new
GBTClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures")
.setMaxIter(10);

However the API says this method is there, eclipse does not.
I did a straight copy paste, including all imports.

Someone please help.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Upgraded to spark 2.2 and get Guava error

2017-09-28 Thread mckunkel
Greetings,
I am trying to upgrade from 2.1.1 to 2.2

When I run some of the basic examples given on the webpage, I get an error

Exception in thread "main" java.lang.IllegalAccessError: tried to access
method com.google.common.base.Stopwatch.()V from class
org.apache.hadoop.mapred.FileInputFormat
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)

...
...

I do not get this error if I revert back to 2.1.1.

I am using spark through the Maven Central Repo.

Has anyone else had this issue, and if so how can I fix it?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Testing another Dataset after ML training

2017-07-11 Thread mckunkel
Im not sure why I cannot subscribe, so that everyone can view the
conversation.
Help?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Testing-another-Dataset-after-ML-training-tp28845p28846.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Testing another Dataset after ML training

2017-07-11 Thread mckunkel
Greetings,

Following the example on the AS page for Naive Bayes using Dataset
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes

  

I want to predict the outcome of another set of data. So instead of
splitting the data into training and testing, I have 1 set of training and
one set of testing. i.e.;
Dataset training = spark.createDataFrame(dataTraining,
schemaForFrame);
Dataset testing = spark.createDataFrame(dataTesting, 
schemaForFrame);

NaiveBayes nb = new NaiveBayes();
NaiveBayesModel model = nb.fit(train);
Dataset predictions = model.transform(testing);
predictions.show();

But I get the error.

17/07/11 13:40:38 INFO DAGScheduler: Job 2 finished: collect at
NaiveBayes.scala:171, took 3.942413 s
Exception in thread "main" org.apache.spark.SparkException: Failed to
execute user defined function($anonfun$1: (vector) => vector)
at
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1075)
at
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:144)
at
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:48)
at
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:30)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

...
...
...


How do I perform predictions on other datasets that were not created at a
split?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Testing-another-Dataset-after-ML-training-tp28845.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark Encoder with mysql Enum and data truncated Error

2017-06-27 Thread mckunkel
I am using Spark via Java for a MYSQL/ML(machine learning) project.

In the mysql database, I have a column "status_change_type" of type enum =
{broke, fixed} in a table called "status_change" in a DB called "test".

I have an object StatusChangeDB that constructs the needed structure for the
table, however for the "status_change_type", I constructed it as a String. I
know the bytes from MYSQL enum to Java string are much different, but I am
using Spark, so the encoder does not recognize enums properly. However when
I try to set the value of the enum via a Java string, I receive the "data
truncated" error

org.apache.spark.SparkException: Job aborted due to stage failure: Task
0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage
4.0 (TID 9, localhost, executor driver): java.sql.BatchUpdateException: Data
truncated for column 'status_change_type' at row 1 at
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2055)


I have tried to use enum for "status_change_type", however it fails with a
stack trace of

   
 Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at
org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
at
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
at
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127)
at
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
at
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127)
at
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) ... ...


I have tried to use the jdbc setting "jdbcCompliantTruncation=false" but
this does nothing as I get the same error of "data truncated" as first
stated. Here are my jdbc options map, in case I am using the
"jdbcCompliantTruncation=false" incorrectly.

public static Map jdbcOptions() {
Map jdbcOptions = new HashMap();
jdbcOptions.put("url",
"jdbc:mysql://localhost:3306/test?jdbcCompliantTruncation=false");
jdbcOptions.put("driver", "com.mysql.jdbc.Driver");
jdbcOptions.put("dbtable", "status_change");
jdbcOptions.put("user", "root");
jdbcOptions.put("password", "");
return jdbcOptions;
}

Here is the Spark method for inserting into the mysql DB

private void insertMYSQLQuery(Dataset changeDF) {
try {
   
changeDF.write().mode(SaveMode.Append).jdbc(SparkManager.jdbcAppendOptions(),
"status_change",
new java.util.Properties());
} catch (Exception e) {
System.out.println(e);
}
}

where jdbcAppendOptions uses the jdbcOptions methods as:

public static String jdbcAppendOptions() {

return SparkManager.jdbcOptions().get("url") + "&user=" +
SparkManager.jdbcOptions().get("user") + "&password="
+ SparkManager.jdbcOptions().get("password");

}

How do I achieve getting the values of type enum into the mysqlDB using
spark, or avoiding this "data truncated" error?

My only other thought would be to change the DB itself to use VARCHAR, but
the project leader is not to happy with the idea.




--
View this message in c

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve

2017-06-26 Thread mckunkel
First Spark project.

I have a Java method that returns a Dataset. I want to convert this to
a Dataset, where the Object is named StatusChangeDB. I have created a POJO
StatusChangeDB.java and coded it with all the query objects found in the
mySQL table.
I then create a Encoder and convert the Dataset to a
Dataset. However when I try to .show() the values of the
Dataset I receive the error

Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '`hvpinid_quad`' given input columns: [status_change_type,
superLayer, loclayer, sector, locwire];

at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)

at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:86)

at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:83)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:290)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:290)

at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:289)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)

at
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

at
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:287)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:287)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$10.apply(TreeNode.scala:324)

at
scala.collection.MapLike$MappedValues$$anonfun$iterator$3.apply(MapLike.scala:246)

at
scala.collection.MapLike$MappedValues$$anonfun$iterator$3.apply(MapLike.scala:246)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)

at scala.collection.Iterator$class.foreach(Iterator.scala:893)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)

at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)

at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:311)

at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)

at
scala.collection.mutable.MapBuilder.$plus$plus$eq(MapBuilder.scala:25)

at
scala.collection.TraversableViewLike$class.force(TraversableViewLike.scala:88)

at scala.collection.IterableLike$$anon$1.force(IterableLike.scala:311)

at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:332)

at
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

at
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:287)

at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:255)

at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:255)

at
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:266)

at
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:276)

at
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$6.apply(QueryPlan.scala:285)

at
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)

at
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:285)

at
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:255)

at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:83)

at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:76)

at
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:128)

at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:76)

at
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:57)

at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(ExpressionEncoder.scala:259)

at org.apache.spark.sql.Dataset.(Dataset.scala:209)

at org.apache.spark.sql.Dataset.(Dataset.scala:167)

at org.apache.spark.sql.Dataset$.apply(Dataset.