FP Growth saveAsTextFile
I am having trouble with saving an FP-Growth model as a text file. I can print out the results, but when I try to save the model I get a NullPointerException. model.freqItemsets.saveAsTextFile(c://fpGrowth/model) Thanks, Eric
Help understanding the FP-Growth algrithm
I am a total newbe to spark so be kind. I am looking for an example that implements the FP-Growth algorithm so I can better understand both the algorithm as well as spark. The one example I found (on spark .apache.org example) was incomplete. Thanks, Eric
Cluster getting a null pointer error
I have set up a cluster on AWS and am trying a really simple hello world program as a test. The cluster was built using the ec2 scripts that come with Spark. Anyway, I have output the error message (using --verbose) below. The source code is further below that. Any help would be greatly appreciated. Thanks, Eric *Error code:* r...@ip-xx.xx.xx.xx ~]$ ./spark/bin/spark-submit --verbose --class com.je.test.Hello --master spark://xx.xx.xx.xx:7077 Hello-assembly-1.0.jar Spark assembly has been built with Hive, including Datanucleus jars on classpath Using properties file: /root/spark/conf/spark-defaults.conf Adding default property: spark.executor.memory=5929m Adding default property: spark.executor.extraClassPath=/root/ephemeral-hdfs/conf Adding default property: spark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/ Using properties file: /root/spark/conf/spark-defaults.conf Adding default property: spark.executor.memory=5929m Adding default property: spark.executor.extraClassPath=/root/ephemeral-hdfs/conf Adding default property: spark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/ Parsed arguments: master spark://xx.xx.xx.xx:7077 deployMode null executorMemory 5929m executorCores null totalExecutorCores null propertiesFile /root/spark/conf/spark-defaults.conf extraSparkPropertiesMap() driverMemorynull driverCores null driverExtraClassPathnull driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutorsnull files null pyFiles null archivesnull mainClass com.je.test.Hello primaryResource file:/root/Hello-assembly-1.0.jar namecom.je.test.Hello childArgs [] jarsnull verbose true Default properties from /root/spark/conf/spark-defaults.conf: spark.executor.extraLibraryPath - /root/ephemeral-hdfs/lib/native/ spark.executor.memory - 5929m spark.executor.extraClassPath - /root/ephemeral-hdfs/conf Using properties file: /root/spark/conf/spark-defaults.conf Adding default property: spark.executor.memory=5929m Adding default property: spark.executor.extraClassPath=/root/ephemeral-hdfs/conf Adding default property: spark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/ Main class: com.je.test.Hello Arguments: System properties: spark.executor.extraLibraryPath - /root/ephemeral-hdfs/lib/native/ spark.executor.memory - 5929m SPARK_SUBMIT - true spark.app.name - com.je.test.Hello spark.jars - file:/root/Hello-assembly-1.0.jar spark.executor.extraClassPath - /root/ephemeral-hdfs/conf spark.master - spark://xxx.xx.xx.xxx:7077 Classpath elements: file:/root/Hello-assembly-1.0.jar *Actual Error:* Exception in thread main java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) *Source Code:* package com.je.test import org.apache.spark.{SparkConf, SparkContext} class Hello { def main(args: Array[String]): Unit = { val conf = new SparkConf(true)//.set(spark.cassandra.connection.host, xxx.xx.xx.xxx) val sc = new SparkContext(spark://xxx.xx.xx.xxx:7077, Season, conf) println(Hello World) } }
Unresolved attributes
I am running spark 1.1.0 DSE cassandra 4.6 when I try to run the following sql statement: val sstring = Select * from seasonality where customer_id = + customer_id + and cat_id = + seg + and period_desc = + cDate println(sstring = +sstring) val rrCheckRdd = sqlContext.sql(sstring).collect().array I get the following error: Segment Code = 205 cDate=Year_2011_Month_0_Week_0_Site reRunCheck seg = 205 sstring = Select * from seasonality where customer_id = 6 and cat_id = 205 and period_desc = Year_2011_Month_0_Week_0_Site org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: *, tree: Project [*] Filter (((customer_id#144 = 6) (CAST(cat_id#148, DoubleType) = CAST(205, DoubleType))) (period_desc#150 = 'Year_2011_Month_0_Week_0_Site)) Subquery seasonality SparkLogicalPlan (ExistingRdd [customer_id#144,period_id#145,season_id#146,cat_lvl#147,cat_id#148,season_avg#149,period_desc#150,analyzed_date#151,sum_amt#152,total_count#153,process_id#154], MapPartitionsRDD[36] at mapPartitions at basicOperators.scala:208) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70) It looks like an internal join error or possibly something else. I need to get a work around if possible or a quick patch. Any help is appreciated. Eric -- *Eric Tanner*Big Data Developer [image: JustEnough Logo] 15440 Laguna Canyon, Suite 100 Irvine, CA 92618 Cell: Tel: Skype: Web: +1 (951) 313-9274 +1 (949) 706-0400 e http://tonya.nicholls.je/ric.tanner.je www.justenough.com Confidentiality Note: The information contained in this email and document(s) attached are for the exclusive use of the addressee and may contain confidential, privileged and non-disclosable information. If the recipient of this email is not the addressee, such recipient is strictly prohibited from reading, photocopying, distribution or otherwise using this email or its contents in any way.
Scala Spark IDE help
I am a Scala / Spark newbie (attending Paco Nathan's class). What I need is some advice as to how to set up intellij (or eclipse) to be able to attache to the process executing to the debugger. I know that this is not feasible if the code is executing within the cluster. However, if spark is running locally (on my laptop) I would like to attach the debugger process to the spark program that is running locally to be able to step through the program. Any advice will be is helpful. Eric -- *Eric Tanner*Big Data Developer [image: JustEnough Logo] 15440 Laguna Canyon, Suite 100 Irvine, CA 92618 Cell: Tel: Skype: Web: +1 (951) 313-9274 +1 (949) 706-0400 e http://tonya.nicholls.je/ric.tanner.je www.justenough.com Confidentiality Note: The information contained in this email and document(s) attached are for the exclusive use of the addressee and may contain confidential, privileged and non-disclosable information. If the recipient of this email is not the addressee, such recipient is strictly prohibited from reading, photocopying, distribution or otherwise using this email or its contents in any way.