FP Growth saveAsTextFile

2015-05-20 Thread Eric Tanner
I am having trouble with saving an FP-Growth model as a text file.  I can
print out the results, but when I try to save the model I get a
NullPointerException.

model.freqItemsets.saveAsTextFile(c://fpGrowth/model)

Thanks,

Eric


Help understanding the FP-Growth algrithm

2015-04-14 Thread Eric Tanner
I am a total newbe to spark so be kind.

I am looking for an example that implements the FP-Growth algorithm so I
can better understand both the algorithm as well as spark.  The one example
I found (on spark .apache.org example) was incomplete.

Thanks,
Eric


Cluster getting a null pointer error

2014-12-09 Thread Eric Tanner
I have set up a cluster on AWS and am trying a really simple hello world
program as a test.  The cluster was built using the ec2 scripts that come
with Spark.  Anyway, I have output the error message (using --verbose)
below.  The source code is further below that.

Any help would be greatly appreciated.

Thanks,

Eric

*Error code:*

r...@ip-xx.xx.xx.xx ~]$ ./spark/bin/spark-submit  --verbose  --class
com.je.test.Hello --master spark://xx.xx.xx.xx:7077
 Hello-assembly-1.0.jar
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using properties file: /root/spark/conf/spark-defaults.conf
Adding default property: spark.executor.memory=5929m
Adding default property:
spark.executor.extraClassPath=/root/ephemeral-hdfs/conf
Adding default property:
spark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/
Using properties file: /root/spark/conf/spark-defaults.conf
Adding default property: spark.executor.memory=5929m
Adding default property:
spark.executor.extraClassPath=/root/ephemeral-hdfs/conf
Adding default property:
spark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/
Parsed arguments:
  master  spark://xx.xx.xx.xx:7077
  deployMode  null
  executorMemory  5929m
  executorCores   null
  totalExecutorCores  null
  propertiesFile  /root/spark/conf/spark-defaults.conf
  extraSparkPropertiesMap()
  driverMemorynull
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise   false
  queue   null
  numExecutorsnull
  files   null
  pyFiles null
  archivesnull
  mainClass   com.je.test.Hello
  primaryResource file:/root/Hello-assembly-1.0.jar
  namecom.je.test.Hello
  childArgs   []
  jarsnull
  verbose true

Default properties from /root/spark/conf/spark-defaults.conf:
  spark.executor.extraLibraryPath - /root/ephemeral-hdfs/lib/native/
  spark.executor.memory - 5929m
  spark.executor.extraClassPath - /root/ephemeral-hdfs/conf


Using properties file: /root/spark/conf/spark-defaults.conf
Adding default property: spark.executor.memory=5929m
Adding default property:
spark.executor.extraClassPath=/root/ephemeral-hdfs/conf
Adding default property:
spark.executor.extraLibraryPath=/root/ephemeral-hdfs/lib/native/
Main class:
com.je.test.Hello
Arguments:

System properties:
spark.executor.extraLibraryPath - /root/ephemeral-hdfs/lib/native/
spark.executor.memory - 5929m
SPARK_SUBMIT - true
spark.app.name - com.je.test.Hello
spark.jars - file:/root/Hello-assembly-1.0.jar
spark.executor.extraClassPath - /root/ephemeral-hdfs/conf
spark.master - spark://xxx.xx.xx.xxx:7077
Classpath elements:
file:/root/Hello-assembly-1.0.jar

*Actual Error:*
Exception in thread main java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


*Source Code:*
package com.je.test


import org.apache.spark.{SparkConf, SparkContext}

class Hello {

  def main(args: Array[String]): Unit = {

val conf = new
SparkConf(true)//.set(spark.cassandra.connection.host,
xxx.xx.xx.xxx)
val sc = new SparkContext(spark://xxx.xx.xx.xxx:7077, Season, conf)

println(Hello World)

  }
}


Unresolved attributes

2014-12-02 Thread Eric Tanner
I am running
spark 1.1.0
DSE cassandra 4.6

when I try to run the following sql statement:

val sstring = Select * from seasonality where customer_id =  +
customer_id +  and cat_id =  + seg +  and period_desc =  + cDate
println(sstring = +sstring)
val rrCheckRdd = sqlContext.sql(sstring).collect().array

I get the following error:

Segment Code = 205
cDate=Year_2011_Month_0_Week_0_Site
reRunCheck seg = 205
sstring = Select * from seasonality where customer_id = 6 and cat_id = 205
and period_desc = Year_2011_Month_0_Week_0_Site
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: *, tree:
Project [*]
 Filter (((customer_id#144 = 6)  (CAST(cat_id#148, DoubleType) =
CAST(205, DoubleType)))  (period_desc#150 =
'Year_2011_Month_0_Week_0_Site))
  Subquery seasonality
   SparkLogicalPlan (ExistingRdd
[customer_id#144,period_id#145,season_id#146,cat_lvl#147,cat_id#148,season_avg#149,period_desc#150,analyzed_date#151,sum_amt#152,total_count#153,process_id#154],
MapPartitionsRDD[36] at mapPartitions at basicOperators.scala:208)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)


It looks like an internal join error or possibly something else.  I need to
get a work around if possible or a quick patch.

Any help is appreciated.

Eric

-- 





*Eric Tanner*Big Data Developer

[image: JustEnough Logo]

15440 Laguna Canyon, Suite 100

Irvine, CA 92618



Cell:
Tel:
Skype:
Web:

  +1 (951) 313-9274
  +1 (949) 706-0400
  e http://tonya.nicholls.je/ric.tanner.je
  www.justenough.com

Confidentiality Note: The information contained in this email and
document(s) attached are for the exclusive use of the addressee and may
contain confidential, privileged and non-disclosable information. If the
recipient of this email is not the addressee, such recipient is strictly
prohibited from reading, photocopying, distribution or otherwise using this
email or its contents in any way.


Scala Spark IDE help

2014-10-27 Thread Eric Tanner
I am a Scala / Spark newbie (attending Paco Nathan's class).

What I need is some advice as to how to set up intellij (or eclipse) to be
able to attache to the process executing to the debugger.  I know that this
is not feasible if the code is executing within the cluster.  However, if
spark is running locally (on my laptop) I would like to attach the debugger
process to the spark program that is running locally to be able to step
through the program.

Any advice will be is helpful.

Eric

-- 





*Eric Tanner*Big Data Developer

[image: JustEnough Logo]

15440 Laguna Canyon, Suite 100

Irvine, CA 92618



Cell:
Tel:
Skype:
Web:

  +1 (951) 313-9274
  +1 (949) 706-0400
  e http://tonya.nicholls.je/ric.tanner.je
  www.justenough.com

Confidentiality Note: The information contained in this email and
document(s) attached are for the exclusive use of the addressee and may
contain confidential, privileged and non-disclosable information. If the
recipient of this email is not the addressee, such recipient is strictly
prohibited from reading, photocopying, distribution or otherwise using this
email or its contents in any way.