Hi,
I have list of person records in following format:
case class Person(fName:String, city:String)
val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))
val rdd:RDD[Person]=sc.parallelize(l)
val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city)
I would like to sav
Hi,
We are currently working on a Market Basket Analysis by deploying FP Growth
algorithm on Spark to generate association rules for product recommendation.
We are running on close to 24 million invoices over an assortment of more
than 100k products. However, whenever we relax the support threshol
Hi,
My Cassandra table has custom user defined say example:
CREATE TYPE address (
addressline1 text,
addressline2 text,
city text,
state text,
country text,
pincode text
)
create table person (
id text,
name text,
addresses set>,
PRIMARY KEY (id));
val rdd=sqlContext.read.format("
Hi,
I am working with Cassandra and Spark, would like to know what is best
performance using Cassandra filter based on primary key and cluster key vs
using spark data frame transformation/filters.
for example in spark:
val rdd = sqlContext.read.format("org.apache.spark.sql.cassandra")
.op
creating RDD is done via spark context where as creating Dataframe is from
sqlcontext...
so Dataframe is part of sparksql where as RDD is spark core
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-vs-RDD-tp26570p26573.html
Sent from the Apache Spa
Hi,
I am new to Spark, would like to know any guidelines when to use Data Frame
vs. RDD.
Thanks,
As
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-vs-RDD-tp26570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-