I worked man.. Thanks alot :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-sparkSQL-to-convert-a-collection-of-python-dictionary-of-dictionaries-to-schma-RDD-tp20228p20461.html
Sent from the Apache Spark User List mailing list archive at
Hi Davies,
Thanks for the reply
The problem is I have empty dictionaries in my field3 as well. It gives me
an error :
Traceback (most recent call last):
File stdin, line 1, in module
File /root/spark/python/pyspark/sql.py, line 1042, in inferSchema
schema = _infer_schema(first)
Which version of Spark are you using? inferSchema() is improved to
support empty dict in 1.2+, could you try the 1.2-RC1?
Also, you can use applySchema():
from pyspark.sql import *
fields = [StructField('field1', IntegerType(), True),
StructField('field2', StringType(), True),
Hi Guys,
I am trying to use SparkSQL to convert an RDD to SchemaRDD so that I can
save it in parquet format.
A record in my RDD has the following format:
RDD1
{
field1:5,
field2: 'string',
field3: {'a':1, 'c':2}
}
I am using field3 to represent a sparse vector and it can have keys:
inferSchema() will work better than jsonRDD() in your case,
from pyspark.sql import Row
srdd = sqlContext.inferSchema(rdd.map(lambda x: Row(**x)))
srdd.first()
Row( field1=5, field2='string', field3={'a'=1, 'c'=2})
On Wed, Dec 3, 2014 at 12:11 AM, sahanbull sa...@skimlinks.com wrote:
Hi