I worked man.. Thanks alot :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-sparkSQL-to-convert-a-collection-of-python-dictionary-of-dictionaries-to-schma-RDD-tp20228p20461.html
Sent from the Apache Spark User List mailing list archive at
uot;/root/spark/python/pyspark/sql.py", line 552, in _drop_schema
> yield converter(i)
> File "/root/spark/python/pyspark/sql.py", line 540, in nested_conv
> return tuple(f(v) for f, v in zip(convs, conv(row)))
> File "/root/spark/python/pyspark/sql.py", line 54
thon/pyspark/sql.py", line 508, in
return lambda row: dict((k, conv(v)) for k, v in row.iteritems())
AttributeError: 'int' object has no attribute 'iteritems'
I am clueless what to do about this. Hope you can help :)
Many thanks
SahanB
--
View this
> schemaRDD.first()
>
> my output is : Row( field1=5, field2='string', field3 = Row(a=1,b=None,c=2))
>
> in realty, I have 1000s of probable keys in field 3 and only 2 to 3 of them
> occur per record. So When tic converts to a Row, it generates thousands of
> None fi
per record. So When tic converts to a Row, it generates thousands of
None fields per record.
Is there anyways for me to store "field3" as a dictionary instead of
converting it into a Row in the schemaRDD??
--
View this message in context:
http://apache-spark-user-list.1001560.n3