Let's assume K is String, and V is Integer,
schema = StructType([StructField("K", StringType(), True), StructField("V",
IntegerType(), True)])
df = sqlContext.createDataFrame(rdd, schema=schema)
udf1 = udf(lambda x: [x], ArrayType(IntegerType()))
df1 = df.select("K", udf1("V").alias("arrayV"))
df1.show()On Tue, Apr 19, 2016 at 12:51 PM, pth001 <[email protected]> wrote: > Hi, > > How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in > Pyspark? > > Best, > Patcharee > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Wei Chen, Ph.D. Astronomer and Data Scientist Phone: (832)646-7124 Email: [email protected] LinkedIn: https://www.linkedin.com/in/weichen1984
