Re: how to create List in pyspark

2017-04-28 Thread Felix Cheung
Why no use sql functions explode and split?
Would perform and be more stable then udf


From: Yanbo Liang <yblia...@gmail.com>
Sent: Thursday, April 27, 2017 7:34:54 AM
To: Selvam Raman
Cc: user
Subject: Re: how to create List in pyspark

​You can try with UDF, like the following code snippet:

from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StringType
df = spark.read.text("./README.md")​
split_func = udf(lambda text: text.split(" "), ArrayType(StringType()))
df.withColumn("split_value", split_func("value")).show()

Thanks
Yanbo

On Tue, Apr 25, 2017 at 12:27 AM, Selvam Raman 
<sel...@gmail.com<mailto:sel...@gmail.com>> wrote:

documentDF = spark.createDataFrame([

("Hi I heard about Spark".split(" "), ),

("I wish Java could use case classes".split(" "), ),

("Logistic regression models are neat".split(" "), )

], ["text"])


How can i achieve the same df while i am reading from source?

doc = spark.read.text("/Users/rs/Desktop/nohup.out")

how can i create array type with "sentences" column from doc(dataframe)


The below one creates more than one column.

rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" "))

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"



Re: how to create List in pyspark

2017-04-27 Thread Yanbo Liang
​You can try with UDF, like the following code snippet:

from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StringType
df = spark.read.text("./README.md")​
split_func = udf(lambda text: text.split(" "), ArrayType(StringType()))
df.withColumn("split_value", split_func("value")).show()

Thanks
Yanbo

On Tue, Apr 25, 2017 at 12:27 AM, Selvam Raman  wrote:

> documentDF = spark.createDataFrame([
>
> ("Hi I heard about Spark".split(" "), ),
>
> ("I wish Java could use case classes".split(" "), ),
>
> ("Logistic regression models are neat".split(" "), )
>
> ], ["text"])
>
>
> How can i achieve the same df while i am reading from source?
>
> doc = spark.read.text("/Users/rs/Desktop/nohup.out")
>
> how can i create array type with "sentences" column from
> doc(dataframe)
>
>
> The below one creates more than one column.
>
> rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" "))
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>


how to create List in pyspark

2017-04-24 Thread Selvam Raman
documentDF = spark.createDataFrame([

("Hi I heard about Spark".split(" "), ),

("I wish Java could use case classes".split(" "), ),

("Logistic regression models are neat".split(" "), )

], ["text"])


How can i achieve the same df while i am reading from source?

doc = spark.read.text("/Users/rs/Desktop/nohup.out")

how can i create array type with "sentences" column from
doc(dataframe)


The below one creates more than one column.

rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" "))

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"