I have know what is the right way to do it: val df = spark.read.parquet("/parquetdata/weixin/page/month=201607") val df2 = df.withColumn("pa_bid",when(isnull($"url"),"AAAA".split("#")(0)).otherwise(split(split(col("url"),"_biz=")(1), "&mid")(1))) scala> df2.select("pa_bid","url").show +----------------+--------------------+ | pa_bid| url| +----------------+--------------------+ |MjM5MjEyNTk2MA==|http://mp.weixin....| |MzAxODIwMDcwNA==|http://mp.weixin....| |MzIzMjQ4NzQwOA==|http://mp.weixin....| |MzAwOTIxMTcyMQ==|http://mp.weixin....| |MzA3OTAyNzY2OQ==|http://mp.weixin....| |MjM5NDAzMDAwMA==|http://mp.weixin....| |MzAwMjE4MzU0Nw==|http://mp.weixin....| |MzA4NzcyNjI0Mw==|http://mp.weixin....| |MzI5OTE5Nzc5Ng==|http://mp.weixin....|
2016-12-06 lk_spark 发件人:"lk_spark"<lk_sp...@163.com> 发送时间:2016-12-06 17:44 主题:Re: Re: how to add colum to dataframe 收件人:"Pankaj Wahane"<pankajwah...@live.com>,"user.spark"<user@spark.apache.org> 抄送: thanks for reply. I will search how to use na.fill . and I don't know how to get the value of the column and do some operation like substr or split. 2016-12-06 lk_spark 发件人:Pankaj Wahane <pankajwah...@live.com> 发送时间:2016-12-06 17:39 主题:Re: how to add colum to dataframe 收件人:"lk_spark"<lk_sp...@163.com>,"user.spark"<user@spark.apache.org> 抄送: You may want to try using df2.na.fill(…) From: lk_spark <lk_sp...@163.com> Date: Tuesday, 6 December 2016 at 3:05 PM To: "user.spark" <user@spark.apache.org> Subject: how to add colum to dataframe hi,all: my spark version is 2.0 I have a parquet file with one colum name url type is string,I wang get substring from the url and add it to the datafram: val df = spark.read.parquet("/parquetdata/weixin/page/month=201607") val df2 = df.withColumn("pa_bid",when($"url".isNull,col("url").substr(3, 5))) df2.select("pa_bid","url").show +------+--------------------+ |pa_bid| url| +------+--------------------+ | null|http://mp.weixin....| | null|http://mp.weixin....| | null|http://mp.weixin....| | null|http://mp.weixin....| | null|http://mp.weixin....| | null|http://mp.weixin....| | null|http://mp.weixin....| | null|http://mp.weixin....| Why what I got is null? 2016-12-06 lk_spark