Thanks, White. On Thu, Nov 17, 2016 at 11:15 PM, Stuart White <stuart.whi...@gmail.com> wrote:
> Sorry. Small typo. That last part should be: > > val modifiedRows = rows > .select( > substring('age, 0, 2) as "age", > when('gender === 1, "male").otherwise(when('gender === 2, > "female").otherwise("unknown")) as "gender" > ) > modifiedRows.show > > +---+-------+ > |age| gender| > +---+-------+ > | 90| male| > | 80| female| > | 80|unknown| > +---+-------+ > > On Thu, Nov 17, 2016 at 8:57 AM, Stuart White <stuart.whi...@gmail.com> > wrote: > > import org.apache.spark.sql.functions._ > > > > val rows = Seq(("90s", 1), ("80s", 2), ("80s", 3)).toDF("age", "gender") > > rows.show > > > > +---+------+ > > |age|gender| > > +---+------+ > > |90s| 1| > > |80s| 2| > > |80s| 3| > > +---+------+ > > > > val modifiedRows > > .select( > > substring('age, 0, 2) as "age", > > when('gender === 1, "male").otherwise(when('gender === 2, > > "female").otherwise("unknown")) as "gender" > > ) > > modifiedRows.show > > > > +---+-------+ > > |age| gender| > > +---+-------+ > > | 90| male| > > | 80| female| > > | 80|unknown| > > +---+-------+ > > > > On Thu, Nov 17, 2016 at 3:37 AM, 颜发才(Yan Facai) <yaf...@gmail.com> > wrote: > >> Could you give me an example, how to use Column function? > >> Thanks very much. > >> > >> On Thu, Nov 17, 2016 at 12:23 PM, Divya Gehlot <divya.htco...@gmail.com > > > >> wrote: > >>> > >>> Hi, > >>> > >>> You can use the Column functions provided by Spark API > >>> > >>> > >>> https://spark.apache.org/docs/1.6.2/api/java/org/apache/ > spark/sql/functions.html > >>> > >>> Hope this helps . > >>> > >>> Thanks, > >>> Divya > >>> > >>> > >>> On 17 November 2016 at 12:08, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > >>>> > >>>> Hi, > >>>> I have a sample, like: > >>>> +---+------+--------------------+ > >>>> |age|gender| city_id| > >>>> +---+------+--------------------+ > >>>> | | 1|1042015:city_2044...| > >>>> |90s| 2|1042015:city_2035...| > >>>> |80s| 2|1042015:city_2061...| > >>>> +---+------+--------------------+ > >>>> > >>>> and expectation is: > >>>> "age": 90s -> 90, 80s -> 80 > >>>> "gender": 1 -> "male", 2 -> "female" > >>>> > >>>> I have two solutions: > >>>> 1. Handle each column separately, and then join all by index. > >>>> val age = input.select("age").map(...) > >>>> val gender = input.select("gender").map(...) > >>>> val result = ... > >>>> > >>>> 2. Write utf function for each column, and then use in together: > >>>> val result = input.select(ageUDF($"age"), genderUDF($"gender")) > >>>> > >>>> However, both are awkward, > >>>> > >>>> Does anyone have a better work flow? > >>>> Write some custom Transforms and use pipeline? > >>>> > >>>> Thanks. > >>>> > >>>> > >>>> > >>> > >> >