from:"Stuart White"

Re: if conditions

2016-11-28 Thread Stuart White

ing this in Java instead of scala. > Note:- I am using spark 1.6.1 version. > > -----Original Message- > From: Stuart White [mailto:stuart.whi...@gmail.com] > Sent: Monday, November 28, 2016 10:26 AM > To: Hitesh Goyal > Cc: user@spark.apache.org > Subject: Re: if conditi

Re: if conditions

2016-11-27 Thread Stuart White

Use the when() and otherwise() functions. For example: import org.apache.spark.sql.functions._ val rows = Seq(("bob", 1), ("lucy", 2), ("pat", 3)).toDF("name", "genderCode") rows.show ++--+ |name|genderCode| ++--+ | bob| 1| |lucy| 2| | pat| 3| +--

Re: Create a Column expression from a String

2016-11-21 Thread Stuart White

Yes, that's what I was looking for. Thanks! On Mon, Nov 21, 2016 at 6:56 PM, Michael Armbrust wrote: > You are looking for org.apache.spark.sql.functions.expr() > > On Sat, Nov 19, 2016 at 6:12 PM, Stuart White > wrote: >> >> I'd like to allow for runtime-c

Create a Column expression from a String

2016-11-19 Thread Stuart White

I'd like to allow for runtime-configured Column expressions in my Spark SQL application. For example, if my application needs a 5-digit zip code, but the file I'm processing contains a 9-digit zip code, I'd like to be able to configure my application with the expression "substring('zipCode, 0, 5)"

Re: sort descending with multiple columns

2016-11-18 Thread Stuart White

Is this what you're looking for? val df = Seq( (1, "A"), (1, "B"), (1, "C"), (2, "D"), (3, "E") ).toDF("foo", "bar") val colList = Seq("foo", "bar") df.sort(colList.map(col(_).desc): _*).show +---+---+ |foo|bar| +---+---+ | 3| E| | 2| D| | 1| C| | 1| B| | 1| A| +---+---+ On

Re: Best practice for preprocessing feature with DataFrame

2016-11-17 Thread Stuart White

modifiedRows.show +---+---+ |age| gender| +---+---+ | 90| male| | 80| female| | 80|unknown| +---+---+ On Thu, Nov 17, 2016 at 8:57 AM, Stuart White wrote: > import org.apache.spark.sql.functions._ > > val rows = Seq(("90s", 1), ("80s", 2), ("80s&q

Re: Best practice for preprocessing feature with DataFrame

2016-11-17 Thread Stuart White

import org.apache.spark.sql.functions._ val rows = Seq(("90s", 1), ("80s", 2), ("80s", 3)).toDF("age", "gender") rows.show +---+--+ |age|gender| +---+--+ |90s| 1| |80s| 2| |80s| 3| +---+--+ val modifiedRows .select( substring('age, 0, 2) as "age", when('gender =

Re: Joining to a large, pre-sorted file

2016-11-15 Thread Stuart White

partitioned. > > Thanks, > Silvio > ------ > *From:* Stuart White > *Sent:* Saturday, November 12, 2016 11:20:28 AM > *To:* Silvio Fiorito > *Cc:* user@spark.apache.org > *Subject:* Re: Joining to a large, pre-sorted file > > Hi Silvio, > > Thanks very much

Re: Joining to a large, pre-sorted file

2016-11-12 Thread Stuart White

Thanks for the reply. I understand that I need to use bucketBy() to write my master file, but I still can't seem to make it work as expected. Here's a code example for how I'm writing my master file: Range(0, 100) .map(i => (i, s"master_$i")) .toDF("key", "value") .write .format("jso

Re: Joining to a large, pre-sorted file

2016-11-10 Thread Stuart White

dots. It seems like this functionality is pretty new so there aren't a lot of examples available. On Thu, Nov 10, 2016 at 7:33 PM, Jörn Franke wrote: > Can you split the files beforehand in several files (e.g. By the column > you do the join on?) ? > > On 10 Nov 2016, at 23:45, S

Joining to a large, pre-sorted file

2016-11-10 Thread Stuart White

I have a large "master" file (~700m records) that I frequently join smaller "transaction" files to. (The transaction files have 10's of millions of records, so too large for a broadcast join). I would like to pre-sort the master file, write it to disk, and then, in subsequent jobs, read the file

Re: if conditions

Re: if conditions

Re: Create a Column expression from a String

Create a Column expression from a String

Re: sort descending with multiple columns

Re: Best practice for preprocessing feature with DataFrame

Re: Best practice for preprocessing feature with DataFrame

Re: Joining to a large, pre-sorted file

Re: Joining to a large, pre-sorted file

Re: Joining to a large, pre-sorted file

Joining to a large, pre-sorted file

11 matches

Site Navigation

Mail list logo

Footer information