ing this in Java instead of scala.
> Note:- I am using spark 1.6.1 version.
>
> -----Original Message-
> From: Stuart White [mailto:stuart.whi...@gmail.com]
> Sent: Monday, November 28, 2016 10:26 AM
> To: Hitesh Goyal
> Cc: user@spark.apache.org
> Subject: Re: if conditi
Use the when() and otherwise() functions. For example:
import org.apache.spark.sql.functions._
val rows = Seq(("bob", 1), ("lucy", 2), ("pat", 3)).toDF("name", "genderCode")
rows.show
++--+
|name|genderCode|
++--+
| bob| 1|
|lucy| 2|
| pat| 3|
+--
Yes, that's what I was looking for. Thanks!
On Mon, Nov 21, 2016 at 6:56 PM, Michael Armbrust
wrote:
> You are looking for org.apache.spark.sql.functions.expr()
>
> On Sat, Nov 19, 2016 at 6:12 PM, Stuart White
> wrote:
>>
>> I'd like to allow for runtime-c
I'd like to allow for runtime-configured Column expressions in my
Spark SQL application. For example, if my application needs a 5-digit
zip code, but the file I'm processing contains a 9-digit zip code, I'd
like to be able to configure my application with the expression
"substring('zipCode, 0, 5)"
Is this what you're looking for?
val df = Seq(
(1, "A"),
(1, "B"),
(1, "C"),
(2, "D"),
(3, "E")
).toDF("foo", "bar")
val colList = Seq("foo", "bar")
df.sort(colList.map(col(_).desc): _*).show
+---+---+
|foo|bar|
+---+---+
| 3| E|
| 2| D|
| 1| C|
| 1| B|
| 1| A|
+---+---+
On
modifiedRows.show
+---+---+
|age| gender|
+---+---+
| 90| male|
| 80| female|
| 80|unknown|
+---+---+
On Thu, Nov 17, 2016 at 8:57 AM, Stuart White wrote:
> import org.apache.spark.sql.functions._
>
> val rows = Seq(("90s", 1), ("80s", 2), ("80s&q
import org.apache.spark.sql.functions._
val rows = Seq(("90s", 1), ("80s", 2), ("80s", 3)).toDF("age", "gender")
rows.show
+---+--+
|age|gender|
+---+--+
|90s| 1|
|80s| 2|
|80s| 3|
+---+--+
val modifiedRows
.select(
substring('age, 0, 2) as "age",
when('gender =
partitioned.
>
> Thanks,
> Silvio
> ------
> *From:* Stuart White
> *Sent:* Saturday, November 12, 2016 11:20:28 AM
> *To:* Silvio Fiorito
> *Cc:* user@spark.apache.org
> *Subject:* Re: Joining to a large, pre-sorted file
>
> Hi Silvio,
>
> Thanks very much
Thanks for the reply.
I understand that I need to use bucketBy() to write my master file,
but I still can't seem to make it work as expected. Here's a code
example for how I'm writing my master file:
Range(0, 100)
.map(i => (i, s"master_$i"))
.toDF("key", "value")
.write
.format("jso
dots. It seems like this functionality is pretty new so there aren't a
lot of examples available.
On Thu, Nov 10, 2016 at 7:33 PM, Jörn Franke wrote:
> Can you split the files beforehand in several files (e.g. By the column
> you do the join on?) ?
>
> On 10 Nov 2016, at 23:45, S
I have a large "master" file (~700m records) that I frequently join smaller
"transaction" files to. (The transaction files have 10's of millions of
records, so too large for a broadcast join).
I would like to pre-sort the master file, write it to disk, and then, in
subsequent jobs, read the file
11 matches
Mail list logo