Re: Spark SQL: preferred syntax for column reference?

Dean Wampler Wed, 13 May 2015 19:44:31 -0700

Is the $"foo" or mydf("foo") or both checked at compile time to verify that
the column reference is valid? Thx.


Dean

On Wednesday, May 13, 2015, Michael Armbrust <mich...@databricks.com> wrote:

> I would not say that either method is preferred (neither is
> old/deprecated).  One advantage to the second is that you are referencing a
> column from a specific dataframe, instead of just providing a string that
> will be resolved much like an identifier in a SQL query.
>
> This means given:
> df1 = [id: int, name: string ....]
> df2 = [id: int, zip: int]
>
> I can do something like:
>
> df1.join(df2, df1("id") === df2("id"))
>
> Where as I would need aliases if I was only using strings:
>
> df1.as("a").join(df2.as("b"), $"a.id" === $"b.id")
>
> On Wed, May 13, 2015 at 9:55 AM, Diana Carroll <dcarr...@cloudera.com
> <javascript:_e(%7B%7D,'cvml','dcarr...@cloudera.com');>> wrote:
>
>> I'm just getting started with Spark SQL and DataFrames in 1.3.0.
>>
>> I notice that the Spark API shows a different syntax for referencing
>> columns in a dataframe than the Spark SQL Programming Guide.
>>
>> For instance, the API docs for the select method show this:
>> df.select($"colA", $"colB")
>>
>>
>> Whereas the programming guide shows this:
>> df.filter(df("name") > 21).show()
>>
>> I tested and both the $"column" and df(column) syntax works, but I'm
>> wondering which is *preferred*.  Is one the original and one a new
>> feature we should be using?
>>
>> Thanks,
>> Diana
>> (Spark Curriculum Developer for Cloudera)
>>
>
>

-- 
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

Re: Spark SQL: preferred syntax for column reference?

Reply via email to