Is the $"foo" or mydf("foo") or both checked at compile time to verify that the column reference is valid? Thx.
Dean On Wednesday, May 13, 2015, Michael Armbrust <mich...@databricks.com> wrote: > I would not say that either method is preferred (neither is > old/deprecated). One advantage to the second is that you are referencing a > column from a specific dataframe, instead of just providing a string that > will be resolved much like an identifier in a SQL query. > > This means given: > df1 = [id: int, name: string ....] > df2 = [id: int, zip: int] > > I can do something like: > > df1.join(df2, df1("id") === df2("id")) > > Where as I would need aliases if I was only using strings: > > df1.as("a").join(df2.as("b"), $"a.id" === $"b.id") > > On Wed, May 13, 2015 at 9:55 AM, Diana Carroll <dcarr...@cloudera.com > <javascript:_e(%7B%7D,'cvml','dcarr...@cloudera.com');>> wrote: > >> I'm just getting started with Spark SQL and DataFrames in 1.3.0. >> >> I notice that the Spark API shows a different syntax for referencing >> columns in a dataframe than the Spark SQL Programming Guide. >> >> For instance, the API docs for the select method show this: >> df.select($"colA", $"colB") >> >> >> Whereas the programming guide shows this: >> df.filter(df("name") > 21).show() >> >> I tested and both the $"column" and df(column) syntax works, but I'm >> wondering which is *preferred*. Is one the original and one a new >> feature we should be using? >> >> Thanks, >> Diana >> (Spark Curriculum Developer for Cloudera) >> > > -- Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com