df.filter(col("paid") > "").select(col("name1").as("newName"), ...)

On Wed, Mar 23, 2016 at 6:17 PM, Ashok Kumar <ashok34...@yahoo.com> wrote:

> Thank you again
>
> For
>
> val r = df.filter(col("paid") > "").map(x =>
> (x.getString(0),x.getString(1).....)
>
> Can you give an example of column expression please
>  like
>
> df.filter(col("paid") > "").col("firstcolumn").getString   ?....
>
>
>
>
> On Thursday, 24 March 2016, 0:45, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>
> You can only use as on a Column expression, not inside of a lambda
> function.  The reason is the lambda function is compiled into opaque
> bytecode that Spark SQL is not able to see.  We just blindly execute it.
>
> However, there are a couple of ways to name the columns that come out of a
> map.  Either use a case class instead of a tuple.  Or use .toDF("name1",
> "name2"....) after the map.
>
> From a performance perspective, its even better though if you can avoid
> maps and stick to Column expressions.  The reason is that for maps, we have
> to actually materialize and object to pass to your function.  However, if
> you stick to column expression we can actually work directly on serialized
> data.
>
> On Wed, Mar 23, 2016 at 5:27 PM, Ashok Kumar <ashok34...@yahoo.com> wrote:
>
> thank you sir
>
> sql("select `_1` as firstcolumn from items")
>
> is there anyway one can keep the csv column names using databricks when
> mapping
>
> val r = df.filter(col("paid") > "").map(x =>
> (x.getString(0),x.getString(1).....)
>
> can I call example  x.getString(0).as.(firstcolumn) in above when mapping
> if possible so columns will have labels
>
>
>
>
>
> On Thursday, 24 March 2016, 0:18, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>
> You probably need to use `backticks` to escape `_1` since I don't think
> that its a valid SQL identifier.
>
> On Wed, Mar 23, 2016 at 5:10 PM, Ashok Kumar <ashok34...@yahoo.com.invalid
> > wrote:
>
> Gurus,
>
> If I register a temporary table as below
>
>  r.toDF
> res58: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3:
> double, _4: double, _5: double]
>
> r.toDF.registerTempTable("items")
>
> sql("select * from items")
> res60: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3:
> double, _4: double, _5: double]
>
> Is there anyway I can do a select on the first column only
>
> sql("select _1 from items" throws error
>
> Thanking you
>
>
>
>
>
>
>
>

Reply via email to