df.filter(col("paid") > "").select(col("name1").as("newName"), ...)
On Wed, Mar 23, 2016 at 6:17 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: > Thank you again > > For > > val r = df.filter(col("paid") > "").map(x => > (x.getString(0),x.getString(1).....) > > Can you give an example of column expression please > like > > df.filter(col("paid") > "").col("firstcolumn").getString ?.... > > > > > On Thursday, 24 March 2016, 0:45, Michael Armbrust <mich...@databricks.com> > wrote: > > > You can only use as on a Column expression, not inside of a lambda > function. The reason is the lambda function is compiled into opaque > bytecode that Spark SQL is not able to see. We just blindly execute it. > > However, there are a couple of ways to name the columns that come out of a > map. Either use a case class instead of a tuple. Or use .toDF("name1", > "name2"....) after the map. > > From a performance perspective, its even better though if you can avoid > maps and stick to Column expressions. The reason is that for maps, we have > to actually materialize and object to pass to your function. However, if > you stick to column expression we can actually work directly on serialized > data. > > On Wed, Mar 23, 2016 at 5:27 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: > > thank you sir > > sql("select `_1` as firstcolumn from items") > > is there anyway one can keep the csv column names using databricks when > mapping > > val r = df.filter(col("paid") > "").map(x => > (x.getString(0),x.getString(1).....) > > can I call example x.getString(0).as.(firstcolumn) in above when mapping > if possible so columns will have labels > > > > > > On Thursday, 24 March 2016, 0:18, Michael Armbrust <mich...@databricks.com> > wrote: > > > You probably need to use `backticks` to escape `_1` since I don't think > that its a valid SQL identifier. > > On Wed, Mar 23, 2016 at 5:10 PM, Ashok Kumar <ashok34...@yahoo.com.invalid > > wrote: > > Gurus, > > If I register a temporary table as below > > r.toDF > res58: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: > double, _4: double, _5: double] > > r.toDF.registerTempTable("items") > > sql("select * from items") > res60: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: > double, _4: double, _5: double] > > Is there anyway I can do a select on the first column only > > sql("select _1 from items" throws error > > Thanking you > > > > > > > >