from:"dsgriffin"

Re: Nullable is true for the schema of parquet data

2015-05-10 Thread dsgriffin

Ran into this same issue. Only solution seems to be to coerce the DataFrame's schema back into the right state. Looks like you have to convert the DF to an RDD, which has an overhead. But otherwise this worked for me: val newDF = sqlContext.createDataFrame(origDF.rdd, new

Re: How to add a column to a spark RDD with many columns?

2015-05-02 Thread dsgriffin

val newRdd = myRdd.map(row = row ++ Array((row(1).toLong * row(199).toLong).toString)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22735.html Sent from the Apache Spark User List mailing list

Re: Drop a column from the DataFrame.

2015-05-02 Thread dsgriffin

Just use select() to create a new DataFrame with only the columns you want. Sort of the opposite of what you want -- but you can select all but the columns you want minus the one you don. You could even use a filter to remove just the one column you want on the fly:

Re: RDD.filter vs. RDD.join--advice please

2015-04-22 Thread dsgriffin

Test it out, but I would be willing to bet the join is going to be a good deal faster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-filter-vs-RDD-join-advice-please-tp22612p22614.html Sent from the Apache Spark User List mailing list archive at

Re: Nullable is true for the schema of parquet data

Re: How to add a column to a spark RDD with many columns?

Re: Drop a column from the DataFrame.

Re: RDD.filter vs. RDD.join--advice please

4 matches

Site Navigation

Mail list logo

Footer information