How about the following ? scala> df.registerTempTable("df") scala> df1.registerTempTable("df1") scala> sql("select customer_id, uri, browser, epoch from df union select customer_id, uri, browser, epoch from df1").show() +-----------+-------------+-------+-----+ |customer_id| uri|browser|epoch| +-----------+-------------+-------+-----+ | 999|http://foobar|firefox| 1234| | 888|http://foobar| ie|12343| +-----------+-------------+-------+-----+
Cheers On Fri, Oct 30, 2015 at 12:11 PM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > Hi folks, > > I have a need to "append" two dataframes -- I was hoping to use UnionAll > but it seems that this operation treats the underlying dataframes as > sequence of columns, rather than a map. > > In particular, my problem is that the columns in the two DFs are not in > the same order --notice that my customer_id somehow comes out a string: > > This is Spark 1.4.1 > > case class Test(epoch: Long,browser:String,customer_id:Int,uri:String) > val test = Test(1234l,"firefox",999,"http://foobar") > > case class Test1( customer_id :Int, uri:String, browser:String, epoch > :Long) > val test1 = Test1(888,"http://foobar","ie",12343) > val df=sc.parallelize(Seq(test)).toDF > val df1=sc.parallelize(Seq(test1)).toDF > df.unionAll(df1) > > //res2: org.apache.spark.sql.DataFrame = [epoch: bigint, browser: string, > customer_id: string, uri: string] > > > > Is unionAll the wrong operation? Any special incantations? Or advice on > how to otherwise get this to succeeed? >