Re: how to merge two dataframes

Yana Kadiyska Fri, 30 Oct 2015 12:57:32 -0700

Not a bad idea I suspect but doesn't help me. I dumbed down the repro to
ask for help. In reality one of my dataframes is a cassandra DF.
So cassDF.registerTempTable("df1") registers the temp table in a different
SQL Context (new CassandraSQLContext(sc)).



scala> sql("select customer_id, uri, browser, epoch from df union all
select customer_id, uri, browser, epoch from df1").show()
org.apache.spark.sql.AnalysisException: no such table df1; line 1 pos 103
        at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:225)
        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:233)
        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:229)
        at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
        at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222)
        at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
        at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221)
        at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:242)


On Fri, Oct 30, 2015 at 3:34 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> How about the following ?
>
> scala> df.registerTempTable("df")
> scala> df1.registerTempTable("df1")
> scala> sql("select customer_id, uri, browser, epoch from df union select
> customer_id, uri, browser, epoch from df1").show()
> +-----------+-------------+-------+-----+
> |customer_id|          uri|browser|epoch|
> +-----------+-------------+-------+-----+
> |        999|http://foobar|firefox| 1234|
> |        888|http://foobar|     ie|12343|
> +-----------+-------------+-------+-----+
>
> Cheers
>
> On Fri, Oct 30, 2015 at 12:11 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
> wrote:
>
>> Hi folks,
>>
>> I have a need to "append" two dataframes -- I was hoping to use UnionAll
>> but it seems that this operation treats the underlying dataframes as
>> sequence of columns, rather than a map.
>>
>> In particular, my problem is that the columns in the two DFs are not in
>> the same order --notice that my customer_id somehow comes out a string:
>>
>> This is Spark 1.4.1
>>
>> case class Test(epoch: Long,browser:String,customer_id:Int,uri:String)
>> val test = Test(1234l,"firefox",999,"http://foobar";)
>>
>> case class Test1( customer_id :Int,    uri:String,    browser:String,   
>> epoch :Long)
>> val test1 = Test1(888,"http://foobar","ie",12343)
>> val df=sc.parallelize(Seq(test)).toDF
>> val df1=sc.parallelize(Seq(test1)).toDF
>> df.unionAll(df1)
>>
>> //res2: org.apache.spark.sql.DataFrame = [epoch: bigint, browser: string, 
>> customer_id: string, uri: string]
>>
>> 
>>
>> Is unionAll the wrong operation? Any special incantations? Or advice on
>> how to otherwise get this to succeeed?
>>
>
>

Re: how to merge two dataframes

Reply via email to