Efe - You probably hit this bug: https://issues.apache.org/jira/browse/SPARK-18058

On 10/21/16 2:03 AM, Agraj Mangal wrote:
I have seen this error sometimes when the elements in the schema have different nullabilities. Could you print the schema for data and for someCode.thatReturnsADataset() and see if there is any difference between the two ?

On Fri, Oct 21, 2016 at 9:14 AM, Efe Selcuk <efema...@gmail.com <mailto:efema...@gmail.com>> wrote:

    Thanks for the response. What do you mean by "semantically" the
    same? They're both Datasets of the same type, which is a case
    class, so I would expect compile-time integrity of the data. Is
    there a situation where this wouldn't be the case?

    Interestingly enough, if I instead create an empty rdd with
    sparkContext.emptyRDD of the same case class type, it works!

    So something like:
    var data = spark.sparkContext.emptyRDD[SomeData]

    // loop
    data = data.union(someCode.thatReturnsADataset().rdd)
    // end loop

    data.toDS //so I can union it to the actual Dataset I have elsewhere

    On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangal <agraj....@gmail.com
    <mailto:agraj....@gmail.com>> wrote:

        I believe this normally comes when Spark is unable to perform
        union due to "difference" in schema of the operands. Can you
        check if the schema of both the datasets are semantically same ?

        On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk
        <efema...@gmail.com <mailto:efema...@gmail.com>> wrote:

            Bump!

            On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk
            <efema...@gmail.com <mailto:efema...@gmail.com>> wrote:

                I have a use case where I want to build a dataset
                based off of conditionally available data. I thought
                I'd do something like this:

                case class SomeData( ... ) // parameters are basic
                encodable types like strings and BigDecimals

                var data = spark.emptyDataset[SomeData]

                // loop, determining what data to ingest and process
                into datasets
                  data = data.union(someCode.thatReturnsADataset)
                // end loop

                However I get a runtime exception:

                Exception in thread "main"
                org.apache.spark.sql.AnalysisException: unresolved
                operator 'Union;
                        at
                
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
                        at
                
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58)
                        at
                
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361)
                        at
                
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
                        at
                
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
                        at
                
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
                        at
                
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58)
                        at
                
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
                        at
                org.apache.spark.sql.Dataset.<init>(Dataset.scala:161)
                        at
                org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
                        at
                org.apache.spark.sql.Dataset$.apply(Dataset.scala:59)
                        at
                org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
                        at
                org.apache.spark.sql.Dataset.union(Dataset.scala:1459)

                Granted, I'm new at Spark so this might be an
                anti-pattern, so I'm open to suggestions. However it
                doesn't seem like I'm doing anything incorrect here,
                the types are correct. Searching for this error online
                returns results seemingly about working in dataframes
                and having mismatching schemas or a different order of
                fields, and it seems like bugfixes have gone into
                place for those cases.

                Thanks in advance.
                Efe




-- Thanks & Regards,
        Agraj Mangal




--
Thanks & Regards,
Agraj Mangal

Reply via email to