Re: [Spark 2.0.0] error when unioning to an empty dataset
All right, I looked at the schemas. There is one mismatching nullability, on a scala.Boolean. It looks like an empty Dataset with that *cannot* be nullable. However, when I run my code to generate the Dataset, the schema comes back with nullable = true. Effectively: scala> val empty = spark.createDataset[SomeClass] scala> empty.printSchema root |-- aCaseClass: struct (nullable = true) ||-- aBool: boolean (nullable = false) scala> val data = // Dataset#flatMap that returns a Dataset[SomeClass] scala> data.printSchema root |-- aCaseClass: struct (nullable = true) ||-- aBool: boolean (nullable = true) scala> empty.union(data) org.apache.spark.sql.AnalysisException: unresolved operator 'Union; If I switch the Boolean to a java.lang.Boolean, I get nullable = true in the empty schema and the union starts working. 1) Is there a fix for this that I can do without jumping through hoops? I don't know of the implications to switching to java.lang.Boolean. 2) It looks like this is probably the issue that these PRs fix: https://github.com/apache/spark/pull/15595 and https://github.com/apache/spark/pull/15602 Is there a timeline for 2.0.2? I'm in a situation where I can't easily build from source. On Mon, Oct 24, 2016 at 12:29 PM Cheng Lianwrote: > > > On 10/22/16 1:42 PM, Efe Selcuk wrote: > > Ah, looks similar. Next opportunity I get, I'm going to do a printSchema > on the two datasets and see if they don't match up. > > I assume that unioning the underlying RDDs doesn't run into this problem > because of less type checking or something along those lines? > > Exactly. > > > On Fri, Oct 21, 2016 at 3:39 PM Cheng Lian wrote: > > Efe - You probably hit this bug: > https://issues.apache.org/jira/browse/SPARK-18058 > > On 10/21/16 2:03 AM, Agraj Mangal wrote: > > I have seen this error sometimes when the elements in the schema have > different nullabilities. Could you print the schema for data and for > someCode.thatReturnsADataset() and see if there is any difference between > the two ? > > On Fri, Oct 21, 2016 at 9:14 AM, Efe Selcuk wrote: > > Thanks for the response. What do you mean by "semantically" the same? > They're both Datasets of the same type, which is a case class, so I would > expect compile-time integrity of the data. Is there a situation where this > wouldn't be the case? > > Interestingly enough, if I instead create an empty rdd with > sparkContext.emptyRDD of the same case class type, it works! > > So something like: > var data = spark.sparkContext.emptyRDD[SomeData] > > // loop > data = data.union(someCode.thatReturnsADataset().rdd) > // end loop > > data.toDS //so I can union it to the actual Dataset I have elsewhere > > On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangal wrote: > > I believe this normally comes when Spark is unable to perform union due to > "difference" in schema of the operands. Can you check if the schema of both > the datasets are semantically same ? > > On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk wrote: > > Bump! > > On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk wrote: > > I have a use case where I want to build a dataset based off of > conditionally available data. I thought I'd do something like this: > > case class SomeData( ... ) // parameters are basic encodable types like > strings and BigDecimals > > var data = spark.emptyDataset[SomeData] > > // loop, determining what data to ingest and process into datasets > data = data.union(someCode.thatReturnsADataset) > // end loop > > However I get a runtime exception: > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > unresolved operator 'Union; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) > at org.apache.spark.sql.Dataset.(Dataset.scala:161) > at org.apache.spark.sql.Dataset.(Dataset.scala:167) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) > at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) > > Granted, I'm new at Spark so this might be an
Re: [Spark 2.0.0] error when unioning to an empty dataset
On 10/22/16 1:42 PM, Efe Selcuk wrote: Ah, looks similar. Next opportunity I get, I'm going to do a printSchema on the two datasets and see if they don't match up. I assume that unioning the underlying RDDs doesn't run into this problem because of less type checking or something along those lines? Exactly. On Fri, Oct 21, 2016 at 3:39 PM Cheng Lian> wrote: Efe - You probably hit this bug: https://issues.apache.org/jira/browse/SPARK-18058 On 10/21/16 2:03 AM, Agraj Mangal wrote: I have seen this error sometimes when the elements in the schema have different nullabilities. Could you print the schema for data and for someCode.thatReturnsADataset() and see if there is any difference between the two ? On Fri, Oct 21, 2016 at 9:14 AM, Efe Selcuk > wrote: Thanks for the response. What do you mean by "semantically" the same? They're both Datasets of the same type, which is a case class, so I would expect compile-time integrity of the data. Is there a situation where this wouldn't be the case? Interestingly enough, if I instead create an empty rdd with sparkContext.emptyRDD of the same case class type, it works! So something like: var data = spark.sparkContext.emptyRDD[SomeData] // loop data = data.union(someCode.thatReturnsADataset().rdd) // end loop data.toDS //so I can union it to the actual Dataset I have elsewhere On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangal > wrote: I believe this normally comes when Spark is unable to perform union due to "difference" in schema of the operands. Can you check if the schema of both the datasets are semantically same ? On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk > wrote: Bump! On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk > wrote: I have a use case where I want to build a dataset based off of conditionally available data. I thought I'd do something like this: case class SomeData( ... ) // parameters are basic encodable types like strings and BigDecimals var data = spark.emptyDataset[SomeData] // loop, determining what data to ingest and process into datasets data = data.union(someCode.thatReturnsADataset) // end loop However I get a runtime exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset.(Dataset.scala:161) at org.apache.spark.sql.Dataset.(Dataset.scala:167) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to suggestions. However it doesn't seem like I'm doing anything incorrect here, the types are correct. Searching for this
Re: [Spark 2.0.0] error when unioning to an empty dataset
Ah, looks similar. Next opportunity I get, I'm going to do a printSchema on the two datasets and see if they don't match up. I assume that unioning the underlying RDDs doesn't run into this problem because of less type checking or something along those lines? On Fri, Oct 21, 2016 at 3:39 PM Cheng Lianwrote: > Efe - You probably hit this bug: > https://issues.apache.org/jira/browse/SPARK-18058 > > On 10/21/16 2:03 AM, Agraj Mangal wrote: > > I have seen this error sometimes when the elements in the schema have > different nullabilities. Could you print the schema for data and for > someCode.thatReturnsADataset() and see if there is any difference between > the two ? > > On Fri, Oct 21, 2016 at 9:14 AM, Efe Selcuk wrote: > > Thanks for the response. What do you mean by "semantically" the same? > They're both Datasets of the same type, which is a case class, so I would > expect compile-time integrity of the data. Is there a situation where this > wouldn't be the case? > > Interestingly enough, if I instead create an empty rdd with > sparkContext.emptyRDD of the same case class type, it works! > > So something like: > var data = spark.sparkContext.emptyRDD[SomeData] > > // loop > data = data.union(someCode.thatReturnsADataset().rdd) > // end loop > > data.toDS //so I can union it to the actual Dataset I have elsewhere > > On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangal wrote: > > I believe this normally comes when Spark is unable to perform union due to > "difference" in schema of the operands. Can you check if the schema of both > the datasets are semantically same ? > > On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk wrote: > > Bump! > > On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk wrote: > > I have a use case where I want to build a dataset based off of > conditionally available data. I thought I'd do something like this: > > case class SomeData( ... ) // parameters are basic encodable types like > strings and BigDecimals > > var data = spark.emptyDataset[SomeData] > > // loop, determining what data to ingest and process into datasets > data = data.union(someCode.thatReturnsADataset) > // end loop > > However I get a runtime exception: > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > unresolved operator 'Union; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) > at org.apache.spark.sql.Dataset.(Dataset.scala:161) > at org.apache.spark.sql.Dataset.(Dataset.scala:167) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) > at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) > > Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to > suggestions. However it doesn't seem like I'm doing anything incorrect > here, the types are correct. Searching for this error online returns > results seemingly about working in dataframes and having mismatching > schemas or a different order of fields, and it seems like bugfixes have > gone into place for those cases. > > Thanks in advance. > Efe > > > > > -- > Thanks & Regards, > Agraj Mangal > > > > > -- > Thanks & Regards, > Agraj Mangal > > >
Re: [Spark 2.0.0] error when unioning to an empty dataset
Efe - You probably hit this bug: https://issues.apache.org/jira/browse/SPARK-18058 On 10/21/16 2:03 AM, Agraj Mangal wrote: I have seen this error sometimes when the elements in the schema have different nullabilities. Could you print the schema for data and for someCode.thatReturnsADataset() and see if there is any difference between the two ? On Fri, Oct 21, 2016 at 9:14 AM, Efe Selcuk> wrote: Thanks for the response. What do you mean by "semantically" the same? They're both Datasets of the same type, which is a case class, so I would expect compile-time integrity of the data. Is there a situation where this wouldn't be the case? Interestingly enough, if I instead create an empty rdd with sparkContext.emptyRDD of the same case class type, it works! So something like: var data = spark.sparkContext.emptyRDD[SomeData] // loop data = data.union(someCode.thatReturnsADataset().rdd) // end loop data.toDS //so I can union it to the actual Dataset I have elsewhere On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangal > wrote: I believe this normally comes when Spark is unable to perform union due to "difference" in schema of the operands. Can you check if the schema of both the datasets are semantically same ? On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk > wrote: Bump! On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk > wrote: I have a use case where I want to build a dataset based off of conditionally available data. I thought I'd do something like this: case class SomeData( ... ) // parameters are basic encodable types like strings and BigDecimals var data = spark.emptyDataset[SomeData] // loop, determining what data to ingest and process into datasets data = data.union(someCode.thatReturnsADataset) // end loop However I get a runtime exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset.(Dataset.scala:161) at org.apache.spark.sql.Dataset.(Dataset.scala:167) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to suggestions. However it doesn't seem like I'm doing anything incorrect here, the types are correct. Searching for this error online returns results seemingly about working in dataframes and having mismatching schemas or a different order of fields, and it seems like bugfixes have gone into place for those cases. Thanks in advance. Efe -- Thanks & Regards, Agraj Mangal -- Thanks & Regards, Agraj Mangal
Re: [Spark 2.0.0] error when unioning to an empty dataset
I have seen this error sometimes when the elements in the schema have different nullabilities. Could you print the schema for data and for someCode.thatReturnsADataset() and see if there is any difference between the two ? On Fri, Oct 21, 2016 at 9:14 AM, Efe Selcukwrote: > Thanks for the response. What do you mean by "semantically" the same? > They're both Datasets of the same type, which is a case class, so I would > expect compile-time integrity of the data. Is there a situation where this > wouldn't be the case? > > Interestingly enough, if I instead create an empty rdd with > sparkContext.emptyRDD of the same case class type, it works! > > So something like: > var data = spark.sparkContext.emptyRDD[SomeData] > > // loop > data = data.union(someCode.thatReturnsADataset().rdd) > // end loop > > data.toDS //so I can union it to the actual Dataset I have elsewhere > > On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangal wrote: > > I believe this normally comes when Spark is unable to perform union due to > "difference" in schema of the operands. Can you check if the schema of both > the datasets are semantically same ? > > On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk wrote: > > Bump! > > On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk wrote: > > I have a use case where I want to build a dataset based off of > conditionally available data. I thought I'd do something like this: > > case class SomeData( ... ) // parameters are basic encodable types like > strings and BigDecimals > > var data = spark.emptyDataset[SomeData] > > // loop, determining what data to ingest and process into datasets > data = data.union(someCode.thatReturnsADataset) > // end loop > > However I get a runtime exception: > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > unresolved operator 'Union; > at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class. > failAnalysis(CheckAnalysis.scala:40) > at org.apache.spark.sql.catalyst.analysis.Analyzer. > failAnalysis(Analyzer.scala:58) > at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$ > anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) > at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$ > anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) > at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp( > TreeNode.scala:126) > at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class. > checkAnalysis(CheckAnalysis.scala:67) > at org.apache.spark.sql.catalyst.analysis.Analyzer. > checkAnalysis(Analyzer.scala:58) > at org.apache.spark.sql.execution.QueryExecution. > assertAnalyzed(QueryExecution.scala:49) > at org.apache.spark.sql.Dataset.(Dataset.scala:161) > at org.apache.spark.sql.Dataset.(Dataset.scala:167) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) > at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) > > Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to > suggestions. However it doesn't seem like I'm doing anything incorrect > here, the types are correct. Searching for this error online returns > results seemingly about working in dataframes and having mismatching > schemas or a different order of fields, and it seems like bugfixes have > gone into place for those cases. > > Thanks in advance. > Efe > > > > > -- > Thanks & Regards, > Agraj Mangal > > -- Thanks & Regards, Agraj Mangal
Re: [Spark 2.0.0] error when unioning to an empty dataset
Thanks for the response. What do you mean by "semantically" the same? They're both Datasets of the same type, which is a case class, so I would expect compile-time integrity of the data. Is there a situation where this wouldn't be the case? Interestingly enough, if I instead create an empty rdd with sparkContext.emptyRDD of the same case class type, it works! So something like: var data = spark.sparkContext.emptyRDD[SomeData] // loop data = data.union(someCode.thatReturnsADataset().rdd) // end loop data.toDS //so I can union it to the actual Dataset I have elsewhere On Thu, Oct 20, 2016 at 8:34 PM Agraj Mangalwrote: I believe this normally comes when Spark is unable to perform union due to "difference" in schema of the operands. Can you check if the schema of both the datasets are semantically same ? On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcuk wrote: Bump! On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk wrote: I have a use case where I want to build a dataset based off of conditionally available data. I thought I'd do something like this: case class SomeData( ... ) // parameters are basic encodable types like strings and BigDecimals var data = spark.emptyDataset[SomeData] // loop, determining what data to ingest and process into datasets data = data.union(someCode.thatReturnsADataset) // end loop However I get a runtime exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset.(Dataset.scala:161) at org.apache.spark.sql.Dataset.(Dataset.scala:167) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to suggestions. However it doesn't seem like I'm doing anything incorrect here, the types are correct. Searching for this error online returns results seemingly about working in dataframes and having mismatching schemas or a different order of fields, and it seems like bugfixes have gone into place for those cases. Thanks in advance. Efe -- Thanks & Regards, Agraj Mangal
Re: [Spark 2.0.0] error when unioning to an empty dataset
I believe this normally comes when Spark is unable to perform union due to "difference" in schema of the operands. Can you check if the schema of both the datasets are semantically same ? On Tue, Oct 18, 2016 at 9:06 AM, Efe Selcukwrote: > Bump! > > On Thu, Oct 13, 2016 at 8:25 PM Efe Selcuk wrote: > >> I have a use case where I want to build a dataset based off of >> conditionally available data. I thought I'd do something like this: >> >> case class SomeData( ... ) // parameters are basic encodable types like >> strings and BigDecimals >> >> var data = spark.emptyDataset[SomeData] >> >> // loop, determining what data to ingest and process into datasets >> data = data.union(someCode.thatReturnsADataset) >> // end loop >> >> However I get a runtime exception: >> >> Exception in thread "main" org.apache.spark.sql.AnalysisException: >> unresolved operator 'Union; >> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class. >> failAnalysis(CheckAnalysis.scala:40) >> at org.apache.spark.sql.catalyst.analysis.Analyzer. >> failAnalysis(Analyzer.scala:58) >> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$ >> anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) >> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$ >> anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) >> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp( >> TreeNode.scala:126) >> at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class. >> checkAnalysis(CheckAnalysis.scala:67) >> at org.apache.spark.sql.catalyst.analysis.Analyzer. >> checkAnalysis(Analyzer.scala:58) >> at org.apache.spark.sql.execution.QueryExecution. >> assertAnalyzed(QueryExecution.scala:49) >> at org.apache.spark.sql.Dataset.(Dataset.scala:161) >> at org.apache.spark.sql.Dataset.(Dataset.scala:167) >> at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) >> at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) >> at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) >> >> Granted, I'm new at Spark so this might be an anti-pattern, so I'm open >> to suggestions. However it doesn't seem like I'm doing anything incorrect >> here, the types are correct. Searching for this error online returns >> results seemingly about working in dataframes and having mismatching >> schemas or a different order of fields, and it seems like bugfixes have >> gone into place for those cases. >> >> Thanks in advance. >> Efe >> >> -- Thanks & Regards, Agraj Mangal
Re: [Spark 2.0.0] error when unioning to an empty dataset
Bump! On Thu, Oct 13, 2016 at 8:25 PM Efe Selcukwrote: > I have a use case where I want to build a dataset based off of > conditionally available data. I thought I'd do something like this: > > case class SomeData( ... ) // parameters are basic encodable types like > strings and BigDecimals > > var data = spark.emptyDataset[SomeData] > > // loop, determining what data to ingest and process into datasets > data = data.union(someCode.thatReturnsADataset) > // end loop > > However I get a runtime exception: > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > unresolved operator 'Union; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) > at org.apache.spark.sql.Dataset.(Dataset.scala:161) > at org.apache.spark.sql.Dataset.(Dataset.scala:167) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) > at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) > > Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to > suggestions. However it doesn't seem like I'm doing anything incorrect > here, the types are correct. Searching for this error online returns > results seemingly about working in dataframes and having mismatching > schemas or a different order of fields, and it seems like bugfixes have > gone into place for those cases. > > Thanks in advance. > Efe > >
[Spark 2.0.0] error when unioning to an empty dataset
I have a use case where I want to build a dataset based off of conditionally available data. I thought I'd do something like this: case class SomeData( ... ) // parameters are basic encodable types like strings and BigDecimals var data = spark.emptyDataset[SomeData] // loop, determining what data to ingest and process into datasets data = data.union(someCode.thatReturnsADataset) // end loop However I get a runtime exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved operator 'Union; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:361) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset.(Dataset.scala:161) at org.apache.spark.sql.Dataset.(Dataset.scala:167) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) at org.apache.spark.sql.Dataset.union(Dataset.scala:1459) Granted, I'm new at Spark so this might be an anti-pattern, so I'm open to suggestions. However it doesn't seem like I'm doing anything incorrect here, the types are correct. Searching for this error online returns results seemingly about working in dataframes and having mismatching schemas or a different order of fields, and it seems like bugfixes have gone into place for those cases. Thanks in advance. Efe