[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
ajantha-bhat commented on pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642476599 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
CarbonDataQA1 commented on pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642548570 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3145/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
CarbonDataQA1 commented on pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#issuecomment-642548932 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1421/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() was: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name",
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" {code:java} sql("select count from order").show() sql("select count from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count from order_hist where c_name = 'delete'").show() sql("select count from order_hist where c_name = 'insert'").show() placeholder {code} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() Results in spark-shell --master yarn scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ |0| ++ Results in spark-shell --master local scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |2| ++ scala> sql("select price from order where id
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" {code:java} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count from order_hist where c_name = 'delete'").show() sql("select count from order_hist where c_name = 'insert'").show() placeholder {code} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() Results in spark-shell --master yarn scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ |0| ++ Results in spark-shell --master local scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |2| ++ scala> sql("select price from order
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] test
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-642657316 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1422/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
Sachin Ramachandra Setty created CARBONDATA-3851: Summary: Merge Update and Insert with Partition Table is giving different results in different spark deploy modes Key: CARBONDATA-3851 URL: https://issues.apache.org/jira/browse/CARBONDATA-3851 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.0.0 Reporter: Sachin Ramachandra Setty The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map{ x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)}.toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) }.toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() *Results in spark-shell --master yarn * scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ | 0| ++ *Results in spark-shell --master local* scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 2| ++
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() Results in spark-shell --master yarn scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ | 0| ++ Results in spark-shell --master local scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ | 7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 2| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() Results in spark-shell --master yarn scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ | 0| ++ Results in spark-shell --master local scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ | 7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 2| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: (was: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() Results in spark-shell --master yarn scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ | 0| ++ Results in spark-shell --master local scala> sql("select count(*) from order").show() ++ |count(1)| ++ | 10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ | 2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ | 7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ | 2| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show()
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map{ x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)}.toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) }.toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" was: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe =
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() was: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map{ x => ("id"+x,
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" {code} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() {code} Results in spark-shell --master yarn {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ |0| ++ {code} Results in spark-shell --master local {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ |7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |2| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] test
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-642659075 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3146/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3772: [CARBONDATA-3832]Added block and blocket pruning for the polygon expression processing
kunal642 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r438818433 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; Review comment: Add header This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell --master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : {code} import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" {code} SQL Queries : {code} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() {code} Results in spark-shell --master yarn {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ |0| ++ {code} Results in spark-shell --master local {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ |7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |2| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show()
[GitHub] [carbondata] kunal642 commented on a change in pull request #3772: [CARBONDATA-3832]Added block and blocket pruning for the polygon expression processing
kunal642 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r438820691 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; + +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.MeasureColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.geo.scan.expression.PolygonExpression; + +import org.apache.log4j.Logger; + +public class PolygonFilterExecutorImpl extends RowLevelFilterExecuterImpl { + public PolygonFilterExecutorImpl(List dimColEvaluatorInfoList, Review comment: Format this class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3852) CCD Merge with Partition Table is giving different results in different spark deploy modes
Sachin Ramachandra Setty created CARBONDATA-3852: Summary: CCD Merge with Partition Table is giving different results in different spark deploy modes Key: CARBONDATA-3852 URL: https://issues.apache.org/jira/browse/CARBONDATA-3852 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.0.0 Reporter: Sachin Ramachandra Setty The result sets are different when run the sql queries in spark-shell --master local and spark-shell --master yarn (Two Different Spark Deploy Modes) {code} import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ val df1 = sc.parallelize(1 to 10, 4).map{ x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)}.toDF("id", "name", "c_name", "quantity", "price", "state") df1.write.format("carbondata").option("tableName", "order").mode(SaveMode.Overwrite).save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") val ds2 = sc.parallelize(1 to 2, 4).map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1)}.toDS().toDF() val ds3 = ds1.union(ds2) val odsframe = ds3.as("B") sql("drop table if exists target").show() val initframe = spark.createDataFrame(Seq( Row("a", "0"), Row("b", "1"), Row("c", "2"), Row("d", "3") ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", StringType initframe.write .format("carbondata") .option("tableName", "target") .option("partitionColumns", "value") .mode(SaveMode.Overwrite) .save() val target = spark.read.format("carbondata").option("tableName", "target").load() var ccd = spark.createDataFrame(Seq( Row("a", "10", false, 0), Row("a", null, true, 1), Row("b", null, true, 2), Row("c", null, true, 3), Row("c", "20", false, 4), Row("c", "200", false, 5), Row("e", "100", false, 6) ).asJava, StructType(Seq(StructField("key", StringType), StructField("newValue", StringType), StructField("deleted", BooleanType), StructField("time", IntegerType ccd.createOrReplaceTempView("changes") ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)") val updateMap = Map("key" -> "B.key", "value" -> "B.newValue").asInstanceOf[Map[Any, Any]] val insertMap = Map("key" -> "B.key", "value" -> "B.newValue").asInstanceOf[Map[Any, Any]] target.as("A").merge(ccd.as("B"), "A.key=B.key"). whenMatched("B.deleted=false"). updateExpr(updateMap). whenNotMatched("B.deleted=false"). insertExpr(insertMap). whenMatched("B.deleted=true"). delete().execute() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : {code} import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" {code} SQL Queries : {code} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() {code} Results in spark-shell --master yarn {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ |0| ++ {code} Results in spark-shell --master local {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ |7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |2| ++ scala> sql("select count(*) from order_hist where c_name =
[jira] [Updated] (CARBONDATA-3851) Merge Update and Insert with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3851: - Description: The result sets are different when run the queries in spark-shell--master local and spark-shell --master yarn (Two Different Spark Deploy Modes) Steps to Reproduce Issue : {code} import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge. {CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types. {BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ sql("drop table if exists order").show() sql("drop table if exists order_hist").show() sql("create table order_hist(id string, name string, quantity int, price int, state int) PARTITIONED BY (c_name String) STORED AS carbondata").show() val initframe = sc.parallelize(1 to 10, 4).map { x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)} .toDF("id", "name", "c_name", "quantity", "price", "state") initframe.write .format("carbondata") .option("tableName", "order") .option("partitionColumns", "c_name") .mode(SaveMode.Overwrite) .save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") ds1.show() val ds2 = sc.parallelize(1 to 2, 4) .map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1) } .toDS().toDF() ds2.show() val ds3 = ds1.union(ds2) ds3.show() val odsframe = ds3.as("B") var matches = Seq.empty[MergeMatch] val updateMap = Map(col("id") -> col("A.id"), col("price") -> expr("B.price + 1"), col("state") -> col("B.state")) val insertMap = Map(col("id") -> col("B.id"), col("name") -> col("B.name"), col("c_name") -> col("B.c_name"), col("quantity") -> col("B.quantity"), col("price") -> expr("B.price * 100"), col("state") -> col("B.state")) val insertMap_u = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("insert"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) val insertMap_d = Map(col("id") -> col("id"), col("name") -> col("name"), col("c_name") -> lit("delete"), col("quantity") -> col("quantity"), col("price") -> expr("price"), col("state") -> col("state")) matches ++= Seq(WhenMatched(Some(col("A.state") =!= col("B.state"))).addAction(UpdateAction(updateMap)).addAction(InsertInHistoryTableAction(insertMap_u, TableIdentifier("order_hist" matches ++= Seq(WhenNotMatched().addAction(InsertAction(insertMap))) matches ++= Seq(WhenNotMatchedAndExistsOnlyOnTarget().addAction(DeleteAction()).addAction(InsertInHistoryTableAction(insertMap_d, TableIdentifier("order_hist" {code} SQL Queries : {code} sql("select count(*) from order").show() sql("select count(*) from order where state = 2").show() sql("select price from order where id = 'newid1'").show() sql("select count(*) from order_hist where c_name = 'delete'").show() sql("select count(*) from order_hist where c_name = 'insert'").show() {code} Results in spark-shell --master yarn {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |0| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |0| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show() ++ |count(1)| ++ |0| ++ {code} Results in spark-shell --master local {code} scala> sql("select count(*) from order").show() ++ |count(1)| ++ |10| ++ scala> sql("select count(*) from order where state = 2").show() ++ |count(1)| ++ |2| ++ scala> sql("select price from order where id = 'newid1'").show() +-+ |price| +-+ |7500| +-+ scala> sql("select count(*) from order_hist where c_name = 'delete'").show() ++ |count(1)| ++ |2| ++ scala> sql("select count(*) from order_hist where c_name = 'insert'").show()
[jira] [Updated] (CARBONDATA-3852) CCD Merge with Partition Table is giving different results in different spark deploy modes
[ https://issues.apache.org/jira/browse/CARBONDATA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Ramachandra Setty updated CARBONDATA-3852: - Description: The result sets are different when run the sql queries in spark-shell --master local and spark-shell --master yarn (Two Different Spark Deploy Modes) {code} import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, StructField, StructType} import spark.implicits._ val df1 = sc.parallelize(1 to 10, 4).map{ x => ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1)}.toDF("id", "name", "c_name", "quantity", "price", "state") df1.write.format("carbondata").option("tableName", "order").mode(SaveMode.Overwrite).save() val dwframe = spark.read.format("carbondata").option("tableName", "order").load() val dwSelframe = dwframe.as("A") val ds1 = sc.parallelize(3 to 10, 4) .map {x => if (x <= 4) { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 2) } else { ("id"+x, s"order$x",s"customer$x", x*10, x*75, 1) } }.toDF("id", "name", "c_name", "quantity", "price", "state") val ds2 = sc.parallelize(1 to 2, 4).map {x => ("newid"+x, s"order$x",s"customer$x", x*10, x*75, 1)}.toDS().toDF() val ds3 = ds1.union(ds2) val odsframe = ds3.as("B") sql("drop table if exists target").show() val initframe = spark.createDataFrame(Seq( Row("a", "0"), Row("b", "1"), Row("c", "2"), Row("d", "3") ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", StringType initframe.write .format("carbondata") .option("tableName", "target") .option("partitionColumns", "value") .mode(SaveMode.Overwrite) .save() val target = spark.read.format("carbondata").option("tableName", "target").load() var ccd = spark.createDataFrame(Seq( Row("a", "10", false, 0), Row("a", null, true, 1), Row("b", null, true, 2), Row("c", null, true, 3), Row("c", "20", false, 4), Row("c", "200", false, 5), Row("e", "100", false, 6) ).asJava, StructType(Seq(StructField("key", StringType), StructField("newValue", StringType), StructField("deleted", BooleanType), StructField("time", IntegerType ccd.createOrReplaceTempView("changes") ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)") val updateMap = Map("key" -> "B.key", "value" -> "B.newValue").asInstanceOf[Map[Any, Any]] val insertMap = Map("key" -> "B.key", "value" -> "B.newValue").asInstanceOf[Map[Any, Any]] target.as("A").merge(ccd.as("B"), "A.key=B.key"). whenMatched("B.deleted=false"). updateExpr(updateMap). whenNotMatched("B.deleted=false"). insertExpr(insertMap). whenMatched("B.deleted=true"). delete().execute() {code} SQL Queries to run : {code} sql("select count(*) from target").show() sql("select * from target order by key").show() {code} Results in spark-shell --master yarn {code} scala> sql("select count(*) from target").show() ++ |count(1)| ++ | 4| ++ scala> sql("select * from target order by key").show() +---+-+ |key|value| +---+-+ | a|0| | b|1| | c|2| | d|3| +---+-+ {code} Results in spark-shell --master local {code} scala> sql("select count(*) from target").show() ++ |count(1)| ++ | 3| ++ scala> sql("select * from target order by key").show() +---+-+ |key|value| +---+-+ | c| 200| | d|3| | e| 100| +---+-+ {code} was: The result sets are different when run the sql queries in spark-shell --master local and spark-shell --master yarn (Two Different Spark Deploy Modes) {code} import scala.collection.JavaConverters._ import java.sql.Date import org.apache.spark.sql._ import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand, DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, WhenNotMatchedAndExistsOnlyOnTarget} import org.apache.spark.sql.functions._ import org.apache.spark.sql.test.util.QueryTest import
[GitHub] [carbondata] kunal642 commented on a change in pull request #3772: [CARBONDATA-3832]Added block and blocket pruning for the polygon expression processing
kunal642 commented on a change in pull request #3772: URL: https://github.com/apache/carbondata/pull/3772#discussion_r438837210 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/filter/executor/PolygonFilterExecutorImpl.java ## @@ -0,0 +1,84 @@ +package org.apache.carbondata.geo.scan.filter.executor; + +import java.util.BitSet; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.MeasureColumnResolvedFilterInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.geo.scan.expression.PolygonExpression; + +import org.apache.log4j.Logger; + +public class PolygonFilterExecutorImpl extends RowLevelFilterExecuterImpl { + public PolygonFilterExecutorImpl(List dimColEvaluatorInfoList, + List msrColEvalutorInfoList, Expression exp, + AbsoluteTableIdentifier tableIdentifier, SegmentProperties segmentProperties, + Map complexDimensionInfoMap) { +super(dimColEvaluatorInfoList, msrColEvalutorInfoList, exp, tableIdentifier, segmentProperties, +complexDimensionInfoMap); + } + + private int getNearestRangeIndex(List ranges, long searchForNumber) { +Long[] range; +int low = 0, mid = 0, high = ranges.size() - 1; +while (low <= high) { + mid = low + ((high - low) / 2); + range = ranges.get(mid); + if (searchForNumber >= range[0]) { +if (searchForNumber <= range[1]) { + // Return the range index if the number is between min and max values of the range + return mid; +} else { + // Number is bigger than this range's min and max. Search on the right side of the range + low = mid + 1; +} + } else { +// Number is smaller than this range's min and max. Search on the left side of the range +high = mid - 1; + } +} +return mid; + } + + private boolean isScanRequired(byte[] maxValue, byte[] minValue) { Review comment: Please write a detailed explanation for the logic This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3853) Dataload fails for date column configured as BUCKET_COLUMNS
Chetan Bhat created CARBONDATA-3853: --- Summary: Dataload fails for date column configured as BUCKET_COLUMNS Key: CARBONDATA-3853 URL: https://issues.apache.org/jira/browse/CARBONDATA-3853 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 2.0.0 Reporter: Chetan Bhat Steps and Issue 0: jdbc:hive2://10.20.255.171:23040/> create table if not exists all_data_types1(bool_1 boolean,bool_2 boolean,chinese string,Number int,smallNumber smallint,BigNumber bigint,LargeDecimal double,smalldecimal float,customdecimal decimal(38,15),words string,smallwords char(8),varwords varchar(20),time timestamp,day date,emptyNumber int,emptysmallNumber smallint,emptyBigNumber bigint,emptyLargeDecimal double,emptysmalldecimal float,emptycustomdecimal decimal(38,38),emptywords string,emptysmallwords char(8),emptyvarwords varchar(20)) stored as carbondata TBLPROPERTIES ('BUCKET_NUMBER'='2', *'BUCKET_COLUMNS'='day'*); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.494 seconds) 0: jdbc:hive2://10.20.255.171:23040/> LOAD DATA INPATH 'hdfs://hacluster/chetan/datafile_0.csv' into table all_data_types1 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='bool_1 ,bool_2 ,chinese ,Number ,smallNumber ,BigNumber ,LargeDecimal ,smalldecimal ,customdecimal,words ,smallwords ,varwords ,time ,day ,emptyNumber ,emptysmallNumber ,emptyBigNumber ,emptyLargeDecimal ,emptysmalldecimal,emptycustomdecimal ,emptywords ,emptysmallwords ,emptyvarwords'); *Error: java.lang.Exception: DataLoad failure (state=,code=0)* *Log-* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3853) Dataload fails for date column configured as BUCKET_COLUMNS
[ https://issues.apache.org/jira/browse/CARBONDATA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-3853: Description: Steps and Issue 0: jdbc:hive2://10.20.255.171:23040/> create table if not exists all_data_types1(bool_1 boolean,bool_2 boolean,chinese string,Number int,smallNumber smallint,BigNumber bigint,LargeDecimal double,smalldecimal float,customdecimal decimal(38,15),words string,smallwords char(8),varwords varchar(20),time timestamp,day date,emptyNumber int,emptysmallNumber smallint,emptyBigNumber bigint,emptyLargeDecimal double,emptysmalldecimal float,emptycustomdecimal decimal(38,38),emptywords string,emptysmallwords char(8),emptyvarwords varchar(20)) stored as carbondata TBLPROPERTIES ('BUCKET_NUMBER'='2', *'BUCKET_COLUMNS'='day'*); +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.494 seconds) 0: jdbc:hive2://10.20.255.171:23040/> LOAD DATA INPATH 'hdfs://hacluster/chetan/datafile_0.csv' into table all_data_types1 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='bool_1 ,bool_2 ,chinese ,Number ,smallNumber ,BigNumber ,LargeDecimal ,smalldecimal ,customdecimal,words ,smallwords ,varwords ,time ,day ,emptyNumber ,emptysmallNumber ,emptyBigNumber ,emptyLargeDecimal ,emptysmalldecimal,emptycustomdecimal ,emptywords ,emptysmallwords ,emptyvarwords'); *Error: java.lang.Exception: DataLoad failure (state=,code=0)* *Log-* java.lang.Exception: DataLoad failure at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:560) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:207) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:168) at org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:148) at org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:145) at org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:104) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:141) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:145) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258) at org.apache.spark.sql.Dataset.(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2020-06-11 23:47:24,973 | ERROR | [HiveServer2-Background-Pool: Thread-104] | Error running hive query: | org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:179) org.apache.hive.service.cli.HiveSQLException: java.lang.Exception: DataLoad failure at
[jira] [Created] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010
Mahesh Raju Somalaraju created CARBONDATA-3854: -- Summary: Quote char support to unprintable character like \u0009 \u0010 Key: CARBONDATA-3854 URL: https://issues.apache.org/jira/browse/CARBONDATA-3854 Project: CarbonData Issue Type: New Feature Reporter: Mahesh Raju Somalaraju Quote char support to unprintable character like \u0009 \u0010 Currently carbondata will not support setting quotechar to printable char like \u0009. current behaviour is quotechar will through exception if we give more than one character. Need to support more than one character same as like delimiter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] maheshrajus opened a new pull request #3790: [CARBONDATA-3854] Quotechar support to more than one character
maheshrajus opened a new pull request #3790: URL: https://github.com/apache/carbondata/pull/3790 ### Why is this PR needed? Need to support more than one character the same as a like **delimiter**. Quote char support to unprintable character like \u0009 \u0010 Currently, carbondata will not support setting quotechar to printable char like \u0009. the current behaviour is quotechar will through exception if we give more than one character. ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3790: [CARBONDATA-3854] Quotechar support to more than one character
CarbonDataQA1 commented on pull request #3790: URL: https://github.com/apache/carbondata/pull/3790#issuecomment-642882015 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1423/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3790: [CARBONDATA-3854] Quotechar support to more than one character
CarbonDataQA1 commented on pull request #3790: URL: https://github.com/apache/carbondata/pull/3790#issuecomment-642881814 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3147/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org