[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422205#comment-17422205 ] Bigicecream commented on CARBONDATA-4279: - [~Indhumathi27] Sorry for the slow responce 1.No, This are the logs: {noformat} SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/livy/filecache/48/__spark_libs__3665716770347383703.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 21/09/29 15:44:36 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 18902@ip-10-4-181-156 21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for TERM 21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for HUP 21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for INT 21/09/29 15:44:37 INFO SecurityManager: Changing view acls to: yarn,livy 21/09/29 15:44:37 INFO SecurityManager: Changing modify acls to: yarn,livy 21/09/29 15:44:37 INFO SecurityManager: Changing view acls groups to: 21/09/29 15:44:37 INFO SecurityManager: Changing modify acls groups to: 21/09/29 15:44:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, livy); groups with view permissions: Set(); users with modify permissions: Set(yarn, livy); groups with modify permissions: Set() 21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 78 ms (0 ms spent in bootstraps) 21/09/29 15:44:38 INFO SecurityManager: Changing view acls to: yarn,livy 21/09/29 15:44:38 INFO SecurityManager: Changing modify acls to: yarn,livy 21/09/29 15:44:38 INFO SecurityManager: Changing view acls groups to: 21/09/29 15:44:38 INFO SecurityManager: Changing modify acls groups to: 21/09/29 15:44:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, livy); groups with view permissions: Set(); users with modify permissions: Set(yarn, livy); groups with modify permissions: Set() 21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 1 ms (0 ms spent in bootstraps) 21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-5aa03748-2d6d-4c78-9da5-1ef0e23cc506 21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-2dba9cef-1782-4baa-a13f-fe379e090118 21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-d279178b-8dc9-4319-a64c-1e5bad11fe29 21/09/29 15:44:38 INFO MemoryStore: MemoryStore started with capacity 4.0 GB 21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://coarsegrainedschedu...@ip-10-4-137-125.eu-west-1.compute.internal:34545 21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 21/09/29 15:44:38 INFO Executor: Starting executor ID 4 on host ip-10-4-181-156.eu-west-1.compute.internal 21/09/29 15:44:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38947. 21/09/29 15:44:38 INFO NettyBlockTransferService: Server created on ip-10-4-181-156.eu-west-1.compute.internal:38947 21/09/29 15:44:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 21/09/29 15:44:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None) 21/09/29 15:44:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None) 21/09/29 15:44:38 INFO BlockManager: external shuffle service port = 7337 21/09/29 15:44:38 INFO BlockManager: Registering executor with local external shuffle service. 21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection to ip-10-4-181-156.eu-west-1.compute.internal/10.4.181.156:7337 after 2 ms (0 ms spent in bootstraps) 21/09/29 15:44:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None) 21/09/29 15:44:38 INFO Executor: Using REPL class URI: spark://ip-10-4-137-125.eu-west-1.compute.internal:34545/classes 21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Got assigned task 1 21/09/29
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418713#comment-17418713 ] Indhumathi Muthumurugesh commented on CARBONDATA-4279: -- Thanks for the clarification. So, now the initial issue ( as mentioned in description) is Without Location keyword. Please check the following. # If any exception occurred during insert ? # If scenario works fine with non-partition table ? > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as described [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418708#comment-17418708 ] Bigicecream commented on CARBONDATA-4279: - I think I confused you, the bug I opened is happening when I am not using the Location keyword in the table creation and I am not getting any error when inserting the data In your first comment, you suggested try adding Location keyword, after doing it, I started getting errors when inserting data to the table and when running _'show segments'_, I think it is a different bug but maybe it related so I will answer each case: Without Location keyword: # Yes # Without Location keyword # I try to insert 4 columns(timestamp and name are normal columns and dt and hr are partitions) it works fine # It works the same when the database is not created with specifying LOCATION With Location keyword: # Yes # With Location keyword # I try to insert 4 columns(timestamp and name are normal columns and dt and hr are partitons) I getting an error when I do that, the stack trace of the error: {code:java} org.apache.spark.sql.AnalysisException: Cannot insert into target table because number of columns mismatch; at org.apache.spark.sql.util.CarbonException$.analysisException(CarbonException.scala:23) at org.apache.spark.sql.hive.CarbonPreInsertionCasts.castChildOutput(CarbonAnalysisRules.scala:330) at org.apache.spark.sql.hive.CarbonPreInsertionCasts$$anonfun$apply$3.applyOrElse(CarbonAnalysisRules.scala:261) at org.apache.spark.sql.hive.CarbonPreInsertionCasts$$anonfun$apply$3.applyOrElse(CarbonAnalysisRules.scala:253) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:285) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:275) at org.apache.spark.sql.hive.CarbonPreInsertionCasts.apply(CarbonAnalysisRules.scala:253) at org.apache.spark.sql.hive.CarbonPreInsertionCasts.apply(CarbonAnalysisRules.scala:251) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:88) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:88) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:80) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:164) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156) at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withLocalMetrics(Analyzer.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:126) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:125) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:125) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76) at
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418667#comment-17418667 ] Indhumathi Muthumurugesh commented on CARBONDATA-4279: -- Hi.. the steps that you have mentioned in description and in comment is different. > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as described [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418665#comment-17418665 ] Bigicecream commented on CARBONDATA-4279: - [~Indhumathi27] this what you meant? > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as described [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418575#comment-17418575 ] Bigicecream commented on CARBONDATA-4279: - sure, without LOCATION in the create table: {code:java} +-++---+ |col_name |data_type |comment| +-++---+ |timestamp|string |null | |name |string |null | |dt |string |null | |hr |string |null | | | | | |## Detailed Table Information| | | |Database |lior_carbon_tests | | |Table|mark_for_del_bug | | |Owner|livy | | |Created |Wed Sep 22 11:51:40 UTC 2021 | | |Location |s3a://coralogix-bigicecream/CarbonDataTests/mark_for_del_bug| | |External |false | | |Transactional|true | | |Streaming|false | | |Table Block Size |1024 MB | | |Table Blocklet Size |64 MB | | |Comment | | | |Bad Record Path | | | |Date Format | | | |Timestamp Format | | | +-++---+{code} with LOCATION in the create table: {code:java} +-+---+---+ |col_name |data_type |comment| +-+---+---+ |timestamp|string |null | |name |string |null | |dt |string |null | |hr |string |null | | | | | |## Detailed Table Information| | | |Database |lior_carbon_tests | | |Table|mark_for_del_bug | | |Owner|livy | | |Created |Wed Sep 22 12:43:04 UTC 2021 | | |Location |s3a://coralogix-bigicecream/CarbonDataTests| | |External |true | | |Transactional|false | | |Streaming|false | | |Table Block Size |1024 MB| | |Table Blocklet Size |64 MB | | |Comment | | | |Bad Record Path | | | |Date Format | | | |Timestamp Format | | | +-+---+---+ {code} > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR >
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418560#comment-17418560 ] Indhumathi Muthumurugesh commented on CARBONDATA-4279: -- can you please share describe formatted table results > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as described [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418555#comment-17418555 ] Bigicecream commented on CARBONDATA-4279: - [~Indhumathi27] it letting me create it, but then I get many errors when trying working with it: like: I cannot insert the data(I am getting: '_Cannot insert into target table because number of columns mismatch_') cannot see the segments(as I said before) > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as described [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418478#comment-17418478 ] Indhumathi Muthumurugesh commented on CARBONDATA-4279: -- create table with location doesnt work in your cluster ? > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as decribed [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418476#comment-17418476 ] Bigicecream commented on CARBONDATA-4279: - [~Indhumathi27] Hi, I specify the location via 'spark.sql.warehouse.dir'(I set it when running the spark-shell) setting LOCATION in the table creation {code:java} CREATE TABLE lior_carbon_tests.mark_for_del_bug( timestamp string, name string ) STORED AS carbondata PARTITIONED BY (dt string, hr string) LOCATION 's3a://bla/CarbonDataTests' {code} causing: {code:java} org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: Unsupported operation on non transactional table {code} {code:java} {code} when running: {code:java} show segments for TABLE lior_carbon_tests.mark_for_del_bug {code} I will take a look at the tests It does strange that it doesn't work on my cluster > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as decribed [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417549#comment-17417549 ] Indhumathi Muthumurugesh commented on CARBONDATA-4279: -- Hi, I have the following questions for this JIRA # If the table is created is created with `LOCATION '' ` or not > Insert data to table with a partitions resulting in 'Marked for Delete' > segment in Spark in EMR > --- > > Key: CARBONDATA-4279 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.3.0-SNAPSHOT > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Blocker > > as decribed [here|https://github.com/apache/carbondata/issues/4212] > After the commit > [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] > I have successfully created a table with partitions, but when I trying insert > data the job end with a success > but the segment is marked as "Marked for Delete" > I am running: > {code:sql} > CREATE TABLE lior_carbon_tests.mark_for_del_bug( > timestamp string, > name string > ) > STORED AS carbondata > PARTITIONED BY (dt string, hr string) > {code} > {code:sql} > INSERT INTO lior_carbon_tests.mark_for_del_bug select > '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' > {code} > {code:sql} > select * from lior_carbon_tests.mark_for_del_bug > {code} > gives: > {code:java} > +-++---+---+ > |timestamp|name| dt| hr| > +-++---+---+ > +-++---+---+ > {code} > And > {code:java} > show segments for TABLE lior_carbon_tests.mark_for_del_bug > {code} > gives > > {code:java} > +---+-+---+---+-+-+--+---+ > |ID |Status |Load Start Time|Load Time Taken|Partition|Data > Size|Index Size|File Format| > +---+-+---+---+-+-+--+---+ > |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA |NA > |NA|columnar_v3| > +---+-+---+---+-+-+--+---+ > {code} > > I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)