[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-29 Thread Bigicecream (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422205#comment-17422205
 ] 

Bigicecream commented on CARBONDATA-4279:
-

[~Indhumathi27] 
 Sorry for the slow responce

 

1.No,
 This are the logs:
{noformat}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/mnt/yarn/usercache/livy/filecache/48/__spark_libs__3665716770347383703.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/09/29 15:44:36 INFO CoarseGrainedExecutorBackend: Started daemon with 
process name: 18902@ip-10-4-181-156
21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for TERM
21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for HUP
21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for INT
21/09/29 15:44:37 INFO SecurityManager: Changing view acls to: yarn,livy
21/09/29 15:44:37 INFO SecurityManager: Changing modify acls to: yarn,livy
21/09/29 15:44:37 INFO SecurityManager: Changing view acls groups to: 
21/09/29 15:44:37 INFO SecurityManager: Changing modify acls groups to: 
21/09/29 15:44:37 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(yarn, livy); 
groups with view permissions: Set(); users  with modify permissions: Set(yarn, 
livy); groups with modify permissions: Set()
21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection 
to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 78 ms (0 
ms spent in bootstraps)
21/09/29 15:44:38 INFO SecurityManager: Changing view acls to: yarn,livy
21/09/29 15:44:38 INFO SecurityManager: Changing modify acls to: yarn,livy
21/09/29 15:44:38 INFO SecurityManager: Changing view acls groups to: 
21/09/29 15:44:38 INFO SecurityManager: Changing modify acls groups to: 
21/09/29 15:44:38 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(yarn, livy); 
groups with view permissions: Set(); users  with modify permissions: Set(yarn, 
livy); groups with modify permissions: Set()
21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection 
to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 1 ms (0 
ms spent in bootstraps)
21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at 
/mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-5aa03748-2d6d-4c78-9da5-1ef0e23cc506
21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at 
/mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-2dba9cef-1782-4baa-a13f-fe379e090118
21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at 
/mnt/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-d279178b-8dc9-4319-a64c-1e5bad11fe29
21/09/29 15:44:38 INFO MemoryStore: MemoryStore started with capacity 4.0 GB
21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Connecting to driver: 
spark://coarsegrainedschedu...@ip-10-4-137-125.eu-west-1.compute.internal:34545
21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Successfully registered 
with driver
21/09/29 15:44:38 INFO Executor: Starting executor ID 4 on host 
ip-10-4-181-156.eu-west-1.compute.internal
21/09/29 15:44:38 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 38947.
21/09/29 15:44:38 INFO NettyBlockTransferService: Server created on 
ip-10-4-181-156.eu-west-1.compute.internal:38947
21/09/29 15:44:38 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
21/09/29 15:44:38 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None)
21/09/29 15:44:38 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None)
21/09/29 15:44:38 INFO BlockManager: external shuffle service port = 7337
21/09/29 15:44:38 INFO BlockManager: Registering executor with local external 
shuffle service.
21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection 
to ip-10-4-181-156.eu-west-1.compute.internal/10.4.181.156:7337 after 2 ms (0 
ms spent in bootstraps)
21/09/29 15:44:38 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None)
21/09/29 15:44:38 INFO Executor: Using REPL class URI: 
spark://ip-10-4-137-125.eu-west-1.compute.internal:34545/classes
21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Got assigned task 1
21/09/29 

[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418713#comment-17418713
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

Thanks for the clarification.

So, now the initial issue ( as mentioned in description) is Without Location 
keyword.

Please check the following.
 # If any exception occurred during insert ?
 # If scenario works fine with non-partition table ?

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Bigicecream (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418708#comment-17418708
 ] 

Bigicecream commented on CARBONDATA-4279:
-

I think I confused you,

the bug I opened is happening when I am not using the Location keyword in the 
table creation and I am not getting any error when inserting the data

 

In your first comment, you suggested try adding Location keyword, after doing 
it, I started getting errors when inserting data to the table and when running 
_'show segments'_, I think it is a different bug but maybe it related

 

so I will answer each case:

Without Location keyword:
 # Yes
 # Without Location keyword
 # I try to insert 4 columns(timestamp and name are normal columns and dt and 
hr are partitions) it works fine
 # It works the same when the database is not created with specifying LOCATION

With Location keyword:
 # Yes
 # With Location keyword
 # I try to insert 4 columns(timestamp and name are normal columns and dt and 
hr are partitons) 
I getting an error when I do that,
 the stack trace of the error:
{code:java}
org.apache.spark.sql.AnalysisException: Cannot insert into target table because 
number of columns mismatch;
  at 
org.apache.spark.sql.util.CarbonException$.analysisException(CarbonException.scala:23)
  at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts.castChildOutput(CarbonAnalysisRules.scala:330)
  at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts$$anonfun$apply$3.applyOrElse(CarbonAnalysisRules.scala:261)
  at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts$$anonfun$apply$3.applyOrElse(CarbonAnalysisRules.scala:253)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:286)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:286)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:285)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:275)
  at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts.apply(CarbonAnalysisRules.scala:253)
  at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts.apply(CarbonAnalysisRules.scala:251)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
  at 
org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:91)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:88)
  at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:88)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:80)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:164)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
  at 
org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withLocalMetrics(Analyzer.scala:104)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:126)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:125)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:125)
  at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
  at 

[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418667#comment-17418667
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

Hi.. the steps that you have mentioned in description and in comment is 
different.

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Bigicecream (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418665#comment-17418665
 ] 

Bigicecream commented on CARBONDATA-4279:
-

[~Indhumathi27]

this what you meant?

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Bigicecream (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418575#comment-17418575
 ] 

Bigicecream commented on CARBONDATA-4279:
-

sure,
 without LOCATION in the create table:
{code:java}
+-++---+
|col_name |data_type
   |comment|
+-++---+
|timestamp|string   
   |null   |
|name |string   
   |null   |
|dt   |string   
   |null   |
|hr   |string   
   |null   |
| | 
   |   |
|## Detailed Table Information| 
   |   |
|Database |lior_carbon_tests
   |   |
|Table|mark_for_del_bug 
   |   |
|Owner|livy 
   |   |
|Created  |Wed Sep 22 11:51:40 UTC 2021 
   |   |
|Location 
|s3a://coralogix-bigicecream/CarbonDataTests/mark_for_del_bug|   |
|External |false
   |   |
|Transactional|true 
   |   |
|Streaming|false
   |   |
|Table Block Size |1024 MB  
   |   |
|Table Blocklet Size  |64 MB
   |   |
|Comment  | 
   |   |
|Bad Record Path  | 
   |   |
|Date Format  | 
   |   |
|Timestamp Format | 
   |   |
+-++---+{code}

with LOCATION in the create table:

{code:java}
+-+---+---+
|col_name |data_type  
|comment|
+-+---+---+
|timestamp|string |null 
  |
|name |string |null 
  |
|dt   |string |null 
  |
|hr   |string |null 
  |
| |   | 
  |
|## Detailed Table Information|   | 
  |
|Database |lior_carbon_tests  | 
  |
|Table|mark_for_del_bug   | 
  |
|Owner|livy   | 
  |
|Created  |Wed Sep 22 12:43:04 UTC 2021   | 
  |
|Location |s3a://coralogix-bigicecream/CarbonDataTests| 
  |
|External |true   | 
  |
|Transactional|false  | 
  |
|Streaming|false  | 
  |
|Table Block Size |1024 MB| 
  |
|Table Blocklet Size  |64 MB  | 
  |
|Comment  |   | 
  |
|Bad Record Path  |   | 
  |
|Date Format  |   | 
  |
|Timestamp Format |   | 
  |
+-+---+---+
{code}


> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> 

[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418560#comment-17418560
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

can you please share describe formatted table results

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Bigicecream (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418555#comment-17418555
 ] 

Bigicecream commented on CARBONDATA-4279:
-

[~Indhumathi27]
 it letting me create it,
 but then I get many errors when trying working with it:
 like:
 I cannot insert the data(I am getting:
 '_Cannot insert into target table because number of columns mismatch_')

cannot see the segments(as I said before)

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418478#comment-17418478
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

create table with location doesnt work in your cluster ?

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Bigicecream (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418476#comment-17418476
 ] 

Bigicecream commented on CARBONDATA-4279:
-

[~Indhumathi27] 
Hi, I specify the location via 'spark.sql.warehouse.dir'(I set it when running 
the spark-shell)
setting LOCATION in the table creation

 
{code:java}
CREATE TABLE lior_carbon_tests.mark_for_del_bug(
timestamp string,
name string )
STORED AS carbondata
PARTITIONED BY (dt string, hr string)
LOCATION 's3a://bla/CarbonDataTests'
{code}
 

causing:
{code:java}
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
Unsupported operation on non transactional table
{code}
{code:java}
 {code}
when running:
{code:java}
show segments for TABLE lior_carbon_tests.mark_for_del_bug
{code}

I will take a look at the tests
It does strange that it doesn't work on my cluster

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-20 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417549#comment-17417549
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

Hi, I have the following questions for this JIRA
 # If the table is created is created with `LOCATION '' ` or not

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)