[jira] [Resolved] (CARBONDATA-4232) Add missing doc change for secondary index.

2021-07-07 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4232.

Fix Version/s: 2.2.0
   Resolution: Fixed

> Add missing doc change for secondary index.
> ---
>
> Key: CARBONDATA-4232
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4232
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Doc changes were not handled for PR-4116 to leverage secondary index till 
> segment level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4230) table properties not updated with lower-case

2021-06-29 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4230.

Fix Version/s: 2.2.0
   Resolution: Fixed

> table properties not updated with lower-case
> 
>
> Key: CARBONDATA-4230
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4230
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> h1. table properties not updated with lower-case
> try to create table properties with case-sensitive and they are not storing 
> with lower-case. due to this query properties are failing
>  
> reproduce steps.
> test("testing geo case sensitive") {
>  // Source columns must be present in the table. Fails to create table.
>  sql("drop table source_index")
>  sql(
>  s"""
>  | CREATE TABLE source_index(timevalue BIGINT, longitude LONG, latitude LONG)
>  | STORED AS carbondata
>  | TBLPROPERTIES ('SPATIAL_INDEX.MYGEOHASH.type'='geohash', 
> 'SPATIAL_INDEX.mygeohash
>  | .sourcecolumns'='longitude,latitude',
>  | 'SPATIAL_INDEX.mygeohash.originalLatitude'='39.930753',
>  | 'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  | 'SPATIAL_INDEX'='MYGEOHASH',
>  | 'SPATIAL_INDEX.MYGEOHASH.conversionRatio'='100')
>  """.stripMargin)
>  }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4271) Support DPP for carbon filters

2021-08-18 Thread Indhumathi (Jira)
Indhumathi created CARBONDATA-4271:
--

 Summary: Support DPP for carbon filters
 Key: CARBONDATA-4271
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4271
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Indhumathi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4119) User Input for GeoID column not validated.

2021-08-24 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4119.

Fix Version/s: 2.3.0
   Resolution: Fixed

> User Input for GeoID column not validated.
> --
>
> Key: CARBONDATA-4119
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4119
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.1.0
>Reporter: PURUJIT CHAUGULE
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> * User Input for geoId column can be paired to multiple pairs of source 
> columns values (correct internally calculated geoID values are different for 
> such above source columns values).
>  * The advantage of using geoID is not applicable when taking user input for 
> GeoId column is not validated and user input values may differ from actual 
> internally calculated values. GeoID value is only generated internally if 
> user does not input the geoID column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4238) Documentation Issue in Github Docs Link https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#add-columns

2021-08-24 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4238.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Documentation Issue in Github Docs Link 
> https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#add-columns
> --
>
> Key: CARBONDATA-4238
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4238
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.2.0
>Reporter: PURUJIT CHAUGULE
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: Alter Add Complex.png, Alter Add 
> Complex_Error_message.png
>
>
> [https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#add-columns]
>  * Example provided for Adding of only single-level Complex datatype 
> columns(only array and struct)  has double level array column which is not 
> supported and needs to be changed to single level array column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4236) Documentation correctness and link issues in https://github.com/apache/carbondata/blob/master/docs/

2021-08-24 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4236.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Documentation correctness and link issues in 
> https://github.com/apache/carbondata/blob/master/docs/
> ---
>
> Key: CARBONDATA-4236
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4236
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.2.0
> Environment: docs with content and examples verified on Spark 2.4.5 
> and Spark 3.1.1 compatible carbon.
>Reporter: Chetan Bhat
>Priority: Minor
> Fix For: 2.3.0
>
>
> In the documentation link 
> https://github.com/apache/carbondata/blob/master/docs/
> Issue 1 :- 
> In link -> 
> https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md
>  the "See detail" links does not open the target 
> "http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence;
> In link --> 
> https://github.com/apache/carbondata/blob/master/docs/documentation.md the 
> link "Apache CarbonData wiki" when clicked tries to open link 
> "https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Home; the 
> target page cant be opened. Similarly the other links in the "External 
> Resources" section cant be opened due to the same error.
> In link 
> https://github.com/apache/carbondata/blob/master/docs/faq.md#what-are-bad-records
>  the link "https://thrift.apache.org/docs/install; when clicked does not open 
> the target page.
> In link 
> https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md 
> when the "Spark website" link is clicked 
> https://spark.apache.org/downloads.html page is not opened. Also in same page 
> when the "Apache Spark Documentation" link is clicked the 
> "http://spark.apache.org/docs/latest/; page is not opened.
> In the link 
> https://github.com/apache/carbondata/blob/master/docs/release-guide.md 
> "Product Release Policy link" , "release signing guidelines" , "Apache Nexus 
> repository" and "repository.apache.org" when clicked the target pages are not 
> opening.
> Issue 2:-
> In link --> 
> https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md
>  the "To configure Ranges-based Compaction" to be changed to "To configure 
> Range-based Compaction"
> Issue 3:-
> In link --> 
> https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md
>  the "Making this true degrade the LOAD performance" to be changed to "Making 
> this true degrades the LOAD performance"
> Issue 4 :-
> In link --> 
> https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md
>  the "user an either set to true" to be changed to "user can either set to 
> true"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4237) documentation issues in github master docs.

2021-08-24 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4237.

Fix Version/s: 2.3.0
   Resolution: Fixed

> documentation issues in github master docs.
> ---
>
> Key: CARBONDATA-4237
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4237
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.2.0
> Environment: Contents verified on Spark 2.4.5 and Spark 3.1.1
>Reporter: PRIYESH RANJAN
>Priority: Minor
> Fix For: 2.3.0
>
>
> +Modification 1 :+
> [https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md]
> Streaming table don't support alter table operation(alter add columns, drop 
> column, rename column, change datatypes and rename table name) so In 
> Constraint section of this doc ,it can be added.
>  
> 0: jdbc:hive2://100-112-148-186:22550/> alter table uniqdata_alter add 
> columns(id2 int);
>  Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Alter table add column is not allowed for streaming table
> 0: jdbc:hive2://100-112-148-186:22550/> alter table uniqdata_alter drop 
> columns(integer_column1);
>  Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Alter table drop column is not allowed for streaming table.
> 0: jdbc:hive2://100-112-148-186:22550/> ALTER TABLE uniqdata_alter rename TO 
> uniqdata_alterTable ;
>  Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Alter rename table is not allowed for streaming table.
>  
> +Modification 2 :+
> [https://github.com/apache/carbondata/blob/master/docs/file-structure-of-carbondata.md]
> Since Metadata folder contain segment, tablestatus and schema folder so  
> dictionary file related content inside metadata folder can be removed from 
> doc.
> eg : Metadata directory stores schema files, tablestatus and *dictionary 
> files (including .dict, .dictmeta and .sortindex).* These line from doc can 
> be modifed as Metadata directory stores schema files, tablestatus and 
> segments details.
>  
> +Modification 3 :+
> [https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md]
>  In the Quick Example section of following doc, it still converting date 
> datatype to Integer value and timestamp datatype to long value whereas now 
> they accept value as date and timestamp value respectively.
>  
> {{while (reader.hasNext()) {
>  Object[] row = (Object[]) reader.readNextRow();
>  System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t",
>  i, row[0], row[1], row[2], row[3], row[4], row[5],
>  +*new Date((day * ((int) row[6]))), new Timestamp((long) row[7] / 1000)*+, 
> row[8]
>  ));
> {{can be modified to}}
> while (reader.hasNext()) {
>  Object[] row = (Object[]) reader.readNextRow();
>  
> System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t",
>  i, row[0], row[1], row[2], row[3], row[4], row[5], +*row[6], row[7]*+,
>  row[8], row[9]
>  ));\{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-31 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407301#comment-17407301
 ] 

Indhumathi edited comment on CARBONDATA-4273 at 8/31/21, 12:47 PM:
---

[~bigicecream] what kind of data is present in location 
s3a://my-bucket/CarbonDataTests/will_not_work ?

 
is the location is empty (or) it has some partition folder which holds carbon 
data and index files ?


was (Author: indhumuthumurugesh):
[~bigicecream] what kind of data is present in location 
s3a://my-bucket/CarbonDataTests/will_not_work ?

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-31 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407301#comment-17407301
 ] 

Indhumathi commented on CARBONDATA-4273:


[~bigicecream] what kind of data is present in location 
s3a://my-bucket/CarbonDataTests/will_not_work ?

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4198) Support adding of single-level and multi-level map columns

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4198.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support adding of single-level and multi-level map columns
> --
>
> Key: CARBONDATA-4198
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4198
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akshay
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4164) Support adding of multi-level complex columns(array/struct)

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4164.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support adding of multi-level complex columns(array/struct)
> ---
>
> Key: CARBONDATA-4164
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4164
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Add multi-level(upto 3 nested levels) complex columns(only array and struct) 
> to carbon table. For example - 
> Command - 
> ALTER TABLE  ADD COLUMNS(arr array >)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4199) Support renaming of map columns including nested levels

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4199.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support renaming of map columns including nested levels
> ---
>
> Key: CARBONDATA-4199
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4199
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akshay
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4234) Alter change datatype at nested levels

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4234.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Alter change datatype at nested levels
> --
>
> Key: CARBONDATA-4234
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4234
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akshay
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4273:
---
Comment: was deleted

(was: Can you tell me, in which file environment you are facing this issue ? in 
Hadoop FileSystem or are you running this in local ?)

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-26 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405273#comment-17405273
 ] 

Indhumathi commented on CARBONDATA-4273:


Can you tell me, in which file environment you are facing this issue ? in 
Hadoop FileSystem or are you running this in local ?

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)

2021-09-15 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4277.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 
> 2.2.0 (Spark 2.4.5 and Spark 3.1.1)
> -
>
> Key: CARBONDATA-4277
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4277
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: Spark 2.4.5
> Spark 3.1.1
>Reporter: PURUJIT CHAUGULE
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
>  
>  
> *Issue 1 : Load on geospatial table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 
> 3.1.1) is failing*
> *STEPS:-*
>  # create table in CarbonData 2.1.0 : create table 
> source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS 
> carbondata TBLPROPERTIES 
> ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude,
>  
> latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100');
>  # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO 
> TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 
> 'QUOTECHAR'='|');
>  # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and 
> Spark 3.1.1)  clusters
>  # refresh table source_index_2_1_0;
>  # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE 
> source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.Exception: DataLoad failure: Data Loading failed for table 
> source_index_2_1_0
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for 
> table source_index_2_1_0
>  at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155)
> 

[jira] [Commented] (CARBONDATA-4150) Information about indexed datamap

2021-07-13 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379629#comment-17379629
 ] 

Indhumathi commented on CARBONDATA-4150:


@[~imsuyash] Hope, the information provided above by Mahesh has clarified your 
doubts.

Closing this JIRA, For further doubts, you can contact us through carbondata 
slack channel.

[https://join.slack.com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H9DVvNl6H2dg]

> Information about indexed datamap
> -
>
> Key: CARBONDATA-4150
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4150
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache 2.0.1 spark 2.4.5 hadoop 2.7.2
>Reporter: suyash yadav
>Priority: Critical
> Fix For: 2.0.1
>
>
> Hi Team,
>  
> We would like to know detailed information about indexed datamap and possible 
> use cases for this datamap.
> So please help us in getting answer to below queries:-
>  
> 1) What is an indexed datamap and related use cases.
> 2) how it is to be used,
> 3) any reference documents
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4108) How to connect carbondata with Hive

2021-07-13 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379640#comment-17379640
 ] 

Indhumathi commented on CARBONDATA-4108:


@[~imsuyash] Hope, the information provided above might solved your doubts.

Closing this JIRA, For further doubts, you can contact us through carbondata 
slack channel.

[https://join.slack.com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H9DVvNl6H2dg]

> How to connect carbondata with Hive
> ---
>
> Key: CARBONDATA-4108
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4108
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, spark 2.4.5, Hive 2.0
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,
> We would like to know how to connect hive with carbondata.We are doing a POC 
> where in we need to access carbondata table through hive but we need this 
> configuration with username and password. So our hive connection should have 
> some username and password configuration to connect to carbondata tables.
>  
> Could you guys please review above requirement and suggest steps to achieve 
> the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4108) How to connect carbondata with Hive

2021-07-13 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi closed CARBONDATA-4108.
--
Resolution: Not A Problem

> How to connect carbondata with Hive
> ---
>
> Key: CARBONDATA-4108
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4108
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, spark 2.4.5, Hive 2.0
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,
> We would like to know how to connect hive with carbondata.We are doing a POC 
> where in we need to access carbondata table through hive but we need this 
> configuration with username and password. So our hive connection should have 
> some username and password configuration to connect to carbondata tables.
>  
> Could you guys please review above requirement and suggest steps to achieve 
> the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3970) Carbondata 2.0.1 MV ERROR CarbonInternalMetastore$: Adding/Modifying tableProperties operation failed

2021-07-13 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379648#comment-17379648
 ] 

Indhumathi commented on CARBONDATA-3970:


HI, from the below exception, looks like issue with the configurations related 
to 'spark.sql.extensions=org.apache.spark.sql.CarbonExtensions' .

20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying 
tableProperties operation failed: org.apache.spark.sql.hive.HiveExternalCatalog 
cannot be cast to 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.

Closing this JIRA, for further clarifications, please contact through slack 
channel.

> Carbondata 2.0.1 MV  ERROR CarbonInternalMetastore$: Adding/Modifying 
> tableProperties operation failed
> --
>
> Key: CARBONDATA-3970
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3970
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query, hive-integration
>Affects Versions: 2.0.1
> Environment: CarbonData 2.0.1 with Spark 2.4.5
>Reporter: Sushant Sammanwar
>Priority: Major
>
> Hi ,
>  
> I am facing issues with materialized views  -  the query is not hitting the 
> view in the explain plan .I would really appreciate if you could help me on 
> this.
> Below are the details : 
> I am using Spark shell to connect to Carbon 2.0.1 using spark 2.4.5
> Underlying table has data loaded.
> I think problem is while create materialized view as i am getting a error 
> related to metastore.
>  
>  
> scala> carbon.sql("create MATERIALIZED VIEW agg_sales_mv as select country, 
> sex,sum(quantity),avg(price) from sales group by country,sex").show()
> 20/08/26 01:04:41 AUDIT audit: \{"time":"August 26, 2020 1:04:41 AM 
> IST","username":"root","opName":"CREATE MATERIALIZED 
> VIEW","opId":"16462372696035311","opStatus":"START"}
> 20/08/26 01:04:45 AUDIT audit: \{"time":"August 26, 2020 1:04:45 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377160819798","opStatus":"START"}
> 20/08/26 01:04:46 AUDIT audit: \{"time":"August 26, 2020 1:04:46 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377696791275","opStatus":"START"}
> 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377696791275","opStatus":"SUCCESS","opTime":"2326 
> ms","table":"NA","extraInfo":{}}
> 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377160819798","opStatus":"SUCCESS","opTime":"2955 
> ms","table":"default.agg_sales_mv","extraInfo":{"local_dictionary_threshold":"1","bad_record_path":"","table_blocksize":"1024","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","carbon.column.compressor":"snappy","mv_related_tables":"sales"}}
> 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying 
> tableProperties operation failed: 
> org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener
> 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying 
> tableProperties operation failed: 
> org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener
> 20/08/26 01:04:51 AUDIT audit: \{"time":"August 26, 2020 1:04:51 AM 
> IST","username":"root","opName":"CREATE MATERIALIZED 
> VIEW","opId":"16462372696035311","opStatus":"SUCCESS","opTime":"10551 
> ms","table":"NA","extraInfo":{"mvName":"agg_sales_mv"}}
> ++
> ||
> ++
> ++
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3970) Carbondata 2.0.1 MV ERROR CarbonInternalMetastore$: Adding/Modifying tableProperties operation failed

2021-07-13 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi closed CARBONDATA-3970.
--
Resolution: Won't Fix

> Carbondata 2.0.1 MV  ERROR CarbonInternalMetastore$: Adding/Modifying 
> tableProperties operation failed
> --
>
> Key: CARBONDATA-3970
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3970
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query, hive-integration
>Affects Versions: 2.0.1
> Environment: CarbonData 2.0.1 with Spark 2.4.5
>Reporter: Sushant Sammanwar
>Priority: Major
>
> Hi ,
>  
> I am facing issues with materialized views  -  the query is not hitting the 
> view in the explain plan .I would really appreciate if you could help me on 
> this.
> Below are the details : 
> I am using Spark shell to connect to Carbon 2.0.1 using spark 2.4.5
> Underlying table has data loaded.
> I think problem is while create materialized view as i am getting a error 
> related to metastore.
>  
>  
> scala> carbon.sql("create MATERIALIZED VIEW agg_sales_mv as select country, 
> sex,sum(quantity),avg(price) from sales group by country,sex").show()
> 20/08/26 01:04:41 AUDIT audit: \{"time":"August 26, 2020 1:04:41 AM 
> IST","username":"root","opName":"CREATE MATERIALIZED 
> VIEW","opId":"16462372696035311","opStatus":"START"}
> 20/08/26 01:04:45 AUDIT audit: \{"time":"August 26, 2020 1:04:45 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377160819798","opStatus":"START"}
> 20/08/26 01:04:46 AUDIT audit: \{"time":"August 26, 2020 1:04:46 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377696791275","opStatus":"START"}
> 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377696791275","opStatus":"SUCCESS","opTime":"2326 
> ms","table":"NA","extraInfo":{}}
> 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM 
> IST","username":"root","opName":"CREATE 
> TABLE","opId":"16462377160819798","opStatus":"SUCCESS","opTime":"2955 
> ms","table":"default.agg_sales_mv","extraInfo":{"local_dictionary_threshold":"1","bad_record_path":"","table_blocksize":"1024","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","carbon.column.compressor":"snappy","mv_related_tables":"sales"}}
> 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying 
> tableProperties operation failed: 
> org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener
> 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying 
> tableProperties operation failed: 
> org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener
> 20/08/26 01:04:51 AUDIT audit: \{"time":"August 26, 2020 1:04:51 AM 
> IST","username":"root","opName":"CREATE MATERIALIZED 
> VIEW","opId":"16462372696035311","opStatus":"SUCCESS","opTime":"10551 
> ms","table":"NA","extraInfo":{"mvName":"agg_sales_mv"}}
> ++
> ||
> ++
> ++
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4150) Information about indexed datamap

2021-07-13 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi closed CARBONDATA-4150.
--
Resolution: Not A Problem

> Information about indexed datamap
> -
>
> Key: CARBONDATA-4150
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4150
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache 2.0.1 spark 2.4.5 hadoop 2.7.2
>Reporter: suyash yadav
>Priority: Critical
> Fix For: 2.0.1
>
>
> Hi Team,
>  
> We would like to know detailed information about indexed datamap and possible 
> use cases for this datamap.
> So please help us in getting answer to below queries:-
>  
> 1) What is an indexed datamap and related use cases.
> 2) how it is to be used,
> 3) any reference documents
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3354) how to use filiters in datamaps

2021-07-13 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi closed CARBONDATA-3354.
--
Resolution: Not A Problem

> how to use filiters in datamaps
> ---
>
> Key: CARBONDATA-3354
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3354
> Project: CarbonData
>  Issue Type: Task
>  Components: core
>Affects Versions: 1.5.2
> Environment: apache carbon data 1.5.x
>Reporter: suyash yadav
>Priority: Major
>
> Hi Team,
>  
> We are doing a POC on apache carbon data so that we can verify if this 
> database is capable of handling amount of data we are collecting form network 
> devices.
>  
> We are stuck on few of our datamap related activities and have below queries: 
>  
>  # How to use timiebased filters while creating datamap.We tried a time based 
> condition while creating a datamap but it didn't work.
>  # How to create a timeseries datamap with column which is having value of 
> epoch time.Our query is like below:-  *carbon.sql("CREATE DATAMAP test ON 
> TABLE carbon_RT_test USING 'timeseries' DMPROPERTIES 
> ('event_time'='endMs','minute_granularity'='1',) AS SELECT sum(inOctets) FROM 
> carbon_RT_test GROUP BY inIfId")*
>  # *In above query endMs is having epoch time value.*
>  # We got an error like below: "Timeseries event time is only supported on 
> Timestamp column"
>  # Also we need to know if we can have a time granularity other then 1 like 
> in above query, can we have minute_granularity='5*'.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3354) how to use filiters in datamaps

2021-07-13 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379642#comment-17379642
 ] 

Indhumathi commented on CARBONDATA-3354:


@[~imsuyash] Hope, the information provided above by Akash has clarified your 
doubts.

Closing this JIRA, For further doubts, you can contact us through carbondata 
slack channel.

[https://join.slack.com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H9DVvNl6H2dg]

> how to use filiters in datamaps
> ---
>
> Key: CARBONDATA-3354
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3354
> Project: CarbonData
>  Issue Type: Task
>  Components: core
>Affects Versions: 1.5.2
> Environment: apache carbon data 1.5.x
>Reporter: suyash yadav
>Priority: Major
>
> Hi Team,
>  
> We are doing a POC on apache carbon data so that we can verify if this 
> database is capable of handling amount of data we are collecting form network 
> devices.
>  
> We are stuck on few of our datamap related activities and have below queries: 
>  
>  # How to use timiebased filters while creating datamap.We tried a time based 
> condition while creating a datamap but it didn't work.
>  # How to create a timeseries datamap with column which is having value of 
> epoch time.Our query is like below:-  *carbon.sql("CREATE DATAMAP test ON 
> TABLE carbon_RT_test USING 'timeseries' DMPROPERTIES 
> ('event_time'='endMs','minute_granularity'='1',) AS SELECT sum(inOctets) FROM 
> carbon_RT_test GROUP BY inIfId")*
>  # *In above query endMs is having epoch time value.*
>  # We got an error like below: "Timeseries event time is only supported on 
> Timestamp column"
>  # Also we need to know if we can have a time granularity other then 1 like 
> in above query, can we have minute_granularity='5*'.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4205) MINOR compaction getting triggered by it self while inserting data to a table

2021-07-13 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi closed CARBONDATA-4205.
--
Resolution: Won't Fix

> MINOR compaction getting triggered by it self while inserting data to a table
> -
>
> Key: CARBONDATA-4205
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4205
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, hadoop 2.7.2, spark 2.4.5
>Reporter: suyash yadav
>Priority: Major
>
> Hi Team we have created a table and also created a timeseries MV on it. Later 
> we tried to insert a some data from other table to this newly created table 
> but we have observed that while inserting ...MINOR compaction on the MV is 
> getting triggered by it self. It doesn't happen for all the insert but 
> whnever we insert 6 to 7th hour data and then 14 to 15 hour datathe MINOR 
> compaction gets triggered. Could you tell us why the MINOR compaction is 
> getting triggered by it self.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4205) MINOR compaction getting triggered by it self while inserting data to a table

2021-07-13 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379645#comment-17379645
 ] 

Indhumathi edited comment on CARBONDATA-4205 at 7/13/21, 6:34 AM:
--

Hi, Looks like this is not an issue.

 

>From your description, looks like auto-compaction is enabled in your 
>environment, which will trigger compaction, when threshold is reached.

In your case, threshold looks (6,0). Closing this JIRA, as it is not an issue.


was (Author: indhumuthumurugesh):
Hi, Looks like this is not an issue.

 

>From your description, looks like auto-compaction is enabled is enabled in 
>your environment, which will trigger compaction, when threshold is reached.

In your case, threshold looks (6,0). Closing this JIRA, as it is not an issue.

> MINOR compaction getting triggered by it self while inserting data to a table
> -
>
> Key: CARBONDATA-4205
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4205
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, hadoop 2.7.2, spark 2.4.5
>Reporter: suyash yadav
>Priority: Major
>
> Hi Team we have created a table and also created a timeseries MV on it. Later 
> we tried to insert a some data from other table to this newly created table 
> but we have observed that while inserting ...MINOR compaction on the MV is 
> getting triggered by it self. It doesn't happen for all the insert but 
> whnever we insert 6 to 7th hour data and then 14 to 15 hour datathe MINOR 
> compaction gets triggered. Could you tell us why the MINOR compaction is 
> getting triggered by it self.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4205) MINOR compaction getting triggered by it self while inserting data to a table

2021-07-13 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379645#comment-17379645
 ] 

Indhumathi commented on CARBONDATA-4205:


Hi, Looks like this is not an issue.

 

>From your description, looks like auto-compaction is enabled is enabled in 
>your environment, which will trigger compaction, when threshold is reached.

In your case, threshold looks (6,0). Closing this JIRA, as it is not an issue.

> MINOR compaction getting triggered by it self while inserting data to a table
> -
>
> Key: CARBONDATA-4205
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4205
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, hadoop 2.7.2, spark 2.4.5
>Reporter: suyash yadav
>Priority: Major
>
> Hi Team we have created a table and also created a timeseries MV on it. Later 
> we tried to insert a some data from other table to this newly created table 
> but we have observed that while inserting ...MINOR compaction on the MV is 
> getting triggered by it self. It doesn't happen for all the insert but 
> whnever we insert 6 to 7th hour data and then 14 to 15 hour datathe MINOR 
> compaction gets triggered. Could you tell us why the MINOR compaction is 
> getting triggered by it self.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4132) Numer of records not matching in MVs

2021-07-15 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381080#comment-17381080
 ] 

Indhumathi commented on CARBONDATA-4132:


Please refer the comment that i have added in CARBONDATA-4239 which can help 
you to use MV in better way for your scenario to get storage benefit and 
performance

> Numer of records not matching in MVs
> 
>
> Key: CARBONDATA-4132
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4132
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: Apache carbondata 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team, 
> We are working on a POC where we need to insert 300k records/second in a 
> table where we have already created Timeeries MVs with Minute,Hour,Day 
> granularity.
>  
> As per our the Minute based MV should contain 300K records till the insertion 
> of next minute data. Also the hour and Day based MVs should contain 300K 
> records till the arrival of next hour and next day data respectively.
>  
> But The count of records in MV is not coming out as per our expectation.It is 
> always more than our expectation.
> But the strange thing is, When we drop the MV and create the MV after 
> inserting the data in the table then the count if reocrds comes correct.So it 
> is clear there is no problem with MV definition and the data.
>  
> Kindly help us in resolving this issue on priority.Please find more details 
> below:
> Table definition:
> ===
> spark.sql("create table Flow_Raw_TS(export_ms bigint,exporter_ip 
> string,pkt_seq_num bigint,flow_seq_num int,src_ip string,dst_ip 
> string,protocol_id smallint,src_tos smallint,dst_tos smallint,raw_src_tos 
> smallint,raw_dst_tos smallint,src_mask smallint,dst_mask smallint,tcp_bits 
> int,src_port int,in_if_id bigint,in_if_entity_id bigint,in_if_enabled 
> boolean,dst_port int,out_if_id bigint,out_if_entity_id bigint,out_if_enabled 
> boolean,direction smallint,in_octets bigint,out_octets bigint,in_packets 
> bigint,out_packets bigint,next_hop_ip string,bgp_src_as_num 
> bigint,bgp_dst_as_num bigint,bgp_next_hop_ip string,end_ms timestamp,start_ms 
> timestamp,app_id string,app_name string,src_ip_group string,dst_ip_group 
> string,policy_qos_classification_hierarchy string,policy_qos_queue_id 
> bigint,worker_id int,day bigint ) stored as carbondata TBLPROPERTIES 
> ('local_dictionary_enable'='false')
> MV definition:
>  
> ==
> +*Minute based*+
> spark.sql("create materialized view Flow_Raw_TS_agg_001_min as select 
> timeseries(end_ms,'minute') as 
> end_ms,src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,dst_ip_group,protocol_id,bgp_src_as_num,
>  bgp_dst_as_num,policy_qos_classification_hierarchy, 
> policy_qos_queue_id,sum(in_octets) as octects, sum(in_packets) as packets, 
> sum(out_packets) as out_packets, sum(out_octets) as out_octects FROM 
> Flow_Raw_TS group by 
> timeseries(end_ms,'minute'),src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,
>  
> dst_ip_group,protocol_id,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy,
>  policy_qos_queue_id").show()
> +*Hour Based*+
> val startTime = System.nanoTime
> spark.sql("create materialized view Flow_Raw_TS_agg_001_hour as select 
> timeseries(end_ms,'hour') as end_ms,app_name,sum(in_octets) as octects, 
> sum(in_packets) as packets, sum(out_packets) as out_packets, sum(out_octets) 
> as out_octects, in_if_id,src_tos,src_ip_group, 
> dst_ip_group,protocol_id,src_ip, dst_ip,bgp_src_as_num, 
> bgp_dst_as_num,policy_qos_classification_hierarchy, policy_qos_queue_id FROM 
> Flow_Raw_TS group by 
> timeseries(end_ms,'hour'),in_if_id,app_name,src_tos,src_ip_group,dst_ip_group,protocol_id,src_ip,
>  dst_ip,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy, 
> policy_qos_queue_id").show()
> val endTime = System.nanoTime
> val elapsedSeconds = (endTime - startTime) / 1e9d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

2021-07-15 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381075#comment-17381075
 ] 

Indhumathi commented on CARBONDATA-4239:


 

For loading data(LOAD using csv, .txt, where each load has more data), 
Incremental loading will save time, and benefit load performance. If your case 
is INSERT scenario, then the mv table with automatic refresh(which is enabled 
by default), will not benefit in terms of both storage and performance.

For your scenario, i suggest you to use MV with Manual Refresh. You can Refresh 
mv, at some interval (say, at each hour, which will load 4 segments of main 
table to single segment of MV), which will benefit both storage cost and mv 
performance also.

To create MV with manual Refresh, use

create materialized view mv_name with deferred refresh as SELECT(..)

(or)

create materialized view mv_name properties('refresh_trigger_mode'='on_manual') 
as SELECT(...)

 

Refer 
https://github.com/apache/carbondata/blob/master/docs/mv-guide.md#loading-data

> Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly 
> -
>
> Key: CARBONDATA-4239
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4239
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, data-load
>Affects Versions: 2.1.1
> Environment: RHEL  spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 
>Reporter: Sushant Sammanwar
>Priority: Major
>  Labels: Materialistic_Views, materializedviews, refreshnodes
>
> Hi Team ,
> We are doing a POC with Carbondata using MV .
> Our MV doesnot contain AVG function as we wanted to utilize the feature of 
> incremental refresh.
> But with incremetnal refresh , we noticed the MV doesnot aggregate value 
> correctly.
> If a row is inserted , it creates another row in MV instead of adding 
> incremental value .
> As a result no. of rows in MV are almost same as raw table.
> This doesnot happen with full refresh MV. 
> Below is the data in MV with 3 rows :
> scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
> ++---+---+--+-+-++
> |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| 
> sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2|
> ++---+---+--+-+-++
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 
> 06:30:00|5412.68105| 31.345| 4578.112| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| 
> 392.2345| 392.2345| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| 
> 58.112| 58.112| 2020-09-25 05:30:00|
> ++---+---+--+-+-++
> Below , i am inserting data for 6th hour, and it should add incremental 
> values to 6th hour row of MV. 
> Note the data being inserted ; columns which are part of groupby clause are 
> having same values as existing data.
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25
>  05:30:00')").show()
> 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"START"}
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"START"}
> [Stage 40:=>(199 + 1) / 
> 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 
> ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}}
> 21/06/28 16:01:44 

[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

2021-07-15 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381175#comment-17381175
 ] 

Indhumathi commented on CARBONDATA-4239:


MV can be used for real-time data loading, even for every 15 mins data, but, 
with more data.

If you use INSERT to add a single row every 5/15 mins, then it will not give 
much benefit.

As i already suggested in previous comments, you can still use MV for your 
scenario, with manual refresh.

 

> Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly 
> -
>
> Key: CARBONDATA-4239
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4239
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, data-load
>Affects Versions: 2.1.1
> Environment: RHEL  spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 
>Reporter: Sushant Sammanwar
>Priority: Major
>  Labels: Materialistic_Views, materializedviews, refreshnodes
>
> Hi Team ,
> We are doing a POC with Carbondata using MV .
> Our MV doesnot contain AVG function as we wanted to utilize the feature of 
> incremental refresh.
> But with incremetnal refresh , we noticed the MV doesnot aggregate value 
> correctly.
> If a row is inserted , it creates another row in MV instead of adding 
> incremental value .
> As a result no. of rows in MV are almost same as raw table.
> This doesnot happen with full refresh MV. 
> Below is the data in MV with 3 rows :
> scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
> ++---+---+--+-+-++
> |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| 
> sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2|
> ++---+---+--+-+-++
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 
> 06:30:00|5412.68105| 31.345| 4578.112| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| 
> 392.2345| 392.2345| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| 
> 58.112| 58.112| 2020-09-25 05:30:00|
> ++---+---+--+-+-++
> Below , i am inserting data for 6th hour, and it should add incremental 
> values to 6th hour row of MV. 
> Note the data being inserted ; columns which are part of groupby clause are 
> having same values as existing data.
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25
>  05:30:00')").show()
> 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"START"}
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"START"}
> [Stage 40:=>(199 + 1) / 
> 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 
> ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}}
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"SUCCESS","opTime":"13137 
> ms","table":"default.fact_365_1_eutrancell_21","extraInfo":{}}
> +--+
> |Segment ID|
> +--+
> | 8|
> +--+
> Below we can see it has added another row of 2020-09-25 06:00:00 .
> Note: All values of columns which are part of groupby caluse have same value.
> This means there should have been single row for 2020-09-25 06:00:00 .
> scala> carbon.sql("select * from 
> 

[jira] [Resolved] (CARBONDATA-4091) Upgrade prestosql to 333 version

2021-08-11 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4091.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Upgrade prestosql to 333 version
> 
>
> Key: CARBONDATA-4091
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4091
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Currently carbondata is integrated with presto-sql 316, which is 1.5 years 
> older.
> There are many good features and optimization that came into presto like 
> dynamic filtering, Rubix data cache and some performance improvements.
>  
> It is always good to use latest version, latest version is presto-sql 348.
> But jumping from 316 to 348 will be too many changes. 
> So, to utilize these new features and based on customer demand, I suggest to 
> upgrade presto-sql to 333 version. 
> Later it will be again upgraded to more latest version in few months.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4256) SI creation on a complex column that includes child column with a dot(.) fails with parse exception.

2021-08-11 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4256.

Fix Version/s: 2.3.0
   Resolution: Fixed

> SI creation on a complex column that includes child column with a dot(.) 
> fails with parse exception.
> 
>
> Key: CARBONDATA-4256
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4256
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> sql("create table complextable (country struct, name string, id 
> Map, arr1 array, arr2 array) stored as 
> carbondata");
> sql("create index index_1 on table complextable(country.b) as 'carbondata'");
>  
> The above query fails with a parsing exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4255) Prohibit DROP DATABASE when databaselocation is inconsistent

2021-07-30 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4255.

Fix Version/s: 2.2.0
   Resolution: Fixed

> Prohibit DROP DATABASE when databaselocation is inconsistent
> 
>
> Key: CARBONDATA-4255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4255
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Hao
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> When carbon.storelocation and spark.sql.warehouse.dir are configured to 
> different values, the databaselocation maybe inconsistent. When DROP DATABASE 
> command is executed, maybe both location (carbon dblcation and hive 
> dblocation) will be cleared, which may confuses the users



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4248) Explain query with upper case column is throwing key not found exception.

2021-07-28 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4248.

Fix Version/s: 2.2.0
   Resolution: Fixed

> Explain query with upper case column is throwing key not found exception.
> -
>
> Key: CARBONDATA-4248
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4248
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> sql("drop table if exists carbon_table")
>  sql("drop table if exists parquet_table")
>  sql("create table IF NOT EXISTS carbon_table(`BEGIN_TIME` BIGINT," +
>  " `SAI_CGI_ECGI` STRING) stored as carbondata")
>  sql("create table IF NOT EXISTS parquet_table(CELL_NAME string, CGISAI 
> string)" +
>  " stored as parquet")
>  sql("explain extended with grpMainDatathroughput as (select" +
>  " from_unixtime(begin_time, 'MMdd') as data_time, SAI_CGI_ECGI from 
> carbon_table)," +
>  " grpMainData as (select * from grpMainDatathroughput a JOIN(select 
> CELL_NAME, CGISAI from" +
>  " parquet_table) b ON b.CGISAI=a.SAI_CGI_ECGI) " +
>  "select * from grpMainData a left join grpMainData b on 
> a.cell_name=b.cell_name")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4247) Add fix for timestamp issue induced due to Spark3.0 changes

2021-07-28 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4247:
---
Description: 
Add fix for timestamp issue induced due to Spark3.0 changes

 

With spark 3.1, timestamp values loaded before 1900 years gives wrong results

 

Refer 
https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/directdictionary/TimestampNoDictionaryColumnTestCase.scala

  was:Add fix for timestamp issue induced due to Spark3.0 changes


> Add fix for timestamp issue induced due to Spark3.0 changes
> ---
>
> Key: CARBONDATA-4247
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4247
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Vikram Ahuja
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Add fix for timestamp issue induced due to Spark3.0 changes
>  
> With spark 3.1, timestamp values loaded before 1900 years gives wrong results
>  
> Refer 
> https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/directdictionary/TimestampNoDictionaryColumnTestCase.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4215) When carbon.enable.vector.reader=false and upon adding a parquet segment through alter add segments in a carbon table , we are getting error in count(*)

2021-10-08 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4215.

Fix Version/s: 2.3.0
   Resolution: Fixed

> When carbon.enable.vector.reader=false and upon adding a parquet segment 
> through alter add segments in a carbon table , we are getting error in 
> count(*)
> 
>
> Key: CARBONDATA-4215
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4215
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.1
> Environment: 3 node FI
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> When carbon.enable.vector.reader=false and upon adding a parquet segment 
> through alter add segments in a carbon table , we are getting error in 
> count(*).
>  
> Test queries:
> --set carbon.enable.vector.reader=false in carbon.properties;
> use default;
> drop table if exists uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> drop table if exists uniqdata_parquet;
> CREATE TABLE uniqdata_parquet (cust_id int,cust_name 
> String,active_emui_version string, dob timestamp, doj timestamp, 
> bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), 
> decimal_column2 decimal(36,36),double_column1 double, double_column2 
> double,integer_column1 int) stored as parquet;
> insert into uniqdata_parquet select * from uniqdata;
> create database if not exists test;
> use test;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet');
>  select count(*) from uniqdata; -- throwing error class cast exception;
>  
> Error Log traces:
> java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch 
> cannot be cast to org.apache.spark.sql.catalyst.InternalRow
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:58)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>  at org.apache.spark.scheduler.Task.run(Task.scala:123)
>  at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:413)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1551)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2021-06-19 13:50:59,035 | WARN | task-result-getter-2 | Lost task 0.0 in 
> stage 4.0 (TID 28, localhost, executor driver): java.lang.ClassCastException: 
> 

[jira] [Resolved] (CARBONDATA-4292) Support spatial index creation using data frame

2021-10-12 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4292.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support spatial index creation using data frame
> ---
>
> Key: CARBONDATA-4292
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4292
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> To support spatial index creation using data frame



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4281) document update for range column and COLUMN_META_CACHE for complex column

2021-10-21 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4281.

Fix Version/s: 2.3.0
   Resolution: Fixed

> document update for range column and  COLUMN_META_CACHE for complex column
> --
>
> Key: CARBONDATA-4281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4281
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.3.0
> Environment: Contents verified on Spark 2.4.5 and Spark 3.1.1
>Reporter: PRIYESH RANJAN
>Priority: Minor
> Fix For: 2.3.0
>
>
> +Modification 1 :+
> [https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md]
> Range column and COLUMN_META_CACHE does not support complex columns .This 
> details need to be updated in doc.
>  
> *+Query:+*
> CREATE TABLE alter_com(intField INT,EDUCATED string ,rankk string) STORED AS 
> carbondata 
> TBLPROPERTIES('inverted_index'='intField','sort_columns'='intField','COLUMN_META_CACHE'='rankk','range_column'='EDUCATED');
> insert into alter_com values(1,'pti','tanj');
> ALTER TABLE alter_com ADD COLUMNS(arr1 array>, arr2 
> array>>) ;
> *+For Range column :+*
> 0: jdbc:hive2://linux-29:22550/> ALTER TABLE alter_com SET 
> TBLPROPERTIES('COLUMN_META_CACHE'='arr2');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.RuntimeException: Alter table newProperties operation failed: arr2 
> is a complex type column and *complex type is not allowed for the option(s): 
> column_meta_cach*e
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:387)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:276)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:276)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:290)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> +*For COLUMN_META_CACHE* :+
> 0: jdbc:hive2://linux-29:22550/> ALTER TABLE alter_com SET 
> TBLPROPERTIES('range_column'='arr2');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.RuntimeException: Alter table newProperties operation failed: 
> *RANGE_COLUMN doesn't support ARRAY data type: arr2*
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:387)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:276)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:276)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 

[jira] [Resolved] (CARBONDATA-4298) IS_EMPTY_DATA_BAD_RECORD property not supported for complex types.

2021-10-21 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4298.

Fix Version/s: 2.3.0
   Resolution: Fixed

> IS_EMPTY_DATA_BAD_RECORD property not supported for complex types.
> --
>
> Key: CARBONDATA-4298
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4298
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> {{IS_EMPTY_DATA_BAD_RECORD}} property not supported for complex types. A flag 
> to determine if empty record is to be considered a bad record or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4306) Query Performance issue with Spark 3.1

2021-10-20 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4306:
---
Description: Some rules are applied many times while running benchmark 
queries like TPCDS and TPCH

> Query Performance issue with Spark 3.1
> --
>
> Key: CARBONDATA-4306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4306
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Major
>
> Some rules are applied many times while running benchmark queries like TPCDS 
> and TPCH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4306) Query Performance issue with Spark 3.1

2021-10-20 Thread Indhumathi (Jira)
Indhumathi created CARBONDATA-4306:
--

 Summary: Query Performance issue with Spark 3.1
 Key: CARBONDATA-4306
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4306
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4243) Select filter query with to_date in filter fails for table with column_meta_cache configured also having SI

2021-10-07 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4243.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Select filter query with to_date in filter fails for table with 
> column_meta_cache configured also having SI
> ---
>
> Key: CARBONDATA-4243
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4243
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.2.0
> Environment: Spark 3.1.1, Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Create table with column_meta_cache, create secondary indexes and load data 
> to table. 
> Execute the Select filter query with to_date in filter.
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) stored as carbondata 
> TBLPROPERTIES('COLUMN_META_CACHE'='CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ');
>  CREATE INDEX indextable2 ON TABLE uniqdata (DOB) AS 'carbondata';
>  CREATE INDEX indextable3 ON TABLE uniqdata (DOJ) AS 'carbondata';
>  LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
>  
> *Issue: Select filter query with to_date in filter fails for table with 
> column_meta_cache configured also having SI*
> 0: jdbc:hive2://10.21.19.14:23040/default> select 
> max(to_date(DOB)),min(to_date(DOB)),count(to_date(DOB)) from uniqdata where 
> to_date(DOB)='1975-06-11' or to_date(Dn select 
> max(to_date(DOB)),min(to_date(DOB)),count(to_date(DOB)) from uniqdata where 
> to_date(DOB)='1975-06-11' or to_date(DOB)='1975-06-23';
>  Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, 
> tree:
>  !BroadCastSIFilterPushJoin [none#0|#0], [none#1|#1], Inner, BuildRight
>  :- *(6) ColumnarToRow
>  : +- Scan CarbonDatasourceHadoopRelation chetan.uniqdata[dob#847024|#847024] 
> Batched: true, DirectScan: false, PushedFilters: [((cast(input[0] as date) = 
> 1987) or (cast(in9))], ReadSchema: [dob]
>  +- *(8) HashAggregate(keys=[positionReference#847161|#847161], functions=[], 
> output=[positionReference#847161|#847161])
>  +- ReusedExchange [positionReference#847161|#847161], Exchange 
> hashpartitioning(positionReference#847161, 200), ENSURE_REQUIREMENTS, 
> [id=#195473|#195473]
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Sparation.scala:361)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> makeCopy, tree:
>  

[jira] [Resolved] (CARBONDATA-4303) Columns mismatch when insert into table with static partition

2021-10-25 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4303.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Columns mismatch when insert into table with static partition
> -
>
> Key: CARBONDATA-4303
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4303
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Reporter: Yahui Liu
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Following insert will have null value:
> CREATE TABLE select_from (i int, b string) stored as carbondata;
> CREATE TABLE table1 (i int) partitioned by (a int, b string) stored as 
> carbondata;
> insert into table select_from select 1, 'a';
> insert into table table1 partition(a='100',b) select 1, b from select_from;
> select * from table1;
> Expected:
> 1, 100, a
> Actual result:
> 1, 100, null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4207) MV data getting lost

2021-07-20 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384056#comment-17384056
 ] 

Indhumathi commented on CARBONDATA-4207:


Hi Suyash,

Can you provide the create MV sql to replicate the issue.

For FULL REFRESH case, when loading ( (INSERT-OVERWRITE)) is in progress to MV 
table, and load failed due to any system / application crash/failure, in that 
case, MV will not have any data and it will be disabled. Have to sync the data 
again using Refresh MV command to enable it.

Let me know, what is the reason for insertion failure also.

> MV data getting lost
> 
>
> Key: CARBONDATA-4207
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4207
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,
> We have observed one more issue, We had created one table and a timeseries MV 
> on it. We had loaded almost 15 hours of data into it and then when we were 
> loading 16th hour data the loading failed because of some reason but it 
> caused MV to go empty. Our mv has now zero rows. Could you please let us know 
> if there is any bug or this is how it is supposed to work. Because our MV did 
> not have any avg function so ideally the loading to MV should have been 
> incremental , and in that case MV should not have got impacted if the 
> subsequent hour loading to main table failed. Please have a look into this 
> issue. And let us know what information you need.
>  
> scala> spark.sql("insert into Flow_TS_2day_stats_04062021 select 
> start_time,end_time,source_ip_address,destintion_ip_address,appname,protocol_id,source_tos,src_as,dst_as,source_mask,destination_mask,dst_tos,input_pkt,output_pkt,input_byt,output_byt,source_port,destination_port,in_interface,out_interface
>  from Flow_TS_1day_stats_24052021  where start_time>='2021-03-04 07:00:00' 
> and start_time< '2021-03-04 09:00:00'").show()
>  
> [1:38|https://carbondataworkspace.slack.com/archives/D01GLHKSAFL/p1623226096008700]
> scala> spark.sql("insert into Flow_TS_2day_stats_04062021 select 
> start_time,end_time,source_ip_address,destintion_ip_address,appname,protocol_id,source_tos,src_as,dst_as,source_mask,destination_mask,dst_tos,input_pkt,output_pkt,input_byt,output_byt,source_port,destination_port,in_interface,out_interface
>  from Flow_TS_1day_stats_24052021  where start_time>='2021-03-04 15:00:00' 
> and start_time< '2021-03-04 16:00:00'").show()
> 21/06/06 14:25:33 AUDIT audit: \{"time":"June 6, 2021 2:25:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4069819623887063","opStatus":"START"}
> 21/06/06 14:44:14 AUDIT audit: \{"time":"June 6, 2021 2:44:14 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4070940294400824","opStatus":"START"}
> 21/06/06 16:06:05 AUDIT audit: \{"time":"June 6, 2021 4:06:05 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4070940294400824","opStatus":"SUCCESS","opTime":"4911240 
> ms","table":"default.Interface_Level_Agg_10min_MV_04062021","extraInfo":{"SegmentId":"6","DataSize":"4.52GB","IndexSize":"108.27KB"}}
> 21/06/06 16:06:09 AUDIT audit: \{"time":"June 6, 2021 4:06:09 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"4069819623887063","opStatus":"SUCCESS","opTime":"6036073 
> ms","table":"default.flow_ts_2day_stats_04062021","extraInfo":{"SegmentId":"6","DataSize":"12.37GB","IndexSize":"262.43KB"}}[^Stack_Trace]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

2021-07-14 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi closed CARBONDATA-4239.
--
Resolution: Won't Fix

> Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly 
> -
>
> Key: CARBONDATA-4239
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4239
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, data-load
>Affects Versions: 2.1.1
> Environment: RHEL  spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 
>Reporter: Sushant Sammanwar
>Priority: Major
>  Labels: Materialistic_Views, materializedviews, refreshnodes
>
> Hi Team ,
> We are doing a POC with Carbondata using MV .
> Our MV doesnot contain AVG function as we wanted to utilize the feature of 
> incremental refresh.
> But with incremetnal refresh , we noticed the MV doesnot aggregate value 
> correctly.
> If a row is inserted , it creates another row in MV instead of adding 
> incremental value .
> As a result no. of rows in MV are almost same as raw table.
> This doesnot happen with full refresh MV. 
> Below is the data in MV with 3 rows :
> scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
> ++---+---+--+-+-++
> |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| 
> sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2|
> ++---+---+--+-+-++
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 
> 06:30:00|5412.68105| 31.345| 4578.112| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| 
> 392.2345| 392.2345| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| 
> 58.112| 58.112| 2020-09-25 05:30:00|
> ++---+---+--+-+-++
> Below , i am inserting data for 6th hour, and it should add incremental 
> values to 6th hour row of MV. 
> Note the data being inserted ; columns which are part of groupby clause are 
> having same values as existing data.
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25
>  05:30:00')").show()
> 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"START"}
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"START"}
> [Stage 40:=>(199 + 1) / 
> 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 
> ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}}
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"SUCCESS","opTime":"13137 
> ms","table":"default.fact_365_1_eutrancell_21","extraInfo":{}}
> +--+
> |Segment ID|
> +--+
> | 8|
> +--+
> Below we can see it has added another row of 2020-09-25 06:00:00 .
> Note: All values of columns which are part of groupby caluse have same value.
> This means there should have been single row for 2020-09-25 06:00:00 .
> scala> carbon.sql("select * from 
> fact_365_1_eutrancell_21_30_minute").show(1000,false)
> ++--+---+--+-+-++
> |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts 
> |sum_value 

[jira] [Commented] (CARBONDATA-4132) Numer of records not matching in MVs

2021-07-14 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380491#comment-17380491
 ] 

Indhumathi commented on CARBONDATA-4132:


Hi Suyash,

I think, this is not an issue. The data that is stored in MV is 
partially-aggregated data, becasue of incremental-dataloading concept.

Doing select * /count on mv_table will give partially-aggregated results. If 
you want to check the data correctness, fire the query(on which you have 
created MV), which will do the final aggregation on partial-aggregated data 
stored in mv.

That should give you correct results. It is recommended to check the results of 
MV query, to check data correctness.

To check, if that query is hitting MV table or not, you can run the EXPLAIN 
command with query and check in the plan.

Refer [https://github.com/apache/carbondata/blob/master/docs/mv-guide.md] for 
more info

> Numer of records not matching in MVs
> 
>
> Key: CARBONDATA-4132
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4132
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: Apache carbondata 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team, 
> We are working on a POC where we need to insert 300k records/second in a 
> table where we have already created Timeeries MVs with Minute,Hour,Day 
> granularity.
>  
> As per our the Minute based MV should contain 300K records till the insertion 
> of next minute data. Also the hour and Day based MVs should contain 300K 
> records till the arrival of next hour and next day data respectively.
>  
> But The count of records in MV is not coming out as per our expectation.It is 
> always more than our expectation.
> But the strange thing is, When we drop the MV and create the MV after 
> inserting the data in the table then the count if reocrds comes correct.So it 
> is clear there is no problem with MV definition and the data.
>  
> Kindly help us in resolving this issue on priority.Please find more details 
> below:
> Table definition:
> ===
> spark.sql("create table Flow_Raw_TS(export_ms bigint,exporter_ip 
> string,pkt_seq_num bigint,flow_seq_num int,src_ip string,dst_ip 
> string,protocol_id smallint,src_tos smallint,dst_tos smallint,raw_src_tos 
> smallint,raw_dst_tos smallint,src_mask smallint,dst_mask smallint,tcp_bits 
> int,src_port int,in_if_id bigint,in_if_entity_id bigint,in_if_enabled 
> boolean,dst_port int,out_if_id bigint,out_if_entity_id bigint,out_if_enabled 
> boolean,direction smallint,in_octets bigint,out_octets bigint,in_packets 
> bigint,out_packets bigint,next_hop_ip string,bgp_src_as_num 
> bigint,bgp_dst_as_num bigint,bgp_next_hop_ip string,end_ms timestamp,start_ms 
> timestamp,app_id string,app_name string,src_ip_group string,dst_ip_group 
> string,policy_qos_classification_hierarchy string,policy_qos_queue_id 
> bigint,worker_id int,day bigint ) stored as carbondata TBLPROPERTIES 
> ('local_dictionary_enable'='false')
> MV definition:
>  
> ==
> +*Minute based*+
> spark.sql("create materialized view Flow_Raw_TS_agg_001_min as select 
> timeseries(end_ms,'minute') as 
> end_ms,src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,dst_ip_group,protocol_id,bgp_src_as_num,
>  bgp_dst_as_num,policy_qos_classification_hierarchy, 
> policy_qos_queue_id,sum(in_octets) as octects, sum(in_packets) as packets, 
> sum(out_packets) as out_packets, sum(out_octets) as out_octects FROM 
> Flow_Raw_TS group by 
> timeseries(end_ms,'minute'),src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,
>  
> dst_ip_group,protocol_id,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy,
>  policy_qos_queue_id").show()
> +*Hour Based*+
> val startTime = System.nanoTime
> spark.sql("create materialized view Flow_Raw_TS_agg_001_hour as select 
> timeseries(end_ms,'hour') as end_ms,app_name,sum(in_octets) as octects, 
> sum(in_packets) as packets, sum(out_packets) as out_packets, sum(out_octets) 
> as out_octects, in_if_id,src_tos,src_ip_group, 
> dst_ip_group,protocol_id,src_ip, dst_ip,bgp_src_as_num, 
> bgp_dst_as_num,policy_qos_classification_hierarchy, policy_qos_queue_id FROM 
> Flow_Raw_TS group by 
> timeseries(end_ms,'hour'),in_if_id,app_name,src_tos,src_ip_group,dst_ip_group,protocol_id,src_ip,
>  dst_ip,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy, 
> policy_qos_queue_id").show()
> val endTime = System.nanoTime
> val elapsedSeconds = (endTime - startTime) / 1e9d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4250) Ignore presto test cases as they are failing randomly, fix by below JIRA issue

2021-07-27 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4250.

Fix Version/s: 2.2.0
   Resolution: Fixed

> Ignore presto test cases as they are failing randomly, fix by below JIRA issue
> --
>
> Key: CARBONDATA-4250
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4250
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Jira raised for presto random test case failure fix in concurrent case.
>  *CARBONDATA-4249*
> Please get more details on this JIRA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4253) Optimize Rename Table Performance

2021-07-27 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4253.

Resolution: Fixed

> Optimize Rename Table Performance
> -
>
> Key: CARBONDATA-4253
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4253
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.1.1
>Reporter: Hao
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The rename command will list partitions for the table, but the partitions 
> information is not actually used. If the table has hundreds of thousands 
> partitions, the performance of rename table will degrade a lot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4251) optimize clean index file performance

2021-07-27 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4251.

Resolution: Fixed

> optimize clean index file performance
> -
>
> Key: CARBONDATA-4251
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4251
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2.0
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When cleanfile cleans up data, it cleans up all the carbonindex and 
> carbonmergeindex that once existed, even though many carbonindex have been 
> all deleted, which have been merged into carbonergeindex. considering that 
> there are tens of thousands of carbonindex that once existed after the 
> completion of the compaction, the clean file command will take serveral hours.
> Here, we just need to clean up the existing files, carbonmergeindex or 
> carbonindex files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4326) mv created in beeline not hitting in sql/shell and vice versa if both beeline and sql/shell are running in parellel

2022-03-04 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4326.

Fix Version/s: 2.3.0
   Resolution: Fixed

> mv created in beeline not hitting in sql/shell and vice versa if both beeline 
> and sql/shell are running in parellel
> ---
>
> Key: CARBONDATA-4326
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4326
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [Steps] :-
> When MV is created in spark-shell/spark-sql on table created using Spark 
> Dataframe, Explain query hits MV in spark-shell/spark-sql, but doesnt hit MV 
> in spark-beeline, Same is the case when MV is created in spark-beeline on 
> table created using Spark Dataframe, query hits MV in spark-beeline, but 
> doesnt hit MV in spark-shell/spark-sql. This issue is faced when both 
> sessions are running in parallel during MV Creation. On restarting the 
> sessions of Spark-shell/ Spark-beeline, query hits the MV in both sessions.
> Queries Table created using Spark Dataframe:
> val geoSchema = StructType(Seq(StructField("timevalue", LongType, nullable = 
> true), StructField("longitude", LongType, nullable = false), 
> StructField("latitude", LongType, nullable = false))) val geoDf = 
> sqlContext.read.option("delimiter", ",").option("header", 
> "true").schema(geoSchema).csv("hdfs://hacluster/geodata/geodata.csv")
> sql("drop table if exists source_index_df").show() geoDf.write 
> .format("carbondata") .option("tableName", "source_index_df") 
> .mode(SaveMode.Overwrite) .save()
> Queries for MV created in spark-shell: sql("CREATE MATERIALIZED VIEW 
> datamap_mv1 as select latitude,longitude from source_index_df group by 
> latitude,longitude").show() sql("explain select latitude,longitude from 
> source_index_df group by latitude,longitude").show(100,false)
> Queries for MV created in spark-beeline/spark-sql: CREATE MATERIALIZED VIEW 
> datamap_mv1 as select latitude,longitude from source_index_df group by 
> latitude,longitude; explain select latitude,longitude from source_index_df 
> group by latitude,longitude;
> [Expected Result] :- mv created in beeline should hit the sql/shell and vice 
> versa if both beeline and sql/shell are running in parellel
> [Actual Issue]:- mv created in beeline not hitting in sql/shell and vice 
> versa if both beeline and sql/shell are running in parellel
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4325) Documentation Issue in Github Link: https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md and fix partition table creation with

2022-03-04 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4325.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Documentation Issue in Github Link: 
> https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md
>  and fix partition table creation with df issue
> 
>
> Key: CARBONDATA-4325
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4325
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Reporter: PURUJIT CHAUGULE
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: 
> Partition_Table_Creation_Fail_With_Spatial_Index_Property.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Scenario 1:*
> [https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md]
>  :
>  * Under _*SUPPORTED Options,*_ mention all supported Table Properties. 
> Following are list of supported Table Properties not mentioned in the 
> document:
>  * 
>  ** bucketNumber
>  ** bucketColumns
>  ** streaming
>  ** timestampformat
>  ** dateformat
>  ** SPATIAL_INDEX
>  ** SPATIAL_INDEX_type
>  ** SPATIAL_INDEX_sourcecolumns
>  ** SPATIAL_INDEX_originLatitude
>  ** SPATIAL_INDEX_gridSize
>  ** SPATIAL_INDEX_conversionRatio
>  ** SPATIAL_INDEX_class
> *Scenario 2:*
> _Partition Table Creation Using Spark Dataframe Fails with Spatial Index 
> Property._
> Queries:
> val geoSchema = StructType(Seq(StructField("timevalue", LongType, nullable = 
> true),
>       StructField("longitude", LongType, nullable = false),
>       StructField("latitude", LongType, nullable = false)))
> val geoDf = sqlContext.read.option("delimiter", ",").option("header", 
> "true").schema(geoSchema).csv("hdfs://hacluster/geodata/geodata.csv")
> sql("drop table if exists source_index_df").show()
> geoDf.write
>       .format("carbondata")
>       .option("tableName", "source_index_df")
>       .option("partitionColumns", "timevalue")
>       .option("SPATIAL_INDEX", "mygeohash")
>       .option("SPATIAL_INDEX.mygeohash.type", "geohash")
>       .option("spatial_index.MyGeoHash.sourcecolumns", "longitude, latitude")
>       .option("SPATIAL_INDEX.MyGeoHash.originLatitude", "39.832277")
>       .option("SPATIAL_INDEX.mygeohash.gridSize", "50")
>       .option("spatial_index.mygeohash.conversionRatio", "100")
>       .option("spatial_index.mygeohash.CLASS", 
> "org.apache.carbondata.geo.GeoHashIndex")
>       .mode(SaveMode.Overwrite)
>       .save()
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4322) Insert into local sort partition table select * from text table launch thousands tasks

2022-02-14 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4322.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Insert into local sort partition table select * from text table launch 
> thousands tasks
> --
>
> Key: CARBONDATA-4322
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4322
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> [Reproduce steps]
>  # CREATE TABLE partitionthree1 (empno int, doj Timestamp, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int, empname String, designation String) PARTITIONED 
> BY (workgroupcategory int) STORED AS carbondata 
> tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname');
>  # CREATE TABLE partitionthree2 (empno int, doj Timestamp, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int, empname String, designation String) PARTITIONED 
> BY (workgroupcategory int);
>  # LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE 
> partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"', 
> 'TIMESTAMPFORMAT'='dd-MM-');
>  # set hive.exec.dynamic.partition.mode=nonstrict;
>  # insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
>  # insert into partitionthree1 select * from partitionthree2;
>  
> [Expect Result]
> Step 6 only launches number of tasks equal to number of nodes.
>  
> [Current Behavior]
> Number of tasks far larger than number of nodes.
>  
> [Impact]
> In several product sites, query performance get impact significantly.
>  
> [Initial analysis]
> Insert into non partition local sort table will launch number of tasks equal 
> to number of nodes, make partition table the same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (CARBONDATA-4329) External Table Creation overwrites schema and drop external table deletes the location data

2022-03-24 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4329:
---
Description: 
Issue 1:

When we create external table on transactional table location, schema file will 
be present. While creating external table, which is also transactional, the 
schema file is overwritten

Issue 2:

If external table is created on a location, where the source table already 
exists, on drop external table, it is deleting the table data. Query on the 
source table fails

> External Table Creation overwrites schema and drop external table deletes the 
> location data
> ---
>
> Key: CARBONDATA-4329
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4329
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Major
>
> Issue 1:
> When we create external table on transactional table location, schema file 
> will be present. While creating external table, which is also transactional, 
> the schema file is overwritten
> Issue 2:
> If external table is created on a location, where the source table already 
> exists, on drop external table, it is deleting the table data. Query on the 
> source table fails



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CARBONDATA-4329) External Table Creation overwrites schema and drop external table deletes the location data

2022-03-24 Thread Indhumathi (Jira)
Indhumathi created CARBONDATA-4329:
--

 Summary: External Table Creation overwrites schema and drop 
external table deletes the location data
 Key: CARBONDATA-4329
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4329
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4328) Load parquet table with options error message fix

2022-03-29 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4328.

Fix Version/s: 2.3.1
   Resolution: Fixed

> Load parquet table with options error message fix
> -
>
> Key: CARBONDATA-4328
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4328
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.3.1
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> If parquet table is created and load statement with options is triggerred, 
> then its failing with NoSuchTableException: Table ${tableIdentifier.table} 
> does not exist.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CARBONDATA-4331) MV query displays incorrect projection name

2022-04-13 Thread Indhumathi (Jira)
Indhumathi created CARBONDATA-4331:
--

 Summary: MV query displays incorrect projection name
 Key: CARBONDATA-4331
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4331
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (CARBONDATA-4331) MV query displays incorrect projection name

2022-04-13 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4331:
---
Description: 
create materialized view mv_alias as select empname as e1, designation from 
fact_table1;

select empname,designation from fact_table1;

Actual:
||e1||designation||
|a|abc|

Expected:
||empname||designation||
|a|abc|

> MV query displays incorrect projection name
> ---
>
> Key: CARBONDATA-4331
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4331
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Minor
>
> create materialized view mv_alias as select empname as e1, designation from 
> fact_table1;
> select empname,designation from fact_table1;
> Actual:
> ||e1||designation||
> |a|abc|
> Expected:
> ||empname||designation||
> |a|abc|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4335) Disable the mv which is enabled default

2022-06-01 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4335.

Fix Version/s: 2.3.1
   Resolution: Fixed

> Disable the mv which is enabled default
> ---
>
> Key: CARBONDATA-4335
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4335
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> 1) disable carbon.enable.mv in default
> 2) mvrewriter rule causes concurrent query time consuming and mv used rarely,
> 3) if want we can enable mv and use it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (CARBONDATA-4336) Table Status Versioning

2022-05-12 Thread Indhumathi (Jira)
Indhumathi created CARBONDATA-4336:
--

 Summary: Table Status Versioning
 Key: CARBONDATA-4336
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4336
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi


Currently, carbondata will store the records of a transaction 
(load/insert/IUD/Add/drop segment) in a metadata file named `{_}tablestatus’{_} 
which will be present in the Metadata directory. If the tablestatus file is 
lost, then the metadata for the transactions cannot be recovered directly, as 
there is no previous version file available for tablestatus. Hence, if we 
support versioning for tablestatus files, then it will be easy to recover the 
current version tablestatus meta from previous version tablestatus files.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (CARBONDATA-4341) Drop Index Fails after TABLE RENAME

2022-06-21 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4341.

Fix Version/s: 2.3.1
   Resolution: Fixed

>  Drop Index Fails after TABLE RENAME
> 
>
> Key: CARBONDATA-4341
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4341
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Drop Index Fails after TABLE RENAME
> [Steps] :-
> From spark beeline the queries are executed.
> drop table if exists uniqdata; CREATE TABLE uniqdata(CUST_ID int ,CUST_NAME 
> string,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, 
> INTEGER_COLUMN1 int) STORED AS carbondata; LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS 
> ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); create index 
> uniq2_index on table uniqdata(CUST_NAME) as 'carbondata'; alter table 
> uniqdata rename to uniqdata_i; drop index if exists uniq2_index on uniqdata_i;
> [Expected Result] :- Drop Index should be success after TABLE RENAME
> [Actual Issue]:- Drop Index Fails after TABLE RENAME
> Error message: Table or view 'uniqdata_i' not found in database 'default';



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (CARBONDATA-4344) Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" error

2022-06-27 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4344.

Fix Version/s: 2.3.1
   Resolution: Fixed

>  Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE 
> column: does not exist in table. Please check the DDL" error
> --
>
> Key: CARBONDATA-4344
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4344
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [Steps] :-
> From spark beeline the queries are executed.
> drop table if exists uniqdata;
> CREATE TABLE uniqdata(CUST_ID int ,CUST_NAME string,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
> int) STORED AS carbondata;
> LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
> uniqdata OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME 
> ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
> alter table uniqdata add columns(a int);
> take store from hdfs
> Drop table uniqdata;
> Put store in hdfs
> refresh table uniqdata;
> drop MATERIALIZED VIEW if exists uniq2_mv;
> create MATERIALIZED VIEW uniq2_mv as select CUST_NAME, sum(CUST_ID) from 
> uniqdata group by CUST_NAME;
> [Expected Result] :- Create MV fails should be successful for table created 
> in older version.
> [Actual Issue]:- Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL 
> _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" 
> error



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (CARBONDATA-4345) update/delete operation is failing when deleting the other format segments

2022-06-30 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4345.

Fix Version/s: 2.3.1
   Resolution: Fixed

> update/delete operation is failing when deleting the other format segments
> --
>
> Key: CARBONDATA-4345
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4345
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> 1) create carbon table
> 2) create parquet/orc tables
> 3) add other formate segments in carbontable by alter add segment command
> 4) delete the other format segments which is added in step3
> 5) try to perform update/delete operation in carbondata. They should not 
> fail, but currently failing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CARBONDATA-4342) Desc Column Shows new Column added, even though alter add column operation failed

2022-06-21 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4342:
---
Description: 
# Create table and add new column.
 # If Alter add column failed in the final step, then the revert operation is 
unsuccessful

 

> Desc Column Shows new Column added, even though alter add column operation 
> failed
> -
>
> Key: CARBONDATA-4342
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4342
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Minor
>
> # Create table and add new column.
>  # If Alter add column failed in the final step, then the revert operation is 
> unsuccessful
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (CARBONDATA-4342) Desc Column Shows new Column added, even though alter add column operation failed

2022-06-21 Thread Indhumathi (Jira)
Indhumathi created CARBONDATA-4342:
--

 Summary: Desc Column Shows new Column added, even though alter add 
column operation failed
 Key: CARBONDATA-4342
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4342
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (CARBONDATA-4338) dropped partition data moving to trash

2022-07-18 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4338.

Fix Version/s: 2.3.1
   Resolution: Fixed

> dropped partition data moving to trash
> --
>
> Key: CARBONDATA-4338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4338
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Priority: Minor
> Fix For: 2.3.1
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> h1. dropped partition data moving to trash



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CARBONDATA-4339) Nullpointer exception during load overwrite operation

2022-06-23 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4339.

Fix Version/s: 2.3.1
   Resolution: Fixed

> Nullpointer exception during load overwrite operation
> -
>
> Key: CARBONDATA-4339
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4339
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.3.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Nullpointer exception when load overwrite is performed after delete segment 
> and clean files with force option true



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (CARBONDATA-4330) Incremental‌ ‌Dataload‌ ‌of Average aggregate in ‌MV‌‌

2022-04-28 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4330.

Fix Version/s: 2.3.1
   Resolution: Fixed

>  Incremental‌ ‌Dataload‌ ‌of Average aggregate in ‌MV‌‌
> ---
>
> Key: CARBONDATA-4330
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4330
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Currently, whenever MV is created with average aggregate, a full refresh is 
> done meaning it reloads the whole MV for any newly added segments. This will 
> slow down the loading. With incremental data load, only the segments that are 
> newly added can be loaded to the MV.
> If avg is present, rewrite the query with the sum and count of the columns to 
> create MV and use them to derive avg.
> Refer: 
> https://docs.google.com/document/d/1kPEMCX50FLZcmyzm6kcIQtUH9KXWDIqh-Hco7NkTp80/edit



--
This message was sent by Atlassian Jira
(v8.20.7#820007)