[jira] [Created] (CARBONDATA-4140) The carbon implement of DataSourceV2

2021-02-25 Thread David Cai (Jira)
David Cai created CARBONDATA-4140:
-

 Summary: The carbon implement of DataSourceV2
 Key: CARBONDATA-4140
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4140
 Project: CarbonData
  Issue Type: Sub-task
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4139) Integration for Spark 3 and Hadoop 3

2021-02-25 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-4139:
--
Summary: Integration for Spark 3 and Hadoop 3  (was: integration for Spark 
3 and Hadoop 3)

> Integration for Spark 3 and Hadoop 3
> 
>
> Key: CARBONDATA-4139
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4139
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4139) integration for Spark 3 and Hadoop 3

2021-02-25 Thread David Cai (Jira)
David Cai created CARBONDATA-4139:
-

 Summary: integration for Spark 3 and Hadoop 3
 Key: CARBONDATA-4139
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4139
 Project: CarbonData
  Issue Type: Sub-task
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4138) Carbon Expression Reorder instead of Spark Filter Reorder

2021-02-25 Thread David Cai (Jira)
David Cai created CARBONDATA-4138:
-

 Summary: Carbon Expression Reorder instead of Spark Filter Reorder
 Key: CARBONDATA-4138
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4138
 Project: CarbonData
  Issue Type: Sub-task
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter

2021-02-25 Thread David Cai (Jira)
David Cai created CARBONDATA-4137:
-

 Summary: Refactor CarbonDataSourceScan without Spark Filter
 Key: CARBONDATA-4137
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4137
 Project: CarbonData
  Issue Type: Sub-task
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4136) Support Spark 3 and Hadoop 3

2021-02-25 Thread David Cai (Jira)
David Cai created CARBONDATA-4136:
-

 Summary: Support Spark 3 and Hadoop 3
 Key: CARBONDATA-4136
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4136
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4075) Should refactor to use withEvents instead of fireEvent

2020-12-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-4075:
--
Summary: Should refactor to use withEvents instead of fireEvent  (was: 
Should refactor carbon to use withEvents instead of fireEvent)

> Should refactor to use withEvents instead of fireEvent
> --
>
> Key: CARBONDATA-4075
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4075
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4075) Should refactor carbon to use withEvents instead of fireEvent

2020-12-06 Thread David Cai (Jira)
David Cai created CARBONDATA-4075:
-

 Summary: Should refactor carbon to use withEvents instead of 
fireEvent
 Key: CARBONDATA-4075
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4075
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4074) Should clean stale data in success segments

2020-12-06 Thread David Cai (Jira)
David Cai created CARBONDATA-4074:
-

 Summary: Should clean stale data in success segments
 Key: CARBONDATA-4074
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4074
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


cleaning stale data in success segments include the following parts. 

1.  clean stale delete delta (when force is true)

2. clean stale small  files for index table

3. clean stale data files for loading/compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager

2020-11-27 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-4062:
--
Description: 
To prevent accidental deletion of data, carbon will introduce data trash 
management. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.

Data trash management is a part of carbon data lifecycle management. Clean 
files as a data trash manager should contain the following two parts.
part 1: manage metadata-indexed data trash.
  This data is at the original place of the table and indexed by metadata. 
carbon manages this data by metadata index and should avoid using listFile() 
interface.
part 2: manage ".Trash" folder.
   Now ".Trash" folder is without metadata index, and the operation on it bases 
on timestamp and listFile() interface. In the future, carbon will index 
".Trash" folder to improve data trash management.

  was:
To prevent accidental deletion of data, carbon will introduce data trash 
management. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.

Data trash management is a part of carbon data lifecycle management. Clean 
files as a data trash manager should contain the following two parts.
 part 1: manage metadata-indexed data trash.
 this data should be at the original place
 part 2: manage ".Trash" folder.
 Now this ".Trash" folder is without metadata index, and the operation on it 
will depend on timestamp and listFile interface. It should be improve in the 
future.


> Should make clean files become data trash manager
> -
>
> Key: CARBONDATA-4062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Major
>
> To prevent accidental deletion of data, carbon will introduce data trash 
> management. It will provide buffer time for accidental deletion of data to 
> roll back the delete operation.
> Data trash management is a part of carbon data lifecycle management. Clean 
> files as a data trash manager should contain the following two parts.
> part 1: manage metadata-indexed data trash.
>   This data is at the original place of the table and indexed by metadata. 
> carbon manages this data by metadata index and should avoid using listFile() 
> interface.
> part 2: manage ".Trash" folder.
>    Now ".Trash" folder is without metadata index, and the operation on it 
> bases on timestamp and listFile() interface. In the future, carbon will index 
> ".Trash" folder to improve data trash management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager

2020-11-27 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-4062:
--
Description: 
To prevent accidental deletion of data, carbon will introduce data trash 
management. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.

Data trash management is a part of carbon data lifecycle management. Clean 
files as a data trash manager should contain the following two parts.
 part 1: manage metadata-indexed data trash.
 this data should be at the original place
 part 2: manage ".Trash" folder.
 Now this ".Trash" folder is without metadata index, and the operation on it 
will depend on timestamp and listFile interface. It should be improve in the 
future.

  was:
To prevent accidental deletion of data, carbon will introduce data trash 
management. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.

Data trash management is a part of carbon data lifecycle management. Clean 
files as a data trash manager should contain the following two parts.
 part 1: manage metadata-indexed data trash.
 this data should be at the original place
 part 2: manage ".Trash" folder.
 Now this ".Trash" folder is without metadata index, and the operation on it 
will depend on listFile interface. It should be improve in the future.


> Should make clean files become data trash manager
> -
>
> Key: CARBONDATA-4062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Major
>
> To prevent accidental deletion of data, carbon will introduce data trash 
> management. It will provide buffer time for accidental deletion of data to 
> roll back the delete operation.
> Data trash management is a part of carbon data lifecycle management. Clean 
> files as a data trash manager should contain the following two parts.
>  part 1: manage metadata-indexed data trash.
>  this data should be at the original place
>  part 2: manage ".Trash" folder.
>  Now this ".Trash" folder is without metadata index, and the operation on it 
> will depend on timestamp and listFile interface. It should be improve in the 
> future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager

2020-11-27 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-4062:
--
Description: 
To prevent accidental deletion of data, carbon will introduce data trash 
management. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.

Data trash management is a part of carbon data lifecycle management. Clean 
files as a data trash manager should contain the following two parts.
 part 1: manage metadata-indexed data trash.
 this data should be at the original place
 part 2: manage ".Trash" folder.
 Now this ".Trash" folder is without metadata index, and the operation on it 
will depend on listFile interface. It should be improve in the future.

  was:To prevent accidental deletion of data, carbon introduced a data garbage 
manager. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.


> Should make clean files become data trash manager
> -
>
> Key: CARBONDATA-4062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Major
>
> To prevent accidental deletion of data, carbon will introduce data trash 
> management. It will provide buffer time for accidental deletion of data to 
> roll back the delete operation.
> Data trash management is a part of carbon data lifecycle management. Clean 
> files as a data trash manager should contain the following two parts.
>  part 1: manage metadata-indexed data trash.
>  this data should be at the original place
>  part 2: manage ".Trash" folder.
>  Now this ".Trash" folder is without metadata index, and the operation on it 
> will depend on listFile interface. It should be improve in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager

2020-11-27 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-4062:
--
Summary: Should make clean files become data trash manager  (was: should 
Make clean files become data trash manager)

> Should make clean files become data trash manager
> -
>
> Key: CARBONDATA-4062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Major
>
> To prevent accidental deletion of data, carbon introduced a data garbage 
> manager. It will provide buffer time for accidental deletion of data to roll 
> back the delete operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4062) should Make clean files become data trash manager

2020-11-27 Thread David Cai (Jira)
David Cai created CARBONDATA-4062:
-

 Summary: should Make clean files become data trash manager
 Key: CARBONDATA-4062
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


To prevent accidental deletion of data, carbon introduced a data garbage 
manager. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4015) RetryCount and retryInterval of updateLock and compactLock is fixed as 3 when they try to get lock

2020-10-08 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-4015.
---
Resolution: Fixed

> RetryCount and retryInterval of updateLock and compactLock is fixed as 3 when 
> they try to get lock 
> ---
>
> Key: CARBONDATA-4015
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4015
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Kejian Li
>Priority: Trivial
> Fix For: 2.1.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3960) Column comment should be null by default when adding column

2020-08-25 Thread David Cai (Jira)
David Cai created CARBONDATA-3960:
-

 Summary: Column comment should be null by default when adding 
column
 Key: CARBONDATA-3960
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3960
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


1. create table

create table test_add_column_with_comment(
 col1 string comment 'col1 comment',
 col2 int,
 col3 string)
 stored as carbondata

2 . alter table

alter table test_add_column_with_comment add columns(
col4 string comment "col4 comment",
col5 int,
col6 string comment "")

3. describe table

describe test_add_column_with_comment

++-++
|col_name|data_type|comment |
++-++
|col1 |string |col1 comment|
|col2 |int |null |
|col3 |string |null |
|col4 |string |col4 comment|
|col5 |int | |
|col6 |string | |
++-++

the comment of col5 is "" by default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3958) CDC Merge task can't finish

2020-08-24 Thread David Cai (Jira)
David Cai created CARBONDATA-3958:
-

 Summary: CDC Merge task can't finish
 Key: CARBONDATA-3958
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3958
 Project: CarbonData
  Issue Type: Bug
Reporter: David Cai


# The merge tasks take a long time and can't finish in some cases.
 # We find warning "This scenario should not happen" in log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3930) MVExample is throwing DataLoadingException

2020-07-28 Thread David Cai (Jira)
David Cai created CARBONDATA-3930:
-

 Summary: MVExample is throwing DataLoadingException
 Key: CARBONDATA-3930
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3930
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: David Cai


[Reproduce]

Run 
examples/spark/src/main/scala/org/apache/carbondata/examples/MVExample.scala in 
IDEA

[LOG]

Exception in thread "main" 
org.apache.carbondata.processing.exception.DataLoadingException: The input file 
does not exist: 
/***/carbondata/integration/spark-common-test/src/test/resources/sample.csvException
 in thread "main" 
org.apache.carbondata.processing.exception.DataLoadingException: The input file 
does not exist: 
/home/david/Documents/code/carbondata/integration/spark-common-test/src/test/resources/sample.csv
 at 
org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:81)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:77) at 
org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:97)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:148)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:145)
 at 
org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:104)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:141)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:145)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at 
org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at 
org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3265) at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3264) at 
org.apache.spark.sql.Dataset.(Dataset.scala:190) at 
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at 
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at 
org.apache.carbondata.examples.MVExample$.exampleBody(MVExample.scala:67) at 
org.apache.carbondata.examples.MVExample$.main(MVExample.scala:37) at 
org.apache.carbondata.examples.MVExample.main(MVExample.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3924) Should add default dynamic parameters only one time in one JVM process

2020-07-27 Thread David Cai (Jira)
David Cai created CARBONDATA-3924:
-

 Summary: Should add default dynamic parameters only one time in 
one JVM process
 Key: CARBONDATA-3924
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3924
 Project: CarbonData
  Issue Type: Bug
Reporter: David Cai


Because ConfigEntry.registerEntry method cann't register same entry one times, 
so it should add default dynamic parameters only one time in one JVM process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3889) Should optimize the code of the inspection result of Intellij IDEA

2020-07-04 Thread David Cai (Jira)
David Cai created CARBONDATA-3889:
-

 Summary: Should optimize the code of the inspection result of 
Intellij IDEA
 Key: CARBONDATA-3889
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3889
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3888) Should move .flattened-pom.xml to target folder

2020-07-04 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-3888:
--
Description: after .flattened-pom.xml is generated in the project folder, 
it will impact the project import of Intellij idea  (was: When )

> Should move .flattened-pom.xml to target folder
> ---
>
> Key: CARBONDATA-3888
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3888
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Minor
>
> after .flattened-pom.xml is generated in the project folder, it will impact 
> the project import of Intellij idea



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3888) Should move .flattened-pom.xml to target folder

2020-07-04 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-3888:
--
Description: When 

> Should move .flattened-pom.xml to target folder
> ---
>
> Key: CARBONDATA-3888
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3888
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Minor
>
> When 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3888) Should move .flattened-pom.xml to target folder

2020-07-04 Thread David Cai (Jira)
David Cai created CARBONDATA-3888:
-

 Summary: Should move .flattened-pom.xml to target folder
 Key: CARBONDATA-3888
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3888
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-3878.
---
Resolution: Fixed

> Should get the last modified time from 'tablestatus' file instead of segment 
> file to reduce file operation 'getLastModifiedTime' 
> -
>
> Key: CARBONDATA-3878
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3878
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Assignee: David Cai
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai reassigned CARBONDATA-3878:
-

Assignee: David Cai

> Should get the last modified time from 'tablestatus' file instead of segment 
> file to reduce file operation 'getLastModifiedTime' 
> -
>
> Key: CARBONDATA-3878
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3878
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Assignee: David Cai
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread David Cai (Jira)
David Cai created CARBONDATA-3878:
-

 Summary: Should get the last modified time from 'tablestatus' file 
instead of segment file to reduce file operation 'getLastModifiedTime' 
 Key: CARBONDATA-3878
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3878
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3870) Global lock impact the performance of the concurrent query

2020-06-24 Thread David Cai (Jira)
David Cai created CARBONDATA-3870:
-

 Summary: Global lock impact  the performance of the concurrent 
query
 Key: CARBONDATA-3870
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3870
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3837) Should fallback to the original plan when MV rewrite throw exception

2020-06-01 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai reassigned CARBONDATA-3837:
-

Assignee: David Cai

> Should fallback to the original plan when MV rewrite throw exception
> 
>
> Key: CARBONDATA-3837
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3837
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Assignee: David Cai
>Priority: Major
> Fix For: 2.0.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3835) Global sort doesn't sort string columns properly

2020-05-31 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-3835.
---
Fix Version/s: (was: 2.0.0)
   2.0.1
   Resolution: Fixed

> Global sort doesn't sort string columns properly
> 
>
> Key: CARBONDATA-3835
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3835
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.0.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> problem:
> For global sort without partition, string comes as byte[], if we  use string 
> comparator (StringSerializableComparator) it wll convert byte[] to toString 
> which gives address and comparision goes wrong.
>  
> solution: change data type to byte before choosing comparator.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3837) Should fallback to the original plan when MV rewrite throw exception

2020-05-31 Thread David Cai (Jira)
David Cai created CARBONDATA-3837:
-

 Summary: Should fallback to the original plan when MV rewrite 
throw exception
 Key: CARBONDATA-3837
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3837
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3812) Data load jobs are missing output metrics

2020-05-08 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-3812:
--
Attachment: Screenshot from 2020-05-09 11-54-59.png

> Data load jobs are missing output metrics
> -
>
> Key: CARBONDATA-3812
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3812
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: David Cai
>Priority: Minor
> Attachments: Screenshot from 2020-05-09 11-54-59.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3812) Data load jobs are missing output metrics

2020-05-08 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-3812:
--
Description: please check attachments, the output item is empty.

> Data load jobs are missing output metrics
> -
>
> Key: CARBONDATA-3812
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3812
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: David Cai
>Priority: Minor
> Attachments: Screenshot from 2020-05-09 11-54-59.png
>
>
> please check attachments, the output item is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3812) Data load jobs are missing output metrics

2020-05-08 Thread David Cai (Jira)
David Cai created CARBONDATA-3812:
-

 Summary: Data load jobs are missing output metrics
 Key: CARBONDATA-3812
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3812
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3810) Partition column name should be case insensitive

2020-05-08 Thread David Cai (Jira)
David Cai created CARBONDATA-3810:
-

 Summary: Partition column name should be case insensitive
 Key: CARBONDATA-3810
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3810
 Project: CarbonData
  Issue Type: Bug
Reporter: David Cai


[Reproduce]

create table cs_insert_p
(id int, Name string)
stored as carbondata
partitioned by (c1 int, c2 int, C3 string)


insert into table cs_insert_p
partition(c1=3, C2=111, c3='2019-11-18')
select 200, 'cc'

 

It will throw NoSuchElementException: key not found: c2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-910) Implement Partition feature

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai closed CARBONDATA-910.

Resolution: Invalid

deprecated since 2.0

> Implement Partition feature
> ---
>
> Key: CARBONDATA-910
> URL: https://issues.apache.org/jira/browse/CARBONDATA-910
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, data-load, data-query
>Reporter: Cao, Lionel
>Assignee: Cao, Lionel
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Why need partition table
> Partition table provide an option to divide table into some smaller pieces. 
> With partition table:
>   1. Data could be better managed, organized and stored. 
>   2. We can avoid full table scan in some scenario and improve query 
> performance. (partition column in filter, 
>   multiple partition tables join in the same partition column etc.)
> Partitioning design
> Range Partitioning   
>range partitioning maps data to partitions according to the range of 
> partition column values, operator '<' defines non-inclusive upper bound of 
> current partition.
> List Partitioning
>list partitioning allows you map data to partitions with specific 
> value list
> Hash Partitioning
>hash partitioning maps data to partitions with hash algorithm and put 
> them to the given number of partitions
> Composite Partitioning(2 levels at most for now)
>Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, 
> Hash-Range, Hash-List, Hash-Hash
> DDL-Create 
> Create table sales(
>  itemid long, 
>  logdate datetime, 
>  customerid int
>  ...
>  ...)
> [partition by range logdate(...)]
> [subpartition by list area(...)]
> Stored By 'carbondata'
> [tblproperties(...)];
> range partition: 
>  partition by range logdate(<  '2016-01-01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
> list partition:
>  partition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> hash partition:
>  partition by hash(itemid, 9) 
> composite partition:
>  partition by range logdate(<  '2016- -01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
>  subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> DDL-Rebuild, Add
> Alter table sales rebuild partition by (range|list|hash)(...);
> Alter table salse add partition (< '2018-01-01');#only support range 
> partitioning, list partitioning
> Alter table salse add partition ('South America');
> #Note: No delete operation for partition, please use rebuild. 
> If need delete data, use delete statement, but the definition of partition 
> will not be deleted.
> Partition Table Data Store
> [Option One]
> Use the current design, keep partition folder out of segments
> Fact
>|___Part0
>|  |___Segment_0
>| |___ ***-[bucketId]-.carbondata
>| |___ ***-[bucketId]-.carbondata
>|  |___Segment_1
>|  ...
>|___Part1
>|  |___Segment_0
>|  |___Segment_1
>|...
> [Option Two]
> remove partition folder, add partition id into file name and build btree in 
> driver side.
> Fact
>|___Segment_0
>|  |___ ***-[bucketId]-[partitionId].carbondata
>|  |___ ***-[bucketId]-[partitionId].carbondata
>|___Segment_1
>|___Segment_2
>...
> Pros & Cons: 
> Option one would be faster to locate target files
> Option two need to store more metadata of folders
> Partition Table MetaData Store
> partitioni info should be stored in file footer/index file and load into 
> memory before user query.
> Relationship with Bucket
> Bucket should be lower level of partition.
> Partition Table Query
> Example:
> Select * from sales
> where logdate <= date '2016-12-01';
> User should remember to add a partition filter when write SQL on a partition 
> table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2917) Should support binary datatype

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-2917.
---
Resolution: Fixed

> Should support binary datatype
> --
>
> Key: CARBONDATA-2917
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2917
> Project: CarbonData
>  Issue Type: Improvement
>  Components: file-format
>Affects Versions: 1.5.0
>Reporter: David Cai
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2776) Support ingesting data from Kafka service

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-2776.
---
Resolution: Fixed

> Support ingesting data from Kafka service
> -
>
> Key: CARBONDATA-2776
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2776
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3021) Streaming throw Unsupported data type exception

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-3021.
---
Resolution: Fixed

> Streaming throw Unsupported data type exception
> ---
>
> Key: CARBONDATA-3021
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3021
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: David Cai
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:343)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
> Caused by: org.apache.carbondata.streaming.CarbonStreamException: Job failed 
> to write data file
>   at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:288)
>   at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply(CarbonAppendableStreamSink.scala:238)
>   at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply(CarbonAppendableStreamSink.scala:238)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
>   at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileJob(CarbonAppendableStreamSink.scala:238)
>   at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink.addBatch(CarbonAppendableStreamSink.scala:133)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply$mcV$sp(StreamExecution.scala:666)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:665)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
>   ... 1 more
> Caused by: java.lang.IllegalArgumentException: Unsupported data type: LONG
>   at 
> org.apache.carbondata.core.util.comparator.Comparator.getComparatorByDataTypeForMeasure(Comparator.java:73)
>   at 
> org.apache.carbondata.streaming.segment.StreamSegment.mergeBatchMinMax(StreamSegment.java:471)
>   at 
> org.apache.carbondata.streaming.segment.StreamSegment.updateStreamFileIndex(StreamSegment.java:610)
>   at 
> org.apache.carbondata.streaming.segment.StreamSegment.updateIndexFile(StreamSegment.java:627)
>   at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:277)
>   ... 20 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2923) should log the info of the min/max identification on streaming table

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-2923.
---
Resolution: Fixed

> should log the info of the min/max identification on streaming table
> 
>
> Key: CARBONDATA-2923
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2923
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: David Cai
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> currently, the query doesn't log the info of the min/max identification on 
> the streaming table, so we don't know whether the min/max of streaming is 
> working fine or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3641) Should improve data loading performance for partition table

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-3641.
---
Resolution: Fixed

> Should improve data loading performance for partition table
> ---
>
> Key: CARBONDATA-3641
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3641
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: David Cai
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Background]
>  # only implemented commit algorithm version 1
>  # generated too many segment files during loading
>  # generated too many small data files and index files
> [Modification]
>       1.  implemented carbon commit algorithm, avoid to move data file and 
> index files
>       2.  generate the final segment file directly
>      3.   optimize global_sort to avoid small files issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3347) support SORT_COLUMNS modification

2020-05-06 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-3347.
---
Resolution: Fixed

> support SORT_COLUMNS modification
> -
>
> Key: CARBONDATA-3347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3347
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Reporter: David Cai
>Assignee: David Cai
>Priority: Major
> Attachments: sort_columns modification.pdf, sort_columns 
> modification_v2.pdf
>
>
> *Background*
> Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
> modify SORT_COLUMNS on this table, we need to create a new table and migrate 
> data. If the data is huge, the migration will take a long time and even 
> impact the user business.
> SORT_SCOPE in table properties can be modified now. And we can specify new 
> SORT_SCOPE during data loading. Carbon index file will mark whether this 
> segment is sorted or not. So the different segments maybe have different 
> SORT_SCOPE.
> *Mo**tivation*
> After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
> according to their business. History segments will still use old 
> SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
> if need.
> But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
> create the table because the modification will take many resources to resort 
> data of old segments.
>  
> please check design doc for more detail.
> [^sort_columns modification_v2.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3803) Should mark CarbonSession as deprecated in version 2.0

2020-05-06 Thread David Cai (Jira)
David Cai created CARBONDATA-3803:
-

 Summary: Should mark CarbonSession as deprecated in version 2.0
 Key: CARBONDATA-3803
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3803
 Project: CarbonData
  Issue Type: Wish
Affects Versions: 2.0.0
Reporter: David Cai


Better to use CarbonExtensions instead of CarbonSession in version 2.0.

We should mark CarbonSession as deprecated in version 2.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3756) the query of stage files only read the first blocklet of each carbondata file

2020-03-27 Thread David Cai (Jira)
David Cai created CARBONDATA-3756:
-

 Summary: the query of stage files only read the first blocklet of 
each carbondata file
 Key: CARBONDATA-3756
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3756
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


the query of stage files only read the first blocklet of each carbondata file.

if the file contains multiple blocklets, the query result will be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3752) Query on carbon table should support reusing Exchange

2020-03-26 Thread David Cai (Jira)
David Cai created CARBONDATA-3752:
-

 Summary: Query on carbon table should support reusing Exchange
 Key: CARBONDATA-3752
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3752
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


Query on carbon table should support reusing Exchange

[Reproduce]

create table t1(c1 int, c2 string) using carbondata

insert into t1 values(1, 'abc')

explain
 select c2, sum(c1) from t1 group by c2
 union all
 select c2, sum(c1) from t1 group by c2

[Physical Plan]
{noformat}
Union
:- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
: +- Exchange hashpartitioning(c2#37, 200)
: +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as 
bigint))])
: +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: 
struct
+- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
 +- Exchange hashpartitioning(c2#37, 200)
 +- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as 
bigint))])
 +- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: 
struct{noformat}
It should reuse Exchange like Following:
{noformat}
Union
:- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
:  +- Exchange hashpartitioning(c2#37, 200)
: +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as 
bigint))])
:+- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: 
struct
+- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   +- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 
200){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3668) CarbonSession should use old flow (not CarbonExtensions flow)

2020-01-20 Thread David Cai (Jira)
David Cai created CARBONDATA-3668:
-

 Summary: CarbonSession should use old flow (not CarbonExtensions 
flow)
 Key: CARBONDATA-3668
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3668
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


Considering back-compatibility,  CarbonSession should use old flow (not 
CarbonExtensions flow)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3641) Should improve data loading performance for partition table

2019-12-29 Thread David Cai (Jira)
David Cai created CARBONDATA-3641:
-

 Summary: Should improve data loading performance for partition 
table
 Key: CARBONDATA-3641
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3641
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: David Cai


[Background]
 # only implemented commit algorithm version 1
 # generated too many segment files during loading
 # generated too many small data files and index files

[Modification]

      1.  implemented carbon commit algorithm, avoid to move data file and 
index files

      2.  generate the final segment file directly

     3.   optimize global_sort to avoid small files issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3547) Delete duplicate data in a segment

2019-10-12 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai reassigned CARBONDATA-3547:
-

Assignee: David Cai

> Delete duplicate data in a segment
> --
>
> Key: CARBONDATA-3547
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3547
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Assignee: David Cai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3545) support deduplication

2019-10-12 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-3545:
--
Description: support to delete duplicate data in the table

> support deduplication 
> --
>
> Key: CARBONDATA-3545
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3545
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: David Cai
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> support to delete duplicate data in the table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3545) support deduplication

2019-10-12 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai reassigned CARBONDATA-3545:
-

Assignee: David Cai

> support deduplication 
> --
>
> Key: CARBONDATA-3545
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3545
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: David Cai
>Assignee: David Cai
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> support to delete duplicate data in the table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3546) Delete duplicate data between segments

2019-10-12 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai reassigned CARBONDATA-3546:
-

Assignee: David Cai

> Delete duplicate data between segments
> --
>
> Key: CARBONDATA-3546
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3546
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Assignee: David Cai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3547) Delete duplicate data in a segment

2019-10-12 Thread David Cai (Jira)
David Cai created CARBONDATA-3547:
-

 Summary: Delete duplicate data in a segment
 Key: CARBONDATA-3547
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3547
 Project: CarbonData
  Issue Type: Sub-task
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3546) Delete duplicate data between segments

2019-10-12 Thread David Cai (Jira)
David Cai created CARBONDATA-3546:
-

 Summary: Delete duplicate data between segments
 Key: CARBONDATA-3546
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3546
 Project: CarbonData
  Issue Type: Sub-task
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3545) support deduplication

2019-10-12 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai updated CARBONDATA-3545:
--
Description: (was: delete duplicate the data from new segments if old 
segments exist the same data.

delete repeated col1
from t1
where new.segment.id between 3 and 4
and old.segment.id between 0 and 2)
 Issue Type: New Feature  (was: Improvement)
Summary: support deduplication   (was: support delete repeated data 
from a segment if the data is exists in other segments)

> support deduplication 
> --
>
> Key: CARBONDATA-3545
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3545
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: David Cai
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3545) support delete repeated data from a segment if the data is exists in other segments

2019-10-10 Thread David Cai (Jira)
David Cai created CARBONDATA-3545:
-

 Summary: support delete repeated data from a segment if the data 
is exists in other segments
 Key: CARBONDATA-3545
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3545
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


delete duplicate the data from new segments if old segments exist the same data.

delete repeated col1
from t1
where new.segment.id between 3 and 4
and old.segment.id between 0 and 2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3544) CLI should support a option to show statistics for all columns

2019-10-10 Thread David Cai (Jira)
David Cai created CARBONDATA-3544:
-

 Summary: CLI should support a option to show statistics for all 
columns
 Key: CARBONDATA-3544
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3544
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai


better to add option -C to show statistics for all columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)